in Data
ClickHouse ReplicatedReplacingMergeTree Engine

Now you have a large single node cluster with a ReplacingMergeTree table that can deduplicate itself. This time, you need more replicated nodes to serve more data users or improve the high availability.

ClickHouse ReplicatedReplacingMergeTree Engine
in Data
ClickHouse ReplacingMergeTree Engine

My favorite ClickHouse table engine is `ReplacingMergeTree`. The main reason is that it is similar to `MergeTree` but can automatically deduplicate based on columns in the `ORDER BY` clause, which is very useful.

ClickHouse ReplacingMergeTree Engine
in Data
ClickHouse MergeTree Engine

After starting this series ClickHouse on Kubernetes, you can now configure your first single-node ClickHouse server. Let's dive into creating your first table and understanding the basic concepts behind the ClickHouse engine, its data storage, and some cool features

ClickHouse MergeTree Engine
in Data
Monitoring ClickHouse on Kubernetes

Now that you have your first ClickHouse instance on Kubernetes and are starting to use it, you need to monitoring and observing what happens on it is an important task to achieve stability.

Monitoring ClickHouse on Kubernetes
in Data
ClickHouse SELECT Advances

Dynamic column selection (also known as a `COLUMNS` expression) allows you to match some columns in a result with a re2 regular expression.

ClickHouse SELECT Advances
in Data
ClickHouse on Kubernetes

ClickHouse has been both exciting and incredibly challenging based on my experience migrating and scaling from Iceberg to ClickHouse, zero to a large cluster of trillions of rows. I have had to deal with many of use cases and resolve issues. I have been trying to take notes every day for myself, although it takes time to publish them as a series of blog posts. I hope I can do so on this ClickHouse on Kubernetes series.

ClickHouse on Kubernetes
in Story
2023 - A Year of Moving

So, I hadn't really planned on writing a summary post for this year, as lazy as I am, but somehow, here we are.

2023 - A Year of Moving
in Rust 🦀
Apache OpenDAL in Rust to Access Any Kind of Data Services

OpenDAL is a data access layer that allows users to easily and efficiently retrieve data from various storage services in a unified way such as S3, FTP, FS, Google Drive, HDFS, etc. They has been rewritten in Rust for the Core and have a binding from many various language like Python, Node.js, C, etc..

Apache OpenDAL in Rust to Access Any Kind of Data Services
in Productivity
My Neovim Setup in 2023

It's been years since I first started using neovim and I've been updating it regularly ever since.

My Neovim Setup in 2023
in Data
DuckDB

In this post, I want to explore the features and capabilities of DuckDB, an open-source, in-process SQL OLAP database management system written in C++11 that has been gaining popularity recently. According to what people have said, DuckDB is designed to be easy to use and flexible, allowing you to run complex queries on relational datasets using either local, file-based DuckDB instances or the cloud service MotherDuck.

DuckDB
See more