Heads up: posts on this site are drafted by Claude and fact-checked by Codex. Both can still get things wrong — read with care and verify anything load-bearing before relying on it.
why how

Why columnar storage won for analytics

Row stores read every column to answer a one-column question. Columnar stores refuse — and that refusal is what makes Parquet, ClickHouse, DuckDB, and every modern data warehouse fast.

Data intermediate Apr 29, 2026

Why it exists

Imagine a table of a billion clickstream events with 40 columns: timestamp, user_id, country, browser, url, referrer, and 35 others. You want one number out of it — median latency last week. That answer lives in two columns: timestamp and latency_ms.

A traditional row-oriented database stores each row contiguously: all 40 fields of event 1, then all 40 fields of event 2, and so on. To compute the median latency, the engine has to read past 38 columns it doesn’t care about for every single row, because rows are the unit on disk. You pay for the data you didn’t ask for, and on a billion rows that bill is the whole query.

Columnar storage is what you get when you stop accepting that bill. Rotate the layout 90°: store all the timestamps together, then all the latencies, then all the user_ids. The query reads exactly the two columns it needs. The other 38 never leave disk.

This isn’t a new idea — the C-Store paper (Stonebraker et al., 2005) and MonetDB before it laid the academic groundwork, and the design appears in proprietary systems earlier than that. What changed is that the format went mainstream: Apache Parquet, Apache ORC, and Apache Arrow turned columnar into the default shape of analytical data, and engines like ClickHouse, DuckDB, BigQuery, Snowflake, and Redshift all bet the farm on it.

Why it matters now

If you work anywhere near data, columnar is already under your feet:

You don’t choose columnar by typing it. You choose it by picking a tool whose authors already did.

The short answer

columnar storage = transpose the table on disk + compress each column + skip the columns you don't need

Three wins stack. Layout: each column is a contiguous run, so a one-column query reads one-column’s worth of bytes. Compression: values in a column are the same type and often similar, so they compress 5–20x where mixed rows would barely compress at all. Skipping: with per-block min/max statistics, the engine can prove an entire chunk of a column is irrelevant and never decompress it.

Row stores aren’t wrong; they’re just answering a different question — give me this whole record — that columnar stores are bad at.

How it works

Three things make it work, and they reinforce each other.

1. The transpose

A row layout for a tiny table:

row 1: 2026-04-29 | alice | US | 142
row 2: 2026-04-29 | bob   | DE | 88
row 3: 2026-04-30 | carol | US | 201

The same data, columnar:

timestamp:  2026-04-29 | 2026-04-29 | 2026-04-30
user:       alice      | bob        | carol
country:    US         | DE         | US
latency_ms: 142        | 88         | 201

Now SELECT avg(latency_ms) FROM events reads exactly one column. The bytes you don’t read are the bytes you don’t pay for — in disk I/O, in memory bandwidth, and (on a cloud warehouse) in literal dollars.

The catch: INSERT of one row touches every column. That’s why columnar formats are usually written in batches (Parquet groups rows into “row groups” sized in bytes — the spec recommends roughly 512 MB to 1 GB; concrete row counts depend on the writer and the schema) and why most columnar systems are bad at single-row updates. Updates are typically expressed as appends plus a periodic rewrite, which is the same bargain LSM trees make for a different reason.

2. The compression

A column is type-homogeneous and often value-homogeneous. That makes it dramatically more compressible than a row.

Common encodings columnar formats stack:

Real-world Parquet files routinely compress 5–20x versus the equivalent CSV — I’m giving a range here because the actual ratio is brutally dependent on the data, and any single number would be a lie. The point is that compression on a column is qualitatively better than compression on a row, because the entropy is lower when types and values are clustered.

3. The skipping (this is the underrated one)

A Parquet file is divided into row groups, and each row group carries per-column statistics: min, max, null count, and optionally a Bloom filter and finer-grained page indexes. When you ask WHERE timestamp >= '2026-04-22', the reader checks each row group’s timestamp.max. If it’s less than 2026-04-22, the entire row group — every column, not just timestamp — is skipped without reading or decompressing.

This is predicate pushdown + min/max pruning, and it’s where a lot of the real-world speed comes from. A query that selectively touches a small slice of the data can physically read a small slice of the data, instead of streaming the whole table and filtering — assuming the layout cooperates. ClickHouse builds on this idea even more aggressively with sparse primary indexes and “skip indexes.” DuckDB, BigQuery, Snowflake — different implementations, same shape.

The corollary: columnar is only fast if your data is laid out to be skippable. A Parquet file where timestamps are scrambled across row groups can’t be pruned by timestamp. That’s why warehouses care so much about clustering / partitioning / sort keys — those are the knobs that decide whether min/max pruning actually works.

Show the seams

A few things the marketing slides skip.

Going deeper