Key concepts, architecture diagram, and interactive demo
A Log-Structured Merge-Tree (LSM Tree) converts every write into a sequential append — no read-modify-write cycle, no random I/O. Writes land in an in-memory MemTable, get flushed to immutable on-disk files called SSTables, and are merged in the background by a process called compaction. Bloom filters let reads skip irrelevant SSTables without any disk I/O, keeping latency manageable despite multi-level storage. The tradeoff: reads are slower than B-Trees, but writes are dramatically faster — which is why LSM Trees power Cassandra, RocksDB, LevelDB, HBase, and ClickHouse.
| LSM Tree | B-Tree | |
|---|---|---|
| Write I/O | Sequential append (fast, even on HDD) | Random read-modify-write (slow for small records) |
| Read latency | Multi-level search, mitigated by Bloom filters | O(log n) with low constant — 3–4 levels for billions of keys |
| Write amplification | Moderate — 14x (RocksDB, 128B records) | High — 64x (WiredTiger, 128B records) |
| Space overhead | 1.1x leveled / up to 2x size-tiered | ~1.5x (pages average 2/3 full) |
| Best for | Logs, metrics, IoT, event streams, time-series | OLTP, point queries, mixed read-write |
See It In Action
Try the simulator below. Write some key-value pairs, watch the WAL and MemTable fill up, then trigger a flush by filling the MemTable to capacity. Use Read to trace the search path across levels, and Delete to write a tombstone and see how deletions work.
Anatomy of an LSM Tree
An LSM Tree is not a single data structure — it is a system of components that work together across memory and disk. Understanding each component is essential before we trace the write and read paths.
The Three Amplification Factors
Every storage engine makes tradeoffs across three fundamental costs. Understanding these is key to choosing the right engine for your workload.
| Amplification Factor | Definition | LSM Tree | B-Tree |
|---|---|---|---|
| Write Amplification | Ratio of total bytes written to disk vs. bytes written by the application | Moderate (compaction rewrites data, but all I/O is sequential) | High for small records (read-modify-write on full pages, random I/O) |
| Read Amplification | Number of disk reads required to serve a single point query | Higher (must check multiple levels; mitigated by Bloom filters) | Lower (B-Tree height is typically 3-4 levels for billions of keys) |
| Space Amplification | Ratio of total disk space used vs. actual live data size | Varies by strategy: ~1.1x (leveled) to ~2x (size-tiered) | ~1.5x (pages are typically 2/3 full on average due to splits) |
Real-World Systems Using LSM Trees
The LSM Tree is not an academic curiosity — it is the storage engine behind some of the most heavily used databases in production today.
| System | Origin | LSM Tree Role | Notable Users |
|---|---|---|---|
| LevelDB | Google (2011) | Embedded key-value store; the reference implementation of leveled compaction | Chrome, Bitcoin Core |
| RocksDB | Facebook/Meta (2012) | Fork of LevelDB, heavily optimized for SSDs with multi-threaded compaction, column families, and tunable compaction | Meta, Netflix, Uber, LinkedIn |
| Apache Cassandra | Facebook/Apache (2008) | Distributed wide-column store using LSM Trees for local storage with size-tiered compaction | Netflix, Apple, Discord |
| Apache HBase | Hadoop ecosystem | Distributed wide-column store modeled after Google Bigtable, using LSM Trees per region | Facebook Messenger, Yahoo |
| ClickHouse | Yandex (2016) | Column-oriented OLAP database using a MergeTree engine inspired by LSM principles for batch inserts | Cloudflare, eBay, Uber |
| CockroachDB | Cockroach Labs (2014) | Distributed SQL database using RocksDB (now Pebble) as its storage engine | DoorDash, Netflix |
| ScyllaDB | ScyllaDB (2015) | C++ reimplementation of Cassandra's architecture with per-core LSM Trees | Discord, Comcast, Samsung |
When to Use LSM Trees vs. B-Trees
The choice between LSM Trees and B-Trees is not about which is "better" — it is about matching the storage engine to your workload characteristics.