Skip to content

Tiered Storage in KafkaπŸ”—


1. Why Kafka needed tiered storageπŸ”—

Before tiered storage, Kafka stored all data locally on broker disks. This worked well for moderate retention, but created hard limits when Kafka was used for high-throughput or long-retention use cases.

Problems with local-only storageπŸ”—

a. Storage capacity limitsπŸ”—

  • Each partition replica resides on a single broker and cannot be split across disks.
  • The partition size is therefore limited by the disk size of a single mount point.
  • If a team wanted to retain weeks or months of data, Kafka’s local disks often filled up quickly.

b. Cost inefficiencyπŸ”—

  • To increase storage, you had to add more brokers, even if CPU and memory were underutilized.
  • This made Kafka clusters larger and more expensive than necessary, since scaling storage required scaling compute too.

c. Low elasticityπŸ”—

  • When scaling up or down (e.g., adding/removing brokers), moving large partitions between brokers took a long time.
  • Kafka rebalance and recovery operations were slower because huge amounts of data had to be copied between nodes.

d. External pipelinesπŸ”—

  • Organizations often created additional pipelines to offload old Kafka data to cheaper storage (e.g., S3, HDFS).
  • This duplicated infrastructure and increased operational complexity.

So Kafka needed a way to: βœ… Keep recent, β€œhot” data close to the broker for low-latency access. βœ… Offload older, β€œcold” data to cheaper and more scalable storage. βœ… Do this transparently, without breaking existing clients or APIs.

That’s what Tiered Storage provides.


2. What is Tiered Storage in Kafka?πŸ”—

Tiered storage introduces a two-layer (dual-tier) architecture for Kafka logs:

Tier Description Typical storage type Purpose
Local Tier Existing Kafka storage on local broker disks SSDs or HDDs For hot, recent data; low latency
Remote Tier New, external long-term storage S3, HDFS, or cloud blob store For older, cold data; cheap & scalable

In simple terms:πŸ”—

Kafka keeps recent segments locally (fast disk), and older, completed segments are uploaded to remote storage.

This allows Kafka to behave both as a high-performance message broker and a long-term event store β€” without separate data pipelines.


3. How it worksπŸ”—

a. Log segmentsπŸ”—

  • Kafka stores each partition’s data as a series of log segments (files).
  • As the partition grows, older segments are rolled over (closed) and new ones are started.

b. Remote offloadingπŸ”—

  • Once a segment is closed (no new messages are appended), Kafka can upload it to the remote tier (e.g., S3 or HDFS).
  • The local copy can then be retained for a shorter time and eventually deleted based on local retention policy.

c. Dual retention policiesπŸ”—

You can define different retention policies for the two tiers:

  • Local retention: how long segments stay on broker disk (e.g., a few hours).
  • Remote retention: how long segments stay in the remote tier (e.g., days, months, or indefinitely).

This means brokers maintain only the active tail of the log locally, while the remote tier keeps historical data for long-term access.


4. Accessing data from both tiersπŸ”—

Kafka abstracts both storage layers under the same log interface.

  • Consumers reading recent data (tail reads) β†’ Served from local tier (fast disk I/O and page cache). These are typically real-time stream processors or dashboards.

  • Consumers reading older data (backfill or reprocessing) β†’ Served from remote tier (fetched over the network from S3/HDFS). These are typically analytics, ETL, or replay jobs.

The switch between local and remote is transparent β€” clients don’t need to know where the data physically lives.


5. Components introduced by Tiered StorageπŸ”—

a. RemoteLogManagerπŸ”—

A new internal Kafka component introduced in KIP-405. It handles:

  • Tracking which log segments have been uploaded.
  • Coordinating uploads, downloads, and deletions of remote segments.
  • Integrating with retention policies for both tiers.

Each broker runs a RemoteLogManager to manage the remote copies of segments it owns as leader.


b. Remote Storage Connector (plugin layer)πŸ”—

Kafka uses a pluggable interface to talk to various remote systems. For example:

  • AWS S3 connector
  • HDFS connector
  • Azure Blob connector

These connectors implement how to upload, download, list, and delete log segments.


c. Metadata trackingπŸ”—

Kafka must keep track of:

  • Which segments exist locally,
  • Which are in remote storage,
  • And which offsets correspond to each.

This metadata is replicated via the Kafka controller and metadata topic, ensuring consistency across brokers.


6. Benefits of Tiered StorageπŸ”—

1. Infinite (practically) retentionπŸ”—

  • Because old data is offloaded, retention is no longer constrained by broker disk size.
  • You can store months or years of data at a fraction of the cost.

2. Lower cost per GBπŸ”—

  • Brokers only need fast storage for recent data.
  • Older data can live in cheaper cloud object storage (S3, GCS, Azure Blob, HDFS).

3. Independent scalingπŸ”—

  • Storage can now scale independently from compute.
  • You can add capacity to S3 without adding brokers.

4. Faster elasticity and recoveryπŸ”—

  • Rebalancing or replacing brokers becomes much faster:

  • Old segments already in remote storage don’t need to be copied again.

  • Brokers only rebuild metadata and local active segments.

5. Isolation between workloadsπŸ”—

  • Previously, consumers reading old data competed with real-time consumers for disk I/O.
  • With tiered storage:

  • Old reads β†’ remote storage (network I/O)

  • New reads β†’ local disk
  • Result: better latency isolation and less interference between workloads.

6. Simplified architectureπŸ”—

  • No need for separate ETL pipelines to move data from Kafka to S3 or HDFS.
  • Kafka itself becomes a complete event store.

7. Performance impact (based on KIP-405 measurements)πŸ”—

The Kafka team tested tiered storage in several workloads:

Use Case Without Tiered Storage With Tiered Storage Observation
Normal high-throughput workload p99 latency β‰ˆ 21 ms p99 latency β‰ˆ 25 ms Slight latency increase due to background uploads to remote storage.
Consumers reading old data p99 latency β‰ˆ 60 ms p99 latency β‰ˆ 42 ms Significant improvement because old reads come from remote storage, not competing with hot disk reads.

So even though there’s a small penalty for offloading, the overall cluster performance improves for mixed workloads.


8. Trade-offs and considerationsπŸ”—

Aspect Notes
Latency Slightly higher due to remote fetch for old data, but negligible for most use cases.
Complexity Additional moving parts (RemoteLogManager, remote connectors, new metadata tracking).
Network usage Uploading/downloading segments adds bandwidth requirements.
Consistency Remote segment metadata must stay consistent during leader changes and replica catch-up.
Cost Remote storage is cheaper per GB, but total cost depends on egress and API call pricing in cloud environments.

9. Comparison: Before vs After Tiered StorageπŸ”—

Feature Pre-Tiered Storage With Tiered Storage
Storage Entirely local Split: local + remote
Retention Limited by disk Practically unlimited
Scaling Add brokers for more storage Scale storage separately (e.g., add S3 capacity)
Broker recovery Copies all data Copies only local data, references remote
Cost efficiency High cost per TB Low cost per TB (cheap cloud storage)
Performance isolation Old readers compete for disk I/O Old readers use remote storage, no competition

10. Key takeawayπŸ”—

Tiered storage transforms Kafka from a short-term streaming buffer into a long-term, cost-efficient event store.

It provides: βœ… Infinite retention (practically) βœ… Lower storage cost βœ… Easier scaling and faster recovery βœ… Isolation between real-time and historical consumers βœ… Seamless access to historical data

All while maintaining Kafka’s same client APIs and streaming semantics.


11. AnalogyπŸ”—

Think of Kafka Tiered Storage like your phone’s photo library:

  • Recent photos β†’ stored locally on your phone (fast, small).
  • Old photos β†’ automatically moved to cloud storage (cheap, large). You can still view any photo seamlessly β€” the system just fetches it from the right place when needed.

Kafka’s tiered storage works the same way for log segments.


12. SummaryπŸ”—

Concept Description
Motivation Overcome storage limits, reduce cost, improve elasticity.
Architecture Two tiers β€” local (fast disks) and remote (cheap, scalable).
New component RemoteLogManager manages uploads, downloads, metadata.
Retention Separate policies for local and remote tiers.
Performance Slight latency increase for writes, improved isolation for reads.
Benefit Kafka becomes a long-term, scalable, cost-effective event store.

One interesting result that is documented in KIP-405 is the performance implications of tiered storage. The team implementing tiered storage measured performance in several use cases. The first was using Kafka’s usual highthroughput workload. In that case, latency increased a bit (from 21 ms in p99 to 25 ms), since brokers also have to ship segments to remote storage. The second use case was when some consumers are reading old data.

Without tiered storage, consumers reading old data have a large impact on latency (21 ms versus 60 ms p99), but with tiered storage enabled, the impact is significantly lower (25 ms versus 42 ms p99); this is because tiered storage reads are read from HDFS or S3 via a network path. Network reads do not compete with local reads on disk I/O or page cache, and leave the page cache intact with fresh data. This means that in addition to infinite storage, lower costs, and elasticity, tiered storage also delivers isolation between historical reads and real-time reads.