Skip to content

Why are closed segment files kept open by Kafka

Why are Closed Segment Files kept 'open' in Kafka?๐Ÿ”—


1. Background โ€” Kafka partitions and log segments๐Ÿ”—

Every Kafka partition is stored on disk as a log, and that log is split into multiple segment files.

For example: If you have a topic called orders with 3 partitions, and each partition has multiple segment files, you might see something like this on disk:

/data/kafka/orders-0/
    00000000000000000000.log
    00000000000000000000.index
    00000000000000000000.timeindex
    00000000000000001000.log
    00000000000000001000.index
    00000000000000001000.timeindex

/data/kafka/orders-1/
    ...

Each .log file is one segment โ€” a fixed-size chunk of data from that partitionโ€™s message stream. Kafka rolls over to a new segment file after a certain size or time limit (e.g., 1 GB or 1 hour).


2. How brokers handle these segment files๐Ÿ”—

For performance reasons, the Kafka broker keeps every segment file "open" โ€” meaning it maintains an open file handle to it.

An open file handle is the operating systemโ€™s way of saying โ€œthis file is currently in use by a program.โ€

  • When a program (like Kafka) opens a file, the OS creates a file descriptor.
  • The broker uses this descriptor to read or write data quickly without reopening the file every time.

Kafka keeps these files open because:

  • Producers and consumers may need to read or write to any segment at any moment.
  • Opening and closing files repeatedly would be slow (system call overhead).
  • Keeping them open allows faster reads and writes, since file descriptors are already established.

3. What this means in practice๐Ÿ”—

Each partition can have multiple segment files, and Kafka keeps an open handle for every one of them โ€” even if the segment is inactive (not currently being written to).

Example:

Suppose:

  • You have 1,000 partitions on a broker.
  • Each partition has 10 segment files (some active, some old). โ†’ Thatโ€™s 10,000 open file handles just for the data logs.

And Kafka opens three files per segment:

  • .log
  • .index
  • .timeindex

So total file handles could be: 1,000 partitions ร— 10 segments ร— 3 files = 30,000 open file handles.

Thatโ€™s quite a lot โ€” and for larger clusters, it can reach hundreds of thousands.


4. Why the OS needs to be tuned๐Ÿ”—

Operating systems (Linux, macOS, etc.) have a limit on how many files a process can keep open at once. This is called the open file descriptor limit (ulimit).

On Linux, you can check it with:

ulimit -n

Typical defaults might be 1024 or 4096 โ€” far too low for a Kafka broker.

If Kafka tries to open more files than the OS allows, youโ€™ll get errors like:

Too many open files

and the broker may crash or fail to serve requests.


5. How to fix (tune) it๐Ÿ”—

Kafka administrators must increase the OS limit on open file descriptors for the Kafka process.

This is usually done in:

  • Systemd service configuration (LimitNOFILE), or
  • Shell config (ulimit -n 1000000), depending on your deployment.

A common best practice:

ulimit -n 1000000

Kafkaโ€™s documentation recommends setting it to at least 100,000 or more, depending on:

  • Number of partitions per broker
  • Segment count per partition
  • Replication factor

6. In short โ€” the passage means:๐Ÿ”—

โ€œEach Kafka broker keeps every log segment file open (even old ones), which can lead to a huge number of open files. Because of this, you must tune your operating system to allow many open file handles.โ€


7. Quick summary table๐Ÿ”—

Concept Explanation
Log segment A chunk of data for a partition stored as a file (e.g., 1 GB each).
File handle / file descriptor OS-level resource for accessing an open file.
Kafka behavior Keeps all segment files open for faster I/O (no reopening overhead).
Impact Can result in thousands of open files per broker.
Solution Increase OS limits (ulimit -n) to allow large numbers of open files.

Example scenario๐Ÿ”—

  • Topic with 100 partitions, replication factor 3 โ†’ 300 replicas across brokers.
  • Each replica has 20 segment files.
  • Each segment has 3 files (log, index, timeindex).
  • Broker might handle ~100,000 open file descriptors.

So you must tune Linux to handle that load.


โœ… In short: Kafka brokers keep all segment files open for speed, which can lead to thousands (or even hundreds of thousands) of open file handles. Therefore, you need to increase the OSโ€™s open file limit so Kafka can function reliably.