Kafka File Descriptors and Overcommit Memoryπ
Alright π letβs simplify this down so itβs clear, no kernel jargon needed.
πΉ 1. Kafka and File Descriptorsπ
- Every log segment (chunk of a partition stored on disk) = needs a file descriptor.
- Every client connection (producer, consumer, replication) = also needs a file descriptor.
- So if a broker has lots of partitions and lots of connections β it needs a very large number of file descriptors open at once.
πΉ 2. Why vm.max_map_count
?π
- Linux limits how many memory-mapped files (which Kafka uses for log segments) a process can have.
- If this limit is too low, Kafka crashes or canβt open new log segments.
- Setting
vm.max_map_count = 400,000
or600,000
gives Kafka enough room for large clusters.
π Think of it like: βHow many drawers can Kafka keep open at once?β If too few, Kafka gets stuck. Raising the limit gives Kafka more drawers.
πΉ 3. Why vm.overcommit_memory=0
?π
- This tells Linux: π βDonβt promise applications more memory than you actually have.β
- If set to
1
or2
, the OS may over-commit (promise more than available). -
For Kafka, this is bad because:
-
Kafka needs predictable memory for high ingestion.
- If the OS over-promises, it may run out and start killing processes (OOM Killer).
So 0
= safe mode: kernel checks available memory before giving it to Kafka.
πΉ 4. Putting It Togetherπ
- Kafka needs a lot of open files β increase
vm.max_map_count
. - Kafka needs reliable memory allocation β set
vm.overcommit_memory=0
.
πΉ 5. Simple Analogyπ
-
Imagine Kafka is running a library.
-
Every log segment = a book on the table.
-
Every client connection = another open book.
-
If Linux says: βYou can only keep 65,000 books open,β Kafka will choke. β raise
vm.max_map_count
to 400k+ so all books can stay open. -
Memory is like seats in the library.
-
If the librarian over-commits (βSure, 200 people can sit hereβ when only 100 seats exist), people fight for space β chaos.
-
Setting
vm.overcommit_memory=0
ensures only as many people as seats β stable Kafka.
β In short:
- Raise
vm.max_map_count
so Kafka can keep lots of log segments + connections open. - Keep
vm.overcommit_memory=0
so Kafka only uses real, available memory β avoids crashes.
π Do you want me to also give you a practical command + formula for calculating how high vm.max_map_count
should be for your Kafka cluster (based on partitions + segment size)?