Skip to content

Kafka File Descriptors and Overcommit MemoryπŸ”—

Alright πŸ‘ let’s simplify this down so it’s clear, no kernel jargon needed.


πŸ”Ή 1. Kafka and File DescriptorsπŸ”—

  • Every log segment (chunk of a partition stored on disk) = needs a file descriptor.
  • Every client connection (producer, consumer, replication) = also needs a file descriptor.
  • So if a broker has lots of partitions and lots of connections β†’ it needs a very large number of file descriptors open at once.

πŸ”Ή 2. Why vm.max_map_count?πŸ”—

  • Linux limits how many memory-mapped files (which Kafka uses for log segments) a process can have.
  • If this limit is too low, Kafka crashes or can’t open new log segments.
  • Setting vm.max_map_count = 400,000 or 600,000 gives Kafka enough room for large clusters.

πŸ‘‰ Think of it like: β€œHow many drawers can Kafka keep open at once?” If too few, Kafka gets stuck. Raising the limit gives Kafka more drawers.


πŸ”Ή 3. Why vm.overcommit_memory=0?πŸ”—

  • This tells Linux: πŸ‘‰ β€œDon’t promise applications more memory than you actually have.”
  • If set to 1 or 2, the OS may over-commit (promise more than available).
  • For Kafka, this is bad because:

  • Kafka needs predictable memory for high ingestion.

  • If the OS over-promises, it may run out and start killing processes (OOM Killer).

So 0 = safe mode: kernel checks available memory before giving it to Kafka.


πŸ”Ή 4. Putting It TogetherπŸ”—

  • Kafka needs a lot of open files β†’ increase vm.max_map_count.
  • Kafka needs reliable memory allocation β†’ set vm.overcommit_memory=0.

πŸ”Ή 5. Simple AnalogyπŸ”—

  • Imagine Kafka is running a library.

  • Every log segment = a book on the table.

  • Every client connection = another open book.

  • If Linux says: β€œYou can only keep 65,000 books open,” Kafka will choke. β†’ raise vm.max_map_count to 400k+ so all books can stay open.

  • Memory is like seats in the library.

  • If the librarian over-commits (β€œSure, 200 people can sit here” when only 100 seats exist), people fight for space β†’ chaos.

  • Setting vm.overcommit_memory=0 ensures only as many people as seats β†’ stable Kafka.


βœ… In short:

  • Raise vm.max_map_count so Kafka can keep lots of log segments + connections open.
  • Keep vm.overcommit_memory=0 so Kafka only uses real, available memory β†’ avoids crashes.

πŸ‘‰ Do you want me to also give you a practical command + formula for calculating how high vm.max_map_count should be for your Kafka cluster (based on partitions + segment size)?