Skip to content

Kafka OS Tuning : Dirty Page Handling๐Ÿ”—

Kafka relies on disk I/O operations to give good response time to producers sending the data.

Hence the log segments are put on a fast disk, it can be an individual disk with fast response time like SSD or a disk subsystem with lot of NVRAM for caching like RAID.


๐Ÿ”น 1. What are โ€œdirty pagesโ€?๐Ÿ”—

  • A page = a chunk of memory.
  • A page is called dirty if it contains data that has been changed in memory but not yet written to disk.

  • Example: You write a log entry โ†’ it first goes into memory.

  • That memory page is now โ€œdirty.โ€
  • Later, the kernel flushes it to disk.

This buffering is good, because writing to RAM is faster than writing to disk, and the kernel can group small writes together.


๐Ÿ”น 2. What is vm.dirty_background_ratio?๐Ÿ”—

  • Itโ€™s a kernel parameter that sets: ๐Ÿ‘‰ โ€œHow much of total system memory can be dirty pages before the background flusher starts writing them to disk.โ€
  • Value is a percentage of total RAM.
  • Default = 10.

  • Means: if 10% of system memory contains dirty pages, the kernelโ€™s background flush process starts.


๐Ÿ”น 3. What happens if you lower it?๐Ÿ”—

  • Example: set vm.dirty_background_ratio = 5.
  • Now the kernel starts flushing earlier (when only 5% of RAM is dirty).
  • Result:

  • Dirty data stays in memory for less time.

  • Disk writes happen more continuously instead of in big bursts.
  • Good for workloads where you donโ€™t want long spikes of writes.

๐Ÿ”น 4. Why not set it to 0?๐Ÿ”—

  • If itโ€™s 0, the kernel would flush immediately whenever any page becomes dirty.
  • That means:

  • No buffering.

  • Every write goes straight to disk.
  • Eliminates the kernelโ€™s ability to smooth out temporary spikes (e.g., a burst of writes).
  • Performance would tank, because youโ€™d lose batching benefits.

๐Ÿ”น 5. Analogy๐Ÿ”—

Think of RAM like a kitchen counter and disk like the pantry in another room:

  • You prepare groceries (data) on the counter first.
  • When the counter gets 10% full (default), the helper (kernel flusher) starts moving items to the pantry.
  • If you lower it to 5%, the helper starts earlier โ†’ the counter never gets too messy.
  • But if you set it to 0, the helper runs every single time you put something down โ†’ you canโ€™t work efficiently anymore.

โœ… In summary:

  • vm.dirty_background_ratio controls when the kernel starts flushing dirty pages to disk.
  • Lower values โ†’ smoother, more frequent flushes.
  • Too low (like 0) โ†’ kills performance because every write goes straight to disk.

Synchronous and Background Dirty Page Handling๐Ÿ”—

Perfect ๐Ÿ‘ youโ€™re asking about the second half of Linux dirty page handling โ€” vm.dirty_ratio. Letโ€™s break it down step by step.


๐Ÿ”น 1. What is vm.dirty_ratio?๐Ÿ”—

  • Itโ€™s another Linux kernel setting (percentage of total RAM).
  • It controls the absolute upper limit of dirty pages allowed in memory.

๐Ÿ‘‰ Difference vs vm.dirty_background_ratio:

  • dirty_background_ratio โ†’ threshold when background flushing starts.
  • dirty_ratio โ†’ hard limit where normal processes are forced to stop and flush data themselves (synchronous flush).

๐Ÿ”น 2. Default Value๐Ÿ”—

  • Default = 20% of total memory.
  • Example: If you have 16GB RAM โ†’ at most 3.2GB can be dirty pages.
  • If you reach this limit:

  • Any new writes block until data is flushed โ†’ app threads do the flushing.

  • This is much slower, because the app itself is waiting on disk I/O.

๐Ÿ”น 3. Raising the Limit๐Ÿ”—

  • If you set vm.dirty_ratio = 60 or 80:

  • Now up to 60โ€“80% of memory can be dirty pages.

  • More data can stay in RAM before being forced to disk.
  • This means higher throughput for write-heavy apps like Kafka, because they batch more writes in memory before hitting disk.

๐Ÿ”น 4. Risks๐Ÿ”—

  • More unflushed data in RAM = if the server crashes, more data is lost.
  • Longer I/O pauses = if memory suddenly fills with dirty pages, the kernel will force a big synchronous flush โ†’ applications block until gigabytes of data are written out.
  • These pauses are sometimes called โ€œI/O stormsโ€ or โ€œwrite cliffs.โ€

๐Ÿ”น 5. Why Mention Kafka Replication?๐Ÿ”—

  • Kafka brokers rely on disk writes for durability.
  • If you allow too many dirty pages (high vm.dirty_ratio), and the machine crashes, all that unflushed data is lost.
  • Replication across brokers ensures that even if one machine loses data, others have a copy.
  • So the text is saying: ๐Ÿ‘‰ โ€œIf you increase dirty_ratio to improve throughput, you must rely on replication to stay safe against data loss.โ€

๐Ÿ”น 6. Analogy๐Ÿ”—

  • Imagine youโ€™re taking notes (data) on paper (RAM) before typing them into the computer (disk).
  • Rule 1: At 5 pages of notes โ†’ your friend (background flush) starts typing them up slowly.
  • Rule 2: At 20 pages โ†’ youโ€™re not allowed to write more notes until you type everything in yourself (synchronous flush).
  • If you increase the limit to 60 pages:

  • You can keep writing more before being forced to stop.

  • But if you trip the limit, youโ€™ll have to type 60 pages at once โ€” long pause.
  • If you spill coffee (server crash), you lose a lot more notes.

โœ… In summary:

  • vm.dirty_ratio = maximum dirty memory before synchronous flush kicks in.
  • Default = 20%. Increasing to 60โ€“80% improves write throughput for Kafka, but:

  • Increases risk of data loss on crash.

  • Increases risk of long blocking flushes.
  • Thatโ€™s why Kafka clusters should use replication if you tune this aggressively.