Skip to content

Explain the difference between vm_dirty_ratio and vm_background_dirty_ratiođź”—


🧠 1. What these settings are about — the big picture🔗

When an application (like Kafka) writes data to disk, it doesn’t write it directly to the physical disk each time — that would be too slow.

Instead, Linux uses something called the page cache (a part of RAM) to hold recently written or read data temporarily.

So:

  • Kafka writes → it first goes to memory (the page cache)
  • Later → the Linux kernel flushes (writes) that data from memory to disk in the background.

These two settings:

vm.dirty_ratio
vm.background_dirty_ratio

control how much of your system’s memory can fill up with "dirty" (unflushed) pages before the kernel starts writing them to disk.


đź§© 2. What are "dirty pages"?đź”—

“Dirty pages” = memory pages that contain data that’s been modified but not yet written (“flushed”) to disk.

Example:

  • Kafka writes messages to its log segment file.
  • The OS keeps those writes in memory first (as dirty pages).
  • Eventually, those pages are written to disk asynchronously.

⚙️ 3. vm.background_dirty_ratio — “start cleaning soon”🔗

This setting tells Linux:

“When this percentage of total memory has dirty pages, start writing them to disk in the background.”

It’s like an early warning threshold for the kernel’s background flusher thread.

  • The flushing happens asynchronously (in the background).
  • The goal is to keep the number of dirty pages low so they don’t pile up.

Example:

vm.background_dirty_ratio = 10

→ When 10% of your RAM is full of dirty pages, Linux starts slowly writing them out to disk.

So, the system starts cleaning early — gently.


⚙️ 4. vm.dirty_ratio — “stop the writers!”🔗

This is the hard limit.

It tells Linux:

“If this percentage of total memory is full of dirty pages, stop all new writes until some are written to disk.”

At this point, applications like Kafka or MySQL will be forced to wait because the OS won’t accept more dirty pages.

Example:

vm.dirty_ratio = 20

→ When 20% of memory is dirty, Linux will block new writes from user processes until the kernel finishes flushing enough pages.

This is how Linux prevents the system from running out of memory.


🪜 5. How they work together🔗

You can think of these two settings like a bathtub and a drain:

Setting What it does Analogy
vm.background_dirty_ratio When this much water fills the tub, start draining slowly in the background. “Start draining at 10% full.”
vm.dirty_ratio If the water gets this high, stop pouring in more until it drains. “Stop pouring at 20% full.”

So:

  • Between 0% → 10%: fine, OS just caches data in memory.
  • Between 10% → 20%: OS starts background flushing.
  • Above 20%: OS blocks applications until flushing catches up.

đź§° 6. Typical values (defaults and tuning)đź”—

Setting Default (Linux) Recommended for Kafka or database workloads
vm.dirty_ratio 20 10–15
vm.background_dirty_ratio 10 5

For Kafka or large I/O systems, you generally want to lower both so that:

  • The OS starts flushing earlier (background_dirty_ratio=5)
  • You never hit the hard stop (dirty_ratio=10–15)

This avoids long I/O pauses where Kafka threads get blocked waiting for the OS to flush dirty pages.


🔍 7. Real-world impact on Kafka🔗

Kafka relies on the Linux page cache for writing logs (it doesn’t fsync every message). That’s why these settings matter directly for Kafka performance.

If ratios are too high:đź”—

  • Kafka writes very fast initially (everything goes to memory).
  • Memory fills up with dirty pages.
  • Suddenly, OS stops accepting new writes.
  • Kafka’s disk I/O stalls for seconds while flushing happens. → You get high latency spikes and possible broker timeouts.

If ratios are tuned properly:đź”—

  • Linux flushes gradually and continuously.
  • Kafka writes stay smooth and predictable.
  • No sudden stalls.

So, lowering these ratios makes Kafka more stable under heavy write loads.


đź§® 8. Quick example with numbersđź”—

Let’s say your broker has 64 GB RAM.

Setting Value Dirty memory threshold
vm.background_dirty_ratio=10 Start flushing when 10% dirty 6.4 GB
vm.dirty_ratio=20 Block new writes at 20% dirty 12.8 GB

That means:

  • Once 6.4 GB of writes are cached but not yet flushed → kernel starts background writes.
  • If 12.8 GB pile up → all write calls will block until the kernel flushes some data.

If you change to:

vm.background_dirty_ratio = 5
vm.dirty_ratio = 10

→ Flushing starts earlier (at ~3 GB) and never lets dirty pages exceed ~6 GB.

That means smoother, steadier disk I/O and fewer Kafka pauses.


đź§­ 9. How to view and change these valuesđź”—

Check current values:

cat /proc/sys/vm/dirty_ratio
cat /proc/sys/vm/background_dirty_ratio

Change temporarily:

sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.background_dirty_ratio=5

Make it permanent: Add to /etc/sysctl.conf:

vm.dirty_ratio=10
vm.background_dirty_ratio=5

Then reload:

sudo sysctl -p

🧠 10. TL;DR — like you’re explaining to a 10-year-old🔗

Imagine your computer’s memory as a notebook where you jot down things to save to disk later.

  • vm.background_dirty_ratio = “When 10 pages are filled, start copying them quietly to disk.”
  • vm.dirty_ratio = “If 20 pages are filled and you’re still writing, stop! Clean some up first.”

You want to start cleaning before the notebook gets full, otherwise, you’ll have to stop writing until the old pages are cleared.


âś… In short:

Setting Meaning Safe value for Kafka
vm.background_dirty_ratio When to start background flushing dirty pages 5
vm.dirty_ratio When to block writes until flush completes 10–15

Kafka benefits when these are low, because it avoids long OS-level pauses caused by massive flushes.