Partition Allocation Concepts in Kafka🔗
1. The goal: balanced, fault-tolerant partition placement🔗
When you create a topic in Kafka, you specify:
- Number of partitions (e.g., 10)
- Replication factor (e.g., 3)
This means:
For each partition, Kafka needs to create 3 copies (replicas) — one leader and two followers.
So, 10 partitions × 3 replicas = 30 total partition replicas to be distributed across the brokers.
In the example:
- Brokers = 6
- Partitions = 10
- Replication factor = 3
- Total replicas = 30
Kafka’s job is to spread those 30 replicas across the 6 brokers in a way that achieves:
- Even distribution (no broker overloaded with replicas)
- No duplicate replicas of the same partition on one broker
- Rack-level fault tolerance (if rack info is available)
2. Step 1 — Even distribution across brokers🔗
Kafka tries to assign roughly the same number of replicas per broker.
In the example:
- 30 replicas ÷ 6 brokers = 5 replicas per broker (on average).
This ensures load balance — each broker holds approximately the same amount of data and handles similar traffic.
So, no single broker becomes a hotspot for storage or leader requests.
3. Step 2 — One replica per broker per partition🔗
Kafka ensures that:
For any given partition, its replicas (leader and followers) are placed on different brokers.
Example:
- Partition 0 leader → Broker 2
- Followers → Brokers 3 and 4
Not allowed:
- Two replicas of partition 0 on the same broker.
- Leader and follower on the same broker.
This guarantees that if one broker fails, at least one copy of each partition still exists on another broker.
4. Step 3 — Assigning partition leaders (round-robin)🔗
Kafka determines which broker will hold the leader replica for each partition using a round-robin approach.
Example:
- Brokers: 0, 1, 2, 3, 4, 5
- Random starting broker: say Broker 4
Kafka loops over brokers in order to assign leaders:
| Partition | Leader Broker |
|---|---|
| 0 | 4 |
| 1 | 5 |
| 2 | 0 |
| 3 | 1 |
| 4 | 2 |
| 5 | 3 |
| 6 | 4 |
| 7 | 5 |
| 8 | 0 |
| 9 | 1 |
This round-robin pattern keeps leaders evenly distributed across all brokers.
5. Step 4 — Assigning follower replicas🔗
Once the leader brokers are chosen, Kafka assigns the follower replicas.
Rule:
Each follower is placed on a broker at an increasing offset from the leader.
Example:
-
Leader for partition 0 → Broker 4 → Follower 1 on Broker 5 → Follower 2 on Broker 0
-
Leader for partition 1 → Broker 5 → Follower 1 on Broker 0 → Follower 2 on Broker 1
This ensures:
- Replicas are spread across multiple brokers.
- No two replicas of the same partition end up on the same node.
6. Step 5 — Rack awareness (added in Kafka 0.10.0+)🔗
If your cluster defines rack information (for example, broker.rack=rackA in each broker’s config), Kafka can make smarter placement decisions.
Why this matters🔗
If an entire rack fails (for example, due to power or network outage), you don’t want all replicas of a partition to be on that rack — otherwise the partition becomes unavailable.
How Kafka handles it🔗
Instead of assigning brokers in numeric order (0, 1, 2, 3…), Kafka builds a rack-alternating broker list.
Example:
- Rack 1 → Brokers 0 and 1
- Rack 2 → Brokers 2 and 3
Normal order: 0, 1, 2, 3 Rack-alternating order: 0, 2, 1, 3
Now, if the leader for a partition is on broker 2 (rack 2):
- The first follower might be on broker 1 (rack 1).
- The second follower could be on broker 3 (rack 2) or broker 0 (rack 1), depending on the pattern.
Result: ✅ Each partition has replicas spread across different racks, ✅ So, if one rack goes down, there’s still a live replica on another rack.
This dramatically improves fault tolerance and availability.
7. Step 6 — Choosing the disk directory for each partition🔗
Once the broker for each replica is chosen, Kafka decides which disk (directory) on that broker will store the partition.
This is where the log.dirs setting comes in.
Example:
Each path represents a mount point or disk.
Kafka uses a simple rule:
Place the new partition on the directory that currently has the fewest partitions.
So if /data3/kafka is empty or least used, new partitions go there first.
Why this helps🔗
- Balances storage evenly across disks.
- If you add a new disk, Kafka automatically starts using it immediately because it initially has fewer partitions.
- This dynamic balancing happens only during partition creation, not for existing data (Kafka doesn’t move old partitions automatically).
8. Example end-to-end🔗
Let’s put it all together.
Cluster setup:
- 6 brokers (0–5)
- 10 partitions
- Replication factor = 3
- Rack-aware placement enabled
Resulting behavior:
- Kafka creates 30 replicas total.
- Distributes leaders evenly among brokers.
- Assigns followers at increasing broker offsets.
- Ensures all replicas of a partition are on different brokers.
- Uses rack info to ensure replicas are on different racks.
- On each broker, assigns replicas to the disk (directory) with the fewest partitions.
This yields: ✅ Balanced leader load ✅ Balanced follower load ✅ Fault tolerance across brokers and racks ✅ Balanced use of storage disks
9. Why this matters🔗
- Performance: Load is evenly distributed, so no broker or disk becomes a bottleneck.
- Reliability: Replicas on different brokers and racks protect against single points of failure.
- Scalability: Easy to add new brokers or disks — Kafka will place future partitions intelligently.
- Simplicity: Rules are deterministic and easy to reason about; administrators can predict where data will go.
10. Summary table🔗
| Step | Goal | Mechanism |
|---|---|---|
| 1 | Even distribution | Spread replicas evenly across brokers |
| 2 | Avoid duplication | No two replicas of same partition on one broker |
| 3 | Balanced leaders | Assign leaders round-robin |
| 4 | Balanced followers | Assign followers with offset placement |
| 5 | Rack fault tolerance | Use rack-alternating broker list |
| 6 | Disk balancing | Assign to directory with fewest partitions |
11. Key takeaway🔗
When you create a topic, Kafka’s internal partition assignment algorithm:
- Ensures balanced, resilient distribution of data and leadership across brokers,
- Accounts for rack-level redundancy if available,
- And balances disk usage across available mount points.
This allows Kafka clusters to scale predictably and stay highly available even when disks, brokers, or entire racks fail.