What happens when a relection happens and how is idempotency still preserved?🔗
1. The situation: a leader re-election happens🔗
Let’s start with what causes this.
A leader re-election occurs when the broker currently leading a partition:
- Crashes,
- Is shut down,
- Loses connection,
- Or is demoted by the controller.
Then, one of the in-sync replicas (ISR) becomes the new leader.
So now:
- Old leader = broker A (down)
- New leader = broker B
2. What happens to producer writes in flight?🔗
Suppose a producer was sending messages to partition P0, which had leader = broker A.
Before broker A crashed:
- The producer sent batches with sequence numbers 100–109.
- Some of them were acknowledged.
- Some were still in-flight (not acknowledged yet).
Now broker A goes down, broker B becomes the leader.
The question is:
How does broker B know what the last sequence number was for this producer, so it can continue correctly and not write duplicates?
3. The challenge🔗
- Producer’s idempotence relies on
(ProducerID, Partition, Sequence)tracking. - The old leader (broker A) kept an in-memory table tracking this information:
So, how does broker B know where to continue?
4. The solution — Leader Epoch and Log Recovery🔗
When a new leader is elected, it does log recovery based on leader epochs.
Let’s unpack these two important ideas.
a. Leader Epoch🔗
Each time a new leader is elected for a partition, Kafka increments the leader epoch (a small integer counter).
This number identifies “who was leader when this data was written”.
Example:
| Epoch | Leader Broker | Description |
|---|---|---|
| 0 | Broker A | Original leader |
| 1 | Broker B | After re-election |
| 2 | Broker C | Next re-election |
Every log entry on disk includes its leader epoch, so when replicas sync, they can tell which messages were written by which leader.
This is part of the batch header:
b. Log Recovery and Truncation🔗
When a new leader (say broker B) takes over, it:
- Loads the partition log from disk.
- Uses the last known committed offsets and leader epochs to truncate any uncommitted or divergent messages.
- Ensures the log is consistent with the rest of the in-sync replicas.
So, when broker B becomes leader, it knows exactly which messages are truly committed.
This guarantees that the new leader’s log is identical to what all in-sync replicas have.
5. Producer recovery after leader failover🔗
Now, back to the producer.
After broker A fails:
- The producer keeps retrying (because
acks=allorretries > 0). - It gets a NotLeaderForPartition error from a broker (meaning “the leader changed”).
- The producer then fetches new metadata from Kafka and learns that broker B is now the leader.
- It resends its next message batches to broker B.
6. How the new leader validates sequence numbers🔗
Broker B (the new leader) looks at the incoming batch:
(PID=1234, seq=100–109, epoch=E1)
Broker B maintains producer state on disk, stored inside the partition’s log. When it became leader, it rebuilt this state from the log segments (the “producer state snapshot”).
So it knows the latest acknowledged sequence for each producer.
Example:
Now if the producer retries an old batch (sequence 100–109), broker B detects:
“These sequence numbers are <= last seen → duplicates → ignore.”
If the producer sends a new batch (seq=110–119), broker B appends them as expected.
✅ Result:
- Duplicates avoided
- Order preserved
- Producer resumes seamlessly after leader re-election
7. What if the producer itself restarts after failover?🔗
If the producer crashes and restarts, it loses its local sequence counters.
That’s where the producer epoch comes in.
Producer Epoch = version number for the producer session.🔗
Each time a producer with the same transactional.id restarts:
- The transaction coordinator assigns it a new producer epoch.
- The new epoch signals to brokers that this is a new session.
Brokers then:
- Accept messages from the new epoch,
- Reject any late or duplicate messages from old epochs.
So, if an old batch (epoch=1) arrives after a restart (new epoch=2), the broker discards it.
8. How all these pieces fit together🔗
| Mechanism | Controlled by | Purpose |
|---|---|---|
| Producer ID (PID) | Broker assigns | Identifies producer instance |
| Sequence numbers | Producer assigns | Ordered numbering of messages per partition |
| Producer epoch | Broker increments | Distinguishes new producer session after restart |
| Leader epoch | Broker increments per partition | Distinguishes new partition leader during re-election |
| Producer state snapshot | Broker (on disk) | Tracks last sequence for each PID |
| Log truncation | New leader broker | Removes uncommitted data from previous leader |
| Metadata refresh | Producer | Learns new leader after failover |
Together, these ensure that idempotence and exactly-once semantics remain intact even across leader failures.
9. Step-by-step: EOS through leader failover🔗
Let’s walk through a real example.
| Step | Action | What Happens |
|---|---|---|
| 1 | Broker A is leader | Producer sends (PID=1234, seq=100–109) |
| 2 | Broker A crashes mid-write | Some messages written, some pending |
| 3 | Broker B becomes leader | Rebuilds state, truncates uncommitted data |
| 4 | Producer retries batch | Still uses (PID=1234, seq=100–109) |
| 5 | Broker B checks snapshot | Sees duplicates → ignores |
| 6 | Producer sends new batch (seq=110–119) | Broker B accepts and appends |
| ✅ | Result | No duplicates, no lost messages, consistent order |
10. Summary: Kafka’s idempotence + leader epochs🔗
| Concept | Description | Role in EOS |
|---|---|---|
| Producer ID (PID) | Unique ID per producer | Identify source of messages |
| Sequence Number | Increment per partition | Detect duplicates, maintain order |
| Producer Epoch | Increment per restart | Ignore stale messages after restart |
| Leader Epoch | Increment per leader re-election | Ensure log consistency after failover |
| Producer State Snapshot | Stored on broker | Recover last sequence after failover |
| Retry behavior | Retains same sequence for same batch | Prevent duplicates across retries or failovers |
✅ In short:
- Offsets are still assigned by the broker.
- Sequence numbers are assigned by the producer.
- When the leader changes, the new leader rebuilds its producer state from the log and uses the leader epoch to stay consistent.
- If the producer retries the same batch, the sequence numbers remain the same, and the new leader detects duplicates and discards them.
- Producer epoch protects against old messages from previous producer sessions.
Together, this combination of PID, sequence numbers, producer epoch, and leader epoch allows Kafka to maintain exactly-once semantics — even across broker failovers and leader re-elections.