How does Producer Guarantee Exactly Once Semantics in Kafka?🔗
Excellent question — this gets to the core of Kafka’s delivery guarantees and offset management model.
Let’s go step-by-step: first, we’ll clarify who controls offsets, and then explain how the producer achieves exactly-once semantics (EOS) using producer IDs, epochs, and transactional commits.
🧩 1. Who maintains offsets — broker or producer?🔗
✅ Offsets are maintained by the broker.🔗
Let’s unpack that clearly.
Producer’s view🔗
When a producer sends messages to a Kafka topic-partition, it does not know the offsets in advance. Offsets are assigned by the partition leader broker when it writes the batch to its log.
Process:
- Producer creates a batch of messages.
- It sets the base offset = 0 temporarily (a placeholder).
- Sends the batch to the leader broker for that partition.
- The leader writes the batch to the partition’s log file, assigning real offsets sequentially.
-
The broker then returns a ProduceResponse to the producer that includes:
-
The base offset of the first record.
- Any error codes.
So, the producer never generates offsets — the broker does.
Consumer’s view🔗
For consumers, offsets represent their position in the log — how far they’ve read.
- Consumers read offsets assigned by brokers.
- They commit offsets (to Kafka or external store) to mark progress.
So, offsets are a broker-managed sequence that both producers and consumers depend on.
⚙️ 2. How Kafka assigns offsets (summary)🔗
| Stage | Who assigns | Description |
|---|---|---|
| Produce | Broker (leader) | Assigns offsets sequentially per partition as batches are appended. |
| Consume | Broker provides | Consumer fetch requests specify the starting offset, broker returns records and their offsets. |
| Commit | Consumer | Saves last processed offset to Kafka (__consumer_offsets topic). |
Offsets are strictly increasing and immutable within a partition — ensuring total order for that partition.
🧠 3. How producers ensure exactly-once semantics (EOS)🔗
Kafka’s exactly-once semantics were introduced in Kafka 0.11 through idempotent producers and transactions.
Let’s break this into two layers:
Layer 1 — Idempotent producer (no duplicates)🔗
Normally, when a producer retries (due to a temporary network failure), the same message could be written multiple times.
Example:
- Producer sends message M1.
- Broker receives it, writes it, but the response back to producer is lost.
- Producer retries M1.
- Without EOS, the broker writes M1 again — duplicates.
To fix this, Kafka introduced the idempotent producer, which ensures:
"No message will be written twice to the same topic-partition, even if retries happen."
🧩 How idempotent producers work🔗
Each producer is assigned three key identifiers:
| Field | Description |
|---|---|
| Producer ID (PID) | Unique 64-bit ID assigned by the broker when producer starts. |
| Producer Epoch | Incremented when producer restarts (used to detect old sessions). |
| Sequence Number | Incremented for each record sent to a partition. |
Each partition the producer writes to has its own sequence counter.
🔁 Example🔗
| Message | PID | Partition | Sequence |
|---|---|---|---|
| M1 | 500 | 0 | 0 |
| M2 | 500 | 0 | 1 |
| M3 | 500 | 0 | 2 |
If a retry occurs, and M2 is sent again with the same PID and sequence, the broker checks:
“Have I already seen (PID=500, partition=0, seq=1)?”
If yes → duplicate ignored. If no → accept and append.
Thus, retries no longer create duplicates.
✅ Key guarantees from idempotent producer🔗
- Each (PID, partition, sequence) tuple is unique and ordered.
- The broker uses this metadata to detect and ignore duplicate writes.
- This works automatically when:
Layer 2 — Transactions (atomic multi-partition writes)🔗
Idempotent producers guarantee no duplicates per partition, but what if a producer writes to multiple partitions or topics as part of one logical operation?
Example:
We want either both messages committed, or neither (atomicity).
This is achieved with Kafka transactions.
🧩 How transactions work🔗
Each transaction groups multiple produce requests into a single atomic unit.
Steps:
- Producer begins a transaction:
or rolls back:
4. The broker uses a transaction coordinator (one per producer) to manage state. 5. Kafka marks the affected records as:- Committed (visible to consumers)
- Aborted (hidden from consumers)
🧱 Transaction metadata on broker🔗
Brokers store transaction state in an internal topic:
It contains info about:
- Transaction IDs
- Producers’ current epochs
- Whether transactions are ongoing, committed, or aborted
Consumers that are transactional-aware (using isolation level read_committed) only see messages from committed transactions.
⚖️ Combining the two layers🔗
| Feature | Provided by | Ensures |
|---|---|---|
| Idempotence | Producer ID + sequence numbers | No duplicates per partition |
| Transactions | Transaction coordinator | Atomicity across multiple partitions |
| Read committed isolation | Consumers | Consumers only see committed data |
Together, these provide exactly-once semantics (EOS) end-to-end.
✅ Exactly-once delivery flow (step-by-step)🔗
| Step | Action | Who handles it | Guarantee |
|---|---|---|---|
| 1 | Producer sends messages with PID + seq | Producer → Broker | Broker detects duplicates |
| 2 | Broker appends unique messages | Broker | No duplicate writes |
| 3 | Producer commits transaction | Producer → Coordinator | Atomic commit across partitions |
| 4 | Consumer reads messages | Consumer | Sees each committed record exactly once |
💡 4. Putting it all together🔗
| Concept | Controlled by | Purpose |
|---|---|---|
| Offsets | Broker | Identify message position in partition log |
| Producer ID, Epoch, Sequence | Producer + Broker | Deduplicate writes (idempotence) |
| Transaction Coordinator | Broker (special role) | Group multiple writes into a single atomic commit |
| Consumer offsets | Consumer (stored in __consumer_offsets) |
Track progress per consumer group |
| Exactly-once semantics (EOS) | Producer + Broker + Consumer | Ensure one logical event = one delivery, no duplicates, no loss |
🔐 5. Summary🔗
| Question | Answer |
|---|---|
| Who assigns offsets? | The broker assigns offsets sequentially per partition. Producers do not set them. |
| Who maintains offsets? | Brokers maintain partition offsets; consumers maintain their read offsets separately. |
| How does the producer ensure exactly-once? | By using idempotence (PID + sequence numbers) and transactions (atomic commits). |
| How are duplicates prevented? | Broker checks (PID, partition, sequence) to detect and ignore duplicates. |
| How are multi-partition writes atomic? | Kafka’s transaction coordinator manages commit or abort for all partitions in the transaction. |
✅ In simple terms:
- Offsets are always assigned and managed by the broker.
-
Producers achieve exactly-once semantics using a combination of:
-
Idempotent producers (no duplicates),
- Transactions (atomic commits), and
- Read-committed consumers (see only committed data).
Together, these make Kafka capable of true end-to-end exactly-once delivery — even across failures, retries, and multi-partition writes.