Skip to content

Kafka Consuming Records with Specified Offset

Kafka : Consuming Records with Specific Offset🔗

Sure — let’s go through this step by step and unpack the meaning of that paragraph in detail.


1. Default Consumer Behavior🔗

When you use the Kafka Consumer API, you usually call:

consumer.poll(Duration.ofMillis(100));

Each time you call poll(), Kafka starts reading messages from the last committed offset for each partition that your consumer is assigned to.

  • The offset is a numerical position within a Kafka partition.
  • The committed offset is the last message position your consumer group marked as “processed.”

So, by default, Kafka resumes consumption exactly where it left off — ensuring no messages are skipped or reprocessed.


2. Starting from a Different Offset🔗

Sometimes, you might not want to start reading from the last committed position. You may want to:

  • Reprocess all messages from the beginning.
  • Skip to the most recent messages only.
  • Or start at an exact offset you specify.

Kafka provides APIs that allow you to manually control where the consumer begins reading.


3. seekToBeginning()🔗

If you want to read all messages in a partition — from the very first offset — you can use:

consumer.seekToBeginning(Collection<TopicPartition> partitions);

What happens:

  • This tells Kafka to move the consumer’s position to the earliest offset available in the log for each partition in the list.
  • On the next poll(), Kafka will start returning messages from the oldest record onward.

Use case example:

  • You’re debugging or replaying historical data.
  • You want to rebuild a downstream database or reprocess all records.

4. seekToEnd()🔗

If you only care about new messages that arrive after you start consuming, use:

consumer.seekToEnd(Collection<TopicPartition> partitions);

What happens:

  • This moves the consumer’s position to the latest offset in each partition.
  • On the next poll(), Kafka will start returning only new messages published after this point.

Use case example:

  • You’re monitoring live events and only want real-time updates, not historical data.

5. seek() — Moving to a Specific Offset🔗

For even more control, you can use:

consumer.seek(TopicPartition partition, long offset);

This allows you to position the consumer exactly at a specific offset.

What happens:

  • You manually tell Kafka which record position to start from.
  • On the next poll(), consumption begins from that offset.

Use case examples:

  1. Recovery scenario — if your application crashed while writing data to a file, you can restart it and seek() back to the last successfully written offset.
  2. Skipping stale data — if you detect that you’re falling behind, you can seek() ahead to a newer offset and catch up faster.
  3. Time-based replay — with the help of offsetsForTimes(), you can find the offset corresponding to a timestamp, then use seek() to start from that point.

6. When These APIs Are Useful🔗

These APIs are particularly useful when:

  • You manage offsets manually instead of relying on Kafka’s automatic commits.
  • You need custom recovery or replay logic.
  • You are building tools for data reprocessing, debugging, or time-travel (reading old events).

7. Important Note🔗

After you use seekToBeginning(), seekToEnd(), or seek(), the new position takes effect only for the next call to poll(). Kafka will not commit this new position automatically — you must commit offsets yourself (using commitSync() or commitAsync()) if you want Kafka to remember your new starting point for the next session.


Summary🔗

Method What It Does When to Use
seekToBeginning() Start reading from the earliest offset Replay all data
seekToEnd() Skip to the latest offset Only process new messages
seek() Jump to a specific offset Custom replay/recovery logic