Processing vs Event Time in Flinkπ
1. Processing Time (System Time)π
Processing time means:
βUse the time of the machine where the event is being processed.β
So the timestamp comes from the system clock of the Task Manager.
Characteristicsπ
- Fastest
- No need for watermarks
- No waiting for late events
- Sensitive to system load, network delays, backpressure
Exampleπ
If an event arrives at 12:00:10 PM system time, thatβs its processing time β even if that event was actually produced at 11:59:55 AM.
Use casesπ
- Basic dashboards
- Monitoring where slight inaccuracies are okay
- High-throughput counters
- No out-of-order handling needed
2. Event Timeπ
Event time means:
βUse the timestamp inside the event β when it actually occurred in the real world.β
Flink extracts the timestamp from the event using:
Key characteristicsπ
- Handles late and out-of-order events
- Uses watermarks to decide when to close windows
- Most accurate
- Slightly slower (waiting for late events)
Exampleπ
Suppose a sensor generated data at 11:59:55 AM but reached Flink at 12:00:10 PM.
- Event time = 11:59:55
- Processing time = 12:00:10
Event time gives βreal-world correctnessβ instead of βarrival-time correctness.β
3. Why event time is criticalπ
Real streaming data is almost always out-of-order.
Events can arrive late due to:
- Network delays
- Clock skew
- Mobile devices going offline
- Batch uploads
- Retries in Kafka
If you use processing time, windows close too early.
If you use event time, windows only close when watermarks say βwe have seen almost all events.β
4. Watermarks (core to event time)π
Event time requires watermarks β these tell Flink:
βNo more events older than this timestamp should arrive.β
Watermarks = progress indicators of event time.
Example:
If watermark = 12:00:00, Flink believes:
- All events with timestamps β€ 12:00:00 have arrived
- Safe to close windows up to 12:00:00
Late events after this watermark go to late arrival handling (drop or side outputs).
5. Event Time vs Processing Time: Side-by-Side Comparisonπ
| Feature | Processing Time | Event Time |
|---|---|---|
| Timestamp source | System clock | Inside event |
| Handles out-of-order? | No | Yes |
| Accuracy | Low | High |
| Throughput | High | Medium |
| Requires watermarks? | No | Yes |
| Windows | Close based on arrival time | Close based on event time |
| Late event support | Not possible | Supported |
| Use cases | Monitoring, quick stats | Analytics, correctness-critical |
6. Example with a windowπ
Letβs say you have a 1-minute window.
If using processing time:π
Window 12:00:00β12:01:00 closes at 12:01:00 system time. If an event with timestamp 11:59:50 arrives at 12:00:50, it goes into the running window. If it arrives at 12:01:05, it is too late.
If using event time:π
Window 12:00:00β12:01:00 closes when watermark passes 12:01:00.
If watermark = event_time β 5 seconds:
- Out-of-order events up to 5 seconds late are accepted
- Windows close correctly based on event timestamps
7. When to use what?π
Use event time when:π
- Data comes late, out of order (common in Kafka, IoT, logs)
- Need accurate results (analytics, billing, fraud detection)
- You use windows, joins, aggregations
Use processing time when:π
- You need ultra-low latency
- Out-of-order is not a concern
- Simple transformations, monitoring, counters
8. One-line summaryπ
- Processing time = use machine clock β fast but inaccurate
- Event time = use timestamp inside event β accurate, requires watermarks