Operator Chaining and Slot Parallelism๐
Here you go โ a clear explanation of how Flink chains operators, how slots are assigned, and a comparison with Spark executor cores.
1. How Flink Chains Operators๐
Flink tries to optimize performance by operator chaining.
What is chaining?๐
If multiple operators can run in the same thread, Flink merges them into one physical task.
Example:
If they share the same parallelism and no shuffle is needed, Flink chains them into one task, executed inside a single slot.
When are operators chained?๐
Operators are chained when:
- They have same parallelism
- They donโt require data redistribution (no keyBy, window, join, shuffle)
- They run in the same task slot
- They are compatible for chaining
Example of chainable operators: map, flatMap, filter, process, async I/O
When are operators NOT chained?๐
Operators are not chained when:
- You use
keyBy(introduces a network shuffle) - You use windowing (watermark alignment)
- Join operations
- Repartitioning operators (
rebalance,broadcast,global) - Parallelism changes between operators
These break the chain and create a new task.
2. How Slots Are Assigned (Important Concept)๐
Main rule:๐
One slot holds one operator chain (one pipeline of tasks).
Example:
TM1 slot1: source โ map โ filter
TM1 slot2: keyBy โ window โ reduce
TM2 slot1: keyBy โ window โ reduce
TM2 slot2: sink
Slots DO NOT correspond to individual operators.๐
If parallelism = 4:
- Each operator chain runs 4 times
- Flink places those 4 chains into available slots across all TMs
If you have a cluster with:
- 3 Task Managers
- Each with 2 slots
- Total slots = 6
You cannot run a job with parallelism > 6.
Slots are the capacity limit for subtasks.
3. Comparison with Spark Executors๐
Flink and Spark look similar at first, but they are fundamentally different.
Spark Executor (batch or micro-batch)๐
Spark Executor = JVM process Inside it:
- N CPU cores
- M memory
- Runs multiple task threads concurrently
Executors are long-running, but they process batches, not continuous events.
Flink Task Manager (true streaming)๐
Task Manager = JVM process Inside it:
- N slots (logical)
- Slots run continuous streaming subtasks
- Each subtask is like a small dedicated worker that never stops
Slots โ cores Slots run operator chains continuously.
Key Differences Table๐
| Concept | Spark | Flink |
|---|---|---|
| Processing model | Micro-batch (mostly) | True continuous streaming |
| Execution units | Executors and tasks | Task Managers and slots |
| Task lifetime | Short-lived tasks | Long-running subtasks |
| Cores | Dedicated per executor | Shared across all slots |
| Operator chaining | Rare | Core optimization |
| Backpressure | Coarse | Very fine-grained |
| State | Mostly external | Native, large, fault-tolerant state |
| Checkpoints | RDD lineage recompute | Consistent snapshots |
4. Putting it all together (Process Flow)๐
Example job:๐
Step-by-step:๐
-
Flink chains map + filter (same parallelism)
-
Parallelism triggers 4 subtasks:
-
4 chain subtasks for source โ map โ filter
- 4 subtasks for keyBy โ window โ reduce
-
4 subtasks for sink
-
Total subtasks = 12
-
These 12 subtasks are placed into available slots
-
Each subtask runs continuously and maintains state
5. Short Summary๐
- Operators are chained when possible to reduce overhead.
- One slot = one chain = one subtask, not one operator.
- Spark executors run many short tasks; Flink slots run long-lived pipelines.
- Slots determine parallelism; cores determine actual compute power.
You get 4 subtasks for each operator (or operator chain) because you set parallelism = 4.
So Flink creates:
- 4 parallel instances of the sourceโmapโfilter chain
- 4 parallel instances of the keyByโwindowโreduce chain
- 4 parallel instances of the sink
This gives 12 total subtasks, not 3.
Why? Because parallelism applies per operator (or operator chain), not per whole flow๐
In Flink:
- The program is a DAG of operators
- Each operator runs in parallel
- Parallelism defines how many copies of each operator you run
Think of it like this:๐
If parallelism = 4:
- You are telling Flink: โRun 4 copies of every operator independently.โ
So even if you write:
Flink actually constructs parallel pipelines.
Let's visualize it๐
Logical code (what you write):๐
Physical execution (parallelism = 4):๐
Step 1: Chainable operators๐
source, map, filter โ get chained This becomes one operator chain, but still parallel.
Step 2: Flink creates 4 copies (subtasks) of this chain๐
These run in 4 different slots (distributed across Task Managers).
Why do keyBy โ window โ reduce also get 4 subtasks?๐
Because they also have parallelism = 4.
They are unchained from the previous chain because keyBy requires a network shuffle.
So Flink generates:
- 4 separate receiving tasks
- Each gets a partition of keyed data
Then why does sink also have 4 subtasks?๐
Same reason: parallelism = 4.
Unless you explicitly set:
the sink inherits parallelism from upstream.
Key Idea๐
Parallelism applies per-operator, not per-entire-pipeline.
So if your DAG has 3 physical operators (chains), and each has parallelism = 4:
| Operator or chain | Parallelism | Subtasks |
|---|---|---|
| SourceโMapโFilter | 4 | 4 |
| KeyByโWindowโReduce | 4 | 4 |
| Sink | 4 | 4 |
Total = 12 subtasks
Analogy๐
Imagine a factory line:
- You have 4 workers doing Step 1 (sourceโmapโfilter)
- 4 workers doing Step 2 (keyByโwindowโreduce)
- 4 workers doing Step 3 (sink)
Even though it's one โflowโ, each step needs 4 workers to keep up.
This is exactly how Flink distributes work.