Operator Chaining and Slot Parallelism🔗

Here you go — a clear explanation of how Flink chains operators, how slots are assigned, and a comparison with Spark executor cores.

1. How Flink Chains Operators🔗

Flink tries to optimize performance by operator chaining.

What is chaining?🔗

If multiple operators can run in the same thread, Flink merges them into one physical task.

Example:

source → map → filter → flatMap → sink

If they share the same parallelism and no shuffle is needed, Flink chains them into one task, executed inside a single slot.

When are operators chained?🔗

Operators are chained when:

They have same parallelism
They don’t require data redistribution (no keyBy, window, join, shuffle)
They run in the same task slot
They are compatible for chaining

Example of chainable operators: map, flatMap, filter, process, async I/O

When are operators NOT chained?🔗

Operators are not chained when:

You use keyBy (introduces a network shuffle)
You use windowing (watermark alignment)
Join operations
Repartitioning operators (rebalance, broadcast, global)
Parallelism changes between operators

These break the chain and create a new task.

2. How Slots Are Assigned (Important Concept)🔗

Main rule:🔗

One slot holds one operator chain (one pipeline of tasks).

Example:

TM1 slot1: source → map → filter
TM1 slot2: keyBy → window → reduce
TM2 slot1: keyBy → window → reduce
TM2 slot2: sink

Slots DO NOT correspond to individual operators.🔗

If parallelism = 4:

Each operator chain runs 4 times
Flink places those 4 chains into available slots across all TMs

If you have a cluster with:

3 Task Managers
Each with 2 slots
Total slots = 6

You cannot run a job with parallelism > 6.

Slots are the capacity limit for subtasks.

3. Comparison with Spark Executors🔗

Flink and Spark look similar at first, but they are fundamentally different.

Spark Executor (batch or micro-batch)🔗

Spark Executor = JVM process Inside it:

N CPU cores
M memory
Runs multiple task threads concurrently

Executors are long-running, but they process batches, not continuous events.

Flink Task Manager (true streaming)🔗

Task Manager = JVM process Inside it:

N slots (logical)
Slots run continuous streaming subtasks
Each subtask is like a small dedicated worker that never stops

Slots ≠ cores Slots run operator chains continuously.

Key Differences Table🔗

Concept	Spark	Flink
Processing model	Micro-batch (mostly)	True continuous streaming
Execution units	Executors and tasks	Task Managers and slots
Task lifetime	Short-lived tasks	Long-running subtasks
Cores	Dedicated per executor	Shared across all slots
Operator chaining	Rare	Core optimization
Backpressure	Coarse	Very fine-grained
State	Mostly external	Native, large, fault-tolerant state
Checkpoints	RDD lineage recompute	Consistent snapshots

4. Putting it all together (Process Flow)🔗

Example job:🔗

source → map → filter → keyBy → window → reduce → sink

Step-by-step:🔗

Flink chains map + filter (same parallelism)
Parallelism triggers 4 subtasks:
4 chain subtasks for source → map → filter
4 subtasks for keyBy → window → reduce
4 subtasks for sink
Total subtasks = 12
These 12 subtasks are placed into available slots
Each subtask runs continuously and maintains state

5. Short Summary🔗

Operators are chained when possible to reduce overhead.
One slot = one chain = one subtask, not one operator.
Spark executors run many short tasks; Flink slots run long-lived pipelines.
Slots determine parallelism; cores determine actual compute power.

You get 4 subtasks for each operator (or operator chain) because you set parallelism = 4.

So Flink creates:

4 parallel instances of the source→map→filter chain
4 parallel instances of the keyBy→window→reduce chain
4 parallel instances of the sink

This gives 12 total subtasks, not 3.

Why? Because parallelism applies per operator (or operator chain), not per whole flow🔗

In Flink:

The program is a DAG of operators
Each operator runs in parallel
Parallelism defines how many copies of each operator you run

Think of it like this:🔗

If parallelism = 4:

You are telling Flink: “Run 4 copies of every operator independently.”

So even if you write:

stream
  .map(...)
  .filter(...)
  .keyBy(...)
  .window(...)
  .reduce(...)
  .sink(...)

Flink actually constructs parallel pipelines.

Let's visualize it🔗

Logical code (what you write):🔗

source → map → filter → keyBy → window → reduce → sink

Physical execution (parallelism = 4):🔗

Step 1: Chainable operators🔗

source, map, filter → get chained This becomes one operator chain, but still parallel.

Step 2: Flink creates 4 copies (subtasks) of this chain🔗

Chain 1
Chain 2
Chain 3
Chain 4

These run in 4 different slots (distributed across Task Managers).

Why do keyBy → window → reduce also get 4 subtasks?🔗

Because they also have parallelism = 4.

They are unchained from the previous chain because keyBy requires a network shuffle.

So Flink generates:

4 separate receiving tasks
Each gets a partition of keyed data

Then why does sink also have 4 subtasks?🔗

Same reason: parallelism = 4.

Unless you explicitly set:

sink.setParallelism(1)

the sink inherits parallelism from upstream.

Key Idea🔗

Parallelism applies per-operator, not per-entire-pipeline.

So if your DAG has 3 physical operators (chains), and each has parallelism = 4:

Operator or chain	Parallelism	Subtasks
Source→Map→Filter	4	4
KeyBy→Window→Reduce	4	4
Sink	4	4

Total = 12 subtasks

Analogy🔗

Imagine a factory line:

You have 4 workers doing Step 1 (source→map→filter)
4 workers doing Step 2 (keyBy→window→reduce)
4 workers doing Step 3 (sink)

Even though it's one “flow”, each step needs 4 workers to keep up.

This is exactly how Flink distributes work.