Skip to content

Task Manager in FlinkπŸ”—

A Task Manager in Apache Flink is a worker process that performs the actual data processing. If the Job Manager is the brain, the Task Manager is the muscle.

Here is a clear, complete explanation:


What is a Task Manager?πŸ”—

A Task Manager (TM) is a JVM process that:

  • Runs your operators (map, filter, window, join, aggregate, etc.)
  • Stores state for stateful streaming jobs
  • Manages network buffers
  • Processes data from sources and sends results to sinks

Every Task Manager contributes slots (parallel execution units) to the cluster.


What does the Task Manager do?πŸ”—

1. Executes tasks assigned by the Job ManagerπŸ”—

Each Flink operator (map, flatMap, join, window, etc.) becomes one or more subtasks based on parallelism.

Task Manager runs these subtasks.

Example: If parallelism = 4, you get 4 instances of each operator spread across available TMs.


2. Manages task slotsπŸ”—

Each Task Manager has a fixed number of slots.

For example, a TM with:

  • 4 CPU cores
  • 4 slots

can run 4 parallel subtasks at the same time.

A slot is not just a CPU coreβ€”it's a share of the TM's JVM resources (CPU, memory, network buffers).


3. Holds state for stateful operationsπŸ”—

Task Manager stores operator state:

  • Keyed state
  • Window state
  • Timers
  • RocksDB or heap state

When checkpoints happen:

  • TM snapshots this state
  • Sends it to the checkpoint storage

This is what enables exactly-once processing.


4. Handles network data exchangeπŸ”—

TMs are responsible for:

  • Reading from Kafka / Kinesis / files
  • Sending records to downstream operators
  • Shuffling data between subtasks
  • Handling backpressure

Backpressure signals also originate from TMs and propagate upstream.


5. Restarts tasks after failureπŸ”—

When a failure occurs:

  • JM restarts tasks
  • TM reloads state from the last checkpoint
  • Processing resumes

TMs are stateless in terms of job coordination but stateful in terms of operator data.


Memory and CPU in Task ManagerπŸ”—

Each TM has:

  • Heap memory
  • Off-heap memory
  • Network memory
  • Managed memory
  • RocksDB state backend memory (if used)

Memory config is critical for stable performance.


How many Task Managers do you typically have?πŸ”—

Depends on your workload, but usually:

  • Many Task Managers (10s to 100s)
  • Each with multiple slots (2 to 16)
  • Total parallelism = number of available slots

Example:

  • 10 Task Managers Γ— 4 slots each = 40 parallel workers

AnalogyπŸ”—

Think of a Task Manager as a kitchen worker in a restaurant.

  • The Job Manager (head chef) tells them what tasks to do.
  • Each slot is one hands-on workstation (chopping, frying, baking, etc.).
  • The Task Manager performs the work: cooking the dishes, maintaining prep, storing ingredients (state), and communicating with others.

Short summaryπŸ”—

A Task Manager in Flink is the worker responsible for running operators, storing state, managing memory, handling network communication, and executing the actual computations of your streaming job. The number of Task Managers and their slots determines how much parallelism and throughput your Flink cluster can handle.