Skip to content

Shuffle = Map Phase + Reduce PhaseπŸ”—

MAP PHASE (Shuffle Write)        REDUCE PHASE (Shuffle Read)
---------------------------      ----------------------------
Process partitions               Fetch shuffled data
↓                                ↓
Local aggregation (optional)     Merge data from all mappers
↓                                ↓
Partition by key                 Group by key
↓                                ↓
Write shuffle files              Final aggregation

1. Map Phase (Shuffle Write)πŸ”—

Runs on executors where data already exists.

StepsπŸ”—

a. Process Input PartitionπŸ”—

Example:

("A",1), ("A",2), ("B",3)

b. Local Aggregation (Map-Side Combine)πŸ”—

Only for operations like reduceByKey:

("A",1), ("A",2) β†’ ("A",3)

Important:

  • Happens within a single partition only
  • Reduces data before shuffle

c. PartitioningπŸ”—

Spark decides which reducer gets which key:

partition = hash(key) % numPartitions

d. Write Shuffle FilesπŸ”—

  • Data written to disk
  • Organized per target reducer

2. Reduce Phase (Shuffle Read)πŸ”—

Runs on executors responsible for final output.


a. Fetch DataπŸ”—

Each reducer:

  • Pulls its partition data from all map tasks
  • This is where network transfer happens

b. Merge + Group by KeyπŸ”—

Example incoming data:

("A",3), ("A",9)

Grouped as:

("A", [3,9])

c. Final AggregationπŸ”—

("A",3) + ("A",9) β†’ ("A",12)

Critical Question:πŸ”—

If we already got ("A",3), why do we need reducer?πŸ”—

Because that "A",3 is not global, it is only local to one partition.


Example Across PartitionsπŸ”—

Partition 1πŸ”—

("A",1), ("A",2) β†’ ("A",3)

Partition 2πŸ”—

("A",4), ("A",5) β†’ ("A",9)

After Map PhaseπŸ”—

Partition 1 β†’ ("A",3)
Partition 2 β†’ ("A",9)

At this point:

  • Same key exists in multiple partitions
  • Results are partial

Reducer’s JobπŸ”—

Bring all "A" values together:

("A",3) + ("A",9) β†’ ("A",12)

This requires:

  • Shuffle (to co-locate keys)
  • Reduce phase (to finalize aggregation)

What If We Skip Reduce Phase?πŸ”—

You would get:

("A",3)
("A",9)

Which is:

  • Incorrect
  • Not a true aggregation

Key InsightπŸ”—

Stage Scope Role
Map-side combine Within partition Partial aggregation
Reduce phase Across partitions Final aggregation

Why This Design MattersπŸ”—

Map PhaseπŸ”—

  • Reduces data early
  • Minimizes shuffle size

Reduce PhaseπŸ”—

  • Ensures correctness
  • Combines distributed partial results

Final TakeawayπŸ”—

Map-side aggregation reduces data locally, but reducers are required to combine results across partitions and produce the final correct output.