Smj Spill To Disk Q2
Explain How Streaming Data from Disk to Exector works for SMJ?π
πΉ Sort-Merge Join Execution (with skew + spill)π
-
Both sides sorted & partitioned
-
Table A (smaller side) β all rows sorted by key.
-
Table B (skewed side) β rows also sorted by key, but because the skewed key is huge, Spark may spill a lot of its sorted chunks to disk.
-
Executor merge phase
-
Spark creates iterators:
- One for A (fits in memory).
- One for B (some in memory, some spilled to disk).
-
When join key is encountered
-
Spark buffers all rows for that key from A (usually small enough to keep in memory).
-
Spark starts pulling rows for that key from B in batches.
- If the rows are in memory, read directly.
- If rows were spilled, load them back sequentially from disk (streaming).
-
Join output
-
For each batch of rows from B, Spark does the Cartesian product with Aβs buffered rows.
- Emits results in a streaming fashion.
- If Bβs key group is gigantic, Spark keeps pulling more batches from disk until all pairs are produced.
πΉ Key Insightπ
- Spark never tries to load all of skewed table B into memory at once.
-
Instead:
-
A is small β fully in memory.
- B is big β read batch β join with A β emit results β read next batch β repeat.
- If B is insanely large, Spark may spill intermediate join buffers again, but the logic is still stream & spill, not βload all at once.β
πΉ Analogyπ
Imagine:
- Table A = a tiny bowl of 5 apples π.
- Table B = a giant truckload of apples π.
- Spark doesnβt dump the whole truck into memory.
- Instead, it unloads one crate at a time, joins with the 5 apples from A, writes results out, and then grabs the next crate.
β So yes, exactly: executor keeps small table A in memory, and streams batches of rows for the skewed key from table B (from memory and disk), joining them incrementally.