Skip to content

Spark Dynamic Partition Pruning

Lecture 32 : Dynamic Partition Pruning🔗

image

In below code we have a filter applied to select only 19th April 2023 data,

image

Below we can see that only one file that is for 19th April 2023 is read, not all of them.

image

image

DPP with 2 tables🔗

image

Partition pruning does not happen on first table but will happen on table 2. Dynamic Partition Pruning helps us to update filter on runtime.

Two conditions:

  • Data should be partitioned.
  • 2nd Table should be broadcasted.

image

image

Without Dynamic Partition Pruning

Total 123 files read from first table not one like previous case.

image

With Dynamic Partition Pruning

image

image

The smaller dimdate table is broadcasted and hash join performed. Only 3 files are read this time.

image

At runtime a subquery is run...

image

image

Now because of the runtime filter only 4 partitions are read/scanned.