Spark Job Stage Task
Lecture 18: Job, Stage and Tasks🔗
- One Application is created.
- One job is created per action.
- One stage is defined for every transformation like filter.
- Task is the actually activity on the data that's happening.
Example of Job,Action and Task🔗
Complete flow diagram🔗
Every job has minimum one stage and one task.
Repartition to filter is one job because we dont hit an action in between.
Every wide dependency transformation has its own stage. All narrow dependency transformations come in one stage as a DAG.
How do tasks get created? [Read and Write Exchange]🔗
- The repartition stage actually is a wide dependency transformation and creates two partitions from one, its a Write exchange of data.
- Now the filter and select stage reads this repartitioned data(Read exchange) and filter creates two tasks because we have two partitions.
- Next we need to find out how many folks earn > 90000 and age > 25 so we need to do a groupby that's a wide dependency transformation and it creates another stage. By default there are 200 partitions created.
- So some partitions may have data and some wont.