Skip to content

Spark Union Vs Unionall

Lecture 19: union vs unionAll()🔗

image

We can see that here we have a duplicate id image

In PySpark union and unionAll behaves in the same way, both retain duplicates image

But in Spark SQL when we do union it drops the duplicate records image

image

Selecting data and unioning the same table🔗

image

What happens when we change the order of the columns?🔗

wrong_manager_df actually has the wrong order of columns but still we get the union output but in a wrong column values. image

If we give different number of columns an exception is thrown. image

If we use unionByName then the column names on both dfs must be the same. image