Spark Union Vs Unionall
Lecture 19: union vs unionAll()🔗
We can see that here we have a duplicate id
In PySpark union and unionAll behaves in the same way, both retain duplicates
But in Spark SQL when we do union it drops the duplicate records
Selecting data and unioning the same table🔗
What happens when we change the order of the columns?🔗
wrong_manager_df
actually has the wrong order of columns but still we get the union output but in a wrong column values.
If we give different number of columns an exception is thrown.
If we use unionByName then the column names on both dfs must be the same.