Spark Unique Sorted Records
Lecture 21 : Unique and Sorted Records🔗
distinct()🔗
Original Data
Distinct Data
Distinct Based on certain columns
⚠️ Distinct takes no arguments we need to select the columns first and then apply distinct.
Dropping duplicate records🔗
Point to note is that the dataframe manager_df
has no changes, it just shows the records after dups have been dropped.
sort()🔗
Descending order
Sorting on multiple columns
Here first the salary is srranged in desc order then we arrange the name in asc order from those records with same salary.