Skip to content

Spark Unique Sorted Records

Lecture 21 : Unique and Sorted Records🔗

image

distinct()🔗

Original Data image

Distinct Data image

Distinct Based on certain columns image

⚠️ Distinct takes no arguments we need to select the columns first and then apply distinct.

Dropping duplicate records🔗

Point to note is that the dataframe manager_df has no changes, it just shows the records after dups have been dropped. image

sort()🔗

image

Descending order image

Sorting on multiple columns

Here first the salary is srranged in desc order then we arrange the name in asc order from those records with same salary. image