Skip to content

Deletion Vectors Delta Lake

Deletion Vectors in Delta LakeπŸ”—

πŸ”Ή What are Deletion Vectors?πŸ”—

Normally, when you delete rows in a Delta table, Delta rewrites entire Parquet files without those rows.

This is called copy-on-write β†’ expensive for big tables.

Deletion Vectors (DVs) are a new optimization:

Instead of rewriting files, Delta just marks the deleted rows with a bitmap (a lightweight β€œmask”). The data is still physically there, but readers skip the β€œdeleted” rows.

Think of it like putting a red X mark ❌ on rows instead of erasing them immediately.

πŸ”Ή Why are they useful?πŸ”—

πŸš€ Much faster deletes/updates/merges (because files aren’t rewritten).

⚑ Less I/O β†’ good for big data tables.

βœ… Efficient for streaming + time travel.

Example Without deletion vectorsπŸ”—

  1. Create a sales table
CREATE TABLE dev.bronze.sales as 
select * from 
read_files(
  'dbfs:/databricks-datasets/online_retail/data-001/data.csv',
  header => true,
  format => 'csv'
)
  1. Set Deletion Vectors false
ALTER TABLE dev.bronze.sales SET TBLPROPERTIES (delta.enableDeletionVectors = false);

image

  1. Delete some rows
-- delete InvoiceNo = '540644'
delete from dev.bronze.sales
where InvoiceNo = '540644'
  1. Describe history

image

Observe that all rows (65000+) are removed and rewritten.

Example with deletion vectorsπŸ”—

image

We can see that one deletion vector is added no files are rewritten.

Running optimize would remove those files / records.