Skip to content

Streaming and Materialized Views in Databricks SQL🔗

There is no auto option for incremental load while developing streaming tables and views in SQL, hence additional option has to be provided.

CREATE OR REFRESH STREAMING TABLE dev.bronze.orders_st
AS
SELECT * FROM 
STREAM read_files(
  "/Volumes/dev/bronze/landing/input/",
  format => 'csv',
  includeExistingFiles => false
)
CREATE OR REPLACE MATERIALIZED VIEW dev.bronze.ordes_mv 
-- SCHEDULE EVERY 4 HOURS
AS
SELECT Country,sum(UnitPrice) as agg_total_price FROM dev.bronze.orders_st 
group by Country

Replacing the materialized view does not refresh entire data, just incrementally. The group aggregate is also calculated incrementally.

CREATE OR REPLACE MATERIALIZED VIEW dev.bronze.ordes_mv 
-- SCHEDULE EVERY 4 HOURS
AS
SELECT Country,sum(UnitPrice) as agg_total_price FROM dev.bronze.orders_st 
group by Country

Every run of the query on Serverless Warehouse spins up DLT job in background.

image