Streaming and Materialized Views in Databricks SQL🔗
There is no auto option for incremental load while developing streaming tables and views in SQL, hence additional option has to be provided.
CREATE OR REFRESH STREAMING TABLE dev.bronze.orders_st
AS
SELECT * FROM
STREAM read_files(
"/Volumes/dev/bronze/landing/input/",
format => 'csv',
includeExistingFiles => false
)
CREATE OR REPLACE MATERIALIZED VIEW dev.bronze.ordes_mv
-- SCHEDULE EVERY 4 HOURS
AS
SELECT Country,sum(UnitPrice) as agg_total_price FROM dev.bronze.orders_st
group by Country
Replacing the materialized view does not refresh entire data, just incrementally. The group aggregate is also calculated incrementally.
CREATE OR REPLACE MATERIALIZED VIEW dev.bronze.ordes_mv
-- SCHEDULE EVERY 4 HOURS
AS
SELECT Country,sum(UnitPrice) as agg_total_price FROM dev.bronze.orders_st
group by Country
Every run of the query on Serverless Warehouse spins up DLT job in background.