Metrics Views in Databricksπ
Example Use Caseπ
How to Create Metrics Views in Databricks?π
Measures are quantitative values like sales, revenue etc.
Dimensions are used to support measures like year, month etc. Eg: sales per year, revenue per month
Any joins in metric views are left outer joins.
Metric views are defined in yaml format
Eg:
version: 0.1
source: dev.bronze.orders_raw
dimensions:
- name: Order Key
expr: o_orderkey
- name: Customer Key
expr: o_custkey
- name: Order Status
expr: o_orderstatus
- name: Order Status Readable
expr: >
case
when o_orderstatus = 'O' then 'Open'
when o_orderstatus = 'F' then 'Fulfilling'
when o_orderstatus = 'P' then 'Processing'
end
- name: Order Date
expr: o_orderdate
- name: Order Year
expr: DATE_TRUNC('year',o_orderdate)
- name: Order Priority
expr: o_orderpriority
- name: Clerk
expr: o_clerk
- name: Ship Priority
expr: o_shippriority
measures:
- name: Total Price
expr: SUM(o_totalprice)
- name: Average Total Price
expr: AVG(o_totalprice)
- name: Count of Orders
expr: COUNT(DISTINCT(o_orderkey))
Querying the metric view
Window Functionsπ
measures:
- name: Total Price
expr: SUM(o_totalprice)
- name: Average Total Price
expr: AVG(o_totalprice)
- name: Count of Orders
expr: COUNT(DISTINCT(o_orderkey))
- name: Total Orders Current Year
expr: COUNT(DISTINCT o_orderkey)
window:
- order: Order Year
range: current
semiadditive: last
- name: Total Orders Last Year
expr: COUNT(DISTINCT o_orderkey)
window:
- order: Order Year
range: trailing 1 year
semiadditive: last
Final Metric Views with Joinsπ
version: 0.1
source: dev.bronze.orders_raw
joins:
- name: cust_dim
source: dev.bronze.customer_raw
on: cust_dim.c_custkey = source.o_custkey
dimensions:
- name: Order Key
expr: o_orderkey
- name: Customer Key
expr: o_custkey
- name: Order Status
expr: o_orderstatus
- name: Order Status Readable
expr: >
case
when o_orderstatus = 'O' then 'Open'
when o_orderstatus = 'F' then 'Fulfilling'
when o_orderstatus = 'P' then 'Processing'
end
- name: Customer Market Segment
expr: cust_dim.c_mktsegment
- name: Order Date
expr: o_orderdate
- name: Order Year
expr: DATE_TRUNC('year',o_orderdate)
- name: Order Priority
expr: o_orderpriority
- name: Clerk
expr: o_clerk
- name: Ship Priority
expr: o_shippriority
measures:
- name: Total Price
expr: SUM(o_totalprice)
- name: Average Total Price
expr: AVG(o_totalprice)
- name: Count of Orders
expr: COUNT(DISTINCT(o_orderkey))
- name: Total Orders Current Year
expr: COUNT(DISTINCT o_orderkey)
window:
- order: Order Year
range: current
semiadditive: last
- name: Total Orders Last Year
expr: COUNT(DISTINCT o_orderkey)
window:
- order: Order Year
range: trailing 1 year
semiadditive: last
- name: Year on Year Growth
expr: MEASURE(`Total Orders Current Year`)-MEASURE(`Total Orders Last Year`)
Clear Example of Metric View vs Standard Viewπ
1. Standard View Approachπ
If you only had standard views, youβd need separate SQL queries (or separate views) for each grouping level.
(a) Group by Stateπ
SELECT
state,
SUM(revenue) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
GROUP BY state;
(b) Group by Regionπ
SELECT
region,
SUM(revenue) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
GROUP BY region;
(c) Group by Countryπ
SELECT
country,
SUM(revenue) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
GROUP BY country;
π Notice: you either write multiple queries/views, or you compute all combinations in one big query with GROUP BY CUBE(country, region, state)
(which can be heavy).
2. Metric View Approachπ
With a metric view, you define the metric once:
CREATE OR REPLACE METRIC VIEW revenue_metrics
AS SELECT
country,
region,
state,
SUM(revenue) AS total_revenue,
COUNT(DISTINCT customer_id) AS distinct_customers
FROM orders
GROUP BY country, region, state;
- Metric defined:
Now, when analysts query this metric view, they donβt need to rewrite SQL for each grouping. The query engine rewrites it under the hood.
Example Queries on Metric Viewπ
(a) Group by Stateπ
(b) Group by Regionπ
SELECT
region,
SUM(total_revenue) / SUM(distinct_customers) AS revenue_per_customer
FROM revenue_metrics
GROUP BY region;
(c) Group by Countryπ
SELECT
country,
SUM(total_revenue) / SUM(distinct_customers) AS revenue_per_customer
FROM revenue_metrics
GROUP BY country;
π Difference: You donβt redefine the metric β you just group by a different dimension (state, region, country). Databricks rewrites the query correctly to maintain metric consistency (so that KPIs mean the same thing everywhere).
β In short:
- Standard views = you must predefine each grouping (state/region/country).
- Metric views = define the metric once (
revenue_per_customer
) and reuse across any dimension.