Skip to content

Metrics Views in DatabricksπŸ”—

image

image

Example Use CaseπŸ”—

image

How to Create Metrics Views in Databricks?πŸ”—

Measures are quantitative values like sales, revenue etc.

Dimensions are used to support measures like year, month etc. Eg: sales per year, revenue per month

Any joins in metric views are left outer joins.

Metric views are defined in yaml format

Eg:

version: 0.1

source: dev.bronze.orders_raw

dimensions:
  - name: Order Key
    expr: o_orderkey

  - name: Customer Key
    expr: o_custkey

  - name: Order Status
    expr: o_orderstatus

  - name: Order Status Readable
    expr: >
      case 
        when o_orderstatus = 'O' then 'Open'
        when o_orderstatus = 'F' then 'Fulfilling'
        when o_orderstatus = 'P' then 'Processing'
      end

  - name: Order Date
    expr: o_orderdate

  - name: Order Year
    expr: DATE_TRUNC('year',o_orderdate)

  - name: Order Priority
    expr: o_orderpriority

  - name: Clerk
    expr: o_clerk

  - name: Ship Priority
    expr: o_shippriority

measures:
  - name: Total Price
    expr: SUM(o_totalprice)

  - name: Average Total Price
    expr: AVG(o_totalprice)

  - name: Count of Orders
    expr: COUNT(DISTINCT(o_orderkey))

image

image

Querying the metric view

image

Window FunctionsπŸ”—

measures:
  - name: Total Price
    expr: SUM(o_totalprice)

  - name: Average Total Price
    expr: AVG(o_totalprice)

  - name: Count of Orders
    expr: COUNT(DISTINCT(o_orderkey))

  - name: Total Orders Current Year
    expr: COUNT(DISTINCT o_orderkey)
    window:
      - order: Order Year
        range: current
        semiadditive: last

  - name: Total Orders Last Year
    expr: COUNT(DISTINCT o_orderkey)
    window:
      - order: Order Year
        range: trailing 1 year
        semiadditive: last

Final Metric Views with JoinsπŸ”—

version: 0.1

source: dev.bronze.orders_raw

joins:
  - name: cust_dim
    source: dev.bronze.customer_raw
    on: cust_dim.c_custkey = source.o_custkey

dimensions:
  - name: Order Key
    expr: o_orderkey

  - name: Customer Key
    expr: o_custkey

  - name: Order Status
    expr: o_orderstatus

  - name: Order Status Readable
    expr: >
      case 
        when o_orderstatus = 'O' then 'Open'
        when o_orderstatus = 'F' then 'Fulfilling'
        when o_orderstatus = 'P' then 'Processing'
      end

  - name: Customer Market Segment
    expr: cust_dim.c_mktsegment

  - name: Order Date
    expr: o_orderdate

  - name: Order Year
    expr: DATE_TRUNC('year',o_orderdate)

  - name: Order Priority
    expr: o_orderpriority

  - name: Clerk
    expr: o_clerk

  - name: Ship Priority
    expr: o_shippriority

measures:
  - name: Total Price
    expr: SUM(o_totalprice)

  - name: Average Total Price
    expr: AVG(o_totalprice)

  - name: Count of Orders
    expr: COUNT(DISTINCT(o_orderkey))

  - name: Total Orders Current Year
    expr: COUNT(DISTINCT o_orderkey)
    window:
      - order: Order Year
        range: current
        semiadditive: last

  - name: Total Orders Last Year
    expr: COUNT(DISTINCT o_orderkey)
    window:
      - order: Order Year
        range: trailing 1 year
        semiadditive: last

  - name: Year on Year Growth
    expr: MEASURE(`Total Orders Current Year`)-MEASURE(`Total Orders Last Year`)

Clear Example of Metric View vs Standard ViewπŸ”—


1. Standard View ApproachπŸ”—

If you only had standard views, you’d need separate SQL queries (or separate views) for each grouping level.

(a) Group by StateπŸ”—

SELECT
    state,
    SUM(revenue) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
GROUP BY state;

(b) Group by RegionπŸ”—

SELECT
    region,
    SUM(revenue) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
GROUP BY region;

(c) Group by CountryπŸ”—

SELECT
    country,
    SUM(revenue) / COUNT(DISTINCT customer_id) AS revenue_per_customer
FROM orders
GROUP BY country;

πŸ‘‰ Notice: you either write multiple queries/views, or you compute all combinations in one big query with GROUP BY CUBE(country, region, state) (which can be heavy).


2. Metric View ApproachπŸ”—

With a metric view, you define the metric once:

CREATE OR REPLACE METRIC VIEW revenue_metrics
AS SELECT
    country,
    region,
    state,
    SUM(revenue) AS total_revenue,
    COUNT(DISTINCT customer_id) AS distinct_customers
FROM orders
GROUP BY country, region, state;
  • Metric defined:
revenue_per_customer = total_revenue / distinct_customers

Now, when analysts query this metric view, they don’t need to rewrite SQL for each grouping. The query engine rewrites it under the hood.


Example Queries on Metric ViewπŸ”—

(a) Group by StateπŸ”—

SELECT
    state,
    total_revenue / distinct_customers AS revenue_per_customer
FROM revenue_metrics;

(b) Group by RegionπŸ”—

SELECT
    region,
    SUM(total_revenue) / SUM(distinct_customers) AS revenue_per_customer
FROM revenue_metrics
GROUP BY region;

(c) Group by CountryπŸ”—

SELECT
    country,
    SUM(total_revenue) / SUM(distinct_customers) AS revenue_per_customer
FROM revenue_metrics
GROUP BY country;

πŸ‘‰ Difference: You don’t redefine the metric β€” you just group by a different dimension (state, region, country). Databricks rewrites the query correctly to maintain metric consistency (so that KPIs mean the same thing everywhere).


βœ… In short:

  • Standard views = you must predefine each grouping (state/region/country).
  • Metric views = define the metric once (revenue_per_customer) and reuse across any dimension.