Define Metrics & Dimensions
the core of your semantic layer for trustworthy analytics
Read here, why you need them.
Dimensions
A characteristic or attribute of your data. Examples include customer ID, product name, or date. It's usually a column in your database. Dimensions are what you use to segment your data.
They are defined in the dimensions
section of a cube
dimensions:
- name: short_id
label: Short Identifier
sql: SUBSTRING(${table}._AIRBYTE_ORDERS_HASHID, 8)
- name: status
label: Order Status
sql: ${table}.STATUS
- name: payment_method
label: Payment Method
sql: ${table}.PAYMENT_METHOD
- name: date
label: Payment Date
sql: date_trunc(${time_frame},${table}.DATE_CREATED)
variants:
- time_frame: [day, week, month, quarter, year]
In the example above we defined 4 dimensions:
Short Identifier: is a calculated dimension, where we take a part of the column
_AIRBYTE_ORDERS_HASHID
Status, Payment Method: are just passed directly and only labelled more user friendly
Date: is also a calculated dimension, but with a variant variable
time_frame
that will create multiple variations:date_day, date_week, date_month, date_quarter, date_year
. This allows the end user to query multiple the data in different granularities
Variants
Variants allow us to keep the semantic layer DRY and easy to read. We can either just have a list of values like above:
- name: date
sql: date_trunc(${time_frame},${table}.DATE_CREATED)
variants:
- time_frame: [day, week, month, quarter, year]
or we specify key value pairs, if we want to pass more complex values along, but keep the created fields clean (date_day, date_month, date_year
):
- name: date
sql: date_trunc(${time_frame},${table}.DATE_CREATED)
variants:
- time_frame:
- day: "%Y-%m-%d"
- month: "%Y-%m-01"
- year: "%Y-01-01"
Metrics
A calculation based on your dimensions. Examples include sum of sales, average order value, or count of users. Measures are values you want to analyze.
They are defined in the metrics
section of a cube.
metrics:
- name: total_revenue
label: Total Revenue
sql: sum(${table}.TOTAL)
- name: average_order_value
label: Average Order Value
sql: avg(${table}.TOTAL)
- name: number_of_orders
label: Number of Orders
sql: count(${table}._AIRBYTE_ORDERS_HASHID)
- name: distinct_customers
label: Number of Customers
sql: count(distinct ${table}.CUSTOMER_ID)
- name: orders_over_customers
label: Ratio of Orders per Customer
sql: ${number_of_orders} / ${distinct_customers}
- name: orders_credit_card
label: Amount of Orders with Credit Card
sql: count(case when ${table}.PAYMENT_METHOD = 'credit_card' then 1 else null end)
- name: revenue_credit_card
label: Revenue from Credit Card Payments
sql: sum( case when ${table}.PAYMENT_METHOD = 'credit_card' then ${table}.TOTAL else 0 end )
In the example above we defined 7 metrics:
Total Revenue: is a sum of the column
TOTAL
Average Order Value: is the average of the column
TOTAL
Number of Orders: counts unique transaction ids
Number of Customers: counts unique customer ids
Ratio of Orders per Customer: is a metric composed of the two previous metrics
Amount of Orders with Credit Card: counts orders whose payment method is
credit_card
Revenue from Credit Card Payments: sum up
TOTAL
of orders whose payment method iscredit_card
SQL Notation and ${placeholder}
DotML is database agnostic, which means you can write any SQL expressions you want for your database. If you database supports something cool like Correlation(col1, col2)
DotML also supports it.
The placeholder notation ${placeholder}
allows DotML to dynamically generate the correct SQL statement or to reuse existing dimensions and metrics in other definitions.
There are 3 different placeholders:
${table}.column_name
for correctly identifying a column name in the base table${dimension_name}
or${metric_name}
for reusing an existing definition${variant_name}
for configuring the substitution place of the variants
It's like specifying a template that gets filled with the correct value at runtime.
Last updated