Quick iteration and reusability of metric calculations for powerful data exploration.
At Looker, we want to make it easier for data analysts to service the needs of the data-hungry users in their organizations. We believe too much of their time is spent responding to ad hoc data requests and not enough time is spent building, experimenting, and embellishing a robust model of the business. Worse yet, business users are starving for data, but are forced to make important decisions without access to data that could guide them in the right direction. Looker addresses both of these problems with a YAML-based modeling language called LookML.
This paper walks through a number of data modeling examples, demonstrating how to use LookML to generate, alter, and update reports—without the need to rewrite any SQL. With LookML, you build your business logic, defining your important metrics once and then reusing them throughout a model—allowing quick, rapid iteration of data exploration, while also ensuring the accuracy of the SQL that’s generated. Small updates are quick and can be made immediately available to business users to manipulate, iterate, and transform in any way they see fit.
2. 2 Data Modeling in Looker White Paper 3
At Looker, we want to make it easier for data analysts to service
the needs of the data-hungry users in their organizations. We
believe too much of their time is spent responding to ad hoc data
requests and not enough time is spent building, experimenting,
and embellishing a robust model of the business. Worse yet,
business users are starving for data, but are forced to make
important decisions without access to data that could guide them
in the right direction.
Looker addresses both of these problems with a YAML-based
modeling language called LookML. With LookML, you build your
business logic, defining your important metrics once and then
reusing them throughout a model. That means you can unleash
them on your business users to manipulate, iterate, and transform
in any way they see fit.
The Reusability
Paradigm of LookML
E-Commerce Example —
Starting with Total Cost of Order
A key difference of LookML is
that, unlike older approaches,
LookML combines modeling,
transformations, and derivations
at the same layer (late-binding
modeling). This allows vast
amounts of data to be captured
in relatively inexpensive
databases (mirrored or
copied), and then derivations
and transformations occur
much closer to, or at, query
time. The traditional approach
is to transform the data as
it’s loaded (ETL), whereas
LookML allows for transform
and derivation on demand
(ELT). The result is a very agile
data environment where user
questions can change and the
data environment can better
keep up.
Let’s take a look at a simple
e-commerce example. We
will create one dimension, the
Total Cost of Order, which can
then be reused and built on
throughout a single LookML
model.
First, a quick primer on a
typical e-commerce data
model, which will help answer
questions about the buying
and selling of items online. In
this example, we’ll work with a
subset of tables: Orders, Order
Items, and Inventory Items. As
a business that tracks Orders,
it’s probably important to
determine the distribution of
customer orders based on cost.
In our current Orders table, we
don’t have a field that tells us
the cost of an order, because
each order contains multiple
items of varying costs. So we
need to calculate a cost of an
order by summing over the sale
prices of the items in the order.
Orders
id created_at user_id
1 2014-04-01 5656
2 2014-04-01 7263
Order Items
id created_at order_id Inventory_item_id sale_price
1 2014-04-01 5656 3 $12
2 2014-04-03 7263 5 $45
Inventory Items
id created_at cost sold_at product_id
1 2014-04-01 $8.50 2014-04-05 5
2 2014-04-02 $24.00 2014-04-04 7
Measures and
Dimensions in
Looker
Looker divides data
exploration into
dimensions and measures.
A dimension is something
you can group by, and a
measure is an aggregated
dimension (for example,
a sum, an average, or a
count).
3. 4 5Data Modeling in Looker White Paper
Correlated Subqueries
In a SQL database query, a correlated subquery (also known
as a synchronized subquery) is a subquery (a query nested
inside another query) that uses values from the outer query.
The subquery is evaluated once for each row processed by
the outer query.
Suppose we want to calculate a new dimension for Orders that will determine the Total Cost of Order.
In this case, the field is not stored in our database, but can be calculated from the sale
price of order items in the order. We’ll use is a simple technique called a correlated subquery. (For
databases that don’t support correlated subqueries or when performance becomes a problem,
Looker supports more complex and powerful mechanisms via derived tables.)
For any given order, the SQL to calculate the Total Cost of Order is:
SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id
We sum over the sale price associated for each item in a given order, where the order_items.order_id
field matches with the primary key in the orders table. In Looker, we’d want to create this dimension
in the Orders view, since it’s an attribute of an order.
- view: Order
fields:
- dimension: total_amount_of_order_usd
type: number
decimals: 2
sql:|
(SELECT SUM(order_items.sale_price)
FROM order_items
WHERE order_items.order_id = orders.id)
We sum over the sale price associated for each item in a given order, where the order_items.order_id
field matches with the primary key in the orders table. In Looker, we’d want to create this dimension
in the Orders view, since it’s an attribute of an order.
- view: Order
fields:
- dimension: total_amount_of_order_usd
type: number
decimals: 2
sql:|
(SELECT SUM(order_items.sale_price)
4. 6 7Data Modeling in Looker White Paper
Tiering Total Cost of Order
We now have a wide range of order amounts, so it probably
makes sense to bucket these values across set intervals.
Normally, if we were writing SQL, we’d have to make a CASE
WHEN statement for each discrete bucket. Conveniently,
LookML has a tier function, so we can use that.
Now let’s see this dimension in action.
- dimension: total_amount_of_order_usd_tier
type: tier
sql: ${total_amount_of_order_usd}
tiers: [0,10,50,150,500,1000]
Notice that we can reference our existing Total Amount of Order dimension in the ‘sql:’ parameter of
the measure. Now when we use the tier, we bucket orders into their respective tiers:
5. 8 9Data Modeling in Looker White Paper
Determining Order Profit
What if we wanted to know more about each order, maybe
the profit? To determine the profit of an order, we will need a
Total Cost of Order dimension.
- dimension: total_cost_of_order
type: number
decimals: 2
sql:|
(SELECT SUM(inventory_items.cost)
FROM order_items
LEFT JOIN inventory_items ON order_items.inventory_items_id =
inventory_items.id
WHERE order_items.order_id = orders.id)
In this case, our SQL sums over the cost of inventory items for a specific order.
Now, to determine the Order Profit dimension, we must subtract the Total Cost of Order dimension
from the Total Amount of Order dimension. Normally, we’d have to subtract the SQL for the Total
Cost of Order from the SQL for Total Amount of Order. But with LookML, we can just reference our
already existing dimensions.
- dimension: order_profit
type: number
decimals: 2
sql: ${total_amount_of_order_usd} - ${total_cost_of_order}
When using this Order Profit, Looker will substitute the existing business logic for both the Total
Amount of Order and Total Cost of Order. Let’s run a new query using the new Order Profit
dimension.
6. 10 11Data Modeling in Looker White Paper
Calculating Profit Per User
Another valuable metric for an e-commerce business may
be Profit Per User. In Looker, we can reference dimensions
or measures from other views. In this case, to determine the
Profit Per User, we’ll reference our Count measure from the
Users view as the denominator of a measure in the Orders
view, where the numerator is our Order Profit dimension. We
use the Count measure from the Users view to scope the
count with ‘users.’
- measure: profit_per_user
type: number
decimals: 2
sql: 100.0 *${order_profit}/NULLIF(${users.count},0)
Now we can see how our Profit Per User varies by every order dimension. In this case, we see how it
varies by order date:
7. 12 13Data Modeling in Looker White Paper
Creating an Average
Total Amount of Order
Measure
What if we wanted a measure that computes the Average
Total Amount of Order whenever we group by a dimension
in Looker? For instance, we might group by Average
Total Amount of Order in a certain Month, by orders from
customers in a certain State, or by the Lifetime Number of
Orders of a customer. When we create a measure in Looker,
we can reuse it in many different contexts.
Let’s first build our Average Total Amount of Order measure.
- measure: average_total_amount_of_order_usd
type: average
sql: ${total_amount_of_order_usd}
decimals: 2
Again, we can reference our already existing Total Amount of Order dimension and set the dimension
type as an average. Now when we use this dimension, it will aggregate over all total order amounts
within that group, calculating the average.
Here we see how the Average Total Amount of Order varies by the Lifetime Number of Orders of
customers and by the Week the order was created.
8. 15White Paper14 Data Modeling in Looker
Creating Conditional
Measures — First
Purchase and Return
We can also create measures that calculate Total Amount
of Order based on conditions of the order, such as whether
it was a customer’s first purchase or if a return customer
made the purchase. This way, we can determine how much
revenue was generated from new or returning customers.
It’s likely we have discrete teams focused on new user
acquisition and on current user retention, so it may be
important we break these revenues apart.
- measure: total_first_purchase_revenue
type: sum
sql: ${total_amount_of_order_usd}
decimals: 2
filters:
is_first_purchase: yes
- measure: total_returning_shopper_revenue
type: sum
sql: ${total_amount_of_order_usd}
decimals: 2
filters:
is_first_purchase: no
Again, both of these measures—Total First Purchase Revenue and Total Returning Shopper Revenue—
take advantage of our existing Total Amount of Order. We can now directly compare
both types of revenue.
9. 16 17Data Modeling in Looker White Paper
Putting It All Together
Given the dimensions and measures we’ve just created,
let’s build a report that shows us Total Returning Shopping
Revenue, Total First Purchase Revenue, Average Total
Amount of Order, and Average Order Profit, broken out by
the Total Amount of Order tiered and the Week in which the
order was created.
To generate such a result set, we’d have to write nearly 200 lines of SQL.
Maybe this makes sense to write one time, but what if we want to look at this by a customer’s State
instead of by order Week?
Or maybe we want to see Lifetime Number of Purchases by a customer, tiered?