Join 2017_Deep Dive_To Use or Not Use PDT's

1
SAN FRANCISCO PALACE OF FINE ARTS
SPEAKERS
SPEAKERS
JONATHON MILLER-GIRVETZ
Data Analyst, Customer Support
To Use or Not Use
PDT’s

What we will
cover
•  Why we derive and persist
•  Types of derived tables
•  When to use them
•  What to LOOK out for
•  When to move to ETL
•  Balance
•  Best practices
2

Why we derive and persist?
A derived table is a SQL query that deﬁnes a set of business logic, returns reduced amounts of
data, and can include complex calculations and data transformations
Persistence is when data survives after its creation process has terminated
Some examples
•  Persisting form data in a web app for a better UX
•  Persisting data aggregation in an embedded visualization to easily and quickly access
complex analysis
•  Persisting in Looker ensuring data is ready for analysis
3

4
Types of derived tables
Ephemeral derived tables, EDT’s
WITH tmp AS (SELECT user_id,
SUM(active_usage_min) AS total_active_usage_min
FROM usage
GROUP BY 1 ORDER BY 2 DESC)
Persistent derived tables, PDT’s
CREATE TABLE usage_facts AS
(SELECT user_id,
SUM(active_usage_min) AS total_active_usage_min
FROM usage
GROUP BY 1 ORDER BY 2 DESC)

5
PDT’s build by persisting
and/or triggering, which
caches the table
EDT’s build every time at
runtime of the query

When to build a derived table?
To name a few from the top...
•  Historical summaries
•  Entity and transaction tables
•  Roll-ups/aggregations
•  Overcome SQL structural limitations
•  Window functions
•  Required subqueries
•  Nested aggregates
•  Correlated subqueries
6

When to build an EDT instead of
a PDT?
•  When the view is quick to run
•  When the view should include real-time data
•  A UNION ALL between a historical PDT and a sort-key-filtered, indexed-
filtered, and partitioned-filtered current slice - multi node databases
•  When it should be dynamically built based on user filter inputs
•  Templated filters
•  When a view needs to be dynamic, but the number of
permutations is manageable and likely to be reused
•  User selections
•  Filter values
•  User attributes
7

8
“I love ephemeral derived tables because they
feel light-weight and focused, but they make
the most sense when you're doing something
small and quick and/or if what you're doing is
sensitive to frequent ETL. If you don't mind the
[computation] cost and redoing the
computation each time, then I'd say don't
persist.”
Maxie Corbin
Looker Data Analyst, Customer Support

9
When should we build a PDT?
•  Data freshness requirements
•  Available database resources ratio to resources consumed by the build
•  Prototyping - laying the groundwork for views, business logic and future
ETL processes
How to?
datagroup: set build caching policies - release 4.16+
persist_for: co builds
sql_trigger_value: builds

10
What to LOOK out for?
PDT’s are very powerful but they are not perfect
•  Being aware of the front-end UX and the derived table aggregations that affect it
•  Computational resources
•  Available database resources
•  Time, query queue, and (potentially) money

How much
usage per
customer?
How has our
retention rate
changed over
the past 6
years?
None of the
queries
appear to be
working?
Select
margin
of error?
[SQL
ERROR]:
Table lock?
Table lock.

When should a PDT be part of
the ETL?
•  When a powerful ETL/transformation tool can be leveraged
•  When a PDT is consistently being used
•  When a PDT’s logic is well-understood, stable and rarely changing
•  When raw data only needs to be processed “once” or incrementally
•  When a PDT is being used outside of Looker
•  AVOID table locks which halt the query queue and backup your query breadline
•  When the naming of the ETL’d table clearly communicates its contents and/or a
data dictionary exists to view the deﬁnition of the ETL’d table
12

13
1. Extract data from
sources
2. Transform data with
PDT in Looker
3. Excellent User
Experience
Prototype PDT, load it in Looker,
and move it to ETL if merited
Collect more data
and iterate

Move it!
14
PDT’s ETLYou already have the SQL
lkml provides models across
dialects

The high wire balance
It’s a pragmatic balance between ﬂexibility and reliability, where few PDT’s are ﬂexible,
but many PDT’s can be unreliable.
PDT
Too many to keep track of? Not
feeling reliable or manageable? ETL!
ReliabilityFlexibility
ETL
Feeling stiff and rigid? Need to
stretch out those analytical thoughts?
PDT’s!
15

When to use take away
16
•  Real-time data
•  Quick query
•  Dynamically built
•  Data freshness
•  Available database
resources
•  Prototyping
•  Powerful ETL tool
•  Well understood
PDT
•  Consistently used
PDT
•  Used outside Looker
EDT PDT ETL

Development best practices
•  Use consistent naming conventions
•  Easy to locate and determine primary keys without the need to look through the
entire PDT deﬁnition
•  Development guidelines
•  Iterative development
•  Test the SQL as you develop
•  Validate the lkml often
•  File and code structure
•  Horizontal vs vertical rules
•  Changing and pushing to prod
•  Update a PDT SQL deﬁnition or datagroup in dev and push to prod will result in a non-existent
PDT in prod - forces build on production. So, BUILD IN DEV! and then push :)
17

Horizontal Development
connection: “myconnection”
label: “My Marketing Team”
# includes marketing views
include: “marketing.*.view”
# includes marketing dashboards
include: “marketing.*.dashboard”
18

view: usage_per_user { }
view: total_usage { }
view: pct_usage_per_user { }
explore: pct_usage_per_user {hidden: yes}
view: pct_usage_per_user {
derived_table: {
sql: SELECT email,
usage,
SUM(1.0*usage_per_customer/NULLIF(total_usage,0)) OVER (ORDER BY total_usage
DESC ) AS running_total_usage
FROM ${usage_per_user.SQL_TABLE_NAME}, ${total_usage.SQL_TABLE_NAME} ;;
}
}
view: usage_per_user {
derived_table: {
sql: SELECT DISTINCT users.user_id,
users.email,
SUM(usage_fact.usage_minutes) AS usage_per_user
FROM users
INNER JOIN ${usage_fact.SQL_TABLE_NAME} AS usage_fact ON users.id =
usage_fact.user_id
GROUP BY 1, 2 ORDER BY 3 DESC ;;
}
}
view: total_usage {
derived_table: {
sql: SELECT SUM(usage_minutes) AS total_usage FROM usage_fact ;;
}
}
Vertical Development
19
Single view ﬁle: pct_usage_per_user.view.lkml

Questions?
https://discourse.looker.com/t/join-2017-deep-dive-to-use-or-not-use-pdts/5846

Jonathon M-G
21
Data Analyst, Customer Support

Join 2017_Deep Dive_To Use or Not Use PDT's

More Related Content

What's hot

Similar to Join 2017_Deep Dive_To Use or Not Use PDT's

More from Looker

Recently uploaded

Join 2017_Deep Dive_To Use or Not Use PDT's