Transforming your analytics
workflow with dbt
#Siligong.Data May 2021 Meetup
Jon Su
Jon Su @ Internetrix
Who knows what SQL is?
(pronounced sequel or ss-cue-l depending on who you ask)
https://www.commitstrip.com/en/?
Jon Su @ Internetrix
Tonight’s Talk: dbt (data build tool)
● A History Lesson
● What is dbt?
● Demo: dbt in action
Jon Su @ Internetrix
ETL was all the craze!
https://blog.bismart.com/en/what-do-we-do-etl
Jon Su @ Internetrix
● How to build a data
infrastructure that can
scale
● Controlling storage
costs
● Performance Tuning
The focus for data teams was on extraction
Jon Su @ Internetrix
But then came the “Cloud”...
This is how the Cloud
actually looks like in real-
life ;)
Jon Su @ Internetrix
Jon Su @ Internetrix
Industry Trend #1: Move away from “do-it-all-in-1-tool”
https://lakefs.io/the-state-of-data-engineering-in-2021/
Jon Su @ Internetrix
Industry Trend #2: Shift from ETL to ELT
https://www.striim.com/etl-vs-elt/
Jon Su @ Internetrix
But there are still problems...
https://www.striim.com/etl-vs-elt/
Data Science
BI Tools
Jon Su @ Internetrix
Analytics workflow problems...
● Data consumers don’t have the data when they need it
○ Silos between different members of a traditional data team
Data Analyst
Data Engineer Business
Triangle of
Madness
Jon Su @ Internetrix
Analytics workflow problems...
● Beautiful dashboards that suddenly break when something upstream goes wrong /
source schema changes
● Having to rewrite and rewrite the same piece of SQL again & again….
○ Not sharing analytics code in a team
○ Analysts work in isolation, knowledge isn’t shared
○ Different definitions of a shared metric
● Hard for a business to adopt using BI easily
○ time + $
○ Low BI adoption
Jon Su @ Internetrix
Does this have to be the way?
Jon Su @ Internetrix
Jon Su @ Internetrix
dbt (data build tool) lets anyone who knows SQL author their
analytics workflow and make their own data pipelines.
If you know SQL, you can use it - essentially no barrier to entry
Supports a large number of warehouse through adapters
● Can build your own adapter
“
Jon Su @ Internetrix
Introduces basic software engineering principles to
solve the workflow problems we mentioned!
Jon Su @ Internetrix
Version
Control
Quality
Assurance
Modularity
Multiple
Environments
Documentation
Automated
Tools
Code
Maintainability
https://docs.getdbt.com/docs/about/viewpoint
Jon Su @ Internetrix
How does dbt work?
https://www.getdbt.com/product/
Jon Su @ Internetrix
How does dbt work?
A dbt project consists of .sql and .yml files:
1. Write dbt code (SQL + Jinja templating)
2. Run dbt command from the CLI or dbt Cloud
3. dbt compiles your dbt code into raw SQL and executes that code against your warehouse.
4. Data is transformed and then created as tables/views back in the data warehouse
Jon Su @ Internetrix
dbt Core dbt Cloud
Jon Su @ Internetrix
Demo
Scenario:
● Google Merchandise Store
Dataset:
● Google Analytics sample dataset for
BigQuery
Goal:
● Find all Purchases made in Feb 2017
by Users who previously visited the site
using a Chrome browser in Jan 2017
Jon Su @ Internetrix
● dbt website: https://getdbt.com
● Demo Source Code:
https://github.com/jkersu/dbt-basic-demo
● Get in touch by email at:
jon@irx.io
The End.
Jon Su @ Internetrix

Siligong.Data - May 2021 - Transforming your analytics workflow with dbt

  • 1.
    Transforming your analytics workflowwith dbt #Siligong.Data May 2021 Meetup Jon Su Jon Su @ Internetrix
  • 2.
    Who knows whatSQL is? (pronounced sequel or ss-cue-l depending on who you ask) https://www.commitstrip.com/en/? Jon Su @ Internetrix
  • 3.
    Tonight’s Talk: dbt(data build tool) ● A History Lesson ● What is dbt? ● Demo: dbt in action Jon Su @ Internetrix
  • 5.
    ETL was allthe craze! https://blog.bismart.com/en/what-do-we-do-etl Jon Su @ Internetrix
  • 6.
    ● How tobuild a data infrastructure that can scale ● Controlling storage costs ● Performance Tuning The focus for data teams was on extraction Jon Su @ Internetrix
  • 7.
    But then camethe “Cloud”... This is how the Cloud actually looks like in real- life ;) Jon Su @ Internetrix
  • 8.
    Jon Su @Internetrix
  • 9.
    Industry Trend #1:Move away from “do-it-all-in-1-tool” https://lakefs.io/the-state-of-data-engineering-in-2021/ Jon Su @ Internetrix
  • 10.
    Industry Trend #2:Shift from ETL to ELT https://www.striim.com/etl-vs-elt/ Jon Su @ Internetrix
  • 11.
    But there arestill problems... https://www.striim.com/etl-vs-elt/ Data Science BI Tools Jon Su @ Internetrix
  • 12.
    Analytics workflow problems... ●Data consumers don’t have the data when they need it ○ Silos between different members of a traditional data team Data Analyst Data Engineer Business Triangle of Madness Jon Su @ Internetrix
  • 13.
    Analytics workflow problems... ●Beautiful dashboards that suddenly break when something upstream goes wrong / source schema changes ● Having to rewrite and rewrite the same piece of SQL again & again…. ○ Not sharing analytics code in a team ○ Analysts work in isolation, knowledge isn’t shared ○ Different definitions of a shared metric ● Hard for a business to adopt using BI easily ○ time + $ ○ Low BI adoption Jon Su @ Internetrix
  • 14.
    Does this haveto be the way? Jon Su @ Internetrix
  • 15.
    Jon Su @Internetrix
  • 16.
    dbt (data buildtool) lets anyone who knows SQL author their analytics workflow and make their own data pipelines. If you know SQL, you can use it - essentially no barrier to entry Supports a large number of warehouse through adapters ● Can build your own adapter “ Jon Su @ Internetrix
  • 17.
    Introduces basic softwareengineering principles to solve the workflow problems we mentioned! Jon Su @ Internetrix
  • 18.
  • 19.
    How does dbtwork? https://www.getdbt.com/product/ Jon Su @ Internetrix
  • 20.
    How does dbtwork? A dbt project consists of .sql and .yml files: 1. Write dbt code (SQL + Jinja templating) 2. Run dbt command from the CLI or dbt Cloud 3. dbt compiles your dbt code into raw SQL and executes that code against your warehouse. 4. Data is transformed and then created as tables/views back in the data warehouse Jon Su @ Internetrix
  • 21.
    dbt Core dbtCloud Jon Su @ Internetrix
  • 22.
    Demo Scenario: ● Google MerchandiseStore Dataset: ● Google Analytics sample dataset for BigQuery Goal: ● Find all Purchases made in Feb 2017 by Users who previously visited the site using a Chrome browser in Jan 2017 Jon Su @ Internetrix
  • 23.
    ● dbt website:https://getdbt.com ● Demo Source Code: https://github.com/jkersu/dbt-basic-demo ● Get in touch by email at: jon@irx.io The End. Jon Su @ Internetrix