SlideShare a Scribd company logo
Transforming your analytics
workflow with dbt
#Siligong.Data May 2021 Meetup
Jon Su
Jon Su @ Internetrix
Who knows what SQL is?
(pronounced sequel or ss-cue-l depending on who you ask)
https://www.commitstrip.com/en/?
Jon Su @ Internetrix
Tonight’s Talk: dbt (data build tool)
● A History Lesson
● What is dbt?
● Demo: dbt in action
Jon Su @ Internetrix
ETL was all the craze!
https://blog.bismart.com/en/what-do-we-do-etl
Jon Su @ Internetrix
● How to build a data
infrastructure that can
scale
● Controlling storage
costs
● Performance Tuning
The focus for data teams was on extraction
Jon Su @ Internetrix
But then came the “Cloud”...
This is how the Cloud
actually looks like in real-
life ;)
Jon Su @ Internetrix
Jon Su @ Internetrix
Industry Trend #1: Move away from “do-it-all-in-1-tool”
https://lakefs.io/the-state-of-data-engineering-in-2021/
Jon Su @ Internetrix
Industry Trend #2: Shift from ETL to ELT
https://www.striim.com/etl-vs-elt/
Jon Su @ Internetrix
But there are still problems...
https://www.striim.com/etl-vs-elt/
Data Science
BI Tools
Jon Su @ Internetrix
Analytics workflow problems...
● Data consumers don’t have the data when they need it
○ Silos between different members of a traditional data team
Data Analyst
Data Engineer Business
Triangle of
Madness
Jon Su @ Internetrix
Analytics workflow problems...
● Beautiful dashboards that suddenly break when something upstream goes wrong /
source schema changes
● Having to rewrite and rewrite the same piece of SQL again & again….
○ Not sharing analytics code in a team
○ Analysts work in isolation, knowledge isn’t shared
○ Different definitions of a shared metric
● Hard for a business to adopt using BI easily
○ time + $
○ Low BI adoption
Jon Su @ Internetrix
Does this have to be the way?
Jon Su @ Internetrix
Jon Su @ Internetrix
dbt (data build tool) lets anyone who knows SQL author their
analytics workflow and make their own data pipelines.
If you know SQL, you can use it - essentially no barrier to entry
Supports a large number of warehouse through adapters
● Can build your own adapter
“
Jon Su @ Internetrix
Introduces basic software engineering principles to
solve the workflow problems we mentioned!
Jon Su @ Internetrix
Version
Control
Quality
Assurance
Modularity
Multiple
Environments
Documentation
Automated
Tools
Code
Maintainability
https://docs.getdbt.com/docs/about/viewpoint
Jon Su @ Internetrix
How does dbt work?
https://www.getdbt.com/product/
Jon Su @ Internetrix
How does dbt work?
A dbt project consists of .sql and .yml files:
1. Write dbt code (SQL + Jinja templating)
2. Run dbt command from the CLI or dbt Cloud
3. dbt compiles your dbt code into raw SQL and executes that code against your warehouse.
4. Data is transformed and then created as tables/views back in the data warehouse
Jon Su @ Internetrix
dbt Core dbt Cloud
Jon Su @ Internetrix
Demo
Scenario:
● Google Merchandise Store
Dataset:
● Google Analytics sample dataset for
BigQuery
Goal:
● Find all Purchases made in Feb 2017
by Users who previously visited the site
using a Chrome browser in Jan 2017
Jon Su @ Internetrix
● dbt website: https://getdbt.com
● Demo Source Code:
https://github.com/jkersu/dbt-basic-demo
● Get in touch by email at:
jon@irx.io
The End.
Jon Su @ Internetrix

More Related Content

What's hot

Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Adam Doyle
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
Omid Vahdaty
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
Databricks
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
Snowflake Computing
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
Mykola Zerniuk
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Modern Data Stack France
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
Dremio Corporation
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
Walaa Eldin Moustafa
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
Databricks
 

What's hot (20)

Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Introduction to Dremio
Introduction to DremioIntroduction to Dremio
Introduction to Dremio
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 

Similar to Siligong.Data - May 2021 - Transforming your analytics workflow with dbt

Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
GoDataDriven
 
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
Daryna Dubitska
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
Travis Oliphant
 
Reinventing DDC in the Age of Data Analytics
Reinventing DDC in the Age of Data AnalyticsReinventing DDC in the Age of Data Analytics
Reinventing DDC in the Age of Data Analytics
Memoori
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Daniel Zivkovic
 
Demystifying IoT skills : What does it take to become a FullStack IoT engineer?
Demystifying IoT skills : What does it take to become a FullStack IoT engineer?Demystifying IoT skills : What does it take to become a FullStack IoT engineer?
Demystifying IoT skills : What does it take to become a FullStack IoT engineer?
Emertxe Information Technologies Pvt Ltd
 
Data Con LA 2022 - Open Source Large Knowledge Graph Factory
Data Con LA 2022 - Open Source Large Knowledge Graph FactoryData Con LA 2022 - Open Source Large Knowledge Graph Factory
Data Con LA 2022 - Open Source Large Knowledge Graph Factory
Data Con LA
 
Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...
Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...
Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...
MongoDB
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
VMware Tanzu
 
Challenges of applying Blockchain to enterprise systems in NTTDATA
Challenges of applying Blockchain to enterprise systems in NTTDATAChallenges of applying Blockchain to enterprise systems in NTTDATA
Challenges of applying Blockchain to enterprise systems in NTTDATA
Hyperleger Tokyo Meetup
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
Vladislav Supalov
 
Mihai Tataran - Building Windows 8 Applications with HTML5 and JS
Mihai Tataran - Building Windows 8 Applications with HTML5 and JSMihai Tataran - Building Windows 8 Applications with HTML5 and JS
Mihai Tataran - Building Windows 8 Applications with HTML5 and JS
ITCamp
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
Márton Kodok
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of Things
Inside Analysis
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and Practices
Jeff Bertman
 
MEF Üniversitesi - IoT & Data Dersi
MEF Üniversitesi - IoT & Data DersiMEF Üniversitesi - IoT & Data Dersi
MEF Üniversitesi - IoT & Data Dersi
İbrahim KIVANÇ
 
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
InfluxData
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
Jordan Chung
 
Internet of Things in Tbilisi
Internet of Things in TbilisiInternet of Things in Tbilisi
Internet of Things in Tbilisi
Alexey Bokov
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
Clusterpoint
 

Similar to Siligong.Data - May 2021 - Transforming your analytics workflow with dbt (20)

Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
SoftElegance Services: Data Science, Data Engineering, Big Data Architecture
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Reinventing DDC in the Age of Data Analytics
Reinventing DDC in the Age of Data AnalyticsReinventing DDC in the Age of Data Analytics
Reinventing DDC in the Age of Data Analytics
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Demystifying IoT skills : What does it take to become a FullStack IoT engineer?
Demystifying IoT skills : What does it take to become a FullStack IoT engineer?Demystifying IoT skills : What does it take to become a FullStack IoT engineer?
Demystifying IoT skills : What does it take to become a FullStack IoT engineer?
 
Data Con LA 2022 - Open Source Large Knowledge Graph Factory
Data Con LA 2022 - Open Source Large Knowledge Graph FactoryData Con LA 2022 - Open Source Large Knowledge Graph Factory
Data Con LA 2022 - Open Source Large Knowledge Graph Factory
 
Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...
Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...
Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
Challenges of applying Blockchain to enterprise systems in NTTDATA
Challenges of applying Blockchain to enterprise systems in NTTDATAChallenges of applying Blockchain to enterprise systems in NTTDATA
Challenges of applying Blockchain to enterprise systems in NTTDATA
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
Mihai Tataran - Building Windows 8 Applications with HTML5 and JS
Mihai Tataran - Building Windows 8 Applications with HTML5 and JSMihai Tataran - Building Windows 8 Applications with HTML5 and JS
Mihai Tataran - Building Windows 8 Applications with HTML5 and JS
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 
A Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of ThingsA Connected Data Landscape: Virtualization and the Internet of Things
A Connected Data Landscape: Virtualization and the Internet of Things
 
Maximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and PracticesMaximize Big Data ROI via Best of Breed Patterns and Practices
Maximize Big Data ROI via Best of Breed Patterns and Practices
 
MEF Üniversitesi - IoT & Data Dersi
MEF Üniversitesi - IoT & Data DersiMEF Üniversitesi - IoT & Data Dersi
MEF Üniversitesi - IoT & Data Dersi
 
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
How a Time Series Database Contributes to a Decentralized Cloud Object Storag...
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Internet of Things in Tbilisi
Internet of Things in TbilisiInternet of Things in Tbilisi
Internet of Things in Tbilisi
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 

Recently uploaded

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 

Recently uploaded (20)

Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 

Siligong.Data - May 2021 - Transforming your analytics workflow with dbt

  • 1. Transforming your analytics workflow with dbt #Siligong.Data May 2021 Meetup Jon Su Jon Su @ Internetrix
  • 2. Who knows what SQL is? (pronounced sequel or ss-cue-l depending on who you ask) https://www.commitstrip.com/en/? Jon Su @ Internetrix
  • 3. Tonight’s Talk: dbt (data build tool) ● A History Lesson ● What is dbt? ● Demo: dbt in action Jon Su @ Internetrix
  • 4.
  • 5. ETL was all the craze! https://blog.bismart.com/en/what-do-we-do-etl Jon Su @ Internetrix
  • 6. ● How to build a data infrastructure that can scale ● Controlling storage costs ● Performance Tuning The focus for data teams was on extraction Jon Su @ Internetrix
  • 7. But then came the “Cloud”... This is how the Cloud actually looks like in real- life ;) Jon Su @ Internetrix
  • 8. Jon Su @ Internetrix
  • 9. Industry Trend #1: Move away from “do-it-all-in-1-tool” https://lakefs.io/the-state-of-data-engineering-in-2021/ Jon Su @ Internetrix
  • 10. Industry Trend #2: Shift from ETL to ELT https://www.striim.com/etl-vs-elt/ Jon Su @ Internetrix
  • 11. But there are still problems... https://www.striim.com/etl-vs-elt/ Data Science BI Tools Jon Su @ Internetrix
  • 12. Analytics workflow problems... ● Data consumers don’t have the data when they need it ○ Silos between different members of a traditional data team Data Analyst Data Engineer Business Triangle of Madness Jon Su @ Internetrix
  • 13. Analytics workflow problems... ● Beautiful dashboards that suddenly break when something upstream goes wrong / source schema changes ● Having to rewrite and rewrite the same piece of SQL again & again…. ○ Not sharing analytics code in a team ○ Analysts work in isolation, knowledge isn’t shared ○ Different definitions of a shared metric ● Hard for a business to adopt using BI easily ○ time + $ ○ Low BI adoption Jon Su @ Internetrix
  • 14. Does this have to be the way? Jon Su @ Internetrix
  • 15. Jon Su @ Internetrix
  • 16. dbt (data build tool) lets anyone who knows SQL author their analytics workflow and make their own data pipelines. If you know SQL, you can use it - essentially no barrier to entry Supports a large number of warehouse through adapters ● Can build your own adapter “ Jon Su @ Internetrix
  • 17. Introduces basic software engineering principles to solve the workflow problems we mentioned! Jon Su @ Internetrix
  • 19. How does dbt work? https://www.getdbt.com/product/ Jon Su @ Internetrix
  • 20. How does dbt work? A dbt project consists of .sql and .yml files: 1. Write dbt code (SQL + Jinja templating) 2. Run dbt command from the CLI or dbt Cloud 3. dbt compiles your dbt code into raw SQL and executes that code against your warehouse. 4. Data is transformed and then created as tables/views back in the data warehouse Jon Su @ Internetrix
  • 21. dbt Core dbt Cloud Jon Su @ Internetrix
  • 22. Demo Scenario: ● Google Merchandise Store Dataset: ● Google Analytics sample dataset for BigQuery Goal: ● Find all Purchases made in Feb 2017 by Users who previously visited the site using a Chrome browser in Jan 2017 Jon Su @ Internetrix
  • 23. ● dbt website: https://getdbt.com ● Demo Source Code: https://github.com/jkersu/dbt-basic-demo ● Get in touch by email at: jon@irx.io The End. Jon Su @ Internetrix