SlideShare a Scribd company logo
1 of 34
CONFIDENTIAL. Copyright © 1
1
DBT (DATA BUILD TOOL) AN ELT APPROACH FOR
ADVANCED ANALYTICS
CONFIDENTIAL. Copyright © 2
8+ years swimming in data @
A Researcher, Engineer and Blogger
CONFIDENTIAL. Copyright © 3
Agenda
01
02
03
04
05
06
Motivation
DBT Approach
How to work with DBT
Demo
Key take away
Discussion
CONFIDENTIAL. Copyright © 4
Motivation
CONFIDENTIAL. Copyright © 5
We start with Excel files
CONFIDENTIAL. Copyright © 6
Data Analytics (DA) daily job
How to prepare master table?
• Drag and drop to visualization tool?
• Modeling on the fly?
• Write complex queries?
Multiple data sources Multiple tables Data modeling
Source: link
BIG DATA!?
Volume: 10GB 5 years of Data.
Variety: multiple data sources.
Velocity: real-time analytics.
CONFIDENTIAL. Copyright © 7
We moved to Datawarehouse
Lead time at least 2 weeks
DA don’t understand what DE did
And vise versa
Data warehouse
Transform Load
Extract
CONFIDENTIAL. Copyright © 8
DE challenges
Readability
• How to read and
understand this
query?
• Where to start?
Accessibility
• How to verify the
output?
• Can we break the
script into smaller
pieces for testing?
Collaboration
• How to reuse this
query for other
analysis?
• How to onboard
new members?
• How to explain if
there’re 100
tables?
Scripting
• How to reuse this
query for other
analysis?
• How to manage
model versions?
CONFIDENTIAL. Copyright © 9
CONFIDENTIAL. Copyright © 9
Customer segmentation: Segmentation is a technique used to divide
customers into groups based on certain characteristics or behaviors. This can
help businesses understand their customers better and tailor their
marketing efforts to specific groups. SQL can be used to create customer
segments by grouping customers based on demographic information (age,
gender, location) or transactional data (purchase history, frequency,
monetary value).
Cohort Analysis: DBT can be used to perform cohort analysis by
transforming raw data into a format suitable for analysis. By using DBT to
transform the data, analysts can quickly identify patterns and trends in user
behavior and track the performance of different customer segments over
time.
Marketing Attribution: DBT can be used to perform marketing attribution
analysis by transforming raw data into a format suitable for analysis. By
using DBT to transform the data, marketers can better understand which
channels and campaigns are driving the most conversions and optimize their
marketing spend accordingly.
Financial Reporting: DBT can be used to transform financial data into a
format suitable for reporting and analysis. By using DBT to transform the
data, financial analysts can quickly generate accurate and consistent reports
that provide insights into company performance, revenue, expenses, and
other key financial metrics.
Demand forecasting: DBT can be used to create a series of transformations
on raw transactional data to prepare it for predictive modeling. For example,
it can be used to aggregate transactional data by time periods (e.g., days,
weeks, or months) and join it with other relevant data sources such as
weather data, holidays, or other events that can affect demand.
Recommendation engines: Recommendation engines are used to suggest
products or services to customers based on their past behavior or
preferences. SQL can be used to create recommendation engines by
analyzing customer purchase history and identifying patterns or similarities
between customers. This can be used to suggest similar products or to
identify cross-selling opportunities.
USE CASES
ADVANCED ANALYTICS
How to go fast with Data-driven
culture and Advanced Analytics?
CONFIDENTIAL. Copyright © 10
DBT Approach
DBT (Data Build Tool) an ELT approach for Advanced Analytics
CONFIDENTIAL. Copyright © 11
Migration from Imperative to Declarative
LEADING INSURANCE COMPANY
Say goodbye to spaghetti
code and complex DOM
manipulations with ReactJS
Infrastructure as code (IaC)
with Terraform
Managing containerized
applications at scale has
never been easier with K8s
More accurate and efficient
analytics with DBT
Front end
Cluster
orchestration
Dev Ops
Data job/op
CONFIDENTIAL. Copyright © 12
DBT philosophy
DDL, DML-free
Just write SELECT * FROM table
instead of having to manage multiple
DDL (CRUD), DML (tables, views)
transactions, schema, Pandas
DataFrame, etc.
DRY (Don’t Repeat Yourself)
Modularize the data model, reuse it in
many places instead of rewriting it
from scratch when moving to new
analytics (macros, hooks, package
management).
Avoid copying / pasting SQL scripts in
many places, not reusable, easy to
generate errors when the original data
model needs to be edited.
Model versioning
Data models are versioned, making it
easier to learn the process of building
business logic over time, collaborating
with team members (branching, pull
requests, code reviews,
documentation).
Data quality control
Writing tests for data models is quick
and convenient. Analysis errors often
occur in the corner cases, by
preventing these cases will make the
model more reliable later on.
CONFIDENTIAL. Copyright © 13
dbt and the modern BI stack
Source: link
dbt (data build tool) is a command line tool that enables data analysts and engineers to transform data in their
warehouses more effectively. Today, dbt has ~850 companies using it in production, including companies like Casper,
Seatgeek, and Wistia.
Load Transform
Extract
CONFIDENTIAL. Copyright © 14
How to work with DBT
CONFIDENTIAL. Copyright © 15
Step 1: Develop models Step 2: compile project Step 3: Build tables + views
Write business logic with a simple SQL file
DBT infers the dependencies in the data models and
builds the DAG (directed acyclic graph) for us.
When running dbt, the business logic will build as
tables or views in the data warehouse.
CONFIDENTIAL. Copyright © 16
CONFIDENTIAL. Copyright © 16
Demo
Goal: calculate monthly sales values by category
Tech stacks: DBT, Databricks, Azure Blob
Data: Brazilian E-Commerce Public Dataset by Olist (Kaggle)
Github: https://github.com/ongxuanhong/de05-dbt-databricks
Youtube: https://youtube.com/playlist?list=PLR0bWeb09-BxoexgE1JD-CUC7TAtNVeyO
CONFIDENTIAL. Copyright © 17
Calculate monthly sales values by category
values_per_bills = total_sales / total_bills
CONFIDENTIAL. Copyright © 18
DBT on Databricks Data Lakehouse with Brazilian Ecommerce dataset
Source: link
CONFIDENTIAL. Copyright © 19
Data Lakehouse: ingested data (JSON, Parquet, Avro, Delta)
Brazilian E-Commerce Public Dataset by Olist | Kaggle
CONFIDENTIAL. Copyright © 20
Data Lakehouse: create external tables (JSON, Parquet, Avro, Delta)
CONFIDENTIAL. Copyright © 21
Brazilian Ecommerce tables
CONFIDENTIAL. Copyright © 22
Initialize project
CONFIDENTIAL. Copyright © 23
DBT run
CONFIDENTIAL. Copyright © 24
Full pipeline
CONFIDENTIAL. Copyright © 25
Full pipeline
CONFIDENTIAL. Copyright © 26
Macros
CONFIDENTIAL. Copyright © 27
Pivot table
CONFIDENTIAL. Copyright © 28
DBT packages
https://hub.getdbt.com/
CONFIDENTIAL. Copyright © 29
DBT data lineage and output reports
CONFIDENTIAL. Copyright © 30
Key take away
CONFIDENTIAL. Copyright © 31
CONFIDENTIAL. Copyright © 31
• Enables seamless data transformation: DBT
automates the transformation of raw data into
a format that is useful for analytics. This allows
data analysts and engineers to focus on
insights and analysis rather than spending
time on data preparation.
• Provides a modular approach to data
transformation: DBT’s modular approach
makes it easy to break down complex
transformations into smaller, more
manageable steps. This allows teams to work
collaboratively on specific parts of a project
and to easily modify and test those parts
without affecting the entire project.
• Promotes data consistency and quality: DBT
enforces strict data testing and documentation
requirements, ensuring that data is accurate,
consistent, and reliable. This enables analysts
and engineers to have confidence in the data
they are working with, leading to better
insights and more informed decision-making.
Benefits
DATA BUILD TOOL (DBT)
Grown at 10% every single month (github)
CONFIDENTIAL. Copyright © 32
CONFIDENTIAL. Copyright © 32
• Requires SQL knowledge: While dbt makes
it easier to work with SQL, it still requires
a certain level of SQL knowledge to
use effectively. If you don't have experience
with SQL, you may need to invest time in
learning it in order to use dbt effectively.
• Performance overhead: Depending on the
complexity of your dbt models and the size
of your data, there may be a performance
overhead associated with using dbt.
• Limited scope: While dbt can help automate
some aspects of data modeling, it doesn't
solve all data-related problems. It's
important to understand the limitations of
dbt and when other tools or approaches
might be more appropriate.
Be aware of
DATA BUILD TOOL (DBT)
Source (link)
CONFIDENTIAL. Copyright © 33
Discussion
What is analytics engineering?
dbt: Model contract v1.5
dbt + Machine Learning: What makes a great baton pass?
dbt Cloud integrations (Snowflake, Airflow, Monte Carlo)
CONFIDENTIAL. Copyright © 34
References
• What is analytics engineering?
• What is dbt?
• Quickstart for dbt Core
• Tristan Handy — The Work Behind the Data Work

More Related Content

What's hot

Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackMichel Tricot
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 

What's hot (20)

Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 

Similar to DBT ELT approach for Advanced Analytics.pptx

Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Harsha Gowda B R
 
Traditional data word
Traditional data wordTraditional data word
Traditional data wordorcoxsm
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceDavid Portnoy
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionDenodo
 
Webinar on MongoDB BI Connectors
Webinar on MongoDB BI ConnectorsWebinar on MongoDB BI Connectors
Webinar on MongoDB BI ConnectorsSumit Sarkar
 
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDATAVERSITY
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITandreas kuncoro
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingBest Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingEric Kavanagh
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? Datameer
 
KeyAchivementsMimecast
KeyAchivementsMimecastKeyAchivementsMimecast
KeyAchivementsMimecastVera Ekimenko
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
What Is New In 2008 R2 Public
What Is New In 2008 R2 PublicWhat Is New In 2008 R2 Public
What Is New In 2008 R2 Publicsqlserver.co.il
 

Similar to DBT ELT approach for Advanced Analytics.pptx (20)

Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Amit_Kumar_CV
Amit_Kumar_CVAmit_Kumar_CV
Amit_Kumar_CV
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Webinar on MongoDB BI Connectors
Webinar on MongoDB BI ConnectorsWebinar on MongoDB BI Connectors
Webinar on MongoDB BI Connectors
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data ModelerDAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
DAS Slides: Data Architect vs. Data Engineer vs. Data Modeler
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Open Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise ITOpen Source Ecosystem Future of Enterprise IT
Open Source Ecosystem Future of Enterprise IT
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingBest Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
KeyAchivementsMimecast
KeyAchivementsMimecastKeyAchivementsMimecast
KeyAchivementsMimecast
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
What Is New In 2008 R2 Public
What Is New In 2008 R2 PublicWhat Is New In 2008 R2 Public
What Is New In 2008 R2 Public
 

More from Hong Ong

Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...Hong Ong
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfHong Ong
 
Data Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdfData Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdfHong Ong
 
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?Hong Ong
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịHong Ong
 
Nền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big DataNền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big DataHong Ong
 
Bắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big DataBắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big DataHong Ong
 
Bắt đầu học data science
Bắt đầu học data scienceBắt đầu học data science
Bắt đầu học data scienceHong Ong
 

More from Hong Ong (8)

Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...Feast Feature Store - An In-depth Overview Experimentation and Application in...
Feast Feature Store - An In-depth Overview Experimentation and Application in...
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
 
Data Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdfData Products for Mobile Commerce in Real-time and Real-life.pdf
Data Products for Mobile Commerce in Real-time and Real-life.pdf
 
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
VWS2017: Bắt đầu Big Data từ đâu và như thế nào?
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
 
Nền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big DataNền tảng thuật toán của AI, Machine Learning, Big Data
Nền tảng thuật toán của AI, Machine Learning, Big Data
 
Bắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big DataBắt đầu nghiên cứu Big Data
Bắt đầu nghiên cứu Big Data
 
Bắt đầu học data science
Bắt đầu học data scienceBắt đầu học data science
Bắt đầu học data science
 

Recently uploaded

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 

Recently uploaded (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 

DBT ELT approach for Advanced Analytics.pptx

  • 1. CONFIDENTIAL. Copyright © 1 1 DBT (DATA BUILD TOOL) AN ELT APPROACH FOR ADVANCED ANALYTICS
  • 2. CONFIDENTIAL. Copyright © 2 8+ years swimming in data @ A Researcher, Engineer and Blogger
  • 3. CONFIDENTIAL. Copyright © 3 Agenda 01 02 03 04 05 06 Motivation DBT Approach How to work with DBT Demo Key take away Discussion
  • 5. CONFIDENTIAL. Copyright © 5 We start with Excel files
  • 6. CONFIDENTIAL. Copyright © 6 Data Analytics (DA) daily job How to prepare master table? • Drag and drop to visualization tool? • Modeling on the fly? • Write complex queries? Multiple data sources Multiple tables Data modeling Source: link BIG DATA!? Volume: 10GB 5 years of Data. Variety: multiple data sources. Velocity: real-time analytics.
  • 7. CONFIDENTIAL. Copyright © 7 We moved to Datawarehouse Lead time at least 2 weeks DA don’t understand what DE did And vise versa Data warehouse Transform Load Extract
  • 8. CONFIDENTIAL. Copyright © 8 DE challenges Readability • How to read and understand this query? • Where to start? Accessibility • How to verify the output? • Can we break the script into smaller pieces for testing? Collaboration • How to reuse this query for other analysis? • How to onboard new members? • How to explain if there’re 100 tables? Scripting • How to reuse this query for other analysis? • How to manage model versions?
  • 9. CONFIDENTIAL. Copyright © 9 CONFIDENTIAL. Copyright © 9 Customer segmentation: Segmentation is a technique used to divide customers into groups based on certain characteristics or behaviors. This can help businesses understand their customers better and tailor their marketing efforts to specific groups. SQL can be used to create customer segments by grouping customers based on demographic information (age, gender, location) or transactional data (purchase history, frequency, monetary value). Cohort Analysis: DBT can be used to perform cohort analysis by transforming raw data into a format suitable for analysis. By using DBT to transform the data, analysts can quickly identify patterns and trends in user behavior and track the performance of different customer segments over time. Marketing Attribution: DBT can be used to perform marketing attribution analysis by transforming raw data into a format suitable for analysis. By using DBT to transform the data, marketers can better understand which channels and campaigns are driving the most conversions and optimize their marketing spend accordingly. Financial Reporting: DBT can be used to transform financial data into a format suitable for reporting and analysis. By using DBT to transform the data, financial analysts can quickly generate accurate and consistent reports that provide insights into company performance, revenue, expenses, and other key financial metrics. Demand forecasting: DBT can be used to create a series of transformations on raw transactional data to prepare it for predictive modeling. For example, it can be used to aggregate transactional data by time periods (e.g., days, weeks, or months) and join it with other relevant data sources such as weather data, holidays, or other events that can affect demand. Recommendation engines: Recommendation engines are used to suggest products or services to customers based on their past behavior or preferences. SQL can be used to create recommendation engines by analyzing customer purchase history and identifying patterns or similarities between customers. This can be used to suggest similar products or to identify cross-selling opportunities. USE CASES ADVANCED ANALYTICS How to go fast with Data-driven culture and Advanced Analytics?
  • 10. CONFIDENTIAL. Copyright © 10 DBT Approach DBT (Data Build Tool) an ELT approach for Advanced Analytics
  • 11. CONFIDENTIAL. Copyright © 11 Migration from Imperative to Declarative LEADING INSURANCE COMPANY Say goodbye to spaghetti code and complex DOM manipulations with ReactJS Infrastructure as code (IaC) with Terraform Managing containerized applications at scale has never been easier with K8s More accurate and efficient analytics with DBT Front end Cluster orchestration Dev Ops Data job/op
  • 12. CONFIDENTIAL. Copyright © 12 DBT philosophy DDL, DML-free Just write SELECT * FROM table instead of having to manage multiple DDL (CRUD), DML (tables, views) transactions, schema, Pandas DataFrame, etc. DRY (Don’t Repeat Yourself) Modularize the data model, reuse it in many places instead of rewriting it from scratch when moving to new analytics (macros, hooks, package management). Avoid copying / pasting SQL scripts in many places, not reusable, easy to generate errors when the original data model needs to be edited. Model versioning Data models are versioned, making it easier to learn the process of building business logic over time, collaborating with team members (branching, pull requests, code reviews, documentation). Data quality control Writing tests for data models is quick and convenient. Analysis errors often occur in the corner cases, by preventing these cases will make the model more reliable later on.
  • 13. CONFIDENTIAL. Copyright © 13 dbt and the modern BI stack Source: link dbt (data build tool) is a command line tool that enables data analysts and engineers to transform data in their warehouses more effectively. Today, dbt has ~850 companies using it in production, including companies like Casper, Seatgeek, and Wistia. Load Transform Extract
  • 14. CONFIDENTIAL. Copyright © 14 How to work with DBT
  • 15. CONFIDENTIAL. Copyright © 15 Step 1: Develop models Step 2: compile project Step 3: Build tables + views Write business logic with a simple SQL file DBT infers the dependencies in the data models and builds the DAG (directed acyclic graph) for us. When running dbt, the business logic will build as tables or views in the data warehouse.
  • 16. CONFIDENTIAL. Copyright © 16 CONFIDENTIAL. Copyright © 16 Demo Goal: calculate monthly sales values by category Tech stacks: DBT, Databricks, Azure Blob Data: Brazilian E-Commerce Public Dataset by Olist (Kaggle) Github: https://github.com/ongxuanhong/de05-dbt-databricks Youtube: https://youtube.com/playlist?list=PLR0bWeb09-BxoexgE1JD-CUC7TAtNVeyO
  • 17. CONFIDENTIAL. Copyright © 17 Calculate monthly sales values by category values_per_bills = total_sales / total_bills
  • 18. CONFIDENTIAL. Copyright © 18 DBT on Databricks Data Lakehouse with Brazilian Ecommerce dataset Source: link
  • 19. CONFIDENTIAL. Copyright © 19 Data Lakehouse: ingested data (JSON, Parquet, Avro, Delta) Brazilian E-Commerce Public Dataset by Olist | Kaggle
  • 20. CONFIDENTIAL. Copyright © 20 Data Lakehouse: create external tables (JSON, Parquet, Avro, Delta)
  • 21. CONFIDENTIAL. Copyright © 21 Brazilian Ecommerce tables
  • 22. CONFIDENTIAL. Copyright © 22 Initialize project
  • 24. CONFIDENTIAL. Copyright © 24 Full pipeline
  • 25. CONFIDENTIAL. Copyright © 25 Full pipeline
  • 27. CONFIDENTIAL. Copyright © 27 Pivot table
  • 28. CONFIDENTIAL. Copyright © 28 DBT packages https://hub.getdbt.com/
  • 29. CONFIDENTIAL. Copyright © 29 DBT data lineage and output reports
  • 30. CONFIDENTIAL. Copyright © 30 Key take away
  • 31. CONFIDENTIAL. Copyright © 31 CONFIDENTIAL. Copyright © 31 • Enables seamless data transformation: DBT automates the transformation of raw data into a format that is useful for analytics. This allows data analysts and engineers to focus on insights and analysis rather than spending time on data preparation. • Provides a modular approach to data transformation: DBT’s modular approach makes it easy to break down complex transformations into smaller, more manageable steps. This allows teams to work collaboratively on specific parts of a project and to easily modify and test those parts without affecting the entire project. • Promotes data consistency and quality: DBT enforces strict data testing and documentation requirements, ensuring that data is accurate, consistent, and reliable. This enables analysts and engineers to have confidence in the data they are working with, leading to better insights and more informed decision-making. Benefits DATA BUILD TOOL (DBT) Grown at 10% every single month (github)
  • 32. CONFIDENTIAL. Copyright © 32 CONFIDENTIAL. Copyright © 32 • Requires SQL knowledge: While dbt makes it easier to work with SQL, it still requires a certain level of SQL knowledge to use effectively. If you don't have experience with SQL, you may need to invest time in learning it in order to use dbt effectively. • Performance overhead: Depending on the complexity of your dbt models and the size of your data, there may be a performance overhead associated with using dbt. • Limited scope: While dbt can help automate some aspects of data modeling, it doesn't solve all data-related problems. It's important to understand the limitations of dbt and when other tools or approaches might be more appropriate. Be aware of DATA BUILD TOOL (DBT) Source (link)
  • 33. CONFIDENTIAL. Copyright © 33 Discussion What is analytics engineering? dbt: Model contract v1.5 dbt + Machine Learning: What makes a great baton pass? dbt Cloud integrations (Snowflake, Airflow, Monte Carlo)
  • 34. CONFIDENTIAL. Copyright © 34 References • What is analytics engineering? • What is dbt? • Quickstart for dbt Core • Tristan Handy — The Work Behind the Data Work

Editor's Notes

  1. How to move faster? How to ensure data quality?