SlideShare a Scribd company logo
1 of 43
Download to read offline
How to build
data accessibility
for everyone by open source?
Karen Hsieh, 2022/7/31
Karen Hsieh
A product manager builds
company-wide data literacy and
empowers the product team to
create values for people and grow
the company to profit.
Welcome connect 👋
www.linkedin.com/in/karenhsieh/
● Contribute Using Metabase for Self-service
product analytics to Metabase Community.
● Moderator of #dbt-local-taipei.
Data Accessibility
The ability to access data
Prerequisites
● Has data-informed culture.
○ You let data act as a check on your intuition.
● Some people doing spreadsheets feel tired to repeat the work.
○ “My computer is so slow 🤬 ! “ (When opening a spreadsheet.)
○ “😩 I spend 2 hours to produce the weekly report.” (The report is generated by multiple
spreadsheets.)
Current
Raw
Data
󰳕Engineers 󰞚A data user
🤬
󰲑B data user
😩
󰠀C data user
😰
Only Engineers have
data accessibility
Goal
Raw
Data
󰳕Engineers
Transferred
Data
󰟱Analysts 󰞚A data user
󰲑B data user
󰠀C data user
📊Business Intelligence,
BI Tool
Everyone has data accessibility
Why don’t we let everyone access raw data?
Let everyone accesses raw data
● Everyone needs to understand the raw
data
○ Raw data are not that clean 🥹
○ Effort on documentation
● Everyone needs to know how to write SQL
○ Require them to learn a new skill
Everyone accesses transferred data
● It’s more clear and easy to understand
● It’s much easier to generate reports from
there, e.g. create a pivot table in
spreadsheets
Why don’t we expect everyone access raw
data?
Goal 💪
Empowers everyone to do
self-serve analysis.
● Understand data
● Access data easily
● Build reports easily
Subscription Business
Subscription
channel analysis
Monthly subscription
Subscription coupon
usage
How do we do
1. What reports do people want?
2. What raw data do we have?
○ 🤯 Mostly ask someone who work here for a long time. (Time for archeology. ⛏)
3. Back and forwarth between 1 and 2 = How to transfer data?
○ 🤯🤯 Make sure the numbers are consistent with the previous data that they manually
counted so the users are comfortable and confident to use the transferred data. (May find
out some manual data have errors. 😰)
Data models (detail in this Miro board)
order_user
Raw data
Transferred data
stage
Transferred data
mart
Reports
subscriptions orders coupons channels users
order_revenue
subscription_user
Subscription
channel
Monthly
subscriptions
Subscription
coupon usage
1. Understand needs
2. What we have
3.
Data models (detail in this Miro board)
1 table
Raw data
Transferred data
stage
Transferred data
mart
reports
subscriptions orders coupons channels users
order_revenue
subscription_user
Subscription
channel
Monthly
subscriptions
Subscription
coupon usage
More..
order_user
Data pipeline from ETL ELT
● Extract
● Transfer
● Load
Due to cloud storage was expensive, so we want
to make sure we only load valuable data.
● Extract
● Load
● Transfer
Since cloud storage and computing are easy and
cheaper, we can load everything we extract then
do the transfer later.
R&R
Engineers
build the data pipeline
● Knowledge of data & platform
structure
● Setup the environment,
including data warehouse and
BI tool
Analysts
do data transfer & single
source of truth
● dbt, github, data warehouse
● SQL
● Understand business logic &
doc
Everyone
uses the transferred data
● Advanced - build reports
○ SQL
○ Know transferred data
● Basic - use reports
○ BI Tool
Note: Analytics Engineers provide clean data sets to end users
Data models (detail in this Miro board)
order_user
Raw data
Transferred data
stage
Transferred data
mart
Reports
subscriptions orders coupons channels users
order_revenue
subscription_user
Subscription
channel
Monthly
subscriptions
Subscription
coupon usage
3. Everyone for reports
1. Engineers for EL
2. Analysts for T
Open Source Tools
for data transfer
- with Github and the data
warehouse
the BI tool
Modularized SQL query
● Use ref() or source()
● Auto generated DAG
Source: On DAGs, Hierarchies, and IDEs
Don’t throw 🗑 your query away. 💎 It’s reusable.
See the upstream and downstream relationships.
dbt doc
● Write doc in YML
● Source data:
○ src_xx.yml
● Transferred data:
○ stg_xx.yml,
○ mar_xx.yml
Source: Documentation
Sync dbt doc to Metabase
● persist_docs
○ Sync doc to data warehouses.
● dbt_metabase
○ Model synchronization from dbt
to Metabase.
● Source data is not supported.
It’s easy to keep doc posted.
The doc is usable only if it is updated.
dbt test
● Ensure data quality.
● tests:
- unique
- not_null
- relationships
- accepted_values
Source: Tests
Everyone trusts the data. Earn the trust.
dbt seed
Some data are manualled input. Seeds are CSV files in your dbt project.
dbt seed makes the CSV files into models. Manually input is included in the data source.
Schedule dbt_prod run
● E.g. Daily run
Source: dbt Cloud overview
Do it once.
Config incremental models
An incremental run will be the rows in your source data that have been created or
updated since the last time dbt ran
Source: Configuring incremental models
Save the cost and decrease the errors.
Version control by Github
● Collaborate SQL
● Enabling CI
Source: Enabling CI
Metabase
Question vs Dashboard
A query is a question. A question can be added into multiple
dashboards.
Source: Writing SQL
Source: Dashboard
Easy to adopt
K user says
“After learning SQL in the 11th day,
he builds a dashboard on
Metabase. “
(dashboard screenshot is a sample)
Know your data
View table and column descriptions while writing query.
Source: Data Reference
No misunderstanding. Don’t guess.
Variables for filtering
{{variable name}} as variables.
Source: SQL parameters
Enable basic users to use the reports.
Visualizing data
Support 16 ways of visualization.
Source: Visualizing results
Subscribe dashboard via Email / Slack
Auto refresh and send dashboards.
Source: Dashboard subscription
Do it once.
Detail permission controls
Set permission to Datasets, Tables, Collections by groups.
Source: Data permissions
Take Away
🤩 Wow~ I like to do this!
󰳕Engineer: I want to get rid of checking data errors.
󰞚Data user: I don’t want to wait for someone providing the data.
Build data accessibility to everyone
Raw
Data
Transferred
Data
󰳕Engineers and 󰟱Analysts make sure the data
quality and keeps the data pipeline
Everyone 󰞚󰲑󰠀 owns the reports
and does self-serve data analysis.
🤝
😄
📊Business Intelligence,
BI Tool
Reinforce the data-informed culture
= Raise the data literacy
Self-serve analysis is easy and quick
Many data with good quality.
󰞚󰲑󰠀 like to check the data.
😄 📊
How do we do
1. What reports do people want?
2. What raw data do we have?
3. Transferred data
4. Advocate SQL
5. Share how to use Metabase
Recurring reports are send out
automatically. 🤖
Self-served ad hoc questions. 🎉
Q&A 󰢨󰢧
Using slido
Thank you 🙌
Give me
feedback 🎁
Feedback is a gift. 🙏🙏🙏
Examples - transferred data
Before:
● A operation staff who did 20 revenue reports monthly.
● She waited 6 hours for checking + 1 day for importing per report.
After:
● 5 min to import 1 report.
Examples - transferred data
Before:
● Waited 10 mins to open a spreadsheet with >10 tabs and >10K rows.
● Email attached the reports to the partner.
After:
● Automated update data to the dashboard on Data Studio.
● Share the dashboard to the partner. They can check it anytime.

More Related Content

Similar to How to build data accessibility for everyone

Ajith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith Kumar Pampatti
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysDemi Ben-Ari
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenDatabricks
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 
Data Ops at TripActions
Data Ops at TripActionsData Ops at TripActions
Data Ops at TripActionsRob Winters
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Qiagram
QiagramQiagram
Qiagramjwppz
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageJulien Le Dem
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...DataKitchen
 
Key Skills Required for Data Engineering
Key Skills Required for Data EngineeringKey Skills Required for Data Engineering
Key Skills Required for Data EngineeringFibonalabs
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform Michael Ghen
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningKai Wähner
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysWout Scheepers
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 

Similar to How to build data accessibility for everyone (20)

Sea of Data
Sea of DataSea of Data
Sea of Data
 
Ajith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETL
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Data Discovery at Databricks with Amundsen
Data Discovery at Databricks with AmundsenData Discovery at Databricks with Amundsen
Data Discovery at Databricks with Amundsen
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Data Ops at TripActions
Data Ops at TripActionsData Ops at TripActions
Data Ops at TripActions
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Qiagram
QiagramQiagram
Qiagram
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
Strata+hadoop data kitchen-seven-steps-to-high-velocity-data-analytics-with d...
 
Key Skills Required for Data Engineering
Key Skills Required for Data EngineeringKey Skills Required for Data Engineering
Key Skills Required for Data Engineering
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Transition to a modern data platform
Transition to a modern data platform Transition to a modern data platform
Transition to a modern data platform
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM Exellys
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 

Recently uploaded

Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝soniya singh
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024eCommerce Institute
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSebastiano Panichella
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSebastiano Panichella
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Delhi Call girls
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptxBasil Achie
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Pooja Nehwal
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxmavinoikein
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfhenrik385807
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AITatiana Gurgel
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...NETWAYS
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)Basil Achie
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Salam Al-Karadaghi
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...NETWAYS
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptssuser319dad
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Kayode Fayemi
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )Pooja Nehwal
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfakankshagupta7348026
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
Call Girls in Sarojini Nagar Market Delhi 💯 Call Us 🔝8264348440🔝
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with AerialistSimulation-based Testing of Unmanned Aerial Vehicles with Aerialist
Simulation-based Testing of Unmanned Aerial Vehicles with Aerialist
 
SBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation TrackSBFT Tool Competition 2024 -- Python Test Case Generation Track
SBFT Tool Competition 2024 -- Python Test Case Generation Track
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
LANDMARKS  AND MONUMENTS IN NIGERIA.pptxLANDMARKS  AND MONUMENTS IN NIGERIA.pptx
LANDMARKS AND MONUMENTS IN NIGERIA.pptx
 
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
Navi Mumbai Call Girls Service Pooja 9892124323 Real Russian Girls Looking Mo...
 
Work Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptxWork Remotely with Confluence ACE 2.pptx
Work Remotely with Confluence ACE 2.pptx
 
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdfCTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
CTAC 2024 Valencia - Henrik Hanke - Reduce to the max - slideshare.pdf
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
 
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
NATIONAL ANTHEMS OF AFRICA (National Anthems of Africa)
 
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
 
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
Open Source Camp Kubernetes 2024 | Monitoring Kubernetes With Icinga by Eric ...
 
Philippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.pptPhilippine History cavite Mutiny Report.ppt
Philippine History cavite Mutiny Report.ppt
 
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Vaishnavi 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Vaishnavi 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
WhatsApp 📞 9892124323 ✅Call Girls In Juhu ( Mumbai )
 
Motivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdfMotivation and Theory Maslow and Murray pdf
Motivation and Theory Maslow and Murray pdf
 
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Rohini Delhi 💯Call Us 🔝8264348440🔝
 

How to build data accessibility for everyone

  • 1. How to build data accessibility for everyone by open source? Karen Hsieh, 2022/7/31
  • 2. Karen Hsieh A product manager builds company-wide data literacy and empowers the product team to create values for people and grow the company to profit. Welcome connect 👋 www.linkedin.com/in/karenhsieh/ ● Contribute Using Metabase for Self-service product analytics to Metabase Community. ● Moderator of #dbt-local-taipei.
  • 4. Prerequisites ● Has data-informed culture. ○ You let data act as a check on your intuition. ● Some people doing spreadsheets feel tired to repeat the work. ○ “My computer is so slow 🤬 ! “ (When opening a spreadsheet.) ○ “😩 I spend 2 hours to produce the weekly report.” (The report is generated by multiple spreadsheets.)
  • 5. Current Raw Data 󰳕Engineers 󰞚A data user 🤬 󰲑B data user 😩 󰠀C data user 😰 Only Engineers have data accessibility
  • 6. Goal Raw Data 󰳕Engineers Transferred Data 󰟱Analysts 󰞚A data user 󰲑B data user 󰠀C data user 📊Business Intelligence, BI Tool Everyone has data accessibility
  • 7. Why don’t we let everyone access raw data? Let everyone accesses raw data ● Everyone needs to understand the raw data ○ Raw data are not that clean 🥹 ○ Effort on documentation ● Everyone needs to know how to write SQL ○ Require them to learn a new skill Everyone accesses transferred data ● It’s more clear and easy to understand ● It’s much easier to generate reports from there, e.g. create a pivot table in spreadsheets Why don’t we expect everyone access raw data?
  • 8. Goal 💪 Empowers everyone to do self-serve analysis. ● Understand data ● Access data easily ● Build reports easily Subscription Business Subscription channel analysis Monthly subscription Subscription coupon usage
  • 9. How do we do 1. What reports do people want? 2. What raw data do we have? ○ 🤯 Mostly ask someone who work here for a long time. (Time for archeology. ⛏) 3. Back and forwarth between 1 and 2 = How to transfer data? ○ 🤯🤯 Make sure the numbers are consistent with the previous data that they manually counted so the users are comfortable and confident to use the transferred data. (May find out some manual data have errors. 😰)
  • 10. Data models (detail in this Miro board) order_user Raw data Transferred data stage Transferred data mart Reports subscriptions orders coupons channels users order_revenue subscription_user Subscription channel Monthly subscriptions Subscription coupon usage 1. Understand needs 2. What we have 3.
  • 11. Data models (detail in this Miro board) 1 table Raw data Transferred data stage Transferred data mart reports subscriptions orders coupons channels users order_revenue subscription_user Subscription channel Monthly subscriptions Subscription coupon usage More.. order_user
  • 12. Data pipeline from ETL ELT ● Extract ● Transfer ● Load Due to cloud storage was expensive, so we want to make sure we only load valuable data. ● Extract ● Load ● Transfer Since cloud storage and computing are easy and cheaper, we can load everything we extract then do the transfer later.
  • 13. R&R Engineers build the data pipeline ● Knowledge of data & platform structure ● Setup the environment, including data warehouse and BI tool Analysts do data transfer & single source of truth ● dbt, github, data warehouse ● SQL ● Understand business logic & doc Everyone uses the transferred data ● Advanced - build reports ○ SQL ○ Know transferred data ● Basic - use reports ○ BI Tool Note: Analytics Engineers provide clean data sets to end users
  • 14. Data models (detail in this Miro board) order_user Raw data Transferred data stage Transferred data mart Reports subscriptions orders coupons channels users order_revenue subscription_user Subscription channel Monthly subscriptions Subscription coupon usage 3. Everyone for reports 1. Engineers for EL 2. Analysts for T
  • 15. Open Source Tools for data transfer - with Github and the data warehouse the BI tool
  • 16.
  • 17. Modularized SQL query ● Use ref() or source() ● Auto generated DAG Source: On DAGs, Hierarchies, and IDEs Don’t throw 🗑 your query away. 💎 It’s reusable. See the upstream and downstream relationships.
  • 18. dbt doc ● Write doc in YML ● Source data: ○ src_xx.yml ● Transferred data: ○ stg_xx.yml, ○ mar_xx.yml Source: Documentation
  • 19. Sync dbt doc to Metabase ● persist_docs ○ Sync doc to data warehouses. ● dbt_metabase ○ Model synchronization from dbt to Metabase. ● Source data is not supported. It’s easy to keep doc posted. The doc is usable only if it is updated.
  • 20. dbt test ● Ensure data quality. ● tests: - unique - not_null - relationships - accepted_values Source: Tests Everyone trusts the data. Earn the trust.
  • 21. dbt seed Some data are manualled input. Seeds are CSV files in your dbt project. dbt seed makes the CSV files into models. Manually input is included in the data source.
  • 22. Schedule dbt_prod run ● E.g. Daily run Source: dbt Cloud overview Do it once.
  • 23. Config incremental models An incremental run will be the rows in your source data that have been created or updated since the last time dbt ran Source: Configuring incremental models Save the cost and decrease the errors.
  • 24. Version control by Github ● Collaborate SQL ● Enabling CI Source: Enabling CI
  • 26. Question vs Dashboard A query is a question. A question can be added into multiple dashboards. Source: Writing SQL Source: Dashboard
  • 27. Easy to adopt K user says “After learning SQL in the 11th day, he builds a dashboard on Metabase. “ (dashboard screenshot is a sample)
  • 28. Know your data View table and column descriptions while writing query. Source: Data Reference No misunderstanding. Don’t guess.
  • 29. Variables for filtering {{variable name}} as variables. Source: SQL parameters Enable basic users to use the reports.
  • 30. Visualizing data Support 16 ways of visualization. Source: Visualizing results
  • 31. Subscribe dashboard via Email / Slack Auto refresh and send dashboards. Source: Dashboard subscription Do it once.
  • 32. Detail permission controls Set permission to Datasets, Tables, Collections by groups. Source: Data permissions
  • 34. 🤩 Wow~ I like to do this! 󰳕Engineer: I want to get rid of checking data errors. 󰞚Data user: I don’t want to wait for someone providing the data.
  • 35. Build data accessibility to everyone Raw Data Transferred Data 󰳕Engineers and 󰟱Analysts make sure the data quality and keeps the data pipeline Everyone 󰞚󰲑󰠀 owns the reports and does self-serve data analysis. 🤝 😄 📊Business Intelligence, BI Tool
  • 36. Reinforce the data-informed culture = Raise the data literacy Self-serve analysis is easy and quick Many data with good quality. 󰞚󰲑󰠀 like to check the data. 😄 📊
  • 37. How do we do 1. What reports do people want? 2. What raw data do we have? 3. Transferred data 4. Advocate SQL 5. Share how to use Metabase Recurring reports are send out automatically. 🤖 Self-served ad hoc questions. 🎉
  • 40.
  • 41. Give me feedback 🎁 Feedback is a gift. 🙏🙏🙏
  • 42. Examples - transferred data Before: ● A operation staff who did 20 revenue reports monthly. ● She waited 6 hours for checking + 1 day for importing per report. After: ● 5 min to import 1 report.
  • 43. Examples - transferred data Before: ● Waited 10 mins to open a spreadsheet with >10 tabs and >10K rows. ● Email attached the reports to the partner. After: ● Automated update data to the dashboard on Data Studio. ● Share the dashboard to the partner. They can check it anytime.