How to build data accessibility for everyone

How to build
data accessibility
for everyone by open source?
Karen Hsieh, 2022/7/31

Karen Hsieh
A product manager builds
company-wide data literacy and
empowers the product team to
create values for people and grow
the company to proﬁt.
Welcome connect 👋
www.linkedin.com/in/karenhsieh/
● Contribute Using Metabase for Self-service
product analytics to Metabase Community.
● Moderator of #dbt-local-taipei.

Data Accessibility
The ability to access data

Prerequisites
● Has data-informed culture.
○ You let data act as a check on your intuition.
● Some people doing spreadsheets feel tired to repeat the work.
○ “My computer is so slow 🤬 ! “ (When opening a spreadsheet.)
○ “😩 I spend 2 hours to produce the weekly report.” (The report is generated by multiple
spreadsheets.)

Current
Raw
Data
󰳕Engineers 󰞚A data user
🤬
󰲑B data user
😩
󰠀C data user
😰
Only Engineers have
data accessibility

Goal
Raw
Data
󰳕Engineers
Transferred
Data
󰟱Analysts 󰞚A data user
󰲑B data user
󰠀C data user
📊Business Intelligence,
BI Tool
Everyone has data accessibility

Why don’t we let everyone access raw data?
Let everyone accesses raw data
● Everyone needs to understand the raw
data
○ Raw data are not that clean 🥹
○ Effort on documentation
● Everyone needs to know how to write SQL
○ Require them to learn a new skill
Everyone accesses transferred data
● It’s more clear and easy to understand
● It’s much easier to generate reports from
there, e.g. create a pivot table in
spreadsheets
Why don’t we expect everyone access raw
data?

Goal 💪
Empowers everyone to do
self-serve analysis.
● Understand data
● Access data easily
● Build reports easily
Subscription Business
Subscription
channel analysis
Monthly subscription
Subscription coupon
usage

How do we do
1. What reports do people want?
2. What raw data do we have?
○ 🤯 Mostly ask someone who work here for a long time. (Time for archeology. ⛏)
3. Back and forwarth between 1 and 2 = How to transfer data?
○ 🤯🤯 Make sure the numbers are consistent with the previous data that they manually
counted so the users are comfortable and conﬁdent to use the transferred data. (May ﬁnd
out some manual data have errors. 😰)

Data models (detail in this Miro board)
order_user
Raw data
Transferred data
stage
Transferred data
mart
Reports
subscriptions orders coupons channels users
order_revenue
subscription_user
Subscription
channel
Monthly
subscriptions
Subscription
coupon usage
1. Understand needs
2. What we have
3.

1 table
Raw data
Transferred data
stage
Transferred data
mart
reports
order_revenue
subscription_user
Subscription
channel
Monthly
subscriptions
Subscription
coupon usage
More..
order_user

Data pipeline from ETL ELT
● Extract
● Transfer
● Load
Due to cloud storage was expensive, so we want
to make sure we only load valuable data.
● Extract
● Load
● Transfer
Since cloud storage and computing are easy and
cheaper, we can load everything we extract then
do the transfer later.

R&R
Engineers
build the data pipeline
● Knowledge of data & platform
structure
● Setup the environment,
including data warehouse and
BI tool
Analysts
do data transfer & single
source of truth
● dbt, github, data warehouse
● SQL
● Understand business logic &
doc
Everyone
uses the transferred data
● Advanced - build reports
○ SQL
○ Know transferred data
● Basic - use reports
○ BI Tool
Note: Analytics Engineers provide clean data sets to end users

order_user
Raw data
Transferred data
stage
Transferred data
mart
Reports
order_revenue
subscription_user
Subscription
channel
Monthly
subscriptions
Subscription
coupon usage
3. Everyone for reports
1. Engineers for EL
2. Analysts for T

Open Source Tools
for data transfer
- with Github and the data
warehouse
the BI tool

Modularized SQL query
● Use ref() or source()
● Auto generated DAG
Source: On DAGs, Hierarchies, and IDEs
Don’t throw 🗑 your query away. 💎 It’s reusable.
See the upstream and downstream relationships.

dbt doc
● Write doc in YML
● Source data:
○ src_xx.yml
● Transferred data:
○ stg_xx.yml,
○ mar_xx.yml
Source: Documentation

Sync dbt doc to Metabase
● persist_docs
○ Sync doc to data warehouses.
● dbt_metabase
○ Model synchronization from dbt
to Metabase.
● Source data is not supported.
It’s easy to keep doc posted.
The doc is usable only if it is updated.

dbt test
● Ensure data quality.
● tests:
- unique
- not_null
- relationships
- accepted_values
Source: Tests
Everyone trusts the data. Earn the trust.

dbt seed
Some data are manualled input. Seeds are CSV ﬁles in your dbt project.
dbt seed makes the CSV ﬁles into models. Manually input is included in the data source.

Schedule dbt_prod run
● E.g. Daily run
Source: dbt Cloud overview
Do it once.

Conﬁg incremental models
An incremental run will be the rows in your source data that have been created or
updated since the last time dbt ran
Source: Conﬁguring incremental models
Save the cost and decrease the errors.

Version control by Github
● Collaborate SQL
● Enabling CI
Source: Enabling CI

Question vs Dashboard
A query is a question. A question can be added into multiple
dashboards.
Source: Writing SQL
Source: Dashboard

Easy to adopt
K user says
“After learning SQL in the 11th day,
he builds a dashboard on
Metabase. “
(dashboard screenshot is a sample)

Know your data
View table and column descriptions while writing query.
Source: Data Reference
No misunderstanding. Don’t guess.

Variables for ﬁltering
{{variable name}} as variables.
Source: SQL parameters
Enable basic users to use the reports.

Visualizing data
Support 16 ways of visualization.
Source: Visualizing results

Subscribe dashboard via Email / Slack
Auto refresh and send dashboards.
Source: Dashboard subscription
Do it once.

Detail permission controls
Set permission to Datasets, Tables, Collections by groups.
Source: Data permissions

🤩 Wow~ I like to do this!
󰳕Engineer: I want to get rid of checking data errors.
󰞚Data user: I don’t want to wait for someone providing the data.

Build data accessibility to everyone
Raw
Data
Transferred
Data
󰳕Engineers and 󰟱Analysts make sure the data
quality and keeps the data pipeline
Everyone 󰞚󰲑󰠀 owns the reports
and does self-serve data analysis.
🤝
😄
📊Business Intelligence,
BI Tool

Reinforce the data-informed culture
= Raise the data literacy
Self-serve analysis is easy and quick
Many data with good quality.
󰞚󰲑󰠀 like to check the data.
😄 📊

How do we do
1. What reports do people want?
2. What raw data do we have?
3. Transferred data
4. Advocate SQL
5. Share how to use Metabase
Recurring reports are send out
automatically. 🤖
Self-served ad hoc questions. 🎉

Give me
feedback 🎁
Feedback is a gift. 🙏🙏🙏

Examples - transferred data
Before:
● A operation staff who did 20 revenue reports monthly.
● She waited 6 hours for checking + 1 day for importing per report.
After:
● 5 min to import 1 report.

Examples - transferred data
Before:
● Waited 10 mins to open a spreadsheet with >10 tabs and >10K rows.
● Email attached the reports to the partner.
After:
● Automated update data to the dashboard on Data Studio.
● Share the dashboard to the partner. They can check it anytime.

How to build data accessibility for everyone

Recommended

Recommended

More Related Content

Similar to How to build data accessibility for everyone

Similar to How to build data accessibility for everyone (20)

Recently uploaded

Recently uploaded (20)

How to build data accessibility for everyone