SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
1.
Data Discoverability
Maggie Hays
Senior Product Manager -- Data Services
DataHub Town Hall -- September 25, 2020
2.
2
Agenda
● Overview of Teams & Data Stack
● Current State of Data Discoverability
● Data Catalog Evaluation
● DataHub POC - Hypotheses, Progress, and Next Steps
● Brief Demo!
3.
3
SpotHero’s Data-Focused Teams
Data Engineering
3 Engineers
SpotHero IQ
2 Engineers
1-3 Data Scientists
(We’re hiring!!)
Analytics
5 Business Analysts
4.
4
Looker
Airflow
SpotHero’s Data Stack
SH Application
Data
Workflow Tools
Marketing Tools
Microservices
Clickstream
Analytics
Redshift
S3/Parquet
Fivetran
Segment
Kafka
SQL
Python
Spark
Sources Ingestion Storage ETL
5.
5
1
2
3
Current State of Data Discoverability
Data Lineage is difficult to discover and navigate,
regardless of role or tenure
● Impact analysis is arduous; Engineers avoid breaking changes at all costs
● Prolonged debugging/troubleshooting data issues
Difficult to discover what data exists and/or
what it represents
● Reliance on tribal knowledge
● Large burden on the Analytics team to answer any/all questions
Confidence in Data Accuracy is neutral, but room for
improvement
● Once folks track down the data, they are relatively confident in its
accuracy
May 2020 Internal Survey - Engineering, Product, Analytics, Data Science teams; 47% response rate
6.
6
Data Catalog Evaluation
DataHub
Amundsen
/ Marquez
Apache
Atlas Alation
Ease of Integration
Lineage Support
Configurable
Metadata
Affordability
7.
7
1
2
3
DataHub POC - Hypotheses
Increase visibility into what data exists, what it
represents, and how it’s used across the
company
Decrease the effort required by SpotHero
teammates to use and interpret data
Increase SpotHero teammates’ confidence in
the accuracy and/or relevance of data