SlideShare a Scribd company logo
Data Discoverability with DataHub
Maggie Hays
Senior Product Manager -- Data Services
Data Quality Meetup -- November 19, 2020
2
Agenda
● Overview of Teams
● Current State of Data Discoverability
● Data Catalog Evaluation
● DataHub POC - Progress & Level of Effort
● Highlight: DataHub Functionality
3
SpotHero’s Data-Focused Teams
Data Engineering
3 Engineers
SpotHero IQ
2 Engineers
3 Data Scientists
Analytics
3 Business Analysts
(We’re hiring!!)
4
1
2
3
Current State of Data Discoverability
Data Lineage is difficult to discover and navigate,
regardless of role or tenure
● Impact analysis is arduous; Engineers avoid breaking changes at all costs
● Prolonged debugging/troubleshooting data issues
Difficult to discover what data exists and/or
what it represents
● Reliance on tribal knowledge
● Large burden on the Analytics team to answer any/all questions
Confidence in Data Accuracy is neutral, but room for
improvement
● Once folks track down the data, they are relatively confident in its
accuracy
May 2020 Internal Survey - Engineering, Product, Analytics, Data Science teams; 47% response rate
5
Data Catalog Evaluation
DataHub
Amundsen
/ Marquez
Apache
Atlas Alation
Ease of Integration
Lineage Support
Configurable
Metadata
Affordability
6
Looker
Airflow
SpotHero’s Data Stack & DataHub POC
SH Application
Data
Workflow Tools
Marketing Tools
Microservices
Clickstream
Analytics
Redshift
S3/Parquet
Fivetran
Segment
Kafka
SQL
Python
Spark
Sources Ingestion Storage ETL
Complete
Q4 2020
7
1
2
3
DataHub POC - Level of Effort
Research & Tool Evaluation: 180 hrs
● Creation of Pugh Matrix to force-rank evaluation
● Rapid side-by-side POC of DataHub and Amundsen/Marquez
Initial Rollout of DataHub POC: 300 hrs
● Terraform Elasticsearch, MySQL, Neo4j, Aiven; helm chart for
API/frontend/Kafka components
● Datalake & ETL scrapers, including lineage
● Enrich with ETL ownership, links to GHE
Looker & Kafka Metadata Ingestion & Lineage: Est. 160 hrs
● Building Looker/LookML scraper - planning to contribute back to DH codebase
● Teaming up with DataHub to inform design of Dashboard entities
8
DataHub Functionality: Cross-Platform Search
9
DataHub Functionality: Dataset Metadata
DDL & Ownership External Docs
10
DataHub Functionality:
Lineage
11
Yay Data Discoverability!

More Related Content

What's hot

Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Simplilearn
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
ssuser23e4f31
 
Reference Letter - Erica Lai_Cheesecake Factory
Reference Letter - Erica Lai_Cheesecake FactoryReference Letter - Erica Lai_Cheesecake Factory
Reference Letter - Erica Lai_Cheesecake Factory
Erica Lai
 
KDD 2019 IADSS Workshop - How Data Scientists can bridge the gap between Data...
KDD 2019 IADSS Workshop - How Data Scientists can bridge the gap between Data...KDD 2019 IADSS Workshop - How Data Scientists can bridge the gap between Data...
KDD 2019 IADSS Workshop - How Data Scientists can bridge the gap between Data...
IADSS
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Formulatedby
 
Data analytics
Data analyticsData analytics
Data analytics
BindhuBhargaviTalasi
 
Data science using r multisoft systems
Data science using r  multisoft systemsData science using r  multisoft systems
Data science using r multisoft systems
Multisoft Systems
 
Implementation of data science in organizations
Implementation of data science in organizationsImplementation of data science in organizations
Implementation of data science in organizations
Koo Ping Shung
 
An overview of big data analytics
An overview of big data analytics An overview of big data analytics
An overview of big data analytics
LuisaFernandaParraTabares
 
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Formulatedby
 
Help Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your DataHelp Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your Data
Data Con LA
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
Lyn Fenex
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
Dawit Nida
 
Big Data Specialization Certificate 2015
Big Data Specialization Certificate 2015Big Data Specialization Certificate 2015
Big Data Specialization Certificate 2015
Gianfranco Campana
 
Analytics in Telco Analytics
Analytics in Telco AnalyticsAnalytics in Telco Analytics
Analytics in Telco Analytics
Thatchaphon Kaeosuriya
 
Rapid Data Integration: Tools & Methodology
Rapid Data Integration: Tools & MethodologyRapid Data Integration: Tools & Methodology
Rapid Data Integration: Tools & Methodology
Kalido
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
Lisa Cohen
 
Different Career Paths in Data Science
Different Career Paths in Data ScienceDifferent Career Paths in Data Science
Different Career Paths in Data Science
Roger Huang
 
Make good products great with data and analytics
Make good products great with data and analyticsMake good products great with data and analytics
Make good products great with data and analytics
David Mathias
 

What's hot (19)

Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
 
Reference Letter - Erica Lai_Cheesecake Factory
Reference Letter - Erica Lai_Cheesecake FactoryReference Letter - Erica Lai_Cheesecake Factory
Reference Letter - Erica Lai_Cheesecake Factory
 
KDD 2019 IADSS Workshop - How Data Scientists can bridge the gap between Data...
KDD 2019 IADSS Workshop - How Data Scientists can bridge the gap between Data...KDD 2019 IADSS Workshop - How Data Scientists can bridge the gap between Data...
KDD 2019 IADSS Workshop - How Data Scientists can bridge the gap between Data...
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business Processes
 
Data analytics
Data analyticsData analytics
Data analytics
 
Data science using r multisoft systems
Data science using r  multisoft systemsData science using r  multisoft systems
Data science using r multisoft systems
 
Implementation of data science in organizations
Implementation of data science in organizationsImplementation of data science in organizations
Implementation of data science in organizations
 
An overview of big data analytics
An overview of big data analytics An overview of big data analytics
An overview of big data analytics
 
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
 
Help Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your DataHelp Me, Help You: Supporting Your Data
Help Me, Help You: Supporting Your Data
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
Big Data Specialization Certificate 2015
Big Data Specialization Certificate 2015Big Data Specialization Certificate 2015
Big Data Specialization Certificate 2015
 
Analytics in Telco Analytics
Analytics in Telco AnalyticsAnalytics in Telco Analytics
Analytics in Telco Analytics
 
Rapid Data Integration: Tools & Methodology
Rapid Data Integration: Tools & MethodologyRapid Data Integration: Tools & Methodology
Rapid Data Integration: Tools & Methodology
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
 
Different Career Paths in Data Science
Different Career Paths in Data ScienceDifferent Career Paths in Data Science
Different Career Paths in Data Science
 
Make good products great with data and analytics
Make good products great with data and analyticsMake good products great with data and analytics
Make good products great with data and analytics
 

Similar to Data Discoverability with DataHub

Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHero
Maggie Hays
 
Objective Benchmarking for Improved Analytics Health and Effectiveness
Objective Benchmarking for Improved Analytics Health and EffectivenessObjective Benchmarking for Improved Analytics Health and Effectiveness
Objective Benchmarking for Improved Analytics Health and Effectiveness
PersonifyMarketing
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
Caserta
 
Keeping the Pulse of Your Data: Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data:  Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data:  Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data: Why You Need Data Observability to Improve D...
Precisely
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
TamrMarketing
 
Predictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptxPredictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptx
SaminaNawaz14
 
NTEN Your Analytics doesn't have to be dramatic to be useful
NTEN Your Analytics doesn't have to be dramatic to be usefulNTEN Your Analytics doesn't have to be dramatic to be useful
NTEN Your Analytics doesn't have to be dramatic to be useful
Andrew Patricio
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
Caserta
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
Caserta
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
DATAVERSITY
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
Caserta
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
DATAVERSITY
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Caserta
 
1030 track1 bennett
1030 track1 bennett1030 track1 bennett
1030 track1 bennett
Rising Media, Inc.
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
Caserta
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
Alan McSweeney
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
Experience unparalleled data-driven success with our cutting-edge Data Scienc...
Experience unparalleled data-driven success with our cutting-edge Data Scienc...Experience unparalleled data-driven success with our cutting-edge Data Scienc...
Experience unparalleled data-driven success with our cutting-edge Data Scienc...
proitbridgePvtLtd
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?
SAS Canada
 
How to find new ways to add value to your audits
How to find new ways to add value to your auditsHow to find new ways to add value to your audits
How to find new ways to add value to your audits
CaseWare IDEA
 

Similar to Data Discoverability with DataHub (20)

Data Discoverability at SpotHero
Data Discoverability at SpotHeroData Discoverability at SpotHero
Data Discoverability at SpotHero
 
Objective Benchmarking for Improved Analytics Health and Effectiveness
Objective Benchmarking for Improved Analytics Health and EffectivenessObjective Benchmarking for Improved Analytics Health and Effectiveness
Objective Benchmarking for Improved Analytics Health and Effectiveness
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Keeping the Pulse of Your Data: Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data:  Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data:  Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data: Why You Need Data Observability to Improve D...
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
 
Predictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptxPredictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptx
 
NTEN Your Analytics doesn't have to be dramatic to be useful
NTEN Your Analytics doesn't have to be dramatic to be usefulNTEN Your Analytics doesn't have to be dramatic to be useful
NTEN Your Analytics doesn't have to be dramatic to be useful
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
1030 track1 bennett
1030 track1 bennett1030 track1 bennett
1030 track1 bennett
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
Experience unparalleled data-driven success with our cutting-edge Data Scienc...
Experience unparalleled data-driven success with our cutting-edge Data Scienc...Experience unparalleled data-driven success with our cutting-edge Data Scienc...
Experience unparalleled data-driven success with our cutting-edge Data Scienc...
 
Are you getting the most out of your data?
Are you getting the most out of your data?Are you getting the most out of your data?
Are you getting the most out of your data?
 
How to find new ways to add value to your audits
How to find new ways to add value to your auditsHow to find new ways to add value to your audits
How to find new ways to add value to your audits
 

Recently uploaded

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 

Recently uploaded (20)

STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 

Data Discoverability with DataHub

  • 1. Data Discoverability with DataHub Maggie Hays Senior Product Manager -- Data Services Data Quality Meetup -- November 19, 2020
  • 2. 2 Agenda ● Overview of Teams ● Current State of Data Discoverability ● Data Catalog Evaluation ● DataHub POC - Progress & Level of Effort ● Highlight: DataHub Functionality
  • 3. 3 SpotHero’s Data-Focused Teams Data Engineering 3 Engineers SpotHero IQ 2 Engineers 3 Data Scientists Analytics 3 Business Analysts (We’re hiring!!)
  • 4. 4 1 2 3 Current State of Data Discoverability Data Lineage is difficult to discover and navigate, regardless of role or tenure ● Impact analysis is arduous; Engineers avoid breaking changes at all costs ● Prolonged debugging/troubleshooting data issues Difficult to discover what data exists and/or what it represents ● Reliance on tribal knowledge ● Large burden on the Analytics team to answer any/all questions Confidence in Data Accuracy is neutral, but room for improvement ● Once folks track down the data, they are relatively confident in its accuracy May 2020 Internal Survey - Engineering, Product, Analytics, Data Science teams; 47% response rate
  • 5. 5 Data Catalog Evaluation DataHub Amundsen / Marquez Apache Atlas Alation Ease of Integration Lineage Support Configurable Metadata Affordability
  • 6. 6 Looker Airflow SpotHero’s Data Stack & DataHub POC SH Application Data Workflow Tools Marketing Tools Microservices Clickstream Analytics Redshift S3/Parquet Fivetran Segment Kafka SQL Python Spark Sources Ingestion Storage ETL Complete Q4 2020
  • 7. 7 1 2 3 DataHub POC - Level of Effort Research & Tool Evaluation: 180 hrs ● Creation of Pugh Matrix to force-rank evaluation ● Rapid side-by-side POC of DataHub and Amundsen/Marquez Initial Rollout of DataHub POC: 300 hrs ● Terraform Elasticsearch, MySQL, Neo4j, Aiven; helm chart for API/frontend/Kafka components ● Datalake & ETL scrapers, including lineage ● Enrich with ETL ownership, links to GHE Looker & Kafka Metadata Ingestion & Lineage: Est. 160 hrs ● Building Looker/LookML scraper - planning to contribute back to DH codebase ● Teaming up with DataHub to inform design of Dashboard entities
  • 9. 9 DataHub Functionality: Dataset Metadata DDL & Ownership External Docs