SlideShare a Scribd company logo
1 of 9
Download to read offline
Data Discoverability
Maggie Hays
Senior Product Manager -- Data Services
DataHub Town Hall -- September 25, 2020
2
Agenda
● Overview of Teams & Data Stack
● Current State of Data Discoverability
● Data Catalog Evaluation
● DataHub POC - Hypotheses, Progress, and Next Steps
● Brief Demo!
3
SpotHero’s Data-Focused Teams
Data Engineering
3 Engineers
SpotHero IQ
2 Engineers
1-3 Data Scientists
(We’re hiring!!)
Analytics
5 Business Analysts
4
Looker
Airflow
SpotHero’s Data Stack
SH Application
Data
Workflow Tools
Marketing Tools
Microservices
Clickstream
Analytics
Redshift
S3/Parquet
Fivetran
Segment
Kafka
SQL
Python
Spark
Sources Ingestion Storage ETL
5
1
2
3
Current State of Data Discoverability
Data Lineage is difficult to discover and navigate,
regardless of role or tenure
● Impact analysis is arduous; Engineers avoid breaking changes at all costs
● Prolonged debugging/troubleshooting data issues
Difficult to discover what data exists and/or
what it represents
● Reliance on tribal knowledge
● Large burden on the Analytics team to answer any/all questions
Confidence in Data Accuracy is neutral, but room for
improvement
● Once folks track down the data, they are relatively confident in its
accuracy
May 2020 Internal Survey - Engineering, Product, Analytics, Data Science teams; 47% response rate
6
Data Catalog Evaluation
DataHub
Amundsen
/ Marquez
Apache
Atlas Alation
Ease of Integration
Lineage Support
Configurable
Metadata
Affordability
7
1
2
3
DataHub POC - Hypotheses
Increase visibility into what data exists, what it
represents, and how it’s used across the
company
Decrease the effort required by SpotHero
teammates to use and interpret data
Increase SpotHero teammates’ confidence in
the accuracy and/or relevance of data
8
Looker
Airflow
DataHub POC - Progress & Next Steps
SH Application
Data
Workflow Tools
Marketing Tools
Microservices
Clickstream
Analytics
Redshift
S3/Parquet
Fivetran
Segment
Kafka
SQL
Python
Spark
Sources Ingestion Storage ETL
Complete
Q4 2020
9
Quick Demo!

More Related Content

What's hot

(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineerAlex Chalini
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDatabricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkDatabricks
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lakeMykola Zerniuk
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedDatabricks
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDatabricks
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Timothy Spann
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...HostedbyConfluent
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingScyllaDB
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 

What's hot (20)

(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineer
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction Log
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Modularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache SparkModularized ETL Writing with Apache Spark
Modularized ETL Writing with Apache Spark
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data Modeling
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 

Similar to Data Discoverability at SpotHero

Data Discoverability with DataHub
Data Discoverability with DataHubData Discoverability with DataHub
Data Discoverability with DataHubGleb Mezhanskiy
 
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraMolly Alexander
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyTamrMarketing
 
Predictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptxPredictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptxSaminaNawaz14
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptxsharpan
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about DataBigDataExpo
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptxarpit206900
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupCaserta
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumVMware Tanzu
 
Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence Lauren Campbell Assoc CIPD
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
Data Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryData Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryInside Analysis
 
P6 analytics product roadmap and overview - Oracle Primavera P6 Collaborate 14
P6 analytics product roadmap and overview - Oracle Primavera P6 Collaborate 14P6 analytics product roadmap and overview - Oracle Primavera P6 Collaborate 14
P6 analytics product roadmap and overview - Oracle Primavera P6 Collaborate 14p6academy
 

Similar to Data Discoverability at SpotHero (20)

Data Discoverability with DataHub
Data Discoverability with DataHubData Discoverability with DataHub
Data Discoverability with DataHub
 
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
 
Joe C
Joe CJoe C
Joe C
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
 
Predictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptxPredictive Human Capital Analytics (1).pptx
Predictive Human Capital Analytics (1).pptx
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptx
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal GreenplumSimplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
Simplified Machine Learning, Text, and Graph Analytics with Pivotal Greenplum
 
Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Data Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data DiscoveryData Wrangling and the Art of Big Data Discovery
Data Wrangling and the Art of Big Data Discovery
 
P6 analytics product roadmap and overview - Oracle Primavera P6 Collaborate 14
P6 analytics product roadmap and overview - Oracle Primavera P6 Collaborate 14P6 analytics product roadmap and overview - Oracle Primavera P6 Collaborate 14
P6 analytics product roadmap and overview - Oracle Primavera P6 Collaborate 14
 

Recently uploaded

Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
Business Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business AnalysisBusiness Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business AnalysisDEEPRAJ PATHAK
 
What is Mendix and the concept of low-code development.docx
What is Mendix and the concept of low-code development.docxWhat is Mendix and the concept of low-code development.docx
What is Mendix and the concept of low-code development.docxTechnogeeks
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Leveraging the Expertise of a Social Media Fraud Analyst to Safeguard Brand R...
Leveraging the Expertise of a Social Media Fraud Analyst to Safeguard Brand R...Leveraging the Expertise of a Social Media Fraud Analyst to Safeguard Brand R...
Leveraging the Expertise of a Social Media Fraud Analyst to Safeguard Brand R...Milind Agarwal
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...Piyovi
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
ETE PPT.pdf LMMKLMKLMLKMLLMJKBHJBHBNUIHBU
ETE PPT.pdf LMMKLMKLMLKMLLMJKBHJBHBNUIHBUETE PPT.pdf LMMKLMKLMLKMLLMJKBHJBHBNUIHBU
ETE PPT.pdf LMMKLMKLMLKMLLMJKBHJBHBNUIHBUsamruddhijedgule2004
 
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...OnePlan Solutions
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
full course of software engineering mid term.pdf
full course of software engineering mid term.pdffull course of software engineering mid term.pdf
full course of software engineering mid term.pdfAbdul salam
 
The State of the Green IT at the beginning of 2024
The State of the Green IT at the beginning of 2024The State of the Green IT at the beginning of 2024
The State of the Green IT at the beginning of 2024Artur Skowroński
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Preparing BitVisor for Supporting Multiple Architectures
Preparing BitVisor for Supporting Multiple ArchitecturesPreparing BitVisor for Supporting Multiple Architectures
Preparing BitVisor for Supporting Multiple ArchitecturesAke Koomsin
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxAS Design & AST.
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Chapter -5 Agile Testing types and its examples.pptx
Chapter -5 Agile Testing types and its examples.pptxChapter -5 Agile Testing types and its examples.pptx
Chapter -5 Agile Testing types and its examples.pptxManishaPatil932723
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapIshara Amarasekera
 

Recently uploaded (20)

Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
Business Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business AnalysisBusiness Analyzopedia - Your Pocket Gita for Business Analysis
Business Analyzopedia - Your Pocket Gita for Business Analysis
 
What is Mendix and the concept of low-code development.docx
What is Mendix and the concept of low-code development.docxWhat is Mendix and the concept of low-code development.docx
What is Mendix and the concept of low-code development.docx
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Leveraging the Expertise of a Social Media Fraud Analyst to Safeguard Brand R...
Leveraging the Expertise of a Social Media Fraud Analyst to Safeguard Brand R...Leveraging the Expertise of a Social Media Fraud Analyst to Safeguard Brand R...
Leveraging the Expertise of a Social Media Fraud Analyst to Safeguard Brand R...
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...
Explore the Three Main Types of Logistics - Inbound Logistics, Outbound Logis...
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
ETE PPT.pdf LMMKLMKLMLKMLLMJKBHJBHBNUIHBU
ETE PPT.pdf LMMKLMKLMLKMLLMJKBHJBHBNUIHBUETE PPT.pdf LMMKLMKLMLKMLLMJKBHJBHBNUIHBU
ETE PPT.pdf LMMKLMKLMLKMLLMJKBHJBHBNUIHBU
 
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...
Transform your Corporate Strategy Office - Harness OnePlan’s Strategic Portfo...
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
full course of software engineering mid term.pdf
full course of software engineering mid term.pdffull course of software engineering mid term.pdf
full course of software engineering mid term.pdf
 
The State of the Green IT at the beginning of 2024
The State of the Green IT at the beginning of 2024The State of the Green IT at the beginning of 2024
The State of the Green IT at the beginning of 2024
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Preparing BitVisor for Supporting Multiple Architectures
Preparing BitVisor for Supporting Multiple ArchitecturesPreparing BitVisor for Supporting Multiple Architectures
Preparing BitVisor for Supporting Multiple Architectures
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Mastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptxMastering Project Planning with Microsoft Project 2016.pptx
Mastering Project Planning with Microsoft Project 2016.pptx
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Chapter -5 Agile Testing types and its examples.pptx
Chapter -5 Agile Testing types and its examples.pptxChapter -5 Agile Testing types and its examples.pptx
Chapter -5 Agile Testing types and its examples.pptx
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery Roadmap
 

Data Discoverability at SpotHero

  • 1. Data Discoverability Maggie Hays Senior Product Manager -- Data Services DataHub Town Hall -- September 25, 2020
  • 2. 2 Agenda ● Overview of Teams & Data Stack ● Current State of Data Discoverability ● Data Catalog Evaluation ● DataHub POC - Hypotheses, Progress, and Next Steps ● Brief Demo!
  • 3. 3 SpotHero’s Data-Focused Teams Data Engineering 3 Engineers SpotHero IQ 2 Engineers 1-3 Data Scientists (We’re hiring!!) Analytics 5 Business Analysts
  • 4. 4 Looker Airflow SpotHero’s Data Stack SH Application Data Workflow Tools Marketing Tools Microservices Clickstream Analytics Redshift S3/Parquet Fivetran Segment Kafka SQL Python Spark Sources Ingestion Storage ETL
  • 5. 5 1 2 3 Current State of Data Discoverability Data Lineage is difficult to discover and navigate, regardless of role or tenure ● Impact analysis is arduous; Engineers avoid breaking changes at all costs ● Prolonged debugging/troubleshooting data issues Difficult to discover what data exists and/or what it represents ● Reliance on tribal knowledge ● Large burden on the Analytics team to answer any/all questions Confidence in Data Accuracy is neutral, but room for improvement ● Once folks track down the data, they are relatively confident in its accuracy May 2020 Internal Survey - Engineering, Product, Analytics, Data Science teams; 47% response rate
  • 6. 6 Data Catalog Evaluation DataHub Amundsen / Marquez Apache Atlas Alation Ease of Integration Lineage Support Configurable Metadata Affordability
  • 7. 7 1 2 3 DataHub POC - Hypotheses Increase visibility into what data exists, what it represents, and how it’s used across the company Decrease the effort required by SpotHero teammates to use and interpret data Increase SpotHero teammates’ confidence in the accuracy and/or relevance of data
  • 8. 8 Looker Airflow DataHub POC - Progress & Next Steps SH Application Data Workflow Tools Marketing Tools Microservices Clickstream Analytics Redshift S3/Parquet Fivetran Segment Kafka SQL Python Spark Sources Ingestion Storage ETL Complete Q4 2020