Successfully reported this slideshow.

Data Discoverability at SpotHero

1

Share

1 of 9
1 of 9

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Data Discoverability at SpotHero

  1. 1. Data Discoverability Maggie Hays Senior Product Manager -- Data Services DataHub Town Hall -- September 25, 2020
  2. 2. 2 Agenda ● Overview of Teams & Data Stack ● Current State of Data Discoverability ● Data Catalog Evaluation ● DataHub POC - Hypotheses, Progress, and Next Steps ● Brief Demo!
  3. 3. 3 SpotHero’s Data-Focused Teams Data Engineering 3 Engineers SpotHero IQ 2 Engineers 1-3 Data Scientists (We’re hiring!!) Analytics 5 Business Analysts
  4. 4. 4 Looker Airflow SpotHero’s Data Stack SH Application Data Workflow Tools Marketing Tools Microservices Clickstream Analytics Redshift S3/Parquet Fivetran Segment Kafka SQL Python Spark Sources Ingestion Storage ETL
  5. 5. 5 1 2 3 Current State of Data Discoverability Data Lineage is difficult to discover and navigate, regardless of role or tenure ● Impact analysis is arduous; Engineers avoid breaking changes at all costs ● Prolonged debugging/troubleshooting data issues Difficult to discover what data exists and/or what it represents ● Reliance on tribal knowledge ● Large burden on the Analytics team to answer any/all questions Confidence in Data Accuracy is neutral, but room for improvement ● Once folks track down the data, they are relatively confident in its accuracy May 2020 Internal Survey - Engineering, Product, Analytics, Data Science teams; 47% response rate
  6. 6. 6 Data Catalog Evaluation DataHub Amundsen / Marquez Apache Atlas Alation Ease of Integration Lineage Support Configurable Metadata Affordability
  7. 7. 7 1 2 3 DataHub POC - Hypotheses Increase visibility into what data exists, what it represents, and how it’s used across the company Decrease the effort required by SpotHero teammates to use and interpret data Increase SpotHero teammates’ confidence in the accuracy and/or relevance of data
  8. 8. 8 Looker Airflow DataHub POC - Progress & Next Steps SH Application Data Workflow Tools Marketing Tools Microservices Clickstream Analytics Redshift S3/Parquet Fivetran Segment Kafka SQL Python Spark Sources Ingestion Storage ETL Complete Q4 2020
  9. 9. 9 Quick Demo!

×