Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017

1,403 views

Published on

Insights can only be as good as the data. The data quality domain is enormously large, so you need to understand your company pain points to know what to focus on first.

https://www.bigdataspain.org/2017/talk/big-data-big-quality

Big Data Spain 2017
November 16th - 17th Kinépolis Madrid

Published in: Technology
  • Be the first to comment

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017

  1. 1. Irene Gonzálvez, Product Manager at Spotify Big Data, Big Quality?
  2. 2. Irene Gonzálvez Product Manager Data Infrastructure
  3. 3. Music Streaming Service Launched in 2008 Premium and Free Tiers Available in 61 Countries
  4. 4. Over 140M Monthly Active Users
  5. 5. More than 30M Songs
  6. 6. Over 1 billion plays per day
  7. 7. Data enables recommendations, advertising, label and artist payments and more $ $ $ $ $ $
  8. 8. Data First
  9. 9. Data of Good Quality First
  10. 10. Data quality problems cost US business $600B a year! Data Warehouse Institute
  11. 11. Data Quality Dimensions Timely Correctness Completeness Consistency
  12. 12. DataMon
  13. 13. Data Counters
  14. 14. MetriLab
  15. 15. MetriLab
  16. 16. MetriLab
  17. 17. Data Quality Dimensions Timely Correctness Completeness Consistency Datamon Data Counters MetriLab
  18. 18. TC4D: Test Certified for Data Level 1: Set-up, monitoring, alerting and documentation Level 2: Data management and Unit tests Level 3: Build your defenses
  19. 19. What’s next? Build an algorithm library for anomaly detection (ML4ALL) Provide the infrastructure to ‘plug&play’ more algorithms Provide parameter recommendations to tweak the algorithms
  20. 20. What’s next? Spotify-wide strategy ● Have metrics to understand when a dataset qualifies as ‘good’ quality. ● Identify which datasets are critical/ central to Spotify and make them of ‘good’ quality
  21. 21. Key Takeaways
  22. 22. Lesson #1: Think Big Understand your org’s pain points
  23. 23. Lesson #2: Start small And start NOW!
  24. 24. Lesson #3: Data Quality is not an add-on Insights can ONLY be as good as the data
  25. 25. Data will increase 10x by 2025 International Data Corp 1 ZB = 1 trillion GB
  26. 26. 20% 10%Critical Data Hypercritical Data
  27. 27. Q&A Irene Gonzálvez Product Manager, Spotify irene@spotify.com

×