Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Promoting a Data Driven Culture in a Microservices Environment

621 views

Published on

Kelly Burdine

Published in: Technology
  • Be the first to comment

Promoting a Data Driven Culture in a Microservices Environment

  1. 1. Democratizing Data Promoting a data driven culture in a world of microservices
  2. 2. Overview 1. Introduction to Hudl 2. Hudl Data Journey 3. #DataProblems 4. Data Engineering 5. Data Analytics 6. Key Takeaways 7. Summary
  3. 3. Basketball workflow animation or static images.
  4. 4. Basketball workflow animation or static images.
  5. 5. Basketball workflow animation or static images.
  6. 6. Capture and bring value to every moment in sports. 4.9 million users 150 thousand teams 4.5 billion video views last 12 months
  7. 7. Our data journey.
  8. 8. 2006
  9. 9. 2010
  10. 10. 2014
  11. 11. 2014
  12. 12. 2015
  13. 13. 2015
  14. 14. #DataProblems
  15. 15. “Find all football teams that had 3 or more users watch video in 3 different months.”
  16. 16. SSH + SQL + Mongo + Excel/Python/etc.
  17. 17. Data Engineering + Data Analytics
  18. 18. Data Engineering
  19. 19. Data Engineering Just give me my damn data.
  20. 20. Three questions 1. Where do we put the data? 2. How does it get there? 3. How do people access it?
  21. 21. Three questions 1. Where do we put the data? 2. How does it get there? 3. How do people access it?
  22. 22. ● SQL ● Fully managed on AWS ● Reasonably priced Amazon Redshift
  23. 23. ● SQL ● Fully managed on AWS ● Reasonably priced Rob Story, Data Engineering Architecture at Simple, PyData Chicago Amazon Redshift
  24. 24. For the Google Cloud User: Google BigQuery For the Do-it-yourself-er: Hive / Impala / PrestoDB / Druid For the Enterprise User: Vertica / Teradata ? Alternatives
  25. 25. Three questions 1. Where do we put the data? 2. How does it get there? 3. How do people access it?
  26. 26. E T L
  27. 27. Extract Transform Load
  28. 28. Extract Transform Load
  29. 29. Extract Extract Extract Transform TransformTransform Load LoadLoad
  30. 30. Extract Extract Extract Transform TransformTransform Load LoadLoad
  31. 31. Use a workflow manager.
  32. 32. Luigi (Spotify) Airflow (Airbnb) Azkaban (LinkedIn)
  33. 33. Luigi (Spotify) Airflow (Airbnb) Azkaban (LinkedIn)
  34. 34. ● Dependency management ● Parallelism ● Idempotence
  35. 35. Think about your tooling.
  36. 36. ● UI ● Logging ● Triggers ○ Cron ○ Dependency ○ GitHub
  37. 37. Single machine jobs
  38. 38. Single machine jobs ● Zendesk ● Salesforce ● Google Sheets
  39. 39. Multi machine jobs ● Database exports ● Mongo processing
  40. 40. Three questions 1. Where do we put the data? 2. How does it get there? 3. How do people access it?
  41. 41. ● Everyone has access -- 430+ Hudlies ● Lots of data ○ 24+ TB ○ 100B+ rows Our needs
  42. 42. ● Looker ● Periscope ● Tableau Commercial options
  43. 43. ● Open source (Python!) ● Query editor + visualizations ● Hosted version or host your own re:dash
  44. 44. SSH + SQL + Mongo + Excel/Python/etc.
  45. 45. SSH + SQL + Mongo + Excel/Python/etc.
  46. 46. Data Analytics Helping employees use data to make better decisions.
  47. 47. Access for All
  48. 48. Finding Data isn’t Easy ● SQL ● So much data ● Only 3 data analysts
  49. 49. ● Education ● Derived Tables ● Report Automation Removing Roadblocks
  50. 50. ● Education ● Derived Tables ● Report Automation Removing Roadblocks
  51. 51. ● Relational Database Model ● Basic & intermediate SQL ● Table Familiarity ● Using re:dash ● Data Visualization Certification Topics
  52. 52. Data Dictionary screenshot here but delete this box after you place it
  53. 53. Understanding Relationships screenshot here but delete this box after you place it
  54. 54. ● Education ● Derived Tables ● Report Automation Removing Roadblocks
  55. 55. “Find how many football teams had 3 or more users watch video in 3 different months this year.”
  56. 56. hudl_daily_active_users Userid Teamid Date Has_Watched _Video Has_Tagged_ Video Has_uploaded_ video 1 1234 2016-08-15 True True False 2 2345 2016-08-15 False False True 3 5678 2016-08-15 True False False
  57. 57. ● Education ● Derived Tables ● Report Automation Removing Roadblocks
  58. 58. Insert Dashboard Example here
  59. 59. Slackalytics
  60. 60. September Stats ● 194 unique users executed a query ● 14,000 ad hoc queries executed ● 940 unique scheduled queries/week
  61. 61. ● Bad Data ● Slow Queries Cons
  62. 62. ● Being Data-driven is a team sport ● Get the data architecture in place ● Make data and metrics accessible ● Be Flexible Key Takeaways
  63. 63. Summary 1. Introduction to Hudl 2. Hudl Data Journey 3. #DataProblems 4. Data Engineering 5. Data Analytics 6. Key Takeaways 7. Summary
  64. 64. Tools we use Summary Jenkins Scheduling Luigi Workflow management Sqoop RDBMS Extraction Spark Data transformation AWS Lambda Event-driven processing Redshift Data warehouse re:dash Query interface + visualization
  65. 65. Alex DeBrie alex.debrie@hudl.com Kelly Burdine kelly.burdine@hudl.com

×