Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Don't build a data science team

466 views

Published on

Many companies start their big data and AI journey by hiring a team of data scientists, give them some data, and expect them to work their miracles. Although it may yield results, it is not an efficient way to use data scientists. We will explain the problems that occur, and how to adapt the context to get business value from data scientists.

- Why data science teams might fail to deliver results
- What data scientists need to be efficient
- What talent you need in addition to data scientists

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Don't build a data science team

  1. 1. www.mimeria.com Don't build a data science team Data 2020 Summit, 2018-09-13 Lars Albertsson www.mimeria.com 1
  2. 2. www.mimeria.com Many journeys start ... 2 Big Data! AI!! Data driven! Blockchains? Real-time analytics!
  3. 3. www.mimeria.com Many journeys start with data science 3 Big Data! AI!! Data driven! Blockchains? Real-time analytics!
  4. 4. www.mimeria.com The typical results 4 Proof of value Prototype Product ROI
  5. 5. www.mimeria.com Wrong data scientists? 5 When to use Jaccard or cosine distance? How do you implement an LSTM with Tensorflow? When do you terminate an A/B test? What we asked them
  6. 6. www.mimeria.com What we asked them What we should have asked? Wrong data scientists? 6 When to use Jaccard or cosine distance? How do you implement an LSTM with Tensorflow? When do you terminate an A/B test? How to get data from a PO using email only? How to recover Hadoop namenode? How to debug AWS "403 permission denied"? How to get sysadmin to open firewall from Jupyter to MySQL?
  7. 7. www.mimeria.com Size = effort Credits: “Hidden Technical Debt in Colour = code complexity Machine Learning Systems”, Google, NIPS 2015 Machine learning products 7 Configuration Data collection Monitoring Serving infrastructure Feature extraction Process management tools Analysis tools Machine resource management Data verification ML
  8. 8. www.mimeria.com Data science Machine learning products 8 Configuration Data collection Monitoring Serving infrastructure Feature extraction Process management tools Analysis tools Machine resource management Data verification ML
  9. 9. www.mimeria.com The data science team 9 ML
  10. 10. www.mimeria.com Data science hierarchy of needs Credits: “The data science hierarchy of needs”, Monica Rogati 10 AI Deep learning A/B testing Machine learning Analytics Segments Curation Anomaly detection Data infrastructure Pipelines Instrumentation Data collection
  11. 11. www.mimeria.com Data science hierarchy of needs Data science Credits: “The data science hierarchy of needs”, Monica Rogati 11 AI Deep learning A/B testing Machine learning Analytics Segments Curation Anomaly detection Data infrastructure Pipelines Instrumentation Data collection
  12. 12. www.mimeria.com The data science team 12 AI Deep learning A/B testing Machine learning
  13. 13. www.mimeria.com AI first ● Might work once or twice ● Not a sustainable strategy 13 AI Deep learning A/B testing Machine learning
  14. 14. www.mimeria.com AI first ● Might work once or twice ● Not a sustainable strategy ● Machine learning is difficult 14 AI Deep learning A/B testing Machine learning Effort
  15. 15. www.mimeria.com AI first ● Might work once or twice ● Not a sustainable strategy ● Machine learning is difficult ● Low return of investment 15 AI Deep learning A/B testing Machine learning Value Effort
  16. 16. www.mimeria.com AI last ● Lots of hanging fruit ○ Push notifications ○ Simple recommendations ○ Risk & forecasting ○ Reporting ○ Product insights ○ Data-driven product development ○ Anomaly detection ○ ... ● High return of investment ● Media attention != business value 16 Analytics Segments Curation Anomaly detection Data infrastructure Pipelines Instrumentation Data collection Value Effort
  17. 17. www.mimeria.com How do we make best use of data scientists? 17 ● They need ○ Supporting roles ○ Continuous access to fresh data ○ Feedback from validation, monitoring, ... ● But where, how, from whom? ?
  18. 18. www.mimeria.com Data engineering Domain expertise What do we need? 18 Configuration Data collection Monitoring Serving infrastructure Feature extraction Process management tools Analysis tools Machine resource management Data verification ML Product management QA
  19. 19. www.mimeria.com Data engineering Data science Frontend Domain expertise What do we want? 19 Configuration Data collection Monitoring Serving infrastructure Feature extraction Process management tools Analysis tools Machine resource management Data verification ML DevOps / DataOps QA Product management
  20. 20. www.mimeria.com Data engineering Domain expertise Most data-driven products 20 Configuration Data collection Monitoring Serving infrastructure Feature extraction Process management tools Analysis tools Machine resource management Data verification Product management
  21. 21. www.mimeria.com How to get to the summit? 21
  22. 22. www.mimeria.com Service-oriented architectures ● Data lives with services ● Heterogeneous coupling 22 Service Service Service App App App Poll Aggregate logs NFS Hourly dump Data warehouse ETL Queue Queue NFS scp DB HTTP DB DBDB
  23. 23. www.mimeria.com Service-oriented organisations ● Teams own services ● Teams own data 23
  24. 24. www.mimeria.com Data-centric innovation ● Need data from teams ○ willing? ○ backlog? ○ collected? ○ useful? ○ quality? ○ extraction? ○ data governance? ○ history? 24
  25. 25. www.mimeria.com Data-centric innovation ● Need data from teams ○ willing? ○ backlog? ○ collected? ○ useful? ○ quality? ○ extraction? ○ data governance? ○ history? ● Innovation friction Value adding Waste 25
  26. 26. www.mimeria.com Big data - a collaboration paradigm 26 Stream storage Data lake Data democratised
  27. 27. www.mimeria.com Data pipelines 27 Data lake
  28. 28. www.mimeria.com More data - decreased friction 28 Data lake
  29. 29. www.mimeria.com In the lab One shot 29
  30. 30. www.mimeria.com In the lab vs in production One shot Iterative 30 Data lake
  31. 31. www.mimeria.com Data agility ● Siloed: 6+ months ● Autonomous: 1 month ● Coordinated: days 31 Data lake ∆ ∆ Latency?
  32. 32. www.mimeria.com Data agility ● Siloed: 6+ months Cultural work ● Autonomous: 1 month Technical work ● Coordinated: days 32 Data lake ∆ ∆ Latency?
  33. 33. www.mimeria.com What to do with my data scientists? ● Get them out into production ● Pair them with ○ Data engineers ○ Domain experts ○ Product owners ● Invest in processing capabilities 33
  34. 34. www.mimeria.com Key takeaways ● Machine learning is a team sport ● Solid data processing is necessary ● Learning happens in production Lars Albertsson, founder of Mimeria Data-value-as-a-service - tailored data platforms & data pipelines 34
  35. 35. www.mimeria.com Key takeaways ● Machine learning is a team sport ● Solid data processing is necessary ● Learning happens in production Lars Albertsson, founder of Mimeria Data-value-as-a-service - tailored data platforms & data pipelines 35

×