Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Best Practices for Scaling Data Science Across the Organization

1,182 views

Published on

Effective data science in the enterprise is about aligning the right model, data, and infrastructure with the right outcomes. Most organizations today struggle to unlock the potential of data science to enhance decision-making and drive business value.

Guest speaker Kjell Carlsson, a Forrester Senior Analyst, and Peter Wang, Anaconda CTO, will share their unique perspectives on how to tackle five key challenges facing organizations today:

- Identifying, defining, and prioritizing valuable problems
- Building the right teams
- Leveraging the proper tools and platforms
- Iterating and deploying effectively
- Reaching end-users to generate value

Published in: Data & Analytics

Best Practices for Scaling Data Science Across the Organization

  1. 1. Best Practices For Scaling Data Science Across the Organization Peter Wang – CTO - Anaconda Guest Speaker: Kjell Carlsson, PHD – Senior Analyst - Forrester May 2018
  2. 2. 2© 2018 FORRESTER. REPRODUCTION PROHIBITED. Data scientists & business executives are frustrated *Rexer Analytics Data Science Survey 2015 We should drive so much more business value using data science We should drive so much more business value using data science “I’ve built 10 prototypes, none of them have gone live and I’m not confident any of them ever will” At ~1/3 of firms models are deployed only sometimes, rarely or never*
  3. 3. 3© 2018 FORRESTER. REPRODUCTION PROHIBITED. Typical challenges go beyond the data science team • Solves the wrong problem or insights are not actionable • Can not communicate results • Stalls on data access • Doesn’t support key data management technologies • Does not actively participate or advocate for resources • Does not implement the results Data Science Data Engineering IT The Business • Does not support the infrastructure and tools • Abdicates operationalization and app development
  4. 4. 4© 2018 FORRESTER. REPRODUCTION PROHIBITED. Effective data science is about aligning the right model, data and infrastructure with the right outcomes Outcomes The Business Model Data Science Compute IT Data Data Engineering Tackle the right problems 1 Build the right team2 Iterate effectively4 Have the right tools3
  5. 5. 5© 2018 FORRESTER. REPRODUCTION PROHIBITED. Outcomes The Business Model Data Science Compute IT Data Data Engineering Build the right team2 Iterate effectively4 Have the right tools3 Tackle the right problems 1
  6. 6. 6© 2018 FORRESTER. REPRODUCTION PROHIBITED. Tackle projects with large, clearly defined business value 1 The issue: • Stakeholders do not actively participate • Stakeholders do not advocate for resources • Stakeholders do not implement the results Drives senior executive support Minimizes churn on goals & requirements Ensures active, ongoing stakeholder participation Provides commitment to take action on results Benefits “Identifying the objective function is key to getting everyone aligned”
  7. 7. 7© 2018 FORRESTER. REPRODUCTION PROHIBITED. How do you identify valuable projects? Christensen’s Jobs To Be Done Framework (Adapted) 1 What “job” would someone “hire” your solution to do? Who is the “customer”? How else could you do the job? How much is it worth to them? Customer insight Increase conversion, Reduce attrition The organization Sales, Account Mgmt. X% * $XM Marketing campaigns Sales coaching
  8. 8. Tackling the Right Problem 8 Business value • Clear understanding of value to the business • Often counterintuitive. Be careful what you wish for: “You can’t handle the truth!” • Know your customer sponsor
  9. 9. Tackling the Right Problem 9 Business value • Clear understanding of value to the business • Often counterintuitive. Be careful what you wish for: “You can’t handle the truth!” • Know your customer sponsor Data availability • Can we get timely access to the data we need? • Goldilocks problem
  10. 10. Tackling the Right Problem 10 Business value • Clear understanding of value to the business • Often counterintuitive. Be careful what you wish for: “You can’t handle the truth!” • Know your customer sponsor Data availability • Can we get timely access to the data we need? • Goldilocks problem IT Deployability & Maintainability • Realistic assessment of tech/skills • Select tools and approaches that fit within the IT capability envelope
  11. 11. 11© 2018 FORRESTER. REPRODUCTION PROHIBITED. Outcomes The Business Model Data Science Compute IT Data Data Engineering Tackle the right problems 1Iterate effectively4 Have the right tools3 Build the right team2
  12. 12. 12© 2018 FORRESTER. REPRODUCTION PROHIBITED. Data Scientist Consultant Data Engineer Developer Companies complain that good data scientists are unicorns But what they usually want are chimeras Build the right team2 Chimeras are too few, too expensive, hard to retain and inefficient Dev Ops Image Source: Lilla Frerichs, OpenClipart-Vectors
  13. 13. 13© 2018 FORRESTER. REPRODUCTION PROHIBITED. Build hybrid teams, not unicorns (continued) Image Source: LadyofHats, Clker-Free-Vector-Images, Clker-Free-Vector-Images, OpenClipart-Vectors, 2 Teams of fantastic beasts are easier to find, cheaper and achieve synergistic results Data Scientist + a bit of Data Engineer Data Engineer + a bit of Data Scientist Business Analyst + a bit of DeveloperConsultant + a bit of Data Scientist
  14. 14. Building Data Science Teams Because data science is interdisciplinary, the single biggest mistake is to pattern-match against existing tasks or technology footprint 14 Data Science Task Existing Role Query data, build reports Analyst Create datasets, define schemas and data reqs Database specialist, Data engineer Write code Software engineer Explore data, build models Advanced Analyst, Predictive Modeler
  15. 15. 15© 2018 FORRESTER. REPRODUCTION PROHIBITED. Outcomes The Business Model Data Science Compute IT Data Data Engineering Tackle the right problems 1 Build the right team2 Iterate effectively4 Have the right tools3
  16. 16. 16© 2018 FORRESTER. REPRODUCTION PROHIBITED. Deploy platforms for efficiency3 Data Engineering Platform(s) Model Development & Operationalization platform(s) Visualization Platform(s) › Reduces need to rebuild models for deployment › Shares data science knowledge › Leverages common infrastructure › Faster development of end-user apps › Broadens access to insights › Shared, re-usable data pipelines accelerates data discovery and improves data quality
  17. 17. Have the Right Tools • Data science is code-heavy. Tools of choice are Python and R, but Matlab, SAS, and a few others are also used ◦ Scala is mostly used with the Spark framework ◦ Java, in a data science context, is most frequently used at the deployment stage by some teams • Data Science != Software Development 17
  18. 18. 18
  19. 19. Python - Most Misunderstood Language • Python is probably the most misunderstood language ◦ There are “tribes” and ecosystems in Python: web dev, scipy, pydata, embedded, scripting, 3D graphics, etc. • But businesses tend to pigeonhole it: ◦ IT/software/data engineering view: competes with Java, C#, Ruby… ◦ Analyst, statistician view: competes with R, SAS, Matlab, SPSS, BI systems 19
  20. 20. 20© 2018 FORRESTER. REPRODUCTION PROHIBITED. Outcomes The Business Model Data Science Compute IT Data Data Engineering Tackle the right problems 1 Build the right team2Have the right tools3 Iterate effectively4
  21. 21. 21© 2018 FORRESTER. REPRODUCTION PROHIBITED. Successful Data Science projects are more agile than “Agile” 4 Iterate on model development Iterate on data discovery Iterate on the business use case
  22. 22. Iterate Effectively • Exploration part of data science clearly requires agility • But need to design for agile iteration of models once they are deployed • Biggest hurdle to agile iteration is: How to break data science out of the sandbox in a way that is repeatable and maintainable? 22
  23. 23. Sandboxing Data Science • Data Science Sandbox is on isolated network, outside of “GRC reservation” ◦ Provides freedom to data scientists ◦ Protects production ETL, DW, event processing ◦ …but moving anything from Sandbox to Production is a huge pain • Multiple orgs / LOBs interface with Data Science team in the mixed sandbox environment • Compliance, audit, & risk control? 23
  24. 24. Contrasting Concerns 24
  25. 25. Looking forward: SDLC and ALM Have to Evolve • DSDLC emerged in a time when software was (mostly) independent of hardware and data itself • New applications are highly data-dependent • New applications are blends of code, services, and specialized hardware • Very different and broader set of change management, risk, governance concerns • Cannot insist on tech monoculture (language, architecture, DBs): The future is heterogeneous 25
  26. 26. Data Science & IT Lifecycles 26
  27. 27. Anaconda Enterprise 27
  28. 28. Where to go next 28 Download Anaconda https://www.anaconda.com/distribution/ Test Drive Anaconda Enterprise ambassador@anaconda.com Learn about consulting, training, and support ambassador@anaconda.com © 2018 Anaconda - Confidential & Proprietary
  29. 29. FORRESTER.COM Thank you © 2018 FORRESTER. REPRODUCTION PROHIBITED. Kjell Carlsson, PhD kcarlsson@forrester.com Twitter: @kjellkeli Peter Wang pwang@anaconda.com Twitter: @pwang

×