Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
© 2017 Continuum Analytics - Confidential & Proprietary© 2017 Continuum Analytics - Confidential & Proprietary
Overcoming ...
© 2017 Continuum Analytics - Confidential & Proprietary 2
INTRODUCTION
Stephen Kearns - Product Marketing Manager at
Conti...
© 2017 Continuum Analytics - Confidential & Proprietary
“ ”3
Data science is about exploring numbers to
identify business ...
© 2017 Continuum Analytics - Confidential & Proprietary 4
DATA SCIENCE AS A TEAM SPORT
© 2017 Continuum Analytics - Confidential & Proprietary 5
ENTERPRISE DATA SCIENCE
INHERENTLY A TEAM SPORT
Manage and Gover...
© 2017 Continuum Analytics - Confidential & Proprietary
“ ”6
We strongly believe that having people
from different backgro...
© 2017 Continuum Analytics - Confidential & Proprietary 7
LENCIONI’S FIVE DYSFUNCTIONS©
Table Group
© 2017 Continuum Analytics - Confidential & Proprietary 8
FIVE DYSFUNCTIONS
OF A DATA SCIENCE TEAM
Dependency Nightmares
C...
© 2017 Continuum Analytics - Confidential & Proprietary 9
Duplication of effort
• Searchability & discoverability
• Lack o...
© 2017 Continuum Analytics - Confidential & Proprietary 10
Increased data science workflow velocity
Increased retention wi...
© 2017 Continuum Analytics - Confidential & Proprietary 11
People Process Tech
REQUIREMENTS FRAMEWORK
Breaking the challen...
12
DYSFUNCTION #1
Data (in)Access(ibility)
© 2017 Continuum Analytics - Confidential & Proprietary 13
How it affects collaboration
• Assumptions made by the data eng...
© 2017 Continuum Analytics - Confidential & Proprietary 14
Data Science Team Requirements
• People: Data engineers who rel...
© 2017 Continuum Analytics - Confidential & Proprietary 15
How people are solving it today
• Languages: Python, R, Java, C...
16
Dependency Nightmares
DYSFUNCTION #2
© 2017 Continuum Analytics - Confidential & Proprietary 17
DEPENDENCY NIGHTMARES
© 2017 Continuum Analytics - Confidential & Proprietary 18
“DOWN THE RABBIT HOLE”Down the rabbit hole
we go…
© 2017 Continuum Analytics - Confidential & Proprietary 19
How it affects collaboration
• Extra time spent curating packag...
© 2017 Continuum Analytics - Confidential & Proprietary 20
Data Science Team Requirements
• Process: Sourcing packages & m...
© 2017 Continuum Analytics - Confidential & Proprietary 21
How people are solving it today
• Open Source: conda, Pip (Pyth...
© 2017 Continuum Analytics - Confidential & Proprietary 22
• Intelligent dependency
management
• Cross-platform, cross-
la...
23
DYSFUNCTION #3
Collaboration Breakdowns
© 2017 Continuum Analytics - Confidential & Proprietary 2424
Email latest version of Notebook?
• True Notebook Collaborati...
© 2017 Continuum Analytics - Confidential & Proprietary 25
How it affects collaboration
• Recipients-only  not discoverab...
© 2017 Continuum Analytics - Confidential & Proprietary 26
Data science team requirements
• People: value information shar...
© 2017 Continuum Analytics - Confidential & Proprietary 27
How people are solving it today
• Open Source: JupyterHub, Git/...
© 2017 Continuum Analytics - Confidential & Proprietary 28
CUSTOMER PERSPECTIVE
Girish Modgil
Senior Director of Data & An...
© 2017 Continuum Analytics - Confidential & Proprietary 29
ANACONDA ENTERPRISE DEMO
• Shared notebooks
• Locking
• Version...
30
REPRODUCIBILITY HEADACHES
DYSFUNCTION #4
31
REPRODUCIBILITY HEADACHES
DYSFUNCTION #4
© 2017 Continuum Analytics - Confidential & Proprietary 32
How it affects collaboration
• Slows exchange of ideas as time ...
© 2017 Continuum Analytics - Confidential & Proprietary 33
Data science team requirements
• Tech: Isolation, lightweight, ...
© 2017 Continuum Analytics - Confidential & Proprietary 34
How people are solving it today
• Open Source: Anaconda Project...
35
DYSFUNCTION #5
Deployment Barriers
© 2017 Continuum Analytics - Confidential & Proprietary 36
How it affects collaboration
• Data Scientists and IT/DevOps ha...
© 2017 Continuum Analytics - Confidential & Proprietary 37
Data science team requirements
• People: Ability to balance oth...
© 2017 Continuum Analytics - Confidential & Proprietary 38
How people are solving it today
• Convert from R or Python into...
39
BONUS - DYSFUNCTION #6
Excel as Data Science Tool?
© 2017 Continuum Analytics - Confidential & Proprietary 4040
But I use Excel®
• Bridging the divide btw. Analysts and Data...
© 2017 Continuum Analytics - Confidential & Proprietary 41
How it affects collaboration
• No common ground to share tools ...
© 2017 Continuum Analytics - Confidential & Proprietary 42
Data science team requirements
• Process: Allow Data Scientist ...
© 2017 Continuum Analytics - Confidential & Proprietary 43
How people are solving it today
• Open Source: XLWings
• Enterp...
© 2017 Continuum Analytics - Confidential & Proprietary 4444
• Pull data from data
source and querying
• Machine learning
...
© 2017 Continuum Analytics - Confidential & Proprietary 45
People: Team members who value collaboration
Process: Minimize ...
© 2017 Continuum Analytics - Confidential & Proprietary 46
Next Steps
DOWNLOAD Breaking Data Science Open eBook
Go.continu...
47
Q&A
Upcoming SlideShare
Loading in …5
×

The Five Dysfunctions of a Data Science Team

3,384 views

Published on

Avoid common collaboration snares. Foster healthy data science collaboration.

While the mantra “data science is a team sport” has been around for a few years, many organizations are only now just starting to grow their data science teams beyond a small collection of loosely associated individuals. But whether you’re a team of 2 or leading a team of hundreds, you're at risk of falling into these common collaboration pitfalls.

Data science is about exploring numbers to identify business challenges and identify solutions — initiatives that require cross-functional teams of storytellers, programmers, statisticians, designers and accountants.
Source: Ritika Puri, The Next Web Insider, August 2015

This webinar will identify the five most common challenges and offer actions to improve your data science team’s collaboration.

You will learn to avoid common pitfalls including:

Collaboration challenges among data science teams—including reproducibility and sharing notebooks
Data science deployment challenges preventing business value realization
Understanding your data science collaboration requirements
Tune in and get your questions answered during our Live Q&A.

Published in: Technology
  • Be the first to comment

The Five Dysfunctions of a Data Science Team

  1. 1. © 2017 Continuum Analytics - Confidential & Proprietary© 2017 Continuum Analytics - Confidential & Proprietary Overcoming the Five Dysfunctions of a Data Science Team How to Foster Healthy Data Science Collaboration Anaconda Enterprise Webinar March 28, 2017
  2. 2. © 2017 Continuum Analytics - Confidential & Proprietary 2 INTRODUCTION Stephen Kearns - Product Marketing Manager at Continuum Analytics Prior to Continuum Analytics, Steve was director for the portals & collaboration practice area of a boutique consulting firm in Toronto. Steve holds an undergraduate degree in computer engineering from the University of Waterloo.
  3. 3. © 2017 Continuum Analytics - Confidential & Proprietary “ ”3 Data science is about exploring numbers to identify business challenges and identify solutions — initiatives that require cross functional teams of storytellers, programmers, statisticians, designers and accountants Ritika Puri The Next Web Insider August 2015 Data Science Collaboration
  4. 4. © 2017 Continuum Analytics - Confidential & Proprietary 4 DATA SCIENCE AS A TEAM SPORT
  5. 5. © 2017 Continuum Analytics - Confidential & Proprietary 5 ENTERPRISE DATA SCIENCE INHERENTLY A TEAM SPORT Manage and Govern Manage & Govern Ingest & Prepare Experiment & Train Optimize & DeployExplore & Analyze
  6. 6. © 2017 Continuum Analytics - Confidential & Proprietary “ ”6 We strongly believe that having people from different backgrounds collaborating around a problem is more important than selecting some fancy algorithms… Views on Collaboration Steven Hillion Wall Street Journal March 2017
  7. 7. © 2017 Continuum Analytics - Confidential & Proprietary 7 LENCIONI’S FIVE DYSFUNCTIONS© Table Group
  8. 8. © 2017 Continuum Analytics - Confidential & Proprietary 8 FIVE DYSFUNCTIONS OF A DATA SCIENCE TEAM Dependency Nightmares Collaboration Breakdowns Reproducibility Headaches Deployment Barriers Data (in)Access(ibility)
  9. 9. © 2017 Continuum Analytics - Confidential & Proprietary 9 Duplication of effort • Searchability & discoverability • Lack of reuse Team frustrations • Onboarding costs • Synchronization • Versioning Governance and compliance issues • Data loss • Data access challenges THESE ARE REAL PROBLEMS… The cost to dysfunctional data science teams
  10. 10. © 2017 Continuum Analytics - Confidential & Proprietary 10 Increased data science workflow velocity Increased retention with happier data science teams • Streamlined sharing reduces communication overhead • Less friction accessing sensitive data • Teams are automatically synchronized (esp. geo-dispersed) • Centralized hosting, discovery and search empower reuse • Easier to ramp up on new projects and packages Improved compliance and governance BUT WHEN WE SOLVE THESE DYSFUNCTIONS Benefits of healthy collaboration
  11. 11. © 2017 Continuum Analytics - Confidential & Proprietary 11 People Process Tech REQUIREMENTS FRAMEWORK Breaking the challenges down
  12. 12. 12 DYSFUNCTION #1 Data (in)Access(ibility)
  13. 13. © 2017 Continuum Analytics - Confidential & Proprietary 13 How it affects collaboration • Assumptions made by the data engineer can be silently impacting the results prepared by the data scientist • Data scientists mistrust the data, and the data engineer without either having direct access or some way of viewing the provenance of the transformed data • Using different tools reduces visibility of what’s been done to the data DATA (IN)ACCESS(IBILITY) Trusting the data sources
  14. 14. © 2017 Continuum Analytics - Confidential & Proprietary 14 Data Science Team Requirements • People: Data engineers who relentlessly question/expose their own assumptions about the data; willing to write their ingest and ETL scripts in a language the data scientists use can build trust • Tech: Which language ecosystem is going to facilitate data access across many data sources in the fastest, easiest way, and has the critical mass to be ubiquitous? DATA (IN)ACCESS(IBILITY) Connecting to diverse sets of data
  15. 15. © 2017 Continuum Analytics - Confidential & Proprietary 15 How people are solving it today • Languages: Python, R, Java, C/C++ • Open Source: Pandas, NumPy, Blaze, others • Enterprise Ready: Anaconda Enterprise DATA (IN)ACCESS(IBILITY) Laborious process to connect to diverse sets of data
  16. 16. 16 Dependency Nightmares DYSFUNCTION #2
  17. 17. © 2017 Continuum Analytics - Confidential & Proprietary 17 DEPENDENCY NIGHTMARES
  18. 18. © 2017 Continuum Analytics - Confidential & Proprietary 18 “DOWN THE RABBIT HOLE”Down the rabbit hole we go…
  19. 19. © 2017 Continuum Analytics - Confidential & Proprietary 19 How it affects collaboration • Extra time spent curating packages, specific versions, and handling updates • Teams trying to collaborate across different projects may have different dependency needs which can cause conflicts when they go to share tools and packages DEPENDENCY NIGHTMARES Large effort to set up and maintain list of required packages
  20. 20. © 2017 Continuum Analytics - Confidential & Proprietary 20 Data Science Team Requirements • Process: Sourcing packages & mirroring (inc. behind firewall & airgap), whitelisting/blacklisting, publishing new packages • Tech: Support for multiple languages AND multi-operating system support; strong dependency solver; integration with source control, CI/CD DEPENDENCY NIGHTMARES Large effort to set up and maintain list of required packages
  21. 21. © 2017 Continuum Analytics - Confidential & Proprietary 21 How people are solving it today • Open Source: conda, Pip (Python), install.packages (R), Maven (Java), Composer (PHP) • Enterprise Ready: Anaconda Enterprise DEPENDENCY NIGHTMARES Package Management
  22. 22. © 2017 Continuum Analytics - Confidential & Proprietary 22 • Intelligent dependency management • Cross-platform, cross- language OPEN DATA SCIENCE DATA COMPUTATION Leading Open Data Science Platform Powered by Python Open Data Science • Data Science Collaboration • AI • Data Science for Big Data Data Science Governance • Dashboards& Applications• Self-Service Analytics
  23. 23. 23 DYSFUNCTION #3 Collaboration Breakdowns
  24. 24. © 2017 Continuum Analytics - Confidential & Proprietary 2424 Email latest version of Notebook? • True Notebook Collaboration Email: Where Information Goes to Die
  25. 25. © 2017 Continuum Analytics - Confidential & Proprietary 25 How it affects collaboration • Recipients-only  not discoverable  duplication of work • Terrible version control system  working with incorrect information COLLABORATION BREAKDOWNS Email is where information goes to die
  26. 26. © 2017 Continuum Analytics - Confidential & Proprietary 26 Data science team requirements • People: value information sharing • Process: minimize collaboration tax – simple, close to working local as possible • Tech: Security and permissions, discoverable, searchable, seamless integration working from local workstation to hosted collaboration environment to cluster – has to work anywhere your data lives COLLABORATION BREAKDOWNS Email is where information goes to die
  27. 27. © 2017 Continuum Analytics - Confidential & Proprietary 27 How people are solving it today • Open Source: JupyterHub, Git/Github • Enterprise Ready: Anaconda Enterprise Notebooks COLLABORATION BREAKDOWNS Email is where information goes to die
  28. 28. © 2017 Continuum Analytics - Confidential & Proprietary 28 CUSTOMER PERSPECTIVE Girish Modgil Senior Director of Data & Analytics General Electric
  29. 29. © 2017 Continuum Analytics - Confidential & Proprietary 29 ANACONDA ENTERPRISE DEMO • Shared notebooks • Locking • Versions & differences
  30. 30. 30 REPRODUCIBILITY HEADACHES DYSFUNCTION #4
  31. 31. 31 REPRODUCIBILITY HEADACHES DYSFUNCTION #4
  32. 32. © 2017 Continuum Analytics - Confidential & Proprietary 32 How it affects collaboration • Slows exchange of ideas as time is spent reconstructing exact source environments • Other environment constraints may prevent you from running a new project at all in your current environment which then triggers the requirement for a whole new environment REPRODUCIBILITY HEADACHES “Works on my machine!?!”
  33. 33. © 2017 Continuum Analytics - Confidential & Proprietary 33 Data science team requirements • Tech: Isolation, lightweight, cross-platform, easy-to-use REPRODUCIBILITY HEADACHES “Works on my machine!?!”
  34. 34. © 2017 Continuum Analytics - Confidential & Proprietary 34 How people are solving it today • Open Source: Anaconda Project (see Blog Post), conda (w/ conda Environments), VirtualBox (VM), Docker • Enterprise Ready: Anaconda Enterprise REPRODUCIBILITY HEADACHES “Works on my machine!?!”
  35. 35. 35 DYSFUNCTION #5 Deployment Barriers
  36. 36. © 2017 Continuum Analytics - Confidential & Proprietary 36 How it affects collaboration • Data Scientists and IT/DevOps have very different areas focus; models aren’t often built to operationalize in existing production systems DEPLOYMENT BARRIERS “You can’t put THAT into prod!?!”
  37. 37. © 2017 Continuum Analytics - Confidential & Proprietary 37 Data science team requirements • People: Ability to balance other needs in the system • Process: Sketch out the “road to production” blending stability and speed; consider enterprise support requirements – security for open source, assurance, indemnification • Tech: Ability to run original data scientist code in production at required performance levels; APIs and interoperate between applications and projects; also security, scalability, high-availability and version control  alleviate these infrastructure concerns from the Data Scientist • This blog post provides an in-depth review of processes and tools required to deploy enterprise data science and meet the stringent requirements of enterprise DevOps DEPLOYMENT BARRIERS “You can’t put THAT into prod!?!”
  38. 38. © 2017 Continuum Analytics - Confidential & Proprietary 38 How people are solving it today • Convert from R or Python into Java, C/C++ or .Net • Open Source: conda, Java, C/C++, .Net • Enterprise Ready: Anaconda Enterprise (Join Innovators Program Now) DEPLOYMENT BARRIERS “You can’t put THAT into prod!?!”
  39. 39. 39 BONUS - DYSFUNCTION #6 Excel as Data Science Tool?
  40. 40. © 2017 Continuum Analytics - Confidential & Proprietary 4040 But I use Excel® • Bridging the divide btw. Analysts and Data Scientists What Excel Looks Like to a Data Scientist
  41. 41. © 2017 Continuum Analytics - Confidential & Proprietary 41 How it affects collaboration • No common ground to share tools and exchange information • Broken workflow EXCEL® AS A DATA SCIENCE TOOL Bridging the divide between Business Analysts and Data Scientists
  42. 42. © 2017 Continuum Analytics - Confidential & Proprietary 42 Data science team requirements • Process: Allow Data Scientist to use Data Science tools, don’t make Business Analysts have to learn a programming language • Tech: bridge the gap to provide Business Analyst support for basic use cases as well as machine learning, visualization and connecting to Big Data from inside Excel EXCEL® AS A DATA SCIENCE TOOL Bridging the divide between Business Analysts and Data Scientists
  43. 43. © 2017 Continuum Analytics - Confidential & Proprietary 43 How people are solving it today • Open Source: XLWings • Enterprise Ready: Anaconda Fusion EXCEL® AS A DATA SCIENCE TOOL Bridging the divide between Business Analysts and Data Scientists
  44. 44. © 2017 Continuum Analytics - Confidential & Proprietary 4444 • Pull data from data source and querying • Machine learning • Visualization example ANACONDA FUSION DEMO
  45. 45. © 2017 Continuum Analytics - Confidential & Proprietary 45 People: Team members who value collaboration Process: Minimize the collaboration tax Tech: A platform that enables healthy data science collaboration TAKEAWAYS Optimizing for collaboration can increase data science workflow velocity Result: engaged, productive teams, working towards their shared goals
  46. 46. © 2017 Continuum Analytics - Confidential & Proprietary 46 Next Steps DOWNLOAD Breaking Data Science Open eBook Go.continuum.io/download-ebook-breaking-data-science-open CHECKOUT Anaconda Fusion Trial Go.continuum.io/2017-anaconda-fusion-trial EXPERIENCE Anaconda Enterprise on your own Know.continuum.io/Anaconda-Enterprise-Test-Drive.html
  47. 47. 47 Q&A

×