Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Science Beyond the Sandbox

From Peter Wang's talk at Strata Data Conference in New York City on September 27, 2017.

Most businesses have figured out that the value of data science originates in the dynamic, exploratory activities of data scientists, equipped with their favorite tools and algorithms. However, data science teams often produce artifacts that are difficult for others in the enterprise to directly consume. Business analysts are too intimidated by the code, and software developers are sometimes ignorant of the sophisticated math and data analysis.

Anaconda can help resolve both of these issues and help data science teams easily move their work out of the exploratory sandbox and into production servers in a way that IT can feel good about. Meanwhile, other capabilities within the Anaconda platform can expose the results of analysis easily to business analysts and their spreadsheet-based workflows.

Peter Wang explores the typical problems data science teams experience when working with other teams and explains how these issues can be overcome through cohesive collaborative efforts among data scientists, business analysts, IT teams, and more.

  • Login to see the comments

Data Science Beyond the Sandbox

  1. 1. Data Science Beyond the Sandbox Peter Wang Anaconda, Inc. CTO, Co-founder
  2. 2. Agenda • Data Science in the Enterprise • Data Scientists in the Enterprise • Anaconda in the Enterprise
  3. 3. Produce Very Popular OSS Data Science Platform Lead Large Ecosystem of Foundational OSS Data Science Projects Extend OSS Innovation with Products and Services that Target IT Departments
  4. 4. 4© 2017 Anaconda, Inc. - Confidential & Proprietary Anaconda - The Most Popular Data Science Platform
  5. 5. 5© 2017 Anaconda, Inc. - Confidential & Proprietary Anaconda Enterprise 5
  6. 6. 6© 2017 Anaconda, Inc. - Confidential & Proprietary Customers Across Industries Financial Services • Risk management, quant modeling, data exploration and processing, algorithmic trading, compliance reporting Government • Fraud detection, data crawling, web and cyber data analytics, statistical modeling Healthcare & Life Sciences • Genomics data processing, cancer research, natural language processing for health data science High Tech • Customer behavior, recommendations, ad bidding, retargeting, social media analytics Retail & CPG • Engineering simulation, supply chain modeling, scientific analysis Oil & Gas • Pipeline monitoring, noise logging, seismic data processing, geophysics
  7. 7. Data Science in the Enterprise 7© 2017 Anaconda, Inc. - Confidential & Proprietary
  8. 8. 9© 2017 Anaconda, Inc. - Confidential & Proprietary Some Perspective… • It’s not “one size fits all” - Many success stories are at young, fast startups - Larger companies tout successes but… talk != reality • Algorithms matter, but people matter more • Core challenge of data science within modern businesses
  9. 9. 10© 2017 Anaconda, Inc. - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary Conway’s Law !X The design of any piece of software reflects the communications structure of the organization that produced it.
  10. 10. 11© 2017 Anaconda, Inc. - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary Peter’s Corollary to Conway’s Law !X The architecture of any business data system evolves to reflect the budget structure of the IT groups that maintain it. … not strategic or operational needs … not ensuring future analytical agility … not optimizing for rapid insights
  11. 11. 12© 2017 Anaconda, Inc. - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary !X source: Master Data Management and Data Governance, 2e
  12. 12. 13© 2017 Anaconda, Inc. - Confidential & Proprietary© 2016 Continuum Analytics - Confidential & Proprietary !X source: Master Data Management and Data Governance, 2e Data Science “Sandbox”
  13. 13. 14© 2017 Anaconda, Inc. - Confidential & Proprietary Common Problems • Data Science Sandbox is on isolated network, outside of “GRC reservation” - Provides freedom to data scientists - Protects production ETL, DW, event processing - … but moving anything from Sandbox to Production is a huge pain • Multiple orgs / LOBs interface with Data Science team in the mixed sandbox environment • Compliance, audit, & risk control?
  14. 14. 15© 2017 Anaconda, Inc. - Confidential & Proprietary Contrasting Concerns Exploration Production Data • Fast, unfettered access • Ease of introducing new, varied, messy datasets • Reproducibility • Strict, governed access • Well-defined schema • Provenance & auditability Compute Infrastructure • High performance • Low latency, interactive • Individualized & specialized • Scalable, high-availability • Manageable at scale • Cost amortization over many machines and users Organization • Individual high-achievers with lots of context & capability • Agile, able to quickly learn new skills and approaches • Sustain operations at lowest possible cost • Robustness against unintended change
  15. 15. 16© 2017 Anaconda, Inc. - Confidential & Proprietary Core Challenges • Data Exploration generates insight & is required to respond to business challenges • Production data processing & analytics requires different operational concerns • Over-engineering for either leads to structural deficiencies • Modern & future needs will require more agile exploration
  16. 16. Data Scientists in the Enterprise
  17. 17. 18© 2017 Anaconda, Inc. - Confidential & Proprietary What is Data Science?
  18. 18. 19© 2017 Anaconda, Inc. - Confidential & Proprietary What is Data Science?
  19. 19. 20© 2017 Anaconda, Inc. - Confidential & Proprietary What is Data Science?
  20. 20. The Data Science Team 21© 2017 Anaconda, Inc. - Confidential & Proprietary
  21. 21. What About the Rest of the Organization? 22© 2017 Anaconda, Inc. - Confidential & Proprietary Sales Marketing User research Domain experts
  22. 22. What About the Rest of the Organization? 23© 2017 Anaconda, Inc. - Confidential & Proprietary Sales Marketing User research Domain experts
  23. 23. 24© 2017 Anaconda, Inc. - Confidential & Proprietary Works with: Excel, Tableau, SQL Python, Hadoop, Spark Excel, Salesforce, Marketo Thinks data is: spreadsheets, tables dataframes, arrays information to act on Delivers: Reports, dashboards, spreadsheets Notebooks, code, interactive visualizations Recommendations, decisions, actions Other titles: Data analyst, analyts Research scientist, Machine learning engineer Executives, managers (in Sales, marketing, HR…) analytics data science self-service analytics Data ScientistsBiz Analyst Domain experts
  24. 24. Anaconda Enterprise 25© 2017 Anaconda, Inc. - Confidential & Proprietary
  25. 25. © 2017 Anaconda Inc- Confidential & Proprietary Anaconda Enterprise Users 26 Data Scientists Business Analysts & Managers IT/Administrators Developers & ML Engineers • Collaborate • Reproduce • Deploy • Govern • Secure • Scale • Self-Serve • Interact • Understand • Build • Publish • Consume
  26. 26. 27© 2017 Anaconda, Inc. - Confidential & Proprietary Data Science Platform Anaconda Enterprise Data Scientists IT & Admin BA & Managers On-Premises Cloud Anaconda Enterprise Overview Users Deployment Infrastructure Packages Projects Deployments ML REST APIs Dashboards Notebooks Web Apps Bokeh Apps Shiny Apps Amazon AWS Microsoft Azure Google Cloud
  27. 27. 28© 2017 Anaconda, Inc. - Confidential & Proprietary • Centralized, browser-based notebook collaboration with versioning and access control • Integrated data science environments with Jupyter Notebooks and JupyterLab • Manage and share data science projects and dependencies Collaboration
  28. 28. © 2017 Anaconda Inc- Confidential & Proprietary 2 9 • Upload and share projects and notebooks with portable data science environments • Enterprise-grade data science reproducibility and portability with Anaconda Project • Mirror data science packages and dependencies within your organization Reproducibility
  29. 29. 30© 2017 Anaconda, Inc. - Confidential & Proprietary • One-click deployment of self- service notebooks, interactive visualizations, machine learning models, REST APIs and other apps • Industry standard containerization and cluster orchestration technology Deployment
  30. 30. 31© 2017 Anaconda, Inc. - Confidential & Proprietary • On-premises data science package repository • License filtering and license audit reports • Event logging and auditing of package, project and deployment activity Governance
  31. 31. 32© 2017 Anaconda, Inc. - Confidential & Proprietary • Integrated with enterprise- grade identity providers: LDAP, AD, SAML, Kerberos • Secure network communications and end-to- end TLS/SSL encryption • Centralized user management portal and token-based access to deployed models and applications Security
  32. 32. 33© 2017 Anaconda, Inc. - Confidential & Proprietary • Distribute Anaconda libraries across Hadoop and Spark clusters • Connect Anaconda Enterprise to Spark or Dask and perform distributed computations interactively • Scalable distributed computation resources for project editing and user- deployed data science apps Scalability
  33. 33. 34© 2017 Anaconda, Inc. - Confidential & Proprietary
  34. 34. Analysts and Domain Experts can use Excel Data Scientists can keep using Python & R 35
  35. 35. 36© 2017 Anaconda, Inc. - Confidential & Proprietary Key Strengths Unique Differentiators Most comprehensive data science development cycle - ML algorithm development and package management - Self-service dashboards and reports for managers Widest range of deployment options - Deployment of: - Notebooks (Python, R, Spark) - ML REST APIs - Interactive applications: Bokeh apps, Shiny apps - Web apps (Django, Flask) - Composable applications Unique continuity of experience and reproducibility - Identical experience on laptop and Anaconda: - JupyterLab, Jupyter Notebooks - Run platform data science projects on Linux, Windows, macOS Only vendor to support and indemnify Anaconda Distribution - First-class integration with Anaconda Distribution - Protect: Reduce Risk of Intrusions, Vulnerabilities & Infringement - Solve: Get Questions Answered, Guided by Experts - Deliver: Resolve Issues Quickly & Minimize Downtime
  36. 36. 37© 2017 Anaconda, Inc. - Confidential & Proprietary
  37. 37. 38© 2017 Anaconda, Inc. - Confidential & Proprietary Register now for a free 30-day Anaconda Enterprise 5 Test Drive https://go.anaconda.com/test-drive-anaconda-enterprise-5/ Read Deploying Secure and Scalable Data Science Projects http://go.continuum.io/wp-productionizing-deploying-secure-scalable-ds-projects/ Contact us by phone or email +1 (512) 776-1066 | sales@anaconda.com Get Started with Anaconda Enterprise

×