Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

DataOps: Nine steps to transform your data science impact Strata London May 18

626 views

Published on

According to Forrester Research, only 22% of companies are currently seeing a significant return from data science expenditures. Most data science implementations are high-cost IT projects, local applications that are not built to scale for production workflows, or laptop decision support projects that never impact customers. Despite this high failure rate, we keep hearing the same mantra and solutions over and over again. Everybody talks about how to create models, but not many people talk about getting them into production where they can impact customers.

Harvinder Atwal offers an entertaining and practical introduction to DataOps, a new and independent approach to delivering data science value at scale, used at companies like Facebook, Uber, LinkedIn, Twitter, and eBay. The key to adding value through DataOps is to adapt and borrow principles from Agile, Lean, and DevOps. However, DataOps is not just about shipping working machine learning models; it starts with better alignment of data science with the rest of the organization and its goals. Harvinder shares experience-based solutions for increasing your velocity of value creation, including Agile prioritization and collaboration, new operational processes for an end-to-end data lifecycle, developer principles for data scientists, cloud solution architectures to reduce data friction, self-service tools giving data scientists freedom from bottlenecks, and more. The DataOps methodology will enable you to eliminate daily barriers, putting your data scientists in control of delivering ever-faster cutting-edge innovation for your organization and customers.

Published in: Data & Analytics

DataOps: Nine steps to transform your data science impact Strata London May 18

  1. 1. DataOps: 9 steps to transform your data science impact 21-24 May 2018
  2. 2. // Harvinder Atwal MoneySuperMarket // Web dunnhumby {"previous" : "Insight Director, Tesco Clubcard"} Lloyds Banking Group {"previous" : "Senior Manager, Customer Strategy and Insight"} {"Current" : "Head of Data Strategy and Advanced Analytics"} @harvindersatwal British Airways {"previous" : "Senior Operational Research Analyst"} {"about" : "me"} @gmail.com
  3. 3. £2B SAVINGS 2017 estimate total of UK savings 1993 24.9M 24 million £323M 989 We started life as mortgages 2000 Adults choose to share their data with us Average monthly users 2017 Revenue 2017 Product Providers
  4. 4. Sometimes it’s simple things that work really well From one version to 1400+ customised variants of the newsletter +19% Increase in Revenue Per Send
  5. 5. Sometimes it’s more complicated solutions Worried about whether you can afford a personal loan? With UK interest rates at record lows, it’s worth checking to see how reasonable the cost could be. Whether you need to borrow to buy something, or you want to bring your existing debts under one roof, have a look at these competitive deals we’ve assembled. Thanks to our Smart Search tool, you can get an idea of the loans you’re likely to be accepted for before you proceed with your application. Same message but Language tailored to the customer’s Financial Attitude
  6. 6. Only 22% of companies are currently seeing a significant return from data science expenditures* *Obligatory conference presentation quote from GartnerForresterMcKinsey Consulting. Sorry.
  7. 7. The mantra is wrong
  8. 8. HIRE DATA SCIENTISTS How businesses think they become data-driven 1 2 3 MONEY FLOWS HOARD DATA 4
  9. 9. Warning: A data-driven customer focussed strategy will not paper over cracks in operational performance or product deficiency
  10. 10. Putting Data ahead of the Customer or Financial Performance
  11. 11. Source: tamr
  12. 12. Multiple challenges in the process of turning data into value on existing infrastructure Business Problem Evaluate available data Request Data Access from IT Request Compute Resources from IT Negotiate with IT for requested resources Wait for resources to be provisioned Install Languages and tools Configure connectivity, Access and security RAM/CPU Availability, scaling, monitoring Request network Config Change Request to install another package Model building Compose PowerPoint to share results Edit Confluence to document work Negotiate with business stakeholder on deployment timeline Wait for Data Engineering to implement the model Test Newly implemented model to ensure valid results Request Modifications to model due to unexpected results Release model to production and schedule Document release notes and deployment steps Prepare for change management
  13. 13. Data Science trapped in laptops
  14. 14. Thinking real-life Data Science is a Kaggle competition
  15. 15. Treating Data Science as a Death Star Technology Project
  16. 16. Insight does not scale!. Using data to generate ad hoc Decision Support Insight INSTEAD OF ACTION
  17. 17. Money is wasted Time is wasted Talent is wasted
  18. 18. Eliminate waste LEAN THINKING The Optimist The Pessimist The Lean Thinker THE GLASS IS HALF FULL THE GLASS IS HALF EMPTY WHY IS THE GLASS TWICE AS BIG AS IT SHOULD BE?
  19. 19. Alignment of data science with the rest of the organisation and it's goals
  20. 20. It’s a sprint not a marathon
  21. 21. Problems with Agile Data Science - How do you define business value?
  22. 22. DATA STORAGE Cloud File Storage Distributed File System NoSQL DB RDBMS COMPUTE INFRASTRUCTURE ResourceManagement/Monitoring/Auditing Scheduling ProjectandDataGovernance DataEngineering Distributed SQLQuery Engine Distributed Compute Framework Compute Instances Coding Workspace & Language Libraries Output Files ANALYTICS LAYER Machine Learning libraries Data Visualisation libraries BI Tools Interactive dashboards/ Web Apps Security/IdentityAccesscontrol APIs Data Prep/Exploration tools Summary Analysis, Analysis of Experiments, Segmentation, Machine Learning, Data Matching Revision/Deployment Tools Interactive Dashboards/ Web App development Applications (Business Layer) Insight Marketing Optimisation External Data Products Internal Reporting Website Optimisation Commercial Optimisation Production Code DevOps/Infrastructure DBAs ETL DQM MetadataManagement Agile Data Science does not solve tech complexity problem Container Service Resource Vertical requirementsDATA PRODUCTS (Presentation/ Service Layer) Deployment,OrchestrationandScaling ConfigurationManagement RevisionControl KnowledgeManagement DataScientists DATA SOURCES Stream Processing Framework
  23. 23. Changing the way we work
  24. 24. Data Science can’t happen in a vacuum Situational Awareness is needed
  25. 25. Your business already has hypothesis for what creates value Actively avoid work on anything else It’s the Corporate Strategy and Objectives (everyone is aligned behind)
  26. 26. Measurement of everything gives feedback of not just individual deliverables (fast loop) but also the organisation’s hypothesis of what adds value (slow loop) Situational Awareness Objectives (Themes) Strategies (Initiatives) Tactics (Epics) Actions (Stories) Strategies (Initiatives) Strategies (Initiatives) Objectives (Themes) Tactics (Epics) Tactics (Epics) Tactics (Epics) Tactics (Epics) Tactics (Epics) Actions (Stories) Actions (Stories) Actions (Stories) Actions (Stories) Actions (Stories) Actions (Stories) Actions (Stories) Actions (Stories) Actions (Stories) Actions (Stories) Actions (Stories) Corporate strategy is broken down into many options (Epics) for Agile delivery
  27. 27. We reduce Batch sizes of work and have options to keep flow going
  28. 28. Collaboration is key Shared Buy-in from Senior management Organizational behavior structured around the ideal data-journey model Shared Priorities Shared Trust in data Shared Rewards based on measured outcomes, not outputs
  29. 29. Test & Collect Model Embed Roll Out Feedback Plan Pilot test Collect Data Build Model, Identify segments Adjust model to fit organisation Re-engineer business processes to support segmented execution Train organisation Creation of fast feedback loop
  30. 30. Data cycles are measured to eliminate bottlenecks
  31. 31. Shortened Data Cycles to be Agile Data Engineering Dev Ops/Infrastructure DB Management Cloud File Storage Distributed File System NoSQL DB RDBMS Distributed SQLQuery Engine Distributed Compute Framework Compute Instance Container Service Data Prep/Exploration tools Coding Workspace & Language Libraries Machine Learning Data Visualisation Interactive Dashboards/ Web App development Version/ Deployment Tool Output Files BI Tools Interactive dashboards /Web Apps APIs Knowledge Management Security/Identity Access control Revision Control Configuration Management Orchestration and scaling Project and Data Governance Scheduling Resource Management/Monitoring/Auditing ETL DQM Data Scientists Epic Customer Feedback & Iteration Data Product Strategy Story Stream Processing Data Sources
  32. 32. Agile Practice DevOps Culture Lean Thinking We had accidentally stumbled on DataOps Data Analytics
  33. 33. DataOps was popularised by Andy Palmer in a 2015 Blog post
  34. 34. DataOps is an independent approach to data analytics Data Analytics team moves at lightening speed using highly optimized tools and processes across the whole data lifecycle Agile Collaboration to break down silos and work on “The Right Things” that add value Lean Manufacturing like focus on eliminating waste & bottlenecks, improving quality, monitoring and control Iterative project management Continuous delivery Automated test and deployment Monitoring Self-serve Quality Governance Organisational alignment Ease of use PredictabilityReproducibility Strategic Objectives
  35. 35. Further steps to Trust DevOps Reproducibility implementation Self-serve Organisation
  36. 36. Why do we have brakes on a car?
  37. 37. Accept the delivery pipeline is governed by rules and constraints
  38. 38. Trust part 1: Make the “What you do to data” people in the organisation happy Identity and Access Management Custom role permissions Audit trail logs Data Loss Prevention Encryption of Data at Rest Encryption of Data in Motion Resource Monitoring Firewall rules Resource and Object Isolation Penetration Testing Code Encryption and Backup Segregation of Duties Authorisation protocols Data Access and Privacy Policy Metadata Management Data Lineage Tracking Data Stewards and Owners
  39. 39. Trust part 2: Make the “What you do with data” people in the organisation happy Data Quality Testing Transformation Testing End-User Testing ETL Integration Testing Metadata Testing Data Completeness Testing ETL Regression Testing Incremental ETL Testing Reference Data Testing ETL Performance Testing
  40. 40. Automated reproducibility is a must
  41. 41. Configuration Management For consistently reproducible computational environments
  42. 42. Continuous Integration: Commit Code Regularly Data Cleaning Master Data Cleaning Dev Branch Feature Extraction Dev Feature Extraction Master Model Train Master Model Train Dev Branch Machine Learning Pipeline Product Development (e.g. App, Website, Marketing system, Operational System, Dashboard, etc.)
  43. 43. Run tests and review code (please integrate safely)
  44. 44. Continuous Delivery and Beyond: Accelerating Deployment Dev Integration testApplication test Acceptance test Production Continuous Integration Dev Integration testApplication test Continuous Delivery Dev Integration testApplication test Acceptance test Production Continuous Deployment Automated Manual
  45. 45. Continuous Operations: Resources that scale
  46. 46. Chemistry is not about tubes DataOps is not about tools (but the right ones help)
  47. 47. Align your spine Needs Principles Practices Tools Values How do you know it is the best possible tool? How do you know that the Practices actively help the system? How do you know which Principles you want to apply? “We use _____ to get our work done” “We DO Self-Service and DataOps to continuously create VALUE for the customer and business” We LEVERAGE Agile and Lean PRINCIPLES to change the system and make sure resources work on the right thing We OPTIMISE for Speed, Accuracy, Experimentation/Feedback and Security. We are here to SATISFY THE NEED to help customers save money and the business to execute it’s strategy It all starts at Needs. Why does this system exist in the first place? Source: Kevin Trethewey, Danie Roux, Joanne Perold
  48. 48. Avoid building your own anything or being on the bleeding edge. Cost of Delay is high.
  49. 49. Data Scientists need a way to manage their projects end-to-end with self-service data AND ARCHITECTURE Business Problem Evaluate available data Request Data Access from IT Request Compute Resources from IT Negotiate with IT for requested resources Wait for resources to be provisioned Install Languages and tools Configure connectivity, Access and security RAM/CPU Availability, scaling, monitoring Request network Config Change Request to install another package Model building Compose PowerPoint to share results Edit Confluence to document work Negotiate with business stakeholder on deployment timeline Wait for Data Engineering to implement the model Test Newly implemented model to ensure valid results Request Modifications to model due to unexpected results Release model to production and schedule Document release notes and deployment steps Prepare for change management
  50. 50. Modern serverless and managed infrastructure makes it easy to create data products just bring code and data A single unified platform reduces data fragmentation, overcomes business silos and helps enforce consistent governance
  51. 51. You can make the data supply chain more efficient by unifying data and tools in one platform Data Warehouse(s)ETL Analytic s Platform Core Data Other Data Extract/Load OffLoad Main Source(s) of Truth Presentation/ Service Layer(s) Analytical tools Predictive and Prescriptive analytics Flatten/Merge columns Data Sharing BI Tools Descriptive and Diagnostic analytics Source Cubes on Dimensions Reload Data Microservices Flatten/Merge columns Data Sharing
  52. 52. Data Science Platforms add further self-serve capabilities Data Access, Prep and Exploration Jupyter, Rstudio, Zeppelin, etc. Automation and Machine Learning Run experiments, track and compare results Delivery and Model Management Publish APIs, Interactive web apps Schedule reports Collaboration and Version Control Discover, discuss and build on existing work Compute Environment Library Customised software stack Compute Grid Orchestrate hardware for development and deployment Source: Domino Data Labs
  53. 53. The market for platforms is exploding
  54. 54. Data Scientist Data Analyst Data Engineer Self-serve enables reduced DataOps roles ETL Quality Testing Descriptive Analytics Advanced Analytics BI Dev Ops Infrastructure Engineers DBAs X X X Business Stakeholders Operations Sys adminX Developers ML Product Managers
  55. 55. Implement AI: Actionable Intelligence
  56. 56. #1 Eliminate wasted effort Find the FASTEST, CHEAPEST path between data and consumers
  57. 57. #2 Align with the Organisation through Agile Collaboration
  58. 58. #3 Deliver Products not Projects Prioritize solutions that fit into a DataOps workflow over others
  59. 59. #4 Build a measurement and feedback culture
  60. 60. #5 Embrace Development best practise in Data Science Version Control, Configuration management, Continuous Integration, Continuous Operations
  61. 61. #6 KEEP CALM AND BUILD TRUST IN DATA Put Effective Data Governance, Security and Testing in place
  62. 62. #7 Invest in tools and process to reduce bottlenecks and increase quality Managed Infrastructure and Serverless Cloud, Automation and Data Science Platforms
  63. 63. #8 Decentralise Self-service analytics AND cloud infrastructure
  64. 64. #9 Organise around the ideal data journey instead of teams Fewer roles, more end-to-end ownership, less friction Store Share UseManageAcquire Process Data Engineering Data Scientists Data Analysts Business Stakeholders
  65. 65. #9.5 Optimise data cycles for… SPEED!
  66. 66. Data Science Today Customer Data ? Hamster wheel Analytics The Roadblock The Aimless crash and burn The “So What happened?” The “We did it once, why doesn’t it work again?”
  67. 67. The DataOps Data Science Factory Epic Customer Data Product Strategy Story Data Rest of Business Analytics Agile Collaboration Data Governance Automated testing Value Measurement Version Control Configuration Management Self-Serve Infrastructure Automation Continuous Integration
  68. 68. Sequences shortened
  69. 69. Questions?
  70. 70. // Harvinder Atwal // Web var current: { companyName : "MoneySuperMarket", position : "Head of Data Strategy" + " and Advanced Analytics" }; var previous1: { companyName : "Dunnhumby", position : "Insight Director," + " Tesco Clubcard" }; var previous2: { companyName : "Lloyds Banking Group", position : "Senior Manager" }; var previous3: { companyName : "British Airways", position : "Senior Operational Research Analyst" }; {"about" : "me"} var username = "harvindersatwal"; var linkedIn = "/in/" + username; var twitter = "@" + username; var email = username + "@gmail.com";

×