Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Agile Data Science


Published on

To be successful as a data science team, we need to continuously deliver data-driven insights and data products that generate business value. Identifying the best opportunities and building solutions that actually get used in production requires very close collaboration with business users and subject matter experts. What can we learn from agile software development methodologies, and how can we apply them to data science projects?

Published in: Technology

Agile Data Science

  1. 1. Agile Data Science Alexander Bauer Lead Data Scientist @ Lidl Frankfurt Analytics Meetup, 2017/02/24
  2. 2. Agenda • Data Science • Challenges • Agile Data Science Projects • Case Study
  3. 3. What is Data Science? • Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured – Wikipedia
  4. 4. Business Goals Why do companies hire data scientists? • Reduce costs • Increase revenue • Reduce risk • Create innovation
  5. 5. Deliverables How do data scientists deliver? • Actionable insights (reports) • Data products • New product features • Trials, A/B Testing
  6. 6. Challenges Why do many data science projects fail? • Lack of Business Understanding • Data Access (Security, Privacy) • Deployment and Operation (Scalability, Acceptance) • Time to market (Competition, Budget)
  7. 7. Case Study: Data Science for Sales Department I want a recommender system for my Sales Reps Sure, we can use Alternating Least Square Singular Value Decomposition!
  8. 8. Case Study: Data Science for Sales Department Show me what you can do with Deep Learning Cool, we can do something with Tensorflow on your data
  9. 9. Case Study: Data Science for Sales Department I want a dashboard of sales by country and product Well, we can do visualizations - but that‘s actually not my job!
  10. 10. Typical pitfalls during project execution Modeling Trial/Pilot Operationalization No access to data Model does not scale Users don‘t accept solution Fails to meet business objective Not enough signal 12 months Out of budget
  11. 11. Solution: Iterative Approach CRISP-DM
  12. 12. Agile Data Science How can we implement CRISP-DM in practice? • Agile Product Management • Agile Development • Data Science Platform / Data Lake
  13. 13. Agile Product Management – The Product Vision Statement1 13  Close deals  Prioritize leads  Prevent churn  Acquire new leads  Up-sell  Cross-sell  Sales Reps  Sales Manager Target Group Needs Product Business Goals  Increase conversion rate  Increase average basket size  Reduce churn rate  Grow customer base „Leverage data science to increase sales team productivity“ ? 1Roman Pichler: Agile Product Management with Scrum
  14. 14. User Stories – Briding the gap between algorithms and business needs Association Rules: As a sales rep, I need to understand which products are often bought together, so that I can recommend additional products during sales calls and increase upsale. Churn Factor Analysis: As a sales rep, I need to understand the factors that drive churn so that I can select customers to call, make sure they are satisfied with our products and reduce churn. Recommender system: As a sales rep, for each customer I need to understand which products were bought by customers with similar purchase history, so that I can make personalized recommendations and increase upsale.
  15. 15. Story Mapping and Release Planning Up/Cross-Selling Churn Prevention Leads Prioritization User Interface/Deployment Association Rules Factor Analysis Conversion - Factor Analysis Item-Item Recommender Viz: Top N Items per customer A/B Testing Simple Predictive Model for Churn (sales history data) Improved predictive model for churn (incl. CRM data) Content-based recommender for cold- start (incl. CRM data) Release 1 Release 2 Release 3 A/B Testing Viz: Top N customer to likely to churn
  16. 16. Agile Development with Scrum Data Science is a Team Sport
  17. 17. Data Lake/ Agile Platform CRM Purchase Data Call Center Tickets Platform Layer Application Layer Docker/VMs App Security/Auth Auditing Monitoring Unstructured Data Structured Data Scalable Job Execution / Query Engine App REST ETL Query Interface /Notebooks Visualization Tools Scheduling Legacy Systems Business Users Analysts/ Data Scientists
  18. 18. Summary / Call for Action • Data science projects rarely fail because of insufficient modeling skills • Focus on business value, deliver „good enough“ models first • Deliver in small increments that already provide value end-to-end, present in Sprint Reviews to all stakeholders • Manage stakeholers using a clear product vision, a user story backlog and release plans • Deploy as early as possible to ensure user acceptance, declare as „beta“ mode • Build an infrastructure that enables agile development
  19. 19. Thank you! Questions?