The presentation discusses the importance of data science in achieving business goals, highlighting challenges that often lead to project failures such as lack of understanding, data access issues, and deployment obstacles. It advocates for an agile approach, incorporating methods like CRISP-DM, to facilitate iterative delivery of data-driven insights and products. The document emphasizes the necessity of aligning data science efforts with business objectives and the value of early deployment for user acceptance.
Introduction of Agile Data Science by Alexander Bauer and a brief agenda of the presentation.
Defines Data Science, its business goals like reducing costs and increasing revenue, and deliverables such as actionable insights.
Outlines common reasons for project failures, such as lack of understanding and data access issues, and typical pitfalls during execution.
Introduces CRISP-DM as a solution and discusses how to implement Agile methodologies in Data Science practices.
Details on creating a product vision statement targeting business goals and bridging algorithms to business needs with user stories.
Describes story mapping, release planning, and emphasizes that Data Science is a team effort.
Describes the framework for a Data Lake/Agile platform that supports various data types to facilitate Data Science.Summarizes key takeaways and encourages agile development and early deployment for better project success.
What is DataScience?
• Data science, also known as data-driven
science, is an interdisciplinary field about
scientific methods, processes and systems
to extract knowledge or insights from data
in various forms, either structured or
unstructured – Wikipedia
4.
Business Goals
Why docompanies hire data scientists?
• Reduce costs
• Increase revenue
• Reduce risk
• Create innovation
5.
Deliverables
How do datascientists deliver?
• Actionable insights (reports)
• Data products
• New product features
• Trials, A/B Testing
6.
Challenges
Why do manydata science projects fail?
• Lack of Business Understanding
• Data Access (Security, Privacy)
• Deployment and Operation (Scalability,
Acceptance)
• Time to market (Competition, Budget)
7.
Case Study: DataScience for Sales Department
I want a
recommender
system for my
Sales Reps
Sure, we can use
Alternating Least
Square Singular
Value
Decomposition!
8.
Case Study: DataScience for Sales Department
Show me what you
can do with Deep
Learning
Cool, we can do
something with
Tensorflow on
your data
9.
Case Study: DataScience for Sales Department
I want a
dashboard of
sales by country
and product
Well, we can do
visualizations - but
that‘s actually not my
job!
10.
Typical pitfalls duringproject execution
Modeling
Trial/Pilot
Operationalization
No access to data
Model does
not scale
Users don‘t
accept solution
Fails to meet business objective
Not enough signal
12 months
Out of budget
Agile Data Science
Howcan we implement CRISP-DM in practice?
• Agile Product Management
• Agile Development
• Data Science Platform / Data Lake
13.
Agile Product Management– The Product Vision Statement1
13
Close deals
Prioritize leads
Prevent churn
Acquire new leads
Up-sell
Cross-sell
Sales Reps
Sales Manager
Target Group Needs Product Business Goals
Increase
conversion rate
Increase average
basket size
Reduce churn rate
Grow customer base
„Leverage data science to increase sales team productivity“
?
1Roman Pichler: Agile Product Management with Scrum
14.
User Stories –Briding the gap between
algorithms and business needs
Association Rules:
As a sales rep, I need to understand which products are often bought together, so that I
can recommend additional products during sales calls and increase upsale.
Churn Factor Analysis:
As a sales rep, I need to understand the factors that drive churn so that I can select
customers to call, make sure they are satisfied with our products and reduce churn.
Recommender system:
As a sales rep, for each customer I need to understand which products were bought by
customers with similar purchase history, so that I can make personalized
recommendations and increase upsale.
15.
Story Mapping andRelease Planning
Up/Cross-Selling Churn Prevention Leads Prioritization
User
Interface/Deployment
Association Rules Factor Analysis
Conversion - Factor
Analysis
Item-Item
Recommender
Viz: Top N Items per
customer
A/B Testing
Simple Predictive
Model for Churn
(sales history data)
Improved predictive
model for churn
(incl. CRM data)
Content-based
recommender for cold-
start (incl. CRM data)
Release 1
Release 2
Release 3
A/B Testing
Viz: Top N customer to
likely to churn
Data Lake/
Agile Platform
CRMPurchase Data Call Center Tickets
Platform Layer
Application
Layer
Docker/VMs
App
Security/Auth
Auditing
Monitoring
Unstructured Data Structured Data
Scalable Job Execution / Query Engine
App REST
ETL
Query Interface
/Notebooks
Visualization Tools
Scheduling
Legacy
Systems
Business Users
Analysts/
Data Scientists
18.
Summary / Callfor Action
• Data science projects rarely fail because of insufficient modeling skills
• Focus on business value, deliver „good enough“ models first
• Deliver in small increments that already provide value end-to-end, present
in Sprint Reviews to all stakeholders
• Manage stakeholers using a clear product vision, a user story backlog and
release plans
• Deploy as early as possible to ensure user acceptance, declare as „beta“
mode
• Build an infrastructure that enables agile development