This document discusses using Scrum frameworks for data science projects. It outlines typical data science project plans that have a high risk of failure due to long durations for each phase. An alternative is presented using Scrum which involves shorter sprint cycles for business understanding, data preparation, model creation, evaluation, and deployment. The roles, events, and artifacts of Scrum like kanban boards, standup meetings, and sprint reviews are described. Improving collaboration and adaptability through a unified team using one environment is advocated over current practices with data silos.
4. Typical 2018 Data Science
Requirement Proposal
Business
Understanding
Data Preparation
Create Model
Evaluation
Deployment
2 Weeks
1 Months
1 Months
1 Months
1 Months
Common Data Science Project Plan
That has High Risk to Fail
5. Typical 2018 Data Science
Requirement Proposal
Business
Understanding
Data Preparation
Create Model
Evaluation
Deployment
2 Weeks
1 Months
1 Months
1 Months
1 Months
Common Data Science Project Plan
That has High Risk to Fail
6. Scrum Data Science
Framework
For complex adaptive problems
Unified
Stats, technology, Data Analysist,
Business Knowledge
To understand and analyze actual
phenomena with data
7. Big Data
Google Trends History of Data Science
Popularity
Today 2018
Timeline
Big Data
Google Data
Worldwide
2014 - Now
8. Big Data
Google Trends History of Data Science
Popularity
Today 2018
Timeline
Hadoop
Hadoop
Big Data
2005
Google Data
Worldwide
2014 - Now
9. Big Data
Google Trends History of Data Science
Popularity
Today 2018
Timeline
Hadoop
Hadoop
Big Data
Data Science
Data Science
Google Data
Worldwide
2014 - Now
10. Big Data
Google Trends History of Data Science
Popularity
Today 2018
Timeline
Hadoop
Hadoop
Big Data
Data Science
Data Science
Software Development
Scrum
Software Development
Scrum
Google Data
Worldwide
2014 - Now
11. SCRUM
What Works and Doesn’t in Data
Science Activity
https://www.scrumguides.org/
12. 3 Pillars of Scrum
Transparency Inspection Adaptive
13. 3 Pillars of Scrum
Transparency Inspection Adaptive
14. Data Scientists
What my Mom thinks I do What my Boss/Client think I do
What I think I do What I Actually do
16. “A Data Scientist is that a unique blend
of skills that can both unlock the
insights of data and tell a fantastic story
via the data”
-- DJ Patil --
DJ Patil, former Linkedin and White House Data Scientist. Together
with Jeff H (former Facebook) invent the term Data Scientist in 2011
What is Data Scientists?
17. “A Data Scientist is that a unique blend
of skills that can both unlock the
insights of data and tell a fantastic story
via the data”
-- DJ Patil --
DJ Patil, former Linkedin and White House Data Scientist. Together
with Jeff H (former Facebook) invent the term Data Scientist in 2011
What is Data Scientists?
20. ROLESData Engineer
Data Scientist
Business Analysts
Product Owner
Scrum Master
Business UnitBusiness
Manager
External Party
Development Team
Assist
Presentation
Assist
Roles
Scrum Data Science Team
21. ROLESData Engineer
Data Scientist
Business Analysts
Product Owner
Scrum Master
Business UnitBusiness
Manager
External Party
Development Team
Assist
Presentation
Assist
Roles
Scrum Data Science Team
30. Project Goals
Data
Business
Problem Data Science Team
Graph Analytics
Text Analytics
Path Analytics
Machine Learning
Define
Business Problem
Provide
Data Science Team
Insights,
Recommendations
and Workflow
Doing
Data Exploration
Deliver
Insights
DataLabs AGILE ANALYTICS Service
Contact for Engagement : adi@datalabs.id
We help company to :
33. Data Engineer
Data Scientist 1
Big Data Environment
Create ETL job
To extract sample data
To csv
Use FTP or even USB to transfer the data
Jupyter
Notebook
Data Scientist 2
R Studio
Jupyter Notebook to Data Engineer
(Again sometimes using USB)
• Rewrite
notebook to
scripts
• Create API with
other language
• Deploy
Data Science Team 2018
Life Cycle
34. Data Engineer
Data Scientist 1
Big Data Environment
Create ETL job
To extract sample data
To csv
Use FTP or even USB to transfer the data
Jupyter
Notebook
Data Scientist 2
R Studio
Jupyter Notebook to Data Engineer
(Again sometimes using USB)
• Rewrite
notebook to
scripts
• Create API with
other language
• Deploy
Data Science Team 2018
Life Cycle
35. Data Engineer
Big Data Environment
Ideal Data Science
Life Cycle
Data Scientist 1 Data Scientist 2
Analytics
Environment
Maintain DataLake
Maintain Production Model
Optimize Performance
Experimentation on Big Data
Create Model
Evaluate & Deploy
✓ One Environment
✓ Self Organizing
✓ Cross-functional
36. Data Engineer
Cloudera Hadoop
Ideal Data Science
Life Cycle
Data Scientist 1 Data Scientist 2
Maintain DataLake
Maintain Production Model
Optimize Performance
Experimentation on Big Data
Create Model
Evaluate & Deploy
✓ One Environment
✓ Self Organizing
✓ Cross-functional
Data Science Workbench