The Convergence of Data Science and Software Development
1. @MargrietGr
The Convergence of Data Science
and Software Development
Margriet Groenendijk
Developer Advocate | IBM
25 April 2018 | IP EXPO Manchester
12. Data Scientists vs. Developers
@MargrietGr
Data Scientists Developers
Data Static data Dynamic databases
Code Python, R JavaScript
Platform Notebooks Text editors
Design Models Web apps
14. Data Science is a Team Sport
@MargrietGr
Extract Data
Data
Engineer
Prepare
Data
Data
Scientist
Build & train
models
Data
Scientist
Evaluate
Business
Analyst
Deploy DevOps
Use models Developer
Monitor DevOps
18. The LocalCart Project
@MargrietGr
Customer behaviour information, such as
demographics, shopping cart values
A recommendation engine to encourage
additional purchases based on past buying
behaviour
33. PixieDust - open-source Python library for
Jupyter notebook to load and visualize data
@MargrietGr
34. PixieDust - open-source Python library for
Jupyter notebook to load and visualize data
PixieDebugger – first visual debugging tool for
Jupyter notebooks
@MargrietGr
35. PixieDust - open-source Python library for
Jupyter notebook to load and visualize data
PixieDebugger – first visual debugging tool for
Jupyter notebooks
PixieApps - create dashboards in a notebook
@MargrietGr
36. PixieDust - open-source Python library for
Jupyter notebook to load and visualize data
PixieDebugger – first visual debugging tool for
Jupyter notebooks
PixieApps - create dashboards in a notebook
PixieGateway - run charts or PixieApps as
standalone web applications
@MargrietGr
37. The LocalCart Project
@MargrietGr
Customer behaviour information, such as
demographics, shopping cart values
A recommendation engine to encourage
additional purchases based on past buying
behaviour
47. The LocalCart Project
@MargrietGr
Customer behaviour information, such as
demographics, shopping cart values
A recommendation engine to encourage
additional purchases based on past buying
behaviour
49. Build, train and deploy a model
1. Prepare data in notebook
2. Build and train model in notebook
3. Deploy to Watson Machine Learning
4. Test from anywhere
@MargrietGr
59. @MargrietGr
1. Load training
data
2. Build
model
3. Deploy model
4. Feedback data
5. Feedback
evaluation
6. If accuracy too low, retrain
model with all data
60. @MargrietGr
1. Load training
data
2. Build
model
3. Deploy model
4. Feedback data
5. Feedback
evaluation
6. If accuracy too low, retrain
model with all data
7. Deploy new model if
improved accuracy
63. Summary
@MargrietGr
Data scientists use a Python notebook to load, enrich,
analyze data, and create analytics
From the same notebook, developers create a PixieApp
to operationalize these analytics
64. Summary
@MargrietGr
Data scientists use a Python notebook to load, enrich,
analyze data, and create analytics
From the same notebook, developers create a PixieApp
to operationalize these analytics
Developers publish the PixieApp as a web application
65. Summary
@MargrietGr
Data scientists use a Python notebook to load, enrich,
analyze data, and create analytics
From the same notebook, developers create a PixieApp
to operationalize these analytics
Developers publish the PixieApp as a web application
The PixieApp can be viewed interactively by line-of-
business users with no need to access the notebook
66. Summary
@MargrietGr
Data scientists use a Python notebook to load, enrich,
analyze data, and create analytics
From the same notebook, developers create a PixieApp
to operationalize these analytics
Developers publish the PixieApp as a web application
The PixieApp can be viewed interactively by line-of-
business users with no need to access the notebook
Data scientist and developers both work in the same
Cloud
67. Benefits
of bringing the right tools into
the Data Science work flow
@MargrietGr
• Competitive advantage
• Discover new insights
• Real-time decision making
• Reduce complexity and
lower cost
• Accelerate time to market
and deployment of data
science and analytics
68. Benefits
of bringing the right tools into
the Data Science work flow
ONE GOAL Develop data
driven applications
Data science is maturing
NOW is the time for
integration with software
development workflow
@MargrietGr
• Competitive advantage
• Discover new insights
• Real-time decision making
• Reduce complexity and
lower cost
• Accelerate time to market
and deployment of data
science and analytics