@MargrietGr
The Convergence of Data Science
and Software Development
Margriet Groenendijk
Developer Advocate | IBM
25 April 2018 | IP EXPO Manchester
@MargrietGr
Artificial Intelligence
(Big)
Data
Data Science
Deep
Learning
Machine Learning
Machine learning
Algorithm selection
@MargrietGr
Machine learning
Algorithm selection
Deep learning
Neural network design
@MargrietGr
Machine learning
Algorithm selection
Deep learning
Neural network design
Artificial intelligence
Systems architecture
@MargrietGr
Data science
Extract insights from data
@MargrietGr
Data science
Extract insights from data
Use historical data to train a model
and then make predictions
@MargrietGr
@MargrietGr
Output
Loan
Approval
Model
Train
Algorithm
John X
§ credit_score=800
§ age=25
§ income=$900,000
§ works in Oil & Gas
Historical Loans
Label
Approve
@MargrietGr
Approve
Output
Reject
James X
§ credit_score=900
§ age=55
§ income=$1,200,000
§ works in Insurance
New Applicant
Loan
Approval
Model
Deep Learning – Imaginary celebrities
@MargrietGr https://arxiv.org/abs/1710.10196
Data
@MargrietGr https://unsplash.com/photos/vpR0oc4X8Mk
Data Scientists vs. Developers
@MargrietGr
Data Scientists Developers
Data Static data Dynamic databases
Code Python, R JavaScript
Platform Notebooks Text editors
Design Models Web apps
https://unsplash.com/photos/AtgRjx271ks@MargrietGr
Data Science is a Team Sport
@MargrietGr
Extract Data
Data
Engineer
Prepare
Data
Data
Scientist
Build & train
models
Data
Scientist
Evaluate
Business
Analyst
Deploy DevOps
Use models Developer
Monitor DevOps
Data Scientist
@MargrietGr
Extract Data
Data
Engineer
Prepare
Data
Data
Scientist
Build & train
models
Data
Scientist
Evaluate
Business
Analyst
Deploy DevOps
Use models Developer
Monitor DevOps
Software Developer
@MargrietGr
Extract Data
Data
Engineer
Prepare
Data
Data
Scientist
Build & train
models
Data
Scientist
Evaluate
Business
Analyst
Deploy DevOps
Use models Developer
Monitor DevOps
LocalCart
https://github.com/ibm-watson-data-lab/localcart-at-think-conf
@MargrietGr
The LocalCart Project
@MargrietGr
Customer behaviour information, such as
demographics, shopping cart values
A recommendation engine to encourage
additional purchases based on past buying
behaviour
Extract & Prepare Data
@MargrietGr
Explore Data
@MargrietGr
Jupyter
Notebooks
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
@MargrietGr
notebook.ipynb
@MargrietGr
Jupyter
Notebooks
For Data Scientists and
Developers
@MargrietGr
https://ibm-watson-data-lab.github.io/pixiedust/
PixieDust - open-source Python library for
Jupyter notebook to load and visualize data
@MargrietGr
PixieDust - open-source Python library for
Jupyter notebook to load and visualize data
PixieDebugger – first visual debugging tool for
Jupyter notebooks
@MargrietGr
PixieDust - open-source Python library for
Jupyter notebook to load and visualize data
PixieDebugger – first visual debugging tool for
Jupyter notebooks
PixieApps - create dashboards in a notebook
@MargrietGr
PixieDust - open-source Python library for
Jupyter notebook to load and visualize data
PixieDebugger – first visual debugging tool for
Jupyter notebooks
PixieApps - create dashboards in a notebook
PixieGateway - run charts or PixieApps as
standalone web applications
@MargrietGr
The LocalCart Project
@MargrietGr
Customer behaviour information, such as
demographics, shopping cart values
A recommendation engine to encourage
additional purchases based on past buying
behaviour
Display data as a table
@MargrietGr
Display data as a chart
@MargrietGr
Display data as a chart
@MargrietGr
Display data as a chart
@MargrietGr
@MargrietGr
Customers by age in
a histogram
@MargrietGr
Customers
clustered
by gender
@MargrietGr
Customers by state in
a map
Combine customer
data with
Census open data of
income by zip code
@MargrietGr
Combine customer
data with
Census open data of
income by zip code
@MargrietGr
The LocalCart Project
@MargrietGr
Customer behaviour information, such as
demographics, shopping cart values
A recommendation engine to encourage
additional purchases based on past buying
behaviour
Build, train and deploy a model
@MargrietGr
Build, train and deploy a model
1. Prepare data in notebook
2. Build and train model in notebook
3. Deploy to Watson Machine Learning
4. Test from anywhere
@MargrietGr
@MargrietGr
Test interactively with a PixieApp
@MargrietGr
PixieDebugger
@MargrietGr
PixieApp example – Weather Forecast
@MargrietGr
Continuous
Learning
https://medium.com/ibm-watson-data-lab/keeping-your-machine-
learning-models-up-to-date-f1ead546591b
@MargrietGr
@MargrietGr
1. Load training
data
2. Build
model
@MargrietGr
1. Load training
data
2. Build
model
3. Deploy model
@MargrietGr
1. Load training
data
2. Build
model
3. Deploy model
4. Feedback data
@MargrietGr
1. Load training
data
2. Build
model
3. Deploy model
4. Feedback data
5. Feedback
evaluation
@MargrietGr
1. Load training
data
2. Build
model
3. Deploy model
4. Feedback data
5. Feedback
evaluation
6. If accuracy too low, retrain
model with all data
@MargrietGr
1. Load training
data
2. Build
model
3. Deploy model
4. Feedback data
5. Feedback
evaluation
6. If accuracy too low, retrain
model with all data
7. Deploy new model if
improved accuracy
@MargrietGr
The Convergence of Data Science
and Software Development
Summary
@MargrietGr
Data scientists use a Python notebook to load, enrich,
analyze data, and create analytics
Summary
@MargrietGr
Data scientists use a Python notebook to load, enrich,
analyze data, and create analytics
From the same notebook, developers create a PixieApp
to operationalize these analytics
Summary
@MargrietGr
Data scientists use a Python notebook to load, enrich,
analyze data, and create analytics
From the same notebook, developers create a PixieApp
to operationalize these analytics
Developers publish the PixieApp as a web application
Summary
@MargrietGr
Data scientists use a Python notebook to load, enrich,
analyze data, and create analytics
From the same notebook, developers create a PixieApp
to operationalize these analytics
Developers publish the PixieApp as a web application
The PixieApp can be viewed interactively by line-of-
business users with no need to access the notebook
Summary
@MargrietGr
Data scientists use a Python notebook to load, enrich,
analyze data, and create analytics
From the same notebook, developers create a PixieApp
to operationalize these analytics
Developers publish the PixieApp as a web application
The PixieApp can be viewed interactively by line-of-
business users with no need to access the notebook
Data scientist and developers both work in the same
Cloud
Benefits
of bringing the right tools into
the Data Science work flow
@MargrietGr
• Competitive advantage
• Discover new insights
• Real-time decision making
• Reduce complexity and
lower cost
• Accelerate time to market
and deployment of data
science and analytics
Benefits
of bringing the right tools into
the Data Science work flow
ONE GOAL Develop data
driven applications
Data science is maturing
NOW is the time for
integration with software
development workflow
@MargrietGr
• Competitive advantage
• Discover new insights
• Real-time decision making
• Reduce complexity and
lower cost
• Accelerate time to market
and deployment of data
science and analytics
Thank you!
@MargrietGr
Margriet Groenendijk
mgroenen@uk.ibm.com
@MargrietGr
https://www.linkedin.com/in/
margrietgroenendijk/
Slides
https://www.slideshare.net/MargrietGroenen
dijk/presentations
Blog
https://medium.com/ibm-watson-data-lab
PixieDust
https://ibm-watson-data-
lab.github.io/pixiedust/
IBM Watson Studio
https://www.ibm.com/cloud/watson-studio
IBM Cloud
https://ibm.com/cloud

The Convergence of Data Science and Software Development