SlideShare a Scribd company logo
1
Continuous
Intelligence
Applying Continuous Delivery for
Machine Learning
Christoph Windheuser
Global Head of Artificial Intelligence
ThoughtWorks Inc.
Munich, May 8, 2019
©ThoughtWorks 2019
5000+ technologists with 40 offices in 14 countries
Partner for technology driven business transformation
©ThoughtWorks 2019
join.thoughtworks.com
#1
in Agile and
Continuous Delivery
100+
books written
©ThoughtWorks 2019
©ThoughtWorks 2019
TECHNIQUES
Continuous delivery
for machine
learning (CD4ML)
models
#8
TRIAL
8
©ThoughtWorks 2019
6
CONTINUOUS INTELLIGENCE
©ThoughtWorks 2019
©ThoughtWorks 2018 Commercial in Confidence
CONTINUOUS INTELLIGENCE CYCLE
©ThoughtWorks 2019
7
8
PRODUCTIONIZING ML IS HARD
● High number of changing artifacts
● Size and portability of the artifacts
● Different skills and working processes in the
workforce with “throw over the fence” attitude
● Serial and parallel processing
● Models must be continuously monitored and
improved
©ThoughtWorks 2019
DIFFERENT ARCHETYPES: MANY SOURCES OF CHANGE
9
ModelData Code
+ +
Schema
Sampling over Time
Volume
...
Research, Experiments
Training on Data
Accuracy
...
New Features
Bug Fixes
Performance
...
Icons created by Noura Mbarki and I Putu Kharismayadi from Noun Project
©ThoughtWorks 2019
10
CONTINUOUS INTEGRATION /
CONTINUOUS DELIVERY
©ThoughtWorks 2019
11
CI CD
CI / CD
©ThoughtWorks 2019
“Continuous Delivery is the ability to get changes of
all types — including new features, configuration
changes, bug fixes and experiments — into
production, or into the hands of users, safely and
quickly in a sustainable way.”
- Jez Humble & Dave Farley
12
©ThoughtWorks 2019
©ThoughtWorks 2019
PRINCIPLES OF CONTINUOUS DELIVERY
13
→ Create a Repeatable, Reliable Process for Releasing
Software
→ Automate Almost Everything
→ Build Quality In
→ Work in Small Batches
→ Keep Everything in Source Control
→ Done Means “Released”
→ Improve Continuously
14
CD4ML
©ThoughtWorks 2019
PUTTING EVERYTHING TOGETHER
15
Data Science,
Model
Building
Training Data
Source Code
+
Executables
Model
Evaluation
Productionize
Model
Integration
Testing
Deployment
Test Data
Model +
parameters
CD Tools and Repositories
DiscoverableandAccessibleData
Monitoring
©ThoughtWorks 2019
Production Data
WHAT DO WE NEED IN OUR STACK?
16
Doing CD with Machine Learning is still a hard problem
MODEL
PERFORMANCE
ASSESSMENT
VERSION
CONTROL AND
ARTIFACT
REPOSITORIES
©ThoughtWorks 2019
MONITORING
AND
OBSERVABILITY
DISCOVERABLE
AND
ACCESSIBLE
DATA
CONTINUOUS
DELIVERY
ORCHESTRATION
TO COMBINE
PIPELINES
INFRASTRUCTURE
FOR MULTIPLE
ENVIRONMENTS
AND
EXPERIMENTS
REAL WORLD EXAMPLE
17
There are many options for tools and technologies to implement CD4ML
©ThoughtWorks 2019
MACHINE LEARNING PIPELINE
18
©ThoughtWorks 2019
18
BASIC DATA SCIENCE WORKFLOW
19
Gather data and
extract features
Separate into
training and
validation sets
Train model and
evaluate
performance
©ThoughtWorks 2019
SALES FORECAST MODEL TRAINING PROCESS
20
Data
splitter.p
y
Training
Data
Validation
Data
decision_t
ree.py
model.pkl
evaluation.py
metrics.json
download_d
ata.py
©ThoughtWorks 2019
CHALLENGE 1: THESE ARE LARGE FILES!
21
Data
splitter.p
y
Training
Data
Validation
Data
decision_t
ree.py
model.pkl
evaluation.py
metrics.json
download_d
ata.py
©ThoughtWorks 2019
CHALLENGE 2: AD-HOC MULTI-STEP PROCESS
22
Data
splitter.p
y
Training
Data
Validation
Data
decision_t
ree.py
model.pkl
evaluation.py
metrics.json
download_d
ata.py
©ThoughtWorks 2019
● dvc is git porcelain for storing large files using cloud storage
● dvc connects model training steps to create reproducible workflows
SOLUTION: dvc
data science version control
23
master
change-max-depth
try-random-forest
model.pkl
decision_tree.py
model.pkl.dvc
©ThoughtWorks 2019
HOW DO WE TRACK EXPERIMENTS?
● Which experiments and hypothesis are being explored?
● Which algorithms are being used in each experiment?
● Which version of the code was used?
● How long does it take to run each experiment?
● What parameter and hyperparameters were used?
● How fast are my models learning?
● How do we compare results from different runs?
We need to track the scientific process and evaluate our models:
24
©ThoughtWorks 2019
24
25
©ThoughtWorks 2019
An Open Source platform for managing end-to-end machine learning lifecycle
DEPLOYMENT PIPELINE
26
©ThoughtWorks 2019
26
DEPLOYMENT PIPELINE
Automates the process of building, testing, and deploying applications to production
27
Application code in
version control
repository
Container image as
deployment
artifact
Deploy container
to production
servers
©ThoughtWorks 2019
28
©ThoughtWorks 2019
An Open Source Continuous Delivery server to model and visualise complex workflows
Pipeline Group
ANATOMY OF A GOCD PIPELINE
29
©ThoughtWorks 2019
MODEL MONITORING
30
©ThoughtWorks 2019
30
HOW TO LEARN CONTINUOUSLY?
● Track model usage
● Track model inputs
● Track model outputs to identify potential bias or overfit
● Track model fairness to understand how it behaves
against dimensions that could introduce unfair bias
We need to capture production data to improve our models
31
©ThoughtWorks 2019
31
EFK STACK
Monitoring and Observability infrastructure
32
Open Source data
collector for unified
logging
Open Source Search
Engine
Open Source web UI
to explore and
visualise data
©ThoughtWorks 2019
33
©ThoughtWorks 2019
An Open Source UI that makes it easy to explore and visualise the data index in Elasticsearch
34
REAL WORLD PROJECT EXAMPLES
©ThoughtWorks 2019
35
PRICE
ESTIMATION
©ThoughtWorks 2019
36
Chatbot Platform
©ThoughtWorks 2019
37
Workshop on the Strata Conference in London
(April 30, 2019)
©ThoughtWorks 2019
• 3h hands-on workshop
• Run the a real-world
scenario on participants
laptop
• Cloud Infrastructure
(GCP, Kubernetes,
GoCD Server)
was prepared
38
SUMMARY - WHAT HAVE WE LEARNED?
©ThoughtWorks 2019
CD4ML
● Proper data/model/code versioning tools enable reproducible work to be
done in parallel.
● We can put data science work into a Continuous Delivery (CD) workflow.
● Result: Continuous, on-demand AI development and deployment, from
research to production, with a single command.
● Benefit: production AI systems that are always as smart as your data
science team.
39
©ThoughtWorks 2019
4040
THANK YOU!
Christoph Windheuser
Global Head of Artificial Intelligence
ThoughtWorks Inc.
(cwindheu@thoughtworks.com)
©ThoughtWorks 2019
join.thoughtworks.com

More Related Content

What's hot

On the Opportunities of Scalable Modeling Technologies: An Experience Report ...
On the Opportunities of Scalable Modeling Technologies: An Experience Report ...On the Opportunities of Scalable Modeling Technologies: An Experience Report ...
On the Opportunities of Scalable Modeling Technologies: An Experience Report ...
abgolla
 
Fbe procurement event 23rd July 2015
Fbe procurement event 23rd July 2015Fbe procurement event 23rd July 2015
Fbe procurement event 23rd July 2015
FBE Manchester
 
Schweizer BIM Kongress 2016: Referat von David Philp, AECOM
Schweizer BIM Kongress 2016: Referat von David Philp, AECOMSchweizer BIM Kongress 2016: Referat von David Philp, AECOM
Schweizer BIM Kongress 2016: Referat von David Philp, AECOM
Bauen digital Schweiz
 
Engineering2050 CYBER ENGINEERING OBJECTS
Engineering2050 CYBER ENGINEERING OBJECTSEngineering2050 CYBER ENGINEERING OBJECTS
Engineering2050 CYBER ENGINEERING OBJECTS
Zentrifuge - Kommunikation, Kunst und Kultur e.V.
 
Engineering Trends - Sagar Zilpe
Engineering Trends - Sagar ZilpeEngineering Trends - Sagar Zilpe
Engineering Trends - Sagar Zilpe
Sagar Zilpe
 
Visualization of high dimensional data set
Visualization of high dimensional data setVisualization of high dimensional data set
Visualization of high dimensional data set
Aboul Ella Hassanien
 
Automotive Insulations Limited - the digital journey, Paul Walker
Automotive Insulations Limited - the digital journey, Paul WalkerAutomotive Insulations Limited - the digital journey, Paul Walker
Automotive Insulations Limited - the digital journey, Paul Walker
WMG centre High Value Manufacturing Catapult
 
Large Scale Additive Manufacturing and Construction
Large Scale Additive Manufacturing and Construction Large Scale Additive Manufacturing and Construction
Large Scale Additive Manufacturing and Construction
KTN
 
National Initiatives on AI
National Initiatives on AINational Initiatives on AI
National Initiatives on AI
Big Data Value Association
 
Aguila´s five level of process support through augmented reality
Aguila´s five level of process support through augmented realityAguila´s five level of process support through augmented reality
Aguila´s five level of process support through augmented reality
Kevin Eligio Aguila
 
Clean Growth Summit: Transforming Construction
Clean Growth Summit: Transforming ConstructionClean Growth Summit: Transforming Construction
Clean Growth Summit: Transforming Construction
KTN
 
NW BIM 2018 2 MGF
NW BIM 2018 2 MGFNW BIM 2018 2 MGF
NW BIM 2018 2 MGF
David Owens
 
BIM and PPP
BIM and PPPBIM and PPP
BIM and PPP
Be2camp Admin
 
CONSTRUCTION innovation programme - Presentation leaflet - CRP Henri Tudor
CONSTRUCTION innovation programme - Presentation leaflet - CRP Henri TudorCONSTRUCTION innovation programme - Presentation leaflet - CRP Henri Tudor
CONSTRUCTION innovation programme - Presentation leaflet - CRP Henri Tudor
CRP Henri Tudor
 
Trinity Daily May 28, 2018
Trinity Daily May 28, 2018Trinity Daily May 28, 2018
Trinity Daily May 28, 2018
Arun Surendran
 
CT News: Mobile Analysis: 3D and 2D CAD Viewer "to go"
CT News: Mobile Analysis: 3D and 2D CAD Viewer "to go"CT News: Mobile Analysis: 3D and 2D CAD Viewer "to go"
CT News: Mobile Analysis: 3D and 2D CAD Viewer "to go"
CORETECHNOLOGIE
 
Wrapping up_and_next_steps
Wrapping up_and_next_stepsWrapping up_and_next_steps
Wrapping up_and_next_steps
Archiver
 
Cwin16 tls-iot approach-applied_in_the_plm_domain
Cwin16 tls-iot approach-applied_in_the_plm_domainCwin16 tls-iot approach-applied_in_the_plm_domain
Cwin16 tls-iot approach-applied_in_the_plm_domain
Capgemini
 
Maschinenbaugipfel 2018 peter seeberg_maschinelles lernen_build or buy_180917
Maschinenbaugipfel 2018 peter seeberg_maschinelles lernen_build or buy_180917Maschinenbaugipfel 2018 peter seeberg_maschinelles lernen_build or buy_180917
Maschinenbaugipfel 2018 peter seeberg_maschinelles lernen_build or buy_180917
Peter Seeberg
 

What's hot (20)

On the Opportunities of Scalable Modeling Technologies: An Experience Report ...
On the Opportunities of Scalable Modeling Technologies: An Experience Report ...On the Opportunities of Scalable Modeling Technologies: An Experience Report ...
On the Opportunities of Scalable Modeling Technologies: An Experience Report ...
 
Fbe procurement event 23rd July 2015
Fbe procurement event 23rd July 2015Fbe procurement event 23rd July 2015
Fbe procurement event 23rd July 2015
 
Schweizer BIM Kongress 2016: Referat von David Philp, AECOM
Schweizer BIM Kongress 2016: Referat von David Philp, AECOMSchweizer BIM Kongress 2016: Referat von David Philp, AECOM
Schweizer BIM Kongress 2016: Referat von David Philp, AECOM
 
Engineering2050 CYBER ENGINEERING OBJECTS
Engineering2050 CYBER ENGINEERING OBJECTSEngineering2050 CYBER ENGINEERING OBJECTS
Engineering2050 CYBER ENGINEERING OBJECTS
 
Engineering Trends - Sagar Zilpe
Engineering Trends - Sagar ZilpeEngineering Trends - Sagar Zilpe
Engineering Trends - Sagar Zilpe
 
Visualization of high dimensional data set
Visualization of high dimensional data setVisualization of high dimensional data set
Visualization of high dimensional data set
 
Automotive Insulations Limited - the digital journey, Paul Walker
Automotive Insulations Limited - the digital journey, Paul WalkerAutomotive Insulations Limited - the digital journey, Paul Walker
Automotive Insulations Limited - the digital journey, Paul Walker
 
Large Scale Additive Manufacturing and Construction
Large Scale Additive Manufacturing and Construction Large Scale Additive Manufacturing and Construction
Large Scale Additive Manufacturing and Construction
 
National Initiatives on AI
National Initiatives on AINational Initiatives on AI
National Initiatives on AI
 
Aguila´s five level of process support through augmented reality
Aguila´s five level of process support through augmented realityAguila´s five level of process support through augmented reality
Aguila´s five level of process support through augmented reality
 
Oil and Gas
Oil and GasOil and Gas
Oil and Gas
 
Clean Growth Summit: Transforming Construction
Clean Growth Summit: Transforming ConstructionClean Growth Summit: Transforming Construction
Clean Growth Summit: Transforming Construction
 
NW BIM 2018 2 MGF
NW BIM 2018 2 MGFNW BIM 2018 2 MGF
NW BIM 2018 2 MGF
 
BIM and PPP
BIM and PPPBIM and PPP
BIM and PPP
 
CONSTRUCTION innovation programme - Presentation leaflet - CRP Henri Tudor
CONSTRUCTION innovation programme - Presentation leaflet - CRP Henri TudorCONSTRUCTION innovation programme - Presentation leaflet - CRP Henri Tudor
CONSTRUCTION innovation programme - Presentation leaflet - CRP Henri Tudor
 
Trinity Daily May 28, 2018
Trinity Daily May 28, 2018Trinity Daily May 28, 2018
Trinity Daily May 28, 2018
 
CT News: Mobile Analysis: 3D and 2D CAD Viewer "to go"
CT News: Mobile Analysis: 3D and 2D CAD Viewer "to go"CT News: Mobile Analysis: 3D and 2D CAD Viewer "to go"
CT News: Mobile Analysis: 3D and 2D CAD Viewer "to go"
 
Wrapping up_and_next_steps
Wrapping up_and_next_stepsWrapping up_and_next_steps
Wrapping up_and_next_steps
 
Cwin16 tls-iot approach-applied_in_the_plm_domain
Cwin16 tls-iot approach-applied_in_the_plm_domainCwin16 tls-iot approach-applied_in_the_plm_domain
Cwin16 tls-iot approach-applied_in_the_plm_domain
 
Maschinenbaugipfel 2018 peter seeberg_maschinelles lernen_build or buy_180917
Maschinenbaugipfel 2018 peter seeberg_maschinelles lernen_build or buy_180917Maschinenbaugipfel 2018 peter seeberg_maschinelles lernen_build or buy_180917
Maschinenbaugipfel 2018 peter seeberg_maschinelles lernen_build or buy_180917
 

Similar to CD4ML - ThoughtWorks MeetUp Munich Christoph Windheuser May 8th 2019

Continuous Delivery for Machine Learning
Continuous Delivery for Machine LearningContinuous Delivery for Machine Learning
Continuous Delivery for Machine Learning
Thoughtworks
 
Machine Learning for Product Managers
Machine Learning for Product ManagersMachine Learning for Product Managers
Machine Learning for Product Managers
Thoughtworks
 
Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)
Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)
Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)
Lex Toumbourou
 
Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ...
Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ...Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ...
Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ...
Thoughtworks
 
ProIndústria2018 - Sala Beta - A08
ProIndústria2018 - Sala Beta - A08 ProIndústria2018 - Sala Beta - A08
ProIndústria2018 - Sala Beta - A08
Evandro Gama (Prof. Dr.)
 
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
WMG centre High Value Manufacturing Catapult
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in Production
Dr. Arif Wider
 
CD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsCD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systems
Seldon
 
Implementing BDD at scale for agile and DevOps teams
Implementing BDD at scale for agile and DevOps teamsImplementing BDD at scale for agile and DevOps teams
Implementing BDD at scale for agile and DevOps teams
Laurent PY
 
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
Edge AI and Vision Alliance
 
An Engineering Digital Twin to Accelerate Time to Production
An Engineering Digital Twin to Accelerate Time to ProductionAn Engineering Digital Twin to Accelerate Time to Production
An Engineering Digital Twin to Accelerate Time to Production
aseptingfilling
 
CONFERENCIA: El impacto de la Tecnología en la optimización de la cadena de s...
CONFERENCIA: El impacto de la Tecnología en la optimización de la cadena de s...CONFERENCIA: El impacto de la Tecnología en la optimización de la cadena de s...
CONFERENCIA: El impacto de la Tecnología en la optimización de la cadena de s...
Ignasi Sayol
 
Lns enablinga smartconnectedsupplychain
Lns enablinga smartconnectedsupplychainLns enablinga smartconnectedsupplychain
Lns enablinga smartconnectedsupplychain
Kaizenlogcom
 
5 vuforia studio intro sales presentation
5 vuforia studio intro sales presentation5 vuforia studio intro sales presentation
5 vuforia studio intro sales presentation
Senthilkumar R
 
Value-driven business in the Cloud
Value-driven business in the CloudValue-driven business in the Cloud
Value-driven business in the Cloud
VTT Technical Research Centre of Finland Ltd
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
BigML, Inc
 
6. SRCI Profibus International v2.pdf
6. SRCI Profibus International v2.pdf6. SRCI Profibus International v2.pdf
6. SRCI Profibus International v2.pdf
PROFIBUS and PROFINET InternationaI - PI UK
 
Embracing the Factory of the Future
Embracing the Factory of the FutureEmbracing the Factory of the Future
Embracing the Factory of the Future
MYNDSHFT
 
How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...
Antje Barth
 
Model based systems engineering
Model based systems engineeringModel based systems engineering
Model based systems engineering
Capgemini
 

Similar to CD4ML - ThoughtWorks MeetUp Munich Christoph Windheuser May 8th 2019 (20)

Continuous Delivery for Machine Learning
Continuous Delivery for Machine LearningContinuous Delivery for Machine Learning
Continuous Delivery for Machine Learning
 
Machine Learning for Product Managers
Machine Learning for Product ManagersMachine Learning for Product Managers
Machine Learning for Product Managers
 
Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)
Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)
Emerging Best Practises for Machine Learning Engineering (Canberra Meetup edits)
 
Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ...
Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ...Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ...
Emerging Best Practises for Machine Learning Engineering- Lex Toumbourou (By ...
 
ProIndústria2018 - Sala Beta - A08
ProIndústria2018 - Sala Beta - A08 ProIndústria2018 - Sala Beta - A08
ProIndústria2018 - Sala Beta - A08
 
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
Deep learning enhanced digital twin for Closed-loop In-Process Quality Improv...
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in Production
 
CD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsCD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systems
 
Implementing BDD at scale for agile and DevOps teams
Implementing BDD at scale for agile and DevOps teamsImplementing BDD at scale for agile and DevOps teams
Implementing BDD at scale for agile and DevOps teams
 
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
“Data Versioning: Towards Reproducibility in Machine Learning,” a Presentatio...
 
An Engineering Digital Twin to Accelerate Time to Production
An Engineering Digital Twin to Accelerate Time to ProductionAn Engineering Digital Twin to Accelerate Time to Production
An Engineering Digital Twin to Accelerate Time to Production
 
CONFERENCIA: El impacto de la Tecnología en la optimización de la cadena de s...
CONFERENCIA: El impacto de la Tecnología en la optimización de la cadena de s...CONFERENCIA: El impacto de la Tecnología en la optimización de la cadena de s...
CONFERENCIA: El impacto de la Tecnología en la optimización de la cadena de s...
 
Lns enablinga smartconnectedsupplychain
Lns enablinga smartconnectedsupplychainLns enablinga smartconnectedsupplychain
Lns enablinga smartconnectedsupplychain
 
5 vuforia studio intro sales presentation
5 vuforia studio intro sales presentation5 vuforia studio intro sales presentation
5 vuforia studio intro sales presentation
 
Value-driven business in the Cloud
Value-driven business in the CloudValue-driven business in the Cloud
Value-driven business in the Cloud
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive SectorDutchMLSchool. ML for Energy Trading and Automotive Sector
DutchMLSchool. ML for Energy Trading and Automotive Sector
 
6. SRCI Profibus International v2.pdf
6. SRCI Profibus International v2.pdf6. SRCI Profibus International v2.pdf
6. SRCI Profibus International v2.pdf
 
Embracing the Factory of the Future
Embracing the Factory of the FutureEmbracing the Factory of the Future
Embracing the Factory of the Future
 
How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...
 
Model based systems engineering
Model based systems engineeringModel based systems engineering
Model based systems engineering
 

Recently uploaded

GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
abdulrafaychaudhry
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 

Recently uploaded (20)

GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Game Development with Unity3D (Game Development lecture 3)
Game Development  with Unity3D (Game Development lecture 3)Game Development  with Unity3D (Game Development lecture 3)
Game Development with Unity3D (Game Development lecture 3)
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 

CD4ML - ThoughtWorks MeetUp Munich Christoph Windheuser May 8th 2019

  • 1. 1 Continuous Intelligence Applying Continuous Delivery for Machine Learning Christoph Windheuser Global Head of Artificial Intelligence ThoughtWorks Inc. Munich, May 8, 2019 ©ThoughtWorks 2019
  • 2. 5000+ technologists with 40 offices in 14 countries Partner for technology driven business transformation ©ThoughtWorks 2019 join.thoughtworks.com
  • 3. #1 in Agile and Continuous Delivery 100+ books written ©ThoughtWorks 2019
  • 5. TECHNIQUES Continuous delivery for machine learning (CD4ML) models #8 TRIAL 8 ©ThoughtWorks 2019
  • 7. ©ThoughtWorks 2018 Commercial in Confidence CONTINUOUS INTELLIGENCE CYCLE ©ThoughtWorks 2019 7
  • 8. 8 PRODUCTIONIZING ML IS HARD ● High number of changing artifacts ● Size and portability of the artifacts ● Different skills and working processes in the workforce with “throw over the fence” attitude ● Serial and parallel processing ● Models must be continuously monitored and improved ©ThoughtWorks 2019
  • 9. DIFFERENT ARCHETYPES: MANY SOURCES OF CHANGE 9 ModelData Code + + Schema Sampling over Time Volume ... Research, Experiments Training on Data Accuracy ... New Features Bug Fixes Performance ... Icons created by Noura Mbarki and I Putu Kharismayadi from Noun Project ©ThoughtWorks 2019
  • 10. 10 CONTINUOUS INTEGRATION / CONTINUOUS DELIVERY ©ThoughtWorks 2019
  • 11. 11 CI CD CI / CD ©ThoughtWorks 2019
  • 12. “Continuous Delivery is the ability to get changes of all types — including new features, configuration changes, bug fixes and experiments — into production, or into the hands of users, safely and quickly in a sustainable way.” - Jez Humble & Dave Farley 12 ©ThoughtWorks 2019
  • 13. ©ThoughtWorks 2019 PRINCIPLES OF CONTINUOUS DELIVERY 13 → Create a Repeatable, Reliable Process for Releasing Software → Automate Almost Everything → Build Quality In → Work in Small Batches → Keep Everything in Source Control → Done Means “Released” → Improve Continuously
  • 15. PUTTING EVERYTHING TOGETHER 15 Data Science, Model Building Training Data Source Code + Executables Model Evaluation Productionize Model Integration Testing Deployment Test Data Model + parameters CD Tools and Repositories DiscoverableandAccessibleData Monitoring ©ThoughtWorks 2019 Production Data
  • 16. WHAT DO WE NEED IN OUR STACK? 16 Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT VERSION CONTROL AND ARTIFACT REPOSITORIES ©ThoughtWorks 2019 MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS
  • 17. REAL WORLD EXAMPLE 17 There are many options for tools and technologies to implement CD4ML ©ThoughtWorks 2019
  • 19. BASIC DATA SCIENCE WORKFLOW 19 Gather data and extract features Separate into training and validation sets Train model and evaluate performance ©ThoughtWorks 2019
  • 20. SALES FORECAST MODEL TRAINING PROCESS 20 Data splitter.p y Training Data Validation Data decision_t ree.py model.pkl evaluation.py metrics.json download_d ata.py ©ThoughtWorks 2019
  • 21. CHALLENGE 1: THESE ARE LARGE FILES! 21 Data splitter.p y Training Data Validation Data decision_t ree.py model.pkl evaluation.py metrics.json download_d ata.py ©ThoughtWorks 2019
  • 22. CHALLENGE 2: AD-HOC MULTI-STEP PROCESS 22 Data splitter.p y Training Data Validation Data decision_t ree.py model.pkl evaluation.py metrics.json download_d ata.py ©ThoughtWorks 2019
  • 23. ● dvc is git porcelain for storing large files using cloud storage ● dvc connects model training steps to create reproducible workflows SOLUTION: dvc data science version control 23 master change-max-depth try-random-forest model.pkl decision_tree.py model.pkl.dvc ©ThoughtWorks 2019
  • 24. HOW DO WE TRACK EXPERIMENTS? ● Which experiments and hypothesis are being explored? ● Which algorithms are being used in each experiment? ● Which version of the code was used? ● How long does it take to run each experiment? ● What parameter and hyperparameters were used? ● How fast are my models learning? ● How do we compare results from different runs? We need to track the scientific process and evaluate our models: 24 ©ThoughtWorks 2019 24
  • 25. 25 ©ThoughtWorks 2019 An Open Source platform for managing end-to-end machine learning lifecycle
  • 27. DEPLOYMENT PIPELINE Automates the process of building, testing, and deploying applications to production 27 Application code in version control repository Container image as deployment artifact Deploy container to production servers ©ThoughtWorks 2019
  • 28. 28 ©ThoughtWorks 2019 An Open Source Continuous Delivery server to model and visualise complex workflows
  • 29. Pipeline Group ANATOMY OF A GOCD PIPELINE 29 ©ThoughtWorks 2019
  • 31. HOW TO LEARN CONTINUOUSLY? ● Track model usage ● Track model inputs ● Track model outputs to identify potential bias or overfit ● Track model fairness to understand how it behaves against dimensions that could introduce unfair bias We need to capture production data to improve our models 31 ©ThoughtWorks 2019 31
  • 32. EFK STACK Monitoring and Observability infrastructure 32 Open Source data collector for unified logging Open Source Search Engine Open Source web UI to explore and visualise data ©ThoughtWorks 2019
  • 33. 33 ©ThoughtWorks 2019 An Open Source UI that makes it easy to explore and visualise the data index in Elasticsearch
  • 34. 34 REAL WORLD PROJECT EXAMPLES ©ThoughtWorks 2019
  • 37. 37 Workshop on the Strata Conference in London (April 30, 2019) ©ThoughtWorks 2019 • 3h hands-on workshop • Run the a real-world scenario on participants laptop • Cloud Infrastructure (GCP, Kubernetes, GoCD Server) was prepared
  • 38. 38 SUMMARY - WHAT HAVE WE LEARNED? ©ThoughtWorks 2019
  • 39. CD4ML ● Proper data/model/code versioning tools enable reproducible work to be done in parallel. ● We can put data science work into a Continuous Delivery (CD) workflow. ● Result: Continuous, on-demand AI development and deployment, from research to production, with a single command. ● Benefit: production AI systems that are always as smart as your data science team. 39 ©ThoughtWorks 2019
  • 40. 4040 THANK YOU! Christoph Windheuser Global Head of Artificial Intelligence ThoughtWorks Inc. (cwindheu@thoughtworks.com) ©ThoughtWorks 2019 join.thoughtworks.com