SlideShare a Scribd company logo
1
DATA SCIENCE,
DELIVERED CONTINUOUSLY
Arif Wider & Christian Deger
@arifwider @cdeger
Christian Deger
Chief Architect
cdeger@autoscout24.com
@cdeger
Dr. Arif Wider
Senior Consultant/Developer
awider@thoughtworks.com
@arifwider
PL
S
RUS
UA
RO
CZ
D
NL
B
F
A
HR
I
E
BG
TR
18countries
2.4m+cars & motos
10m+users per
month
The task: A consumer-facing data product
5GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
The task: A consumer-facing data product
6GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
The task: A consumer-facing data product
7GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
The prediction model: Random forest
8
Car listings of
last two years
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Volkswagen Golf
How to turn an R-based prediction model
into a high-performance web application?
9
?
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
How to turn an R-based prediction model
into a high-performance web application?
10GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
How to turn an R-based prediction model
into a high-performance web application?
11GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
How to turn an R-based prediction model
into a high-performance web application?
12
 Continuous Delivery!
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Application code in
one repository per
service.
Typical delivery pipeline
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Application code in
one repository per
service.
CI
Deployment package
as artifact.
Typical delivery pipeline
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Application code in
one repository per
service.
CI
Deployment package
as artifact.
CD
Deliver package to
servers
Typical delivery pipeline
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Continuous delivery pipelines
16
Prediction Model Pipeline
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Continuous delivery pipelines
17
Prediction Model Pipeline
Web Application Pipeline
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
The price for CD: Extensive model validation
18GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
The price for CD: Extensive model validation
19GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Lessons learned
20
Form a cross-functional team of
data scientists & software engineers!
Software engineers
… learn how data scientists work
… and understand the quirks of a prediction model
Data Scientist
… learn about unit testing, stable interfaces, git, etc.
... get quick feedback about the impact of their work
 Model and product iterations become much faster!
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Lessons learned
21
Generating gigabytes of Java code
is a challenge for the JVM
Use the G1 garbage collector
Turn off Tiered Compilation
 Do extensive warm-ups
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Lessons learned – Warm up
22GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Lessons learned
23
The approach of applying Continuous Delivery to
Data Science is useful independently of the tech
 Successfully applied similarly to a Python- and
Spark-based project
 Even more useful when quick model evolution
is required because of rapidly changing inputs
(e.g. user interaction)
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Conclusions
24
 Continuous Delivery allows us to bring prediction
model changes live very quickly.
 Only extensive automated end-to-end tests
provide confidence to deploy to production
automatically.
 Java code generation allows for very low response
times and excellent scalability for high loads but
requires plenty of memory.
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Conclusions: Price evaluation everywhere
25GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Conclusions: Price evaluation everywhere
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger 26
Conclusions: Price evaluation everywhere
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
Conclusions: Price evaluation everywhere
GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
29
THANK YOU
QUESTIONS?
Arif Wider & Christian Deger
@arifwider @cdeger
Data Science, Delivered Continuously @ GOTO Berlin 2017

More Related Content

Recently uploaded

🚂🚘 Premium Girls Call Ranchi 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Ranchi  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Ranchi  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Ranchi 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
bahubalikumar09988
 
UMiami degree offer diploma Transcript
UMiami degree offer diploma TranscriptUMiami degree offer diploma Transcript
UMiami degree offer diploma Transcript
attueb
 
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
confluent
 
08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching
quanhoangd129
 
02. Ruby Basic slides - Ruby Core Teaching
02. Ruby Basic slides - Ruby Core Teaching02. Ruby Basic slides - Ruby Core Teaching
02. Ruby Basic slides - Ruby Core Teaching
quanhoangd129
 
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
shanihomely
 
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
norina2645
 
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
3610stuck
 
BATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databasesBATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databases
BATbern
 
Authentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptxAuthentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptx
DEMONDUOS
 
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
singhlata50dh
 
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing ToolsOld Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Benjamin Bischoff
 
06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching
quanhoangd129
 
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
revolutionary575
 
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
andrehoraa
 
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
87tomato
 
09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching
quanhoangd129
 
Fantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdfFantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdf
6m9p7qnjj8
 
Fix Production Bugs Quickly - The Power of Structured Logging in Ruby on Rail...
Fix Production Bugs Quickly - The Power of Structured Logging in Ruby on Rail...Fix Production Bugs Quickly - The Power of Structured Logging in Ruby on Rail...
Fix Production Bugs Quickly - The Power of Structured Logging in Ruby on Rail...
John Gallagher
 
Empowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - GrawlixEmpowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - Grawlix
Aarisha Shaikh
 

Recently uploaded (20)

🚂🚘 Premium Girls Call Ranchi 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Ranchi  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Ranchi  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Ranchi 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
 
UMiami degree offer diploma Transcript
UMiami degree offer diploma TranscriptUMiami degree offer diploma Transcript
UMiami degree offer diploma Transcript
 
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
 
08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching
 
02. Ruby Basic slides - Ruby Core Teaching
02. Ruby Basic slides - Ruby Core Teaching02. Ruby Basic slides - Ruby Core Teaching
02. Ruby Basic slides - Ruby Core Teaching
 
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
 
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
 
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
Mumbai Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service A...
 
BATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databasesBATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databases
 
Authentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptxAuthentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptx
 
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
High Girls Call Chennai 000XX00000 Provide Best And Top Girl Service And No1 ...
 
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing ToolsOld Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
 
06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching
 
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
 
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
 
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
Verified Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeli...
 
09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching
 
Fantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdfFantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdf
 
Fix Production Bugs Quickly - The Power of Structured Logging in Ruby on Rail...
Fix Production Bugs Quickly - The Power of Structured Logging in Ruby on Rail...Fix Production Bugs Quickly - The Power of Structured Logging in Ruby on Rail...
Fix Production Bugs Quickly - The Power of Structured Logging in Ruby on Rail...
 
Empowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - GrawlixEmpowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - Grawlix
 

Featured

2024 Trend Updates: What Really Works In SEO & Content Marketing
2024 Trend Updates: What Really Works In SEO & Content Marketing2024 Trend Updates: What Really Works In SEO & Content Marketing
2024 Trend Updates: What Really Works In SEO & Content Marketing
Search Engine Journal
 
Storytelling For The Web: Integrate Storytelling in your Design Process
Storytelling For The Web: Integrate Storytelling in your Design ProcessStorytelling For The Web: Integrate Storytelling in your Design Process
Storytelling For The Web: Integrate Storytelling in your Design Process
Chiara Aliotta
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
OECD Directorate for Financial and Enterprise Affairs
 
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
SocialHRCamp
 
2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
Marius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
Expeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
Pixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 

Featured (20)

2024 Trend Updates: What Really Works In SEO & Content Marketing
2024 Trend Updates: What Really Works In SEO & Content Marketing2024 Trend Updates: What Really Works In SEO & Content Marketing
2024 Trend Updates: What Really Works In SEO & Content Marketing
 
Storytelling For The Web: Integrate Storytelling in your Design Process
Storytelling For The Web: Integrate Storytelling in your Design ProcessStorytelling For The Web: Integrate Storytelling in your Design Process
Storytelling For The Web: Integrate Storytelling in your Design Process
 
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
Artificial Intelligence, Data and Competition – SCHREPEL – June 2024 OECD dis...
 
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
How to Leverage AI to Boost Employee Wellness - Lydia Di Francesco - SocialHR...
 
2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 

Data Science, Delivered Continuously @ GOTO Berlin 2017

  • 1. 1 DATA SCIENCE, DELIVERED CONTINUOUSLY Arif Wider & Christian Deger @arifwider @cdeger
  • 3. Dr. Arif Wider Senior Consultant/Developer awider@thoughtworks.com @arifwider
  • 5. The task: A consumer-facing data product 5GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 6. The task: A consumer-facing data product 6GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 7. The task: A consumer-facing data product 7GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 8. The prediction model: Random forest 8 Car listings of last two years GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger Volkswagen Golf
  • 9. How to turn an R-based prediction model into a high-performance web application? 9 ? GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 10. How to turn an R-based prediction model into a high-performance web application? 10GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 11. How to turn an R-based prediction model into a high-performance web application? 11GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 12. How to turn an R-based prediction model into a high-performance web application? 12  Continuous Delivery! GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 13. Application code in one repository per service. Typical delivery pipeline GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 14. Application code in one repository per service. CI Deployment package as artifact. Typical delivery pipeline GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 15. Application code in one repository per service. CI Deployment package as artifact. CD Deliver package to servers Typical delivery pipeline GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 16. Continuous delivery pipelines 16 Prediction Model Pipeline GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 17. Continuous delivery pipelines 17 Prediction Model Pipeline Web Application Pipeline GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 18. The price for CD: Extensive model validation 18GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 19. The price for CD: Extensive model validation 19GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 20. Lessons learned 20 Form a cross-functional team of data scientists & software engineers! Software engineers … learn how data scientists work … and understand the quirks of a prediction model Data Scientist … learn about unit testing, stable interfaces, git, etc. ... get quick feedback about the impact of their work  Model and product iterations become much faster! GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 21. Lessons learned 21 Generating gigabytes of Java code is a challenge for the JVM Use the G1 garbage collector Turn off Tiered Compilation  Do extensive warm-ups GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 22. Lessons learned – Warm up 22GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 23. Lessons learned 23 The approach of applying Continuous Delivery to Data Science is useful independently of the tech  Successfully applied similarly to a Python- and Spark-based project  Even more useful when quick model evolution is required because of rapidly changing inputs (e.g. user interaction) GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 24. Conclusions 24  Continuous Delivery allows us to bring prediction model changes live very quickly.  Only extensive automated end-to-end tests provide confidence to deploy to production automatically.  Java code generation allows for very low response times and excellent scalability for high loads but requires plenty of memory. GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 25. Conclusions: Price evaluation everywhere 25GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 26. Conclusions: Price evaluation everywhere GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger 26
  • 27. Conclusions: Price evaluation everywhere GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 28. Conclusions: Price evaluation everywhere GOTO Berlin 2017 Data Science, Delivered Continuously – A. Wider & C. Deger
  • 29. 29 THANK YOU QUESTIONS? Arif Wider & Christian Deger @arifwider @cdeger

Editor's Notes

  1. A This is Christian - Christian is AutoScout24‘s chief architect but he actually joined AutoScout as a mere Developer and then made his way tohis current role as a Coding Architect. At AutoScout we (TW) have worked a lot with Christian and I think I can say that we‘ve enjoyed each others company quite a bit
  2. C is a developer at TW Germany where Scala is his language of choice, particularly in the context of Big Data applications Before joining TW he has been in academia doing research on applying FP techniques to data synchronization
  3. C AutoScout24 is the largest online car marketplace Europe-wide, with roughly 2.4 million listings on the platform, which means that they have a lot of data about how cars are sold.
  4. A AutoScout has a lot of data about how cars are sold and at what prices. - Now our task was to turn all this data into something actually useful for the end user of the page. - So our task was to create a consumer-facing data product where users can quickly estimate the current value of their car. This works as follows… basic information about the car
  5. A Optionally indicate equipment and condition
  6. A You get a price range
  7. A - What we had when we started working on this was a prediction model because that‘s what the data scientists at AutoScout had already build, the language they used for it was R, and the approach being used for that is called random forest. - Who of you has heard of Random Forest before? - Let‘s have look how this works: The data of the last two years is used to train a prediction model, and what you get out of training are many of such decision trees. - RF is the algorithm that decides … and it is a technique to work agains overfitting, i.e., producing a prediction model that only works on the training data.
  8. C - But our task was to turn this model into a high-performance web application. - And in fact, that is not so common yet in the context of data science, because often, such data is only used for internal decision purposes. - But if you want to create a user facing application, you have a very different situation, where you have to deal with load peaks etc. - And that was also the reason why we ruled out to run an R server in production pretty early. The problem is that R, at least in its open source version does not support multi-threading, thus, scaling for many concurrent requests is extremely difficult.
  9. C - Traditional approach that we still see quite often: model is developed by data scientists in some language that suits their way of working best, e.g., R, and then in order to get a good performance, software engineers translate…
  10. C - However, with this manual approach, what do you do, if the internal structure of the prediction model changes? If a software engineer has to reimplement theses changes, it first of all takes a long time, and also mistakes can be introduced in that translation. For example changed from a random forest to gradient boosted machines. For linear regression, reimplementing the model is not a big problem.
  11. A - We therefore looked on how we can automate this and the technology that helped us with that was H2O. - Has anybody heard of H2O? - It‘s a Java based analytics engine that can be programmed using R (which the data scientists liked) and, and that was the important piece for us, provides the possibility to export your fully trained prediction model as Java source code. - This then allowed us to integrate this model generation into a continuous delivery pipeline.
  12. C Commit stage: Unit tests etc.
  13. C Additional database migration scripts.
  14. C Blue/ Green delivery on the instance.
  15. C This looks as follows: … - Then Java code is generated, actually in our case gigabytes of Java source code, which is then compiled into a JAR which is uploaded to AWS S3. This the prediction model pipeline. - Now, whenever something is changed in the R-based configuration or, at least as important, when the model should be updated using the latest data from the platform, a new model JAR gets generated automatically, and is deployed to S3.
  16. C - Now for the web application, which we implemented in Scala using the Play Framework, there is another CD pipeline. - This pipeline also generates a JAR, the application JAR which is then deployed to AWS EC2. - Now, everytime when deploying this application, the pipeline also pulls the latest prediction model from S3 and then loads both into the same JVM. - But also when the model is updated, this triggers a redeployment of the web application with the newest model. - This way all prediction model changes made by the data scientists go straight to production and users can immediately benefit.
  17. A - However, this only works, if you have enough confidence to do so. - Therefore we build an extensive model validation workflow. - Let‘s start with how a model is usually trained and how the success of the training is evaluated. - You, that is the data scientist, divides the existing historical data into training data and test data, and those two sets need to be disjunct. Then the model is trained using the training data. - The test data is then used to create test estimations and these results are compared with the actual price in the test data. - Will never be exactly the same but indicate how good the model is. - We want to validate how the model reacts to new data.
  18. A - Further down the pipeline we use these test estimation results for a comprehensive end-to-end model validation. - That means we check whether the JAR that was created by compiling the generated Java code gives us exactly the same results as directly asking the model that was created by the data scientists. - Furthermore, we also check whether this model fufills all the expectations that our web application poses on it. - This is called a consumer-driven contract test (CDC), the web application in this case is the consumer of the model. Only if all those things are green, we release to production.
  19. C
  20. A
  21. C The warmup time becomes especially problematic during an incident. Time to recover is drastically increased. You also need to configure your autoscaling to take the period of high load into account.
  22. A
  23. A
  24. C Added labels for fair price, good price and top price. More labels coming.
  25. On the listing itself
  26. With different categories and respective ranges
  27. And as filter criteria for search itself