Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Arif Wider & Christian Deger
DATA SCIENCE,
DELIVERED CONTINUOUSLY
A CONFERENCE ALL ABOUT TECHNOLOGY
Christian Deger
Chief Architect
cdeger@autoscout24.com
@cdeger
Arif Wider
Senior Consultant/Developer
awider@thoughtworks.com
@arifwider
PL
S
RUS
UA
RO
CZ
D
NL
B
F
A
HR
I
E
BG
TR
18countries
2.4m+cars & motos
10m+users per
month
The task: A consumer-facing data product
6XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C...
The task: A consumer-facing data product
7XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C...
The task: A consumer-facing data product
8XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C...
The prediction model: Random forest
9
Volkswagen GolfCar listings of
last two years
XConf ’17 Hamburg/Manchester Data Scie...
How to turn an R-based prediction model
into a high-performance web application?
10
?
XConf ’17 Hamburg/Manchester Data Sc...
How to turn an R-based prediction model
into a high-performance web application?
11XConf ’17 Hamburg/Manchester Data Scien...
How to turn an R-based prediction model
into a high-performance web application?
12XConf ’17 Hamburg/Manchester Data Scien...
How to turn an R-based prediction model
into a high-performance web application?
13
 Continuous Delivery!
XConf ’17 Hambu...
Application code in
one repository per
service.
CI
Deployment package
as artifact.
CD
Deliver package to
servers
Typical d...
Continuous delivery pipelines
15
Prediction Model Pipeline
XConf ’17 Hamburg/Manchester Data Science, Delivered Continuous...
Continuous delivery pipelines
16
Prediction Model Pipeline
Web Application Pipeline
XConf ’17 Hamburg/Manchester Data Scie...
The price for CD: Extensive model validation
17XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wide...
The price for CD: Extensive model validation
18XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wide...
Lessons learned
19
Form a cross-functional team of
data scientists & software engineers!
Software engineers
… learn how da...
Lessons learned
20
Generating gigabytes of Java code
is a challenge for the JVM
Use the G1 garbage collector
Turn off Ti...
Lessons learned – Warm up
21XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
Lessons learned
22
The approach of applying Continuous Delivery to
Data Science is useful independently of the tech
 Succ...
Conclusions
23
 Continuous Delivery allows us to bring prediction
model changes live very quickly.
 Only extensive autom...
Conclusions: Price evaluation everywhere
24XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & ...
Conclusions: Price evaluation everywhere
25XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & ...
QUESTIONS?
THANK YOU
Arif Wider & Christian Deger
@arifwider @cdeger
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Data Science - Delivered Continuously - XConf 2017

Download to read offline

AutoScout24 is the largest online car marketplace Europe-wide for new and used cars. With more than 2.4 million listings across Europe, AutoScout24 has access to large amounts of data about historic and current market prices and wants to use this data to empower its users to make informed decisions about selling and buying cars. We created a live price estimation service for used vehicles based on a Random Forest prediction model that is continuously delivered to the end user.

https://info.thoughtworks.com/XConf-2017-EU.html

Presented at XConf 2017 in Hamburg and Manchester by Arif Wider (ThoughtWorks) and Christian Deger (AutoScout24).

  • Be the first to like this

Data Science - Delivered Continuously - XConf 2017

  1. 1. Arif Wider & Christian Deger DATA SCIENCE, DELIVERED CONTINUOUSLY
  2. 2. A CONFERENCE ALL ABOUT TECHNOLOGY
  3. 3. Christian Deger Chief Architect cdeger@autoscout24.com @cdeger
  4. 4. Arif Wider Senior Consultant/Developer awider@thoughtworks.com @arifwider
  5. 5. PL S RUS UA RO CZ D NL B F A HR I E BG TR 18countries 2.4m+cars & motos 10m+users per month
  6. 6. The task: A consumer-facing data product 6XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  7. 7. The task: A consumer-facing data product 7XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  8. 8. The task: A consumer-facing data product 8XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  9. 9. The prediction model: Random forest 9 Volkswagen GolfCar listings of last two years XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  10. 10. How to turn an R-based prediction model into a high-performance web application? 10 ? XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  11. 11. How to turn an R-based prediction model into a high-performance web application? 11XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  12. 12. How to turn an R-based prediction model into a high-performance web application? 12XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  13. 13. How to turn an R-based prediction model into a high-performance web application? 13  Continuous Delivery! XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  14. 14. Application code in one repository per service. CI Deployment package as artifact. CD Deliver package to servers Typical delivery pipeline XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  15. 15. Continuous delivery pipelines 15 Prediction Model Pipeline XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  16. 16. Continuous delivery pipelines 16 Prediction Model Pipeline Web Application Pipeline XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  17. 17. The price for CD: Extensive model validation 17XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  18. 18. The price for CD: Extensive model validation 18XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  19. 19. Lessons learned 19 Form a cross-functional team of data scientists & software engineers! Software engineers … learn how data scientists work … and understand the quirks of a prediction model Data Scientist … learn about unit testing, stable interfaces, git, etc. ... get quick feedback about the impact of their work  Model and product iterations become much faster! XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  20. 20. Lessons learned 20 Generating gigabytes of Java code is a challenge for the JVM Use the G1 garbage collector Turn off Tiered Compilation  Do extensive warm-ups XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  21. 21. Lessons learned – Warm up 21XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  22. 22. Lessons learned 22 The approach of applying Continuous Delivery to Data Science is useful independently of the tech  Successfully applied similarly to a Python- and Spark-based project  Even more useful when quick model evolution is required because of rapidly changing inputs (e.g. user interaction) XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  23. 23. Conclusions 23  Continuous Delivery allows us to bring prediction model changes live very quickly.  Only extensive automated end-to-end tests provide confidence to deploy to production automatically.  Java code generation allows for very low response times and excellent scalability for high loads but requires plenty of memory. XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  24. 24. Conclusions: Price evaluation everywhere 24XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  25. 25. Conclusions: Price evaluation everywhere 25XConf ’17 Hamburg/Manchester Data Science, Delivered Continuously – A. Wider & C. Deger
  26. 26. QUESTIONS? THANK YOU Arif Wider & Christian Deger @arifwider @cdeger

AutoScout24 is the largest online car marketplace Europe-wide for new and used cars. With more than 2.4 million listings across Europe, AutoScout24 has access to large amounts of data about historic and current market prices and wants to use this data to empower its users to make informed decisions about selling and buying cars. We created a live price estimation service for used vehicles based on a Random Forest prediction model that is continuously delivered to the end user. https://info.thoughtworks.com/XConf-2017-EU.html Presented at XConf 2017 in Hamburg and Manchester by Arif Wider (ThoughtWorks) and Christian Deger (AutoScout24).

Views

Total views

409

On Slideshare

0

From embeds

0

Number of embeds

13

Actions

Downloads

7

Shares

0

Comments

0

Likes

0

×