SlideShare a Scribd company logo
1 of 43
Download to read offline
How Mobile.de brings Data
Science to Production for a
Personalized Web Experience
Dr. Markus Schüler & Dr. Florian Wilhelm
2018-07-08, PyData 2018, Berlin
2
Introduction
@FlorianWilhelm
FlorianWilhelm
florianwilhelm.info
Dr. Florian Wilhelm
Data Scientist
inovex GmbH
Dr. Markus Schüler
Data Scientist & Team Lead
mobile.de GmbH
3
Agenda
• General Introduction
• Personalization Use Cases at mobile.de
• Predicting Car Buying Intent
• Python for Big Data Processing
• Optimizing Performance
4
5
MOBILE.DE
GERMAN MARKET
LEADER
13.5 MIO
UNIQUE USER
PER MONTH
1.6 MIO
VEHICLES
290
EMPLOYEES
DREILINDEN /
FRIEDRICHSHAIN
BERLIN
HEADQUARTERS
Part of
ebay Tech
6
IT-project house for digital transformation:
‣ Agile Development & Management
‣ Web · UI/UX · Replatforming · Microservices
‣ Mobile · Apps · Smart Devices · Robotics
‣ Big Data & Business Intelligence Platforms
‣ Data Science · Data Products · Search · Deep Learning
‣ Data Center Automation · DevOps · Cloud · Hosting
‣ Trainings & Coachings
Using technology to inspire our
clients. And ourselves.
inovex offices in
Karlsruhe · Cologne · Munich ·
Pforzheim · Hamburg · Stuttgart.
www.inovex.de
7
Why Recommendations?Why Personalization?
Inspiration
Engagement
Memory of past interactions
You are unique!
8
Why Personalization?
Data-Driven
Personalization
Improves:
User
Experience
User
Engagement
Source: https://www.kleinerperkins.com/perspectives/internet-trends-report-2018
9
Personalization at mobile.de
User Event Tracking & Storage
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily preference profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily activity profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Recommendations
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Segmentation
User Car Preferences User Interactions
10
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Marketing
Last Action: Yesterday
Frequent User
User 12345
User Preferences based on User’s interactions
User Car Preference Example
User Preferences
Anonymous
11
Uncertainty Quantification
Number of
user events
Impact of prior
(avg. user)
User profile
à
Posterior User Profile
+
Posterior probability∝Likelihood×Prior probability
Bayesian Approach
30% Volkswagen25% gray 50% automatic8% SUV10,000 €
Prior based on all users
User Preferences
Posterior User Preferences
Impact of Prior
(avg. user)
Number of
user events
12
Recommendation
All Listings
Content-based Information
(User Preferences)
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Collaborative Information
P
P P
P
P
Mobile.de Recommendation Engine
Features of vehicle
13
Personalization at mobile.de
User Event Tracking & Storage
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily preference profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily activity profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Recommendations
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Segmentation
User Car Preferences User Interactions
14
Different User Intents
“I have no idea about
cars. I need basic
information and
guidance.”
“I’m a car expert.
Lead me to the
best deals in the
fastest way.”
“I love to browse
expensive cars,
yet I have
no buying intent.”
“As a dealer, I need
detailed data to
compare my own
listings with my
competitor’s”
15
Events of a Car Buying Journey
contacts
parkings
views
16
control buyers
events total 72,621,069 2,500,771
median events 153 188
median days active 22 15
Analysing events of car buyers
17
User Events: Event counts
0.0 0.2 0.4 0.6 0.8 1.0
0.000.050.100.150.200.25
Event count over user journey
contact
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 1.815e−22 ***
Control intercept diff p = 9.823e−02 .
Control slope diff p = 9.956e−04 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
Event count over user journey
parking
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 7.999e−06 ***
Control intercept diff p = 1.399e−21 ***
Control slope diff p = 6.702e−06 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
051015202530
Event count over user journey
search
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 6.694e−51 ***
Control intercept diff p = 1.141e−01
Control slope diff p = 9.044e−07 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
0510152025
Event count over user journey
view
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 1.824e−08 ***
Control intercept diff p = 2.506e−45 ***
Control slope diff p = 2.824e−02 *
local mean
linear model
lowess
contactparking
viewsearch
18
User Events: Duplicated views
0.2
0.4
0.6
0.0 0.2 0.4 0.6 0.8 1.0
Position in user journey
• Buyers look
more often at
cars they have
seen already
than the control
group and their
ratio increases
faster (both
significant)
Amountofduplicatedviews
Buyer
Control
19
When did buyers interact with the car they bought?
§ Buyers view
“their” car the
most 4/5th
along their user
journey
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
When do buyers view the car they buy?
Position in user journey
%ofusers
0
5
10
15
Position in user journey
%ofusers
20
ML Model: How close to buy?
§ Aim: predict how likely
a user is to make his
buying decision today
§ Personalization
§ Highlight dealer contact
details
§ Provide car buying
assistance
21
Feature Generation
Features:
§ Event counts (view, search, contact, parking)
§ % event of all events (like %views among all event)
§ a=Number of active days, b=Max-diff active days, a/b
§ Additional features:
§ Views/(Search+View)
§ % of duplicated views among all views
Buying date (=0)
30 days
0-2 days3-9 days10-30 days
ratio
22
Modelling
§ Logistic Regression
§ Automatic Feature Selection
§ start from different sub-selections of features (like “all”, “no ratios”,
etc.)
§ allow addition and subtraction of features based on maximizing AIC
§ needed to prevent overfitting
§ Window optimization
23
Window size optimization
§ Used window size and number as optimization criterion
Buying date (=0)
30 days
0-2 days3-9 days10-30 days
0 days1-9 days10-30 days
0 days1-7 days8-30 days
0-9 days10-19 days20-30 days
0 days1-4 days10-30 days 5-9 days
0 days1-7 days8-30 days
24
Modelling
§ Logistic Regression
§ Automatic Feature Selection
§ start from different sub-selections of features (like “all”, “no ratios”,
etc.)
§ allow addition and subtraction of features based on maximizing AIC
§ needed to prevent overfitting
§ Window optimization
§ Cross-Validation (15 fold, 70/30 train/test split)
25
closeToBuy_now_0−1−10−30_cid
closeToBuy_now_0−1−7−30_cid
loseToBuy_now_0−10−20−30_cid
closeToBuy_now_0−3−10−30_cid
closeToBuy_now_0−5−10−30_cid
Modelling statistics: closeToBuy_now_cid
0.65
0.70
0.75
0.80 Accuracy Sensitivity Specificity
Results
Prediction: The user made his buying decision today
Best Model:
72% Accuracy / 68% Sensitivity / 76% Specificity
Model1
Model2
Model3
Model4
Model5
26
Buys tomorrow, next week, next two weeks
0%
10%
20%
30%
40%
50%
60%
70%
80%
Buy Today Buy Tomorrow Buy in a Week Buy in two Weeks
Accuracy Sensitivity Specificity
Considerable
lower predictive
power when
predicting more
distant future
events
Still room for
improvement
27
Python & Big Data
BIG
DATA
28
Hive for heavy lifting
• Apache project
• built on top of Hadoop
• SQL interface to your data
• basically map&reduce abstraction layer
• robust and matured
• but slow and surely not “interactive”
Data Team:
• used for batch-processing of user preferences,
user-segmentation etc.
• PyHive by Dropbox for Python support
• usage of Python-based UD(A)Fs
29
User Defined Functions (UDFs)
User defined (aggregation) functions:
§ needed when native functions aren‘t sufficient
§ are always much slower than native functions
§ work on a column or multiple (grouped) columns
§ are vector-valued operations and/or aggregations
transform aggregate apply
30
fast and general engine for
large-scale data processing
PySpark for fast analysis and machine learning
+ =
pyspark
31
Conversion Example of User Preferences
Hive:
• 2483 lines of code
• Jinja2 to generate SQL queries
• Temporary tables for performance
• Runtime 5-10h
• Logic hard to understand at times
Spark:
• 1745 lines of code
• programatic definition of queries
• No temporary tables needed
• Runtime 1-2 h
• Quite easy to understand
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
32
How Spark works
e.g. Jupyter lab
Source: Spark documentation
33
How do Python UD(A)Fs work?
Source: Spark documentation 7
34
Apache Arrow
Source: Arrow documentation
35
PySpark & Pandas
Vectorized UDFs for Spark 2.3:
§build on top of Apache Arrow,
§avoid high serialization and invocation overhead,
§allows row-at-a-time UFDs and cumulative UDAFs
§as flexible as Pandas` apply
Source: databricks blog
36
Performance gains
Source: https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html
37
But what if Spark < 2.3?
It‘s possible to write flexible UD(A)Fs by
•using RDD functionality, df.rdd.mapPartitions(my_func)
•convert low-level Row objects to Pandas dataframe
•wrap everything into a nice decorator
Detailed information under:
https://www.inovex.de/blog/efficient-udafs-with-pyspark/
38
Isolated environments with PySpark
39
Concept
§ create a local environment based on wheels,
§ upload unpacked wheels with to HDFS,
§ read and distribute these Python packages from the Spark
driver to the executores with sc.addFile,
§ use the packages on the executors, e.g. in a UDF.
Detailed information under:
https://www.inovex.de/blog/managing-isolated-environments-with-pyspark/
40
Architecture
41
Summary
PyData Stack
Interesting & Challenging Use Cases
Data Science
Data Engineering
Business Impact
42
Any Questions?
How mobile.de brings Data Science to Production for a Personalized Web Experience

More Related Content

What's hot

Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introductionAnas Jamil
 
Why start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaignsWhy start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaignsData Con LA
 
FINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODEL
FINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODELFINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODEL
FINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODELChristopherTHyatt
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorialAlexandros Karatzoglou
 

What's hot (6)

Machine learning introduction
Machine learning introductionMachine learning introduction
Machine learning introduction
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
Why start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaignsWhy start using uplift models for more efficient marketing campaigns
Why start using uplift models for more efficient marketing campaigns
 
Marketing mix mali
Marketing  mix maliMarketing  mix mali
Marketing mix mali
 
FINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODEL
FINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODELFINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODEL
FINE-TUNING LLAMA 2: DOMAIN ADAPTATION OF A PRE-TRAINED MODEL
 
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems -  ACM RecSys 2013 tutorialLearning to Rank for Recommender Systems -  ACM RecSys 2013 tutorial
Learning to Rank for Recommender Systems - ACM RecSys 2013 tutorial
 

Similar to How mobile.de brings Data Science to Production for a Personalized Web Experience

Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017Florian Wilhelm
 
Which car fits my life? Mobile.de’s approach to recommendations
Which car fits my life? Mobile.de’s approach to recommendationsWhich car fits my life? Mobile.de’s approach to recommendations
Which car fits my life? Mobile.de’s approach to recommendationsinovex GmbH
 
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...Florian Wilhelm
 
Toyota Fortuner - Price, Images & Specification
Toyota Fortuner - Price, Images & SpecificationToyota Fortuner - Price, Images & Specification
Toyota Fortuner - Price, Images & Specificationautoinfoclub
 
HySolarKit - Solar Hybridization of Conventional Vehicles
HySolarKit - Solar Hybridization of Conventional Vehicles HySolarKit - Solar Hybridization of Conventional Vehicles
HySolarKit - Solar Hybridization of Conventional Vehicles Open Makers Italy
 
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdfBYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdfUmair Aijaz
 
Digiprog iii digiprog_3_html
Digiprog iii digiprog_3_htmlDigiprog iii digiprog_3_html
Digiprog iii digiprog_3_htmlEchoCullen
 
Automotive Industry Disruption
Automotive Industry Disruption Automotive Industry Disruption
Automotive Industry Disruption asTech
 
5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)
5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)
5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)Lucas Schrodt
 
Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...
Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...
Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...Kurtis Morrison
 
Trends shaping the automotive remarketing industry melinda zabritski
Trends shaping the automotive remarketing industry   melinda zabritskiTrends shaping the automotive remarketing industry   melinda zabritski
Trends shaping the automotive remarketing industry melinda zabritskiIARAWeb
 
Sf south bay8202016
Sf south bay8202016Sf south bay8202016
Sf south bay8202016Carstir.com
 
101118 Car Pass Mileage Fraud Presentation Brussels
101118 Car Pass Mileage Fraud Presentation Brussels101118 Car Pass Mileage Fraud Presentation Brussels
101118 Car Pass Mileage Fraud Presentation BrusselsMichelvr1
 
Fiat Group Final Version
Fiat Group Final VersionFiat Group Final Version
Fiat Group Final Versionespo180
 
How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...
How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...
How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...IBM Sverige
 

Similar to How mobile.de brings Data Science to Production for a Personalized Web Experience (20)

Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017Which car fits my life? - PyData Berlin 2017
Which car fits my life? - PyData Berlin 2017
 
Which car fits my life? Mobile.de’s approach to recommendations
Which car fits my life? Mobile.de’s approach to recommendationsWhich car fits my life? Mobile.de’s approach to recommendations
Which car fits my life? Mobile.de’s approach to recommendations
 
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
Deep Learning-based Recommendations for Germany's Biggest Online Vehicle Mark...
 
Toyota Fortuner - Price, Images & Specification
Toyota Fortuner - Price, Images & SpecificationToyota Fortuner - Price, Images & Specification
Toyota Fortuner - Price, Images & Specification
 
HySolarKit - Solar Hybridization of Conventional Vehicles
HySolarKit - Solar Hybridization of Conventional Vehicles HySolarKit - Solar Hybridization of Conventional Vehicles
HySolarKit - Solar Hybridization of Conventional Vehicles
 
European Car Market Analysis
European Car Market AnalysisEuropean Car Market Analysis
European Car Market Analysis
 
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdfBYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
BYD Seal Atto 4 SR 2024 Price in Europe Review & Features.pdf
 
Cars on the Go - Project
Cars on the Go - ProjectCars on the Go - Project
Cars on the Go - Project
 
Digiprog iii digiprog_3_html
Digiprog iii digiprog_3_htmlDigiprog iii digiprog_3_html
Digiprog iii digiprog_3_html
 
Sf city8222016
Sf city8222016Sf city8222016
Sf city8222016
 
Automotive Industry Disruption
Automotive Industry Disruption Automotive Industry Disruption
Automotive Industry Disruption
 
Digiprog III
Digiprog IIIDigiprog III
Digiprog III
 
5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)
5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)
5-Forces Analysis, S-W-O-T, Strategic Recommendations - Car2Go (2017)
 
Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...
Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...
Competitive Spotlight: Measuring Design Effectiveness in the UK Car Rental Ma...
 
Trends shaping the automotive remarketing industry melinda zabritski
Trends shaping the automotive remarketing industry   melinda zabritskiTrends shaping the automotive remarketing industry   melinda zabritski
Trends shaping the automotive remarketing industry melinda zabritski
 
Sf south bay8202016
Sf south bay8202016Sf south bay8202016
Sf south bay8202016
 
Chicago8142016
Chicago8142016Chicago8142016
Chicago8142016
 
101118 Car Pass Mileage Fraud Presentation Brussels
101118 Car Pass Mileage Fraud Presentation Brussels101118 Car Pass Mileage Fraud Presentation Brussels
101118 Car Pass Mileage Fraud Presentation Brussels
 
Fiat Group Final Version
Fiat Group Final VersionFiat Group Final Version
Fiat Group Final Version
 
How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...
How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...
How can Insights on Sustainable Transport Solutions Lead to Customer Value? -...
 

More from Florian Wilhelm

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingFlorian Wilhelm
 
WALD: A Modern & Sustainable Analytics Stack
WALD: A Modern & Sustainable Analytics StackWALD: A Modern & Sustainable Analytics Stack
WALD: A Modern & Sustainable Analytics StackFlorian Wilhelm
 
Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!Florian Wilhelm
 
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...Florian Wilhelm
 
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...Florian Wilhelm
 
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...Florian Wilhelm
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AIFlorian Wilhelm
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseFlorian Wilhelm
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionFlorian Wilhelm
 
Declarative Thinking and Programming
Declarative Thinking and ProgrammingDeclarative Thinking and Programming
Declarative Thinking and ProgrammingFlorian Wilhelm
 
PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19Florian Wilhelm
 
Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Florian Wilhelm
 

More from Florian Wilhelm (13)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unlocking the Power of Integer Programming
Unlocking the Power of Integer ProgrammingUnlocking the Power of Integer Programming
Unlocking the Power of Integer Programming
 
WALD: A Modern & Sustainable Analytics Stack
WALD: A Modern & Sustainable Analytics StackWALD: A Modern & Sustainable Analytics Stack
WALD: A Modern & Sustainable Analytics Stack
 
Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!Forget about AI and do Mathematical Modelling instead!
Forget about AI and do Mathematical Modelling instead!
 
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
An Interpretable Model for Collaborative Filtering Using an Extended Latent D...
 
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
Honey I Shrunk the Target Variable! Common pitfalls when transforming the tar...
 
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
Matrix Factorization for Collaborative Filtering Is Just Solving an Adjoint L...
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AI
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
Declarative Thinking and Programming
Declarative Thinking and ProgrammingDeclarative Thinking and Programming
Declarative Thinking and Programming
 
PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19PyData Meetup Berlin 2017-04-19
PyData Meetup Berlin 2017-04-19
 
Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...Explaining the idea behind automatic relevance determination and bayesian int...
Explaining the idea behind automatic relevance determination and bayesian int...
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

How mobile.de brings Data Science to Production for a Personalized Web Experience

  • 1. How Mobile.de brings Data Science to Production for a Personalized Web Experience Dr. Markus Schüler & Dr. Florian Wilhelm 2018-07-08, PyData 2018, Berlin
  • 2. 2 Introduction @FlorianWilhelm FlorianWilhelm florianwilhelm.info Dr. Florian Wilhelm Data Scientist inovex GmbH Dr. Markus Schüler Data Scientist & Team Lead mobile.de GmbH
  • 3. 3 Agenda • General Introduction • Personalization Use Cases at mobile.de • Predicting Car Buying Intent • Python for Big Data Processing • Optimizing Performance
  • 4. 4
  • 5. 5 MOBILE.DE GERMAN MARKET LEADER 13.5 MIO UNIQUE USER PER MONTH 1.6 MIO VEHICLES 290 EMPLOYEES DREILINDEN / FRIEDRICHSHAIN BERLIN HEADQUARTERS Part of ebay Tech
  • 6. 6 IT-project house for digital transformation: ‣ Agile Development & Management ‣ Web · UI/UX · Replatforming · Microservices ‣ Mobile · Apps · Smart Devices · Robotics ‣ Big Data & Business Intelligence Platforms ‣ Data Science · Data Products · Search · Deep Learning ‣ Data Center Automation · DevOps · Cloud · Hosting ‣ Trainings & Coachings Using technology to inspire our clients. And ourselves. inovex offices in Karlsruhe · Cologne · Munich · Pforzheim · Hamburg · Stuttgart. www.inovex.de
  • 9. 9 Personalization at mobile.de User Event Tracking & Storage Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily preference profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily activity profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Recommendations Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Segmentation User Car Preferences User Interactions
  • 10. 10 Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Marketing Last Action: Yesterday Frequent User User 12345 User Preferences based on User’s interactions User Car Preference Example User Preferences Anonymous
  • 11. 11 Uncertainty Quantification Number of user events Impact of prior (avg. user) User profile à Posterior User Profile + Posterior probability∝Likelihood×Prior probability Bayesian Approach 30% Volkswagen25% gray 50% automatic8% SUV10,000 € Prior based on all users User Preferences Posterior User Preferences Impact of Prior (avg. user) Number of user events
  • 12. 12 Recommendation All Listings Content-based Information (User Preferences) Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Collaborative Information P P P P P Mobile.de Recommendation Engine Features of vehicle
  • 13. 13 Personalization at mobile.de User Event Tracking & Storage Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily preference profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Daily activity profiles Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Recommendations Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 % Segmentation User Car Preferences User Interactions
  • 14. 14 Different User Intents “I have no idea about cars. I need basic information and guidance.” “I’m a car expert. Lead me to the best deals in the fastest way.” “I love to browse expensive cars, yet I have no buying intent.” “As a dealer, I need detailed data to compare my own listings with my competitor’s”
  • 15. 15 Events of a Car Buying Journey contacts parkings views
  • 16. 16 control buyers events total 72,621,069 2,500,771 median events 153 188 median days active 22 15 Analysing events of car buyers
  • 17. 17 User Events: Event counts 0.0 0.2 0.4 0.6 0.8 1.0 0.000.050.100.150.200.25 Event count over user journey contact Position in user journey Averagecount Buyer Control Buyer slope p = 1.815e−22 *** Control intercept diff p = 9.823e−02 . Control slope diff p = 9.956e−04 *** local mean linear model lowess 0.0 0.2 0.4 0.6 0.8 1.0 0.00.51.01.52.0 Event count over user journey parking Position in user journey Averagecount Buyer Control Buyer slope p = 7.999e−06 *** Control intercept diff p = 1.399e−21 *** Control slope diff p = 6.702e−06 *** local mean linear model lowess 0.0 0.2 0.4 0.6 0.8 1.0 051015202530 Event count over user journey search Position in user journey Averagecount Buyer Control Buyer slope p = 6.694e−51 *** Control intercept diff p = 1.141e−01 Control slope diff p = 9.044e−07 *** local mean linear model lowess 0.0 0.2 0.4 0.6 0.8 1.0 0510152025 Event count over user journey view Position in user journey Averagecount Buyer Control Buyer slope p = 1.824e−08 *** Control intercept diff p = 2.506e−45 *** Control slope diff p = 2.824e−02 * local mean linear model lowess contactparking viewsearch
  • 18. 18 User Events: Duplicated views 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.8 1.0 Position in user journey • Buyers look more often at cars they have seen already than the control group and their ratio increases faster (both significant) Amountofduplicatedviews Buyer Control
  • 19. 19 When did buyers interact with the car they bought? § Buyers view “their” car the most 4/5th along their user journey 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% When do buyers view the car they buy? Position in user journey %ofusers 0 5 10 15 Position in user journey %ofusers
  • 20. 20 ML Model: How close to buy? § Aim: predict how likely a user is to make his buying decision today § Personalization § Highlight dealer contact details § Provide car buying assistance
  • 21. 21 Feature Generation Features: § Event counts (view, search, contact, parking) § % event of all events (like %views among all event) § a=Number of active days, b=Max-diff active days, a/b § Additional features: § Views/(Search+View) § % of duplicated views among all views Buying date (=0) 30 days 0-2 days3-9 days10-30 days ratio
  • 22. 22 Modelling § Logistic Regression § Automatic Feature Selection § start from different sub-selections of features (like “all”, “no ratios”, etc.) § allow addition and subtraction of features based on maximizing AIC § needed to prevent overfitting § Window optimization
  • 23. 23 Window size optimization § Used window size and number as optimization criterion Buying date (=0) 30 days 0-2 days3-9 days10-30 days 0 days1-9 days10-30 days 0 days1-7 days8-30 days 0-9 days10-19 days20-30 days 0 days1-4 days10-30 days 5-9 days 0 days1-7 days8-30 days
  • 24. 24 Modelling § Logistic Regression § Automatic Feature Selection § start from different sub-selections of features (like “all”, “no ratios”, etc.) § allow addition and subtraction of features based on maximizing AIC § needed to prevent overfitting § Window optimization § Cross-Validation (15 fold, 70/30 train/test split)
  • 25. 25 closeToBuy_now_0−1−10−30_cid closeToBuy_now_0−1−7−30_cid loseToBuy_now_0−10−20−30_cid closeToBuy_now_0−3−10−30_cid closeToBuy_now_0−5−10−30_cid Modelling statistics: closeToBuy_now_cid 0.65 0.70 0.75 0.80 Accuracy Sensitivity Specificity Results Prediction: The user made his buying decision today Best Model: 72% Accuracy / 68% Sensitivity / 76% Specificity Model1 Model2 Model3 Model4 Model5
  • 26. 26 Buys tomorrow, next week, next two weeks 0% 10% 20% 30% 40% 50% 60% 70% 80% Buy Today Buy Tomorrow Buy in a Week Buy in two Weeks Accuracy Sensitivity Specificity Considerable lower predictive power when predicting more distant future events Still room for improvement
  • 27. 27 Python & Big Data BIG DATA
  • 28. 28 Hive for heavy lifting • Apache project • built on top of Hadoop • SQL interface to your data • basically map&reduce abstraction layer • robust and matured • but slow and surely not “interactive” Data Team: • used for batch-processing of user preferences, user-segmentation etc. • PyHive by Dropbox for Python support • usage of Python-based UD(A)Fs
  • 29. 29 User Defined Functions (UDFs) User defined (aggregation) functions: § needed when native functions aren‘t sufficient § are always much slower than native functions § work on a column or multiple (grouped) columns § are vector-valued operations and/or aggregations transform aggregate apply
  • 30. 30 fast and general engine for large-scale data processing PySpark for fast analysis and machine learning + = pyspark
  • 31. 31 Conversion Example of User Preferences Hive: • 2483 lines of code • Jinja2 to generate SQL queries • Temporary tables for performance • Runtime 5-10h • Logic hard to understand at times Spark: • 1745 lines of code • programatic definition of queries • No temporary tables needed • Runtime 1-2 h • Quite easy to understand Looking For: Used Car (100%) Prefers (Make): BMW (50%), Audi (50%) Prefers (Model): Audi A3 (25%), Audi A4 (25%), BMW 318 (50%) Searching In: lat 52.5206, lon 13.409 Search Radius: 300km Preferred Price: 20 000€ ± 1500€ Preferred Mileage: 10 000km ± 5000km User Profile Buyer Last Action: Yesterday Frequent User User 12345 Likelihood to buy: 88 %
  • 32. 32 How Spark works e.g. Jupyter lab Source: Spark documentation
  • 33. 33 How do Python UD(A)Fs work? Source: Spark documentation 7
  • 35. 35 PySpark & Pandas Vectorized UDFs for Spark 2.3: §build on top of Apache Arrow, §avoid high serialization and invocation overhead, §allows row-at-a-time UFDs and cumulative UDAFs §as flexible as Pandas` apply Source: databricks blog
  • 37. 37 But what if Spark < 2.3? It‘s possible to write flexible UD(A)Fs by •using RDD functionality, df.rdd.mapPartitions(my_func) •convert low-level Row objects to Pandas dataframe •wrap everything into a nice decorator Detailed information under: https://www.inovex.de/blog/efficient-udafs-with-pyspark/
  • 39. 39 Concept § create a local environment based on wheels, § upload unpacked wheels with to HDFS, § read and distribute these Python packages from the Spark driver to the executores with sc.addFile, § use the packages on the executors, e.g. in a UDF. Detailed information under: https://www.inovex.de/blog/managing-isolated-environments-with-pyspark/
  • 41. 41 Summary PyData Stack Interesting & Challenging Use Cases Data Science Data Engineering Business Impact