Here are some tips to help you find the right car:
1. Narrow down your top 3 models based on your needs and budget. Research reviews.
2. Set up alerts on mobile.de for your top models within your search radius.
3. Schedule test drives for your top choices on the weekends to compare. Bring a mechanic if needed.
4. Negotiate the best price by getting competing offers from different dealerships in writing.
5. Consider financing options before making an offer to get the best rate.
6. Have the vehicle inspected by your mechanic before purchase.
Let me know if you have any other questions! I'm here to help you through the process.
Unblocking The Main Thread Solving ANRs and Frozen Frames
How mobile.de brings Data Science to Production for a Personalized Web Experience
1. How Mobile.de brings Data
Science to Production for a
Personalized Web Experience
Dr. Markus Schüler & Dr. Florian Wilhelm
2018-07-08, PyData 2018, Berlin
3. 3
Agenda
• General Introduction
• Personalization Use Cases at mobile.de
• Predicting Car Buying Intent
• Python for Big Data Processing
• Optimizing Performance
6. 6
IT-project house for digital transformation:
‣ Agile Development & Management
‣ Web · UI/UX · Replatforming · Microservices
‣ Mobile · Apps · Smart Devices · Robotics
‣ Big Data & Business Intelligence Platforms
‣ Data Science · Data Products · Search · Deep Learning
‣ Data Center Automation · DevOps · Cloud · Hosting
‣ Trainings & Coachings
Using technology to inspire our
clients. And ourselves.
inovex offices in
Karlsruhe · Cologne · Munich ·
Pforzheim · Hamburg · Stuttgart.
www.inovex.de
9. 9
Personalization at mobile.de
User Event Tracking & Storage
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily preference profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily activity profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Recommendations
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Segmentation
User Car Preferences User Interactions
10. 10
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Marketing
Last Action: Yesterday
Frequent User
User 12345
User Preferences based on User’s interactions
User Car Preference Example
User Preferences
Anonymous
11. 11
Uncertainty Quantification
Number of
user events
Impact of prior
(avg. user)
User profile
à
Posterior User Profile
+
Posterior probability∝Likelihood×Prior probability
Bayesian Approach
30% Volkswagen25% gray 50% automatic8% SUV10,000 €
Prior based on all users
User Preferences
Posterior User Preferences
Impact of Prior
(avg. user)
Number of
user events
12. 12
Recommendation
All Listings
Content-based Information
(User Preferences)
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Collaborative Information
P
P P
P
P
Mobile.de Recommendation Engine
Features of vehicle
13. 13
Personalization at mobile.de
User Event Tracking & Storage
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily preference profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Daily activity profiles
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Recommendations
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
Segmentation
User Car Preferences User Interactions
14. 14
Different User Intents
“I have no idea about
cars. I need basic
information and
guidance.”
“I’m a car expert.
Lead me to the
best deals in the
fastest way.”
“I love to browse
expensive cars,
yet I have
no buying intent.”
“As a dealer, I need
detailed data to
compare my own
listings with my
competitor’s”
15. 15
Events of a Car Buying Journey
contacts
parkings
views
16. 16
control buyers
events total 72,621,069 2,500,771
median events 153 188
median days active 22 15
Analysing events of car buyers
17. 17
User Events: Event counts
0.0 0.2 0.4 0.6 0.8 1.0
0.000.050.100.150.200.25
Event count over user journey
contact
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 1.815e−22 ***
Control intercept diff p = 9.823e−02 .
Control slope diff p = 9.956e−04 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
0.00.51.01.52.0
Event count over user journey
parking
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 7.999e−06 ***
Control intercept diff p = 1.399e−21 ***
Control slope diff p = 6.702e−06 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
051015202530
Event count over user journey
search
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 6.694e−51 ***
Control intercept diff p = 1.141e−01
Control slope diff p = 9.044e−07 ***
local mean
linear model
lowess
0.0 0.2 0.4 0.6 0.8 1.0
0510152025
Event count over user journey
view
Position in user journey
Averagecount
Buyer
Control
Buyer slope p = 1.824e−08 ***
Control intercept diff p = 2.506e−45 ***
Control slope diff p = 2.824e−02 *
local mean
linear model
lowess
contactparking
viewsearch
18. 18
User Events: Duplicated views
0.2
0.4
0.6
0.0 0.2 0.4 0.6 0.8 1.0
Position in user journey
• Buyers look
more often at
cars they have
seen already
than the control
group and their
ratio increases
faster (both
significant)
Amountofduplicatedviews
Buyer
Control
19. 19
When did buyers interact with the car they bought?
§ Buyers view
“their” car the
most 4/5th
along their user
journey
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
When do buyers view the car they buy?
Position in user journey
%ofusers
0
5
10
15
Position in user journey
%ofusers
20. 20
ML Model: How close to buy?
§ Aim: predict how likely
a user is to make his
buying decision today
§ Personalization
§ Highlight dealer contact
details
§ Provide car buying
assistance
21. 21
Feature Generation
Features:
§ Event counts (view, search, contact, parking)
§ % event of all events (like %views among all event)
§ a=Number of active days, b=Max-diff active days, a/b
§ Additional features:
§ Views/(Search+View)
§ % of duplicated views among all views
Buying date (=0)
30 days
0-2 days3-9 days10-30 days
ratio
22. 22
Modelling
§ Logistic Regression
§ Automatic Feature Selection
§ start from different sub-selections of features (like “all”, “no ratios”,
etc.)
§ allow addition and subtraction of features based on maximizing AIC
§ needed to prevent overfitting
§ Window optimization
23. 23
Window size optimization
§ Used window size and number as optimization criterion
Buying date (=0)
30 days
0-2 days3-9 days10-30 days
0 days1-9 days10-30 days
0 days1-7 days8-30 days
0-9 days10-19 days20-30 days
0 days1-4 days10-30 days 5-9 days
0 days1-7 days8-30 days
24. 24
Modelling
§ Logistic Regression
§ Automatic Feature Selection
§ start from different sub-selections of features (like “all”, “no ratios”,
etc.)
§ allow addition and subtraction of features based on maximizing AIC
§ needed to prevent overfitting
§ Window optimization
§ Cross-Validation (15 fold, 70/30 train/test split)
26. 26
Buys tomorrow, next week, next two weeks
0%
10%
20%
30%
40%
50%
60%
70%
80%
Buy Today Buy Tomorrow Buy in a Week Buy in two Weeks
Accuracy Sensitivity Specificity
Considerable
lower predictive
power when
predicting more
distant future
events
Still room for
improvement
28. 28
Hive for heavy lifting
• Apache project
• built on top of Hadoop
• SQL interface to your data
• basically map&reduce abstraction layer
• robust and matured
• but slow and surely not “interactive”
Data Team:
• used for batch-processing of user preferences,
user-segmentation etc.
• PyHive by Dropbox for Python support
• usage of Python-based UD(A)Fs
29. 29
User Defined Functions (UDFs)
User defined (aggregation) functions:
§ needed when native functions aren‘t sufficient
§ are always much slower than native functions
§ work on a column or multiple (grouped) columns
§ are vector-valued operations and/or aggregations
transform aggregate apply
30. 30
fast and general engine for
large-scale data processing
PySpark for fast analysis and machine learning
+ =
pyspark
31. 31
Conversion Example of User Preferences
Hive:
• 2483 lines of code
• Jinja2 to generate SQL queries
• Temporary tables for performance
• Runtime 5-10h
• Logic hard to understand at times
Spark:
• 1745 lines of code
• programatic definition of queries
• No temporary tables needed
• Runtime 1-2 h
• Quite easy to understand
Looking For: Used Car (100%)
Prefers (Make): BMW (50%), Audi (50%)
Prefers (Model): Audi A3 (25%), Audi A4 (25%),
BMW 318 (50%)
Searching In: lat 52.5206, lon 13.409
Search Radius: 300km
Preferred Price: 20 000€ ± 1500€
Preferred Mileage: 10 000km ± 5000km
User Profile
Buyer
Last Action: Yesterday
Frequent User
User 12345
Likelihood to buy: 88 %
35. 35
PySpark & Pandas
Vectorized UDFs for Spark 2.3:
§build on top of Apache Arrow,
§avoid high serialization and invocation overhead,
§allows row-at-a-time UFDs and cumulative UDAFs
§as flexible as Pandas` apply
Source: databricks blog
37. 37
But what if Spark < 2.3?
It‘s possible to write flexible UD(A)Fs by
•using RDD functionality, df.rdd.mapPartitions(my_func)
•convert low-level Row objects to Pandas dataframe
•wrap everything into a nice decorator
Detailed information under:
https://www.inovex.de/blog/efficient-udafs-with-pyspark/
39. 39
Concept
§ create a local environment based on wheels,
§ upload unpacked wheels with to HDFS,
§ read and distribute these Python packages from the Spark
driver to the executores with sc.addFile,
§ use the packages on the executors, e.g. in a UDF.
Detailed information under:
https://www.inovex.de/blog/managing-isolated-environments-with-pyspark/