"Recommendation systems. A look from inside" — Lucas Bernardi — Principal Data Scientist at Booking.com.
Slides demonstrate some of his achievements at Booking .com — the world's largest tourist online resource — and talk about:
The danger of optimizing blind online indicators and the importance of experimenting;
The importance of high-performance models and how to build them;
The importance of UX design, with examples;
The risks of using integrated models;
What is an extended advisory system and why we need it?
2. ● 28+ million reported listings
● 5.6+ million are homes, apartments and
other unique places to stay
● 141+ thousands destinations
● 1.5+ million room nights/day
● Terabytes of data every day
● 200+ Machine Learning Models Deployed
Mission to empower people to
experience the world
3. Recommender Systems.
Accommodations
Default Ranking
Click History
Similar Accommodations
Different Accommodations
Dates
Alternative Dates
No Dates
Available Dates
Destinations
Plain Recommendations
Autocomplete
Similar Destinations
Multi-city Trip Destinations
Nearby Destinations
More
Filters
Districts
Room Types
Policies
4. What do we
want to
recommend?
Recommender Systems are Complex.
Target Item Audience Framing KPI Data Feedback
Who are we
recommending?
When?
What is the role
of the recs?
How do we
measure
effectiveness?
What data are
we going to
use?
How do we
define user
satisfaction?
5. Recommender Systems are Hard.
Latency
Compute recs
in less than
10ms
Availability
Hotels run out of
rooms
Cold Start
New Users
New Hotels
New Types
Everyday
Most people
only travel once
a year
Sparsity Complexity
Items are
related to each
other
6. Offline metrics are just a Health Check.
Offline
Metric
KPI
Relative Improvements
7. Offline metrics are just a Health Check.
Offline
Metric
KPI
Relative Improvements
8. Offline metrics are just a Health Check.
Selection Bias
Models are trained using a non random
sample of the population of interest
Offline
Metric
KPI
Relative Improvements
Feedback Loops
Our Recommender System changes the
system
Non Stationarity
Suddenly everyone goes to Russia
It works in my computer but
Constraints
Hotels run out of rooms
9. Offline metrics are just a Health Check.
Evaluate Systems as a whole
Including the model, the UI, and the
audience
Expose wrong intuitions
More clicks is better, is it?
Show Improvement Direction
Analysing experiment results we can get
concrete ideas about what to do next
Discover Causal Effects
Experiments are the ultimate piece of
evidence for causality
Run Experiments
Offline
Metric
KPI
Relative Improvements
11. Latency is Critical.
Precomputed vs Online
Latency
A very Important
Commercial Metric
Another very Important
Commercial Metric
Sparse vs Dense
Black Box vs White Box
Client Side vs Server Side
Work closely with Engineers
12. Latency is Critical.
Decouple Training from Prediction
Centralized Repository of
Machine Learned Models
Latency
A very Important
Commercial Metric
Another very Important
Commercial Metric
Every model runs within latency constraints,
or does not run at all
Wide range of model packaging options
Centralized Monitoring of the system as a
whole