In this talk at the Netflix Machine Learning Platform Meetup on 12 Sep 2019, Fernando Amat and Elliot Chow from Netflix talk about the Bandit infrastructure for Personalized Recommendations
3. Recommendations at Netflix
● Personalized Homepage for each member
● Goal: quickly help members find content they love
● Challenge:
○ 150M+ members in 190 countries
○ New content added daily
● Recommendations Valued at: $1B*
*Carlos A. Gomez-Uribe, Neil Hunt: The Netflix Recommender System: Algorithms, Business Value, and Innovation. ACM Trans. Management Inf. Syst. 6(4): 13:1-13:19 (2016)
9. Evidence personalization
● Multiple choices for each (show, evidence) pair
● For each title, hard to predict what resonates before launch
● Taste can change over time
11. Bandit setup
● Generic framework to obtain unbiased training data
● Very vast literature (beyond scope)
● Every session is a mini A/B test
Pred 1-Pred
reward?
12. Bandit setup for evidence
● Goal: for each mini A/B test collect a sample with
○ Context x
○ Selected action (or treatment) a
○ Propensity p
○ Set of possible actions s
○ Reward r
● Once you have this, very similar to other ML approaches
● Obtaining this data is the hard part infrastructure-wise
(Elliot’s section)
13. System requirements
● Scalable: at Netflix’s global size
● Generic: common policy (or model) and offline metrics API
● Flexible:
○ Each recommendation problems has different
attribution, reward and context definition
○ Easy to add new canvases to the bandit system (image,
video, text, cards, etc)
14. Policy API
● Class Slate (List[Items] items)
○ Vector of ids to recreate a composition
(“row_10”, “img_124”, “img_1037”,...,
“synopsis_32”, “card_101”)
1
2 3 4 5
7 6
8
9
15. Policy API
● Slate select(List[List[Item]] items)
○ Given a list of possible slates, return one of them
(explore-exploit trade-off)
● Map[Slate, Double] propensity(List[List[Item]
items)
○ Return propensity of each possible slate to debias data
16. Offline policy evaluation
● Unbiased offline evaluation from logged samples
● Inverse Propensity Scoring (IPS), Doubly Robust, etc
E[Reward] = 2 / 3
User 1 User 2 User 3 User 4 User 5 User 6
Logged stochastic
treatment
Play?
(binary
reward)
New policy
assignment
17. Offline policy evaluation system
● Input:
○ Logged sample (x, a, s, r, p)
○ Trained policy
● Output:
○ Metric (IPS, SNIPS, DM, DR, RECAP, etc)
● Observations:
○ Trivially parallelizable (decomposes per sample)
○ Each sample needs to consider all possible candidates
19. Actions taken by a Netflix member
- Log-in
- Search for “stra”
- Play Stranger Things
...
Data:
Member Activity
20. Describes Netflix’s desire to take a specific action
- Recommend Stranger Things
- Display Image
...
Data:
Intent-to-Treat (ITT)
21. The actual experience as seen by the Netflix member
- Showed Stranger Things on Home Page
- Displayed Image
...
Data:
Treatment
22. Log In Play Stranger
Things
Post-PlayIntent-to-treatHome
Page
13 Reasons Why
on Popular on
Netflix
Stranger
Things
on My List
Treatment
Member Actions
24. “Closing the
Loop”:
Join Intent-to-
Treat with
Treatment
- Did the intent-to-treat
take effect as desired?
- Which policy was
used?
- What were the
propensities?
- What features were
used?
29. Real-time
Processing with
Flink
Process member activity,
treatment, and intent-to-treat
events in real-time
- Join intent-to-treat and
treatment events
- Prepare data in format
amenable to reward
computation
- Kafka and Hive outputs
30. Spark Client For
Attribution and
Reward
Computation
Provide flexible processing of the
joined member activity, treatment,
and intent-to-treat data
- Events sorted by time
- Unbounded windowing
support
- Optimizations for common
access patterns
Mention study of 1B is from 2016. It is just an approximation.
One aspect of recommendations is once you have selected a title for the user, trying to explain to them why the system things is a good title for them.
This especially important for Netflix Originals, where there is almost no previous knowledge.
For example, I go to my home page and I see this title recommended in the big billboard. OK, seems definitely romance. But is it also commedy? Or more drama? Are there any of my favourite actors/actresses? Do I have time to watch it?
Imagine a recommendation homepage without any evidence… this is how it would look like. You would probably only click on things that you know really well (for example, play The Office S2:E3 for the hundredth time).
TODO: run animation
More detail about Data and Infra side
Can be different from ITT - fall backs, business rules overriding
Timeline view of when this data is produced/by whom
Connect these different discrete pieces of data back together to update our policies
Allows us to answer important questions
Ctr, abandonment….
As soon as possible to allow policies up-to-date/reactive to changes
Cool stuff Fernando mentioned before.
As interest in bandits grew organically, learn from every individual use case and build a number of components to make things easier and more reusable.
Instead of each application logging to different places in different formats…
Ingesting new data becomes easier because uniform.
Make data available ASAP
Library to access data in batch. Provide events in sorted order, unbounded….
In addition to library access, found common patterns in the way rewards are computed. Templatize this so we can materialize this data for our consumers regularly, enable simple SQL access, with no job scheduling/operations work involved.
I’ll highlight a few high-level challenges we faced
Netflix powered by microservices. All requests from the devices go through a single layer called Edge and it fans out to call many microservices. Makes joining ITT and Treatment a bit challenging.
Let’s say we are collecting data for an algorithm used for selecting the best image for a video.
Simple case - ID1 is minted and threaded through subsequent requests for tracing - logged and then passed back to device and then logged with treatment. Join directly.
Let’s say we are collecting data for an algorithm used for selecting the best image for a video.
Simple case - ID1 is minted and threaded through subsequent requests for tracing - logged and then passed back to device and then logged with treatment. Join directly.
Let’s say we are collecting data for an algorithm used for selecting the best image for a video.
Simple case - ID1 is minted and threaded through subsequent requests for tracing - logged and then passed back to device and then logged with treatment. Join directly.
Many cases - caching/precompute. Computation service is unaware of the use of its output when logging
Many cases - caching/precompute. Computation service is unaware of the use of its output when logging. Cannot connect T2 and ITT1. Could approximate with time but not accurate.
Introduced new thing to be logged as part of standardization - id pairs. Each service that uses data logs id pairs when using data (not computing data) so that we can resolve the ids logged with treatment and ITT data.
Implemented as stream processing - typical challenges faced
A lot of IDs and a lot of (Terabytes) state to be dealt with
The question you probably want to ask…
Only scratched the surface - many more parts of the experience to personalize using bandits. Infra will make it much simplet...