Online Tuning of Large Scale Recommendation Systems

Online Tuning of Large-Scale
Recommendation Systems
Team: Yafei Wang, Yunbo Ouyang, Kinjal Basu, Ajith Muralidharan,
Shaunak Chatterjee, Shipeng Yu

Session Goal
• Motivate the need for tuning parameters
• Notifications (use-case)
• Current Approaches
• Gaussian Processes Primer
• Modelling various online metrics using Gaussian Processes
• Use Thompson sampling to find optimal solution
• Other use-cases at LinkedIn.
• Summary

Problem
• Balancing multiple metrics is a core problem at LinkedIn.
• Notifications/Email
• Maximize sessions
• Minimize sends volume and App disablements rate
• PYMK (People you may know)
• Increase invites accept rate and engagement
• Invitations below certain threshold

Problem
f1,
f2,
…
fn
T1(f1),
T1(f2),
…
T1(fn)
Transform Input
…
Score(1)
…… Final Score
w1*s1 + w2*s2
w1
wm
f1,
f2,
…
fn
T1(f1),
T1(f2),
…
T1(fn)
Transform Input
Score(m)

LinkedIn Connects the World Professionals
Remain updated about the
activities of their connections
through newsfeed

Activity Based Notifications
Non-transactional messages, time-
sensitive content
Goal: drive member engagement
while creating delightful experiences
Feeds & Events Notification

Mobile App Uses Notifications to Inform
Badging
Push
Feed
s
Actor
InApp
Recipients
Response

Notifications Problem Setup
Probability of a Click ( member, content, recipient) : pClick
Probability of a Session: pVisit
Final Decision: pClick + alpha * pVisit > T

Notifications Optimization Problem
• The weight vector x = (alpha, T) need to be tuned.
• We are interested in solving:
Here CTR and Sends are online metrics and different from
utility models

Current Approaches
• Online Method: We try several choices of weights
• Launch A/B experiments with different values of weights and monitor the
progress of the experiments.
• Searching for optimal weight can take 1-2 months.
• Pain Points
• Extremely poor model iterations velocity
• Hampers developer productivity

Proposed Solution
Use Bayesian Optimization via Thompson Sampling.
• Remove the human in the loop: Fully automatic process to find the optimal
parameters.
• Drastically improves developer productivity.

Primer on Gaussian Processes
• Given D = {x1, y1, x2, y2, ….xn, yn, xt} find yt
and

Model
• Let denote if i-th member for j-th notification which was served
by parameter x perform action k. Here k = Click the Notification

Thompson Sampling Algorithm
• Consider a Gaussian process prior on each utility i.e. Clicks, Sends
• User provides search region over alpha, T as input.
• Explore Step (exploration iterations is user input and use case dependent)
• Randomly sample values to observe the metrics ( Clicks, Sends). Collect training
data (x1, y1, x2, y2,….xn, yn)
• Exploit Step
• Fit the Gaussian process – learn kernel parameters by maximizing P(y|x)
• Sample functions from posterior distribution
• Get the next distribution of hyperparameters by optimizing the objective
• Continue the Explore-Exploit until convergence

Overall Model Tuning Architecture

Plots (Synthetic Data)
Training data Mean of the posterior

Results
Here red line indicates the data and blue lines are samples from posterior.

PYMK Problem
Setup
• Recommend members that have high
probability of connection.
• Engage members through interactions with
connections (e.g. member engagement)

Detailed Setup - PYMK
The online metrics Accept, Invite and Engage are functions of and we
like to solve below optimization problem

Summary
• Doing A/B experiments to search for optimal hyper-parameter can
sometimes take 1-2 months.
• We have found experimentation velocity increase for several use-
cases including Notifications, PYMK, ads and feed. ( 1 -2 weeks)
• This method relies on good search range for the weights that need to
be tuned.
• High variance in the metrics may make convergence difficult.
• Very generic technique and can be used for modelling non-linear
function

Next Steps
• Time varying Gaussian Processes
• We assumed metric doesn’t change day over day. In reality
weekend traffic is different from weekday
• Grey Box Optimization
• We assumed each metric in the optimization is function over all
the parameters.
• We may know what metric gets affected by subset of parameters.
• This can help us get faster convergence

Online Tuning of Large Scale Recommendation Systems

Recommended

Recommended

More Related Content

Similar to Online Tuning of Large Scale Recommendation Systems

Similar to Online Tuning of Large Scale Recommendation Systems (20)

Recently uploaded

Recently uploaded (20)

Online Tuning of Large Scale Recommendation Systems

Editor's Notes