(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
Personalized list recommendation based on multi armed bandit algorithms
1. Personalized List
Recommendation based on
Multi-armed Bandit Algorithms
Weiwen LIU
Computer Science & Engineering
Chinese University of Hong Kong
wwliu@cse.cuhk.edu.hk
2. wwliu, Term Presentation, Term 1
Content
oBackground
• Existing Methods
• Multi-armed Bandits
• Dependency Click Model
oAlgorithm
oResults
oExperiments
oConclusion and Future Work
2
3. wwliu, Term Presentation, Term 1
Background
oFor users:
• How to discover interesting items like music/news/apps
among large amount of items.
oFor companies:
• How to create economic opportunities.
• How to provide better personalized services.
3
4. wwliu, Term Presentation, Term 1
Existing methods
oContent-based Method/Collaborative Filtering
• Pros: perform well when user have enough click or
download records.
• Cons: cold-start problem
oContext-based Method/Regression
• Pros: efficient and easy to implement
• Cons: lack of diversity
4
Exploration vs Exploitation?
5. wwliu, Term Presentation, Term 1
Multi-armed bandits
o Rewards 𝒙𝑖,1, 𝒙𝑖,2, … of machine 𝑖 are i.i.d. 0,1 -valued
random variables
o An allocation policy prescribes which machine 𝑰 𝑡 to play at
time 𝑡 based on the realization of 𝒙 𝑰1,1
, … , 𝒙 𝑰 𝑡−1,𝑡−1
o The target is to play as often as possible the machine with
largest reward expectation
𝜇∗
= max
𝑖=1,…,𝐾
𝔼[𝑥𝑖]
5
6. wwliu, Term Presentation, Term 1
Bandit Solutions
oStochastic Bandits:
• Select items repeatedly and separately, one at each time
• Limitations: ignores the underlying relations; high
computational cost
oCombinatorial Cascade Bandits:
• Select a set of sequence of arms
• Limitations: can only deal with single click setting
6
7. wwliu, Term Presentation, Term 1
Click Models
oCascade Click Model:
• Stop when first click occurs
• Can only model single click
oDependency Click Model:
• Introduce a set of termination parameters
• Can handle settings with multiple click
7
1
2
3
8. wwliu, Term Presentation, Term 1
Dependency Click Model
o Allow user continue to
check more items after a
click.
o An extension of the
Cascade Model
• Can be reduced to CM if
the termination weights
ҧ𝑣 𝑘 = 1
8
Examine next
item ak
Attracted by the
item?
Would like to
terminate?
Reach the end of the
list?
Start
Satisfied Not satisfied
Yes
Yes
No
Yes
No
No
w(ak)
v(k)¯
¯
9. wwliu, Term Presentation, Term 1
Problem Formulation
o Given ground item set 𝐸 = 1, … , 𝐿 , a contextual vector 𝒙𝑖,𝑡 ∈ ℝd
is known to
the agent at time 𝑡.
o Attraction weight 𝒘 𝑡 𝑎 ∈ 0,1 𝐸
• is 𝑤𝑡(𝑎)-biased Bernoulli r.m.
• denotes whether user is attracted by 𝑎 or not.
• the attraction weights 𝒘 𝑡 𝑎 𝑡=1
𝑛
are i.i.d
o Termination weight 𝒗 𝑡 𝑘 ∈ 0,1 𝐾
• is ҧ𝑣(𝑘)-biased Bernoulli r.m.
• denotes where user wants to terminate examining the list
• only depends on the position 𝑘
• the termination weights 𝒗 𝑡 𝑘 𝑡=1
𝑛
are i.i.d
9
Recommended list
𝑨 𝑡 = (𝒂1
𝑡
, … , 𝒂 𝐾
𝑡
)
Feedback 010 ⋯ 100
10. wwliu, Term Presentation, Term 1
Objective
o The reward function is defined as
𝑓 𝐴, 𝑣, 𝑤 = 1 − ෑ
𝑘=1
𝐾
(1 − 𝑣 𝑘 𝑤(𝑎 𝑘))
indicating that 𝑓 𝑨 𝑡, 𝒗 𝑡, 𝒘 𝑡 = 1 if user clicks on a item, feels
satisfied and terminates examination.
o The pseudo-regret is defined as
ℛ 𝑛 = 𝔼
𝑡=1
𝑛
(𝑓 𝐴 𝑡
∗
, 𝑣 𝑡, 𝑤𝑡 − 𝑓(𝐴 𝑡, 𝑣 𝑡, 𝑤𝑡))
10
11. wwliu, Term Presentation, Term 1
Partial Knowledge
oClick sequence is the only feedback for the agent
• The termination position is unobserved
• The reward is not revealed
11
010011000
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9
reward=1
reward=0
Use feedback before the last click to update the model
12. wwliu, Term Presentation, Term 1
Proposed Model: attraction weight
oAssume the expected attraction weight 𝑤𝑡(𝑎)
follows
𝑤𝑡 𝑎 = 𝔼 𝒘 𝑡 𝑎 ℋ𝑡 = 𝜇(𝜃∗
⊤ 𝑥 𝑡,𝑎)
oUse the generalized linear model as a flexible
extension
• Admits a wider range of distributions, e.g. Gaussian,
binomial, Poisson…
12
Attracted
Or Not?
13. wwliu, Term Presentation, Term 1
Proposed Model: termination weight
o Due to the limited feedback, we assume the order of the
expected termination weights are known
• For simplicity of explanation, assume
ҧ𝑣 1 ≥ ⋯ ≥ ҧ𝑣(𝐾)
oThe expected reward is maximized by recommending the
more attractive item to the higher position.
13
Terminate
Or Not?
14. wwliu, Term Presentation, Term 1
Proposed Model: parameter estimation
o The model parameter 𝜃 can be estimated using MLE:
𝑠=1
𝑡
𝑘=1
𝐶𝑡
𝑤𝑠 𝑎 𝑘
𝑠
− 𝜇 𝜃⊤ 𝑥 𝑠,𝑎 𝑘
𝑠 𝑥 𝑠,𝑎 𝑘
𝑠 = 0.
o Upper Confidence Bound (UCB):
𝑈𝑡 𝑎 = min 𝜇 ෨𝜃𝑡−1
⊤
𝑥𝑡,𝑎 + 𝜌 𝑡 − 1 𝑥𝑡,𝑎 𝑉𝑡−1
−1 , 1 ,
where 𝛽𝑡
𝑎
𝛿 = 𝜌(𝑡) 𝑥𝑡,𝑎 𝑉𝑡
−1.
14
Lemma: For any 𝑡 ≥ 1 and 𝑎 ∈ 𝐸, denote
𝛽𝑡
𝑎
𝛿 =
2𝑘 𝜇
𝑐 𝜇
𝑥𝑡,𝑎 𝑉𝑡
−1 log
1 +
𝐾𝑡
𝜆𝑑
𝑑
𝛿2
.
For all 0 ≤ 𝛿 ≤ 1, with probability at least 1 − 𝛿, it holds that:
𝜇 𝜃∗
⊤
𝑥𝑡,𝑎 − 𝜇 ෪𝜃𝑡
⊤
𝑥𝑡,𝑎 ≤ 𝛽𝑡
𝑎
𝛿 , ∀𝑡 ≥ 1.
15. wwliu, Term Presentation, Term 1
Proposed Model: UCB
oAnalyze mean and a measure of uncertainty
(variance) for each item
oMake decisions based on mean + variance
15
0 0.2 0.4
B
C
A
16. wwliu, Term Presentation, Term 1
Proposed Model: UCB
oThe value of 𝜌(𝑡) decreases w.r.t 𝑡
oThe uncertainty of 𝐴 reduces after several time
step
oAutomatically balances exploration and exploitation
16
0 0.2 0.4
B
C
A
17. wwliu, Term Presentation, Term 1
Proposed Model: Algorithm
17
Recommend
based on UCB
Estimate 𝜃
Update
statistics
18. wwliu, Term Presentation, Term 1
Theoretical Results
o The upper bound is of 𝑂(𝑑 𝑛 log 𝑛) for the regret, which
depends linearly on the dimension 𝑑 of the feature space,
but not on the number 𝐿 of base arms.
18
Theorem: If the reward function is given as 𝑓 𝐴, 𝑣, 𝑤 = 1 − Π 𝑘=1
𝐾
(1 − 𝑣 𝑘 𝑤(𝑎 𝑘)), then the
cumulative regret ℛ(𝑛) of the proposed algorithm has the following bound,
ℛ 𝑛 ≤
4𝐾Δ 𝑣 𝑘 𝜇
𝑐 𝜇 𝑝∗
𝑑𝑛 𝐾 + 1 log
1 +
𝐾𝑛
𝜆𝑑
𝑑
𝛿2
log 1 +
𝐾𝑛
𝜆𝑑
,
where 𝑘 𝜇 is the Lipschitz constant, 𝑐 𝜇 = inf 𝜇 ′.
19. wwliu, Term Presentation, Term 1
Experimental Results
o Synthetic data
• L=200, K=4 and d=10
• 𝜇 𝑥 =
1
1+exp −𝑥
oGL-CDCM outperforms KL-
DCM by 80.27% and Lin-
CDCM by 49.04%.
19
20. wwliu, Term Presentation, Term 1
Experimental Results
o Real-world data
• 20M MovieLens data
• L=200, K=5, d=100
o GL-CDCM is 5.69 times of
that of KL-DCM and 1.45
times of that of Lin-CDCM
20
21. wwliu, Term Presentation, Term 1
Conclusion
oConclusion
• Formulate the DCM bandits problem
• Incorporate contextual information
• Make a weaker assumption on the expected attraction
weight function
• Prove a upper regret bound
oFuture work
• Prove a tighter bound
• Consider other practical click model
• Verify the effectiveness using more real-world dataset
21