Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Online Clustering of Bandits
Claudio Gentile, Shuai Li: DiSTA, University of Insubria, Italy; Giovanni Zappella: Amazon De...
Upcoming SlideShare
Loading in …5
×

ICML 2014 CLUB - Online Clustering of Bandits Poster, 31st ICML, JMLR

779 views

Published on

We introduce a novel algorithmic approach to content recommendation based on adaptive clustering of exploration-exploitation ("bandit") strategies. We provide a sharp regret analysis of this algorithm in a standard stochastic noise setting, demonstrate its scalability properties, and prove its effectiveness on a number of artificial and real-world datasets. Our experiments show a significant increase in prediction performance over state-of-the-art methods for bandit problems.

Published in: Science
  • Be the first to comment

ICML 2014 CLUB - Online Clustering of Bandits Poster, 31st ICML, JMLR

  1. 1. Online Clustering of Bandits Claudio Gentile, Shuai Li: DiSTA, University of Insubria, Italy; Giovanni Zappella: Amazon Development Center Germany, Germany claudio.gentile@uninsubria.it; shuaili.sli@gmail.com; zappella@amazon.com, work done when the author was PhD student at Univeristy of Milan Overview • Novel algorithmic approach to content rec- ommendation based on adaptive clustering of bandit strategies • Relevant to group recommendation • Relies on sequential clustering of users that deliberately avoids low-rank regularizations (scaling issues are our major concerns) The CLUB Algorithm • n users, m << n clusters • Users’ profiles ui, i = 1 . . . n • Clusters’ profiles uj, j = 1 . . . m • Nodes i within cluster j share same profile uj • One linear bandit per node and one linear bandit per cluster: node i hosts proxy wi, cluster j hosts proxy zj • zj is aggregation of proxies wi • Nodes served sequentially in random order: node it gets xt,1, . . . , xt,ct and selects one u2 u1 w1 u3 w2 w3 w6 u3 w7 u2 z2 w4 w8 u3 u1 u3 u3 u1 u3u2 w5 z1 • Start off from full n-node graph (or sparsified version thereof) and single estimated cluster • If ||wi − wj|| > θ(i,j) =⇒ delete edge (i, j) • Clusters are current connected components • When serving user i in estimated cluster j, update node proxy wi and cluster proxy zj • Recompute clusters after deleting edges Two main issues • Statistical: regret analysis • Computational: running time and memory The CLUB Algorithm: Solutions 1. Start off from random (Erdos-Renyi) graph • G is p-randomly sparsified with p log(n/δ) s • All s-node subgraphs connected w.p. > 1 − δ • # of initial edges n2 p = n2 s log(n/δ) << n2 2a. Current clusters are union of underlying ones u2 u1 w1 u3 w2 w3 w6 u3 w7 u2 z2 w4 w8 u3 u1 u3 u3 u1 u3u2 w5 z1 • Within-cluster edges (w.r.t. the underlying clustering) never deleted (w.h.p.) • Between-cluster edges (w.r.t. the underly- ing clustering) eventually deleted (w.h.p), as- suming gap on different cluster profile vec- tors, and enough observed payoff values 2b. D.S. for incremental computation of clusters • Decremental dynamic connectivity: Randomized construction maintaining span- ning forest. In our case: n >> d, |E| = n poly(log n) d2 + dpoly(log n) (amortized) running time per round 3. Derived bound: m j=1 m =1 ||uj − u || learning the clusters + σ d + √ d  1 + m j=1 |Vj| n   √ T √ m learning cluster profile vectors Experimental Results 1. Synthetic datasets • ct = 10, T = 55, 000, d = 25, and n = 500 • Cluster Vj is random unit-norm vector uj ∈ Rd • Context vectors xt,k ∈ Rd generated uniformly with unit-norm • Cluster relative size |Vj| = n j−z m =1 −z , j = 1, . . . , m, with z ∈ {0, 1, 2, 3} • Sequence of served users it generated uni- formly at random over the n users • Payoff with each cluster = uj xt,k plus white noise 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Rounds Cum.Regr.ofAlg./Cum.Regr.ofRAN Balanced Clusters −− No. of Clusters: 2 Payoff Noise: 0.1 CLUB LINUCB−IND LINUCB−ONE GOBLIN CLAIRVOYANT 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Rounds Cum.Regr.ofAlg./Cum.Regr.ofRAN Balanced Clusters −− No. of Clusters: 2 Payoff Noise: 0.3 CLUB LINUCB−IND LINUCB−ONE GOBLIN CLAIRVOYANT 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Rounds Cum.Regr.ofAlg./Cum.Regr.ofRAN Unbalanced Clusters −− No. of Clusters: 10 Payoff Noise: 0.1 CLUB LINUCB−IND LINUCB−ONE GOBLIN CLAIRVOYANT 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Rounds Cum.Regr.ofAlg./Cum.Regr.ofRAN Unbalanced Clusters −− No. of Clusters: 10 Payoff Noise: 0.3 CLUB LINUCB−IND LINUCB−ONE GOBLIN CLAIRVOYANT 2. LastFM & Delicious (“hits” & “niches”) datasets • ct = 25, T = 55, 000, and d = 25 • LastFM contains 1,892 users, 17,632 artists • Delicious contains 1,861 users, 69,226 URLs • Payoff is 1 if the user listened or bookmarked 0 1 2 3 4 5 x 10 4 0.75 0.8 0.85 0.9 0.95 1 Rounds Cum.Regr.ofAlg./Cum.Regr.ofRAN LastFM Dataset CLUB LINUCB−IND LINUCB−ONE 0 1 2 3 4 5 x 10 4 0.75 0.8 0.85 0.9 0.95 1 Rounds Cum.Regr.ofAlg./Cum.Regr.ofRAN Delicious Dataset CLUB LINUCB−IND LINUCB−ONE 3. Yahoo! (“ICML 2012 Challenge”) dataset • ct = 41(med.), T = 55(75), 000, and d = 323 • 8, 362, 905 records, 713, 862 users, 323 news • User described by 136D binary feature vector • Payoff is 1 if the user clicked the news 1 2 3 4 5 x 10 4 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Rounds CTR Yahoo Dataset: 5K Users CLUB UCB−IND UCB−ONE UCB−V RAN 1 2 3 4 5 6 7 x 10 4 0.01 0.02 0.03 0.04 0.05 0.06 0.07 Rounds CTR Yahoo Dataset: 18K Users CLUB UCB−IND UCB−ONE UCB−V RAN Conclusions • Algorithmic ideas and analyses for group rec. • Generalizations: – Overlapped clusters ? – Soft clustering ? – Shifting profiles (can handle this) • Cold start: connect newcomer to all existing users through directed edges (experiments are ongoing) • Get rid of i.i.d. assumption in the analysis ? • Experiments underway with larger datasets Short References [1] Cesa-Bianchi, N., Gentile, C., and Zappella, G., A gang of bandits. NIPS 2013 [2] Crammer, K. and Gentile, C., Multiclass classification with bandit feedback using adaptive regularization. ICML 2011 [3] Abbasi-Yadkori, Y., Pal, D., and Szepesvari, C., Improved algorithms for linear stochastic bandits. NIPS 2011 [4] Auer, P., Using confidence bounds for exploitation- exploration trade-offs. 3:397-422, JMLR 2002 [5] Azar, G., Lazaric, A., and Brunskill, E., Sequential transfer in multi-armed bandit with finite set of models. NIPS 2013 [6] Yue, Y., Hong, S. A., and Guestrin, C. Hierarchical explo- ration for accelerating contextual bandits. ICML 2012 [7] Chu, W., Li, L., Reyzin, L., and Schapire, R. E. Contextual bandits with linear payoff functions. AISTATS 2011 [8] Seldin, Y., Auer, P., Laviolette, F., Shawe.T., J., and Ortner, R., Pac-bayesian analysis of contextual bandits. NIPS 2011 [9] Maillard, O. and Mannor, S., Latent bandits. ICML 2014 [10] Valko, M., Munos, R., Kveton, B., and Kocak, T., Spectral Bandits for Smooth Graph Functions. ICML 2014

×