Upcoming SlideShare
×

# Skew-symmetric matrix completion for rank aggregation

1,990 views

Published on

Slides from a talk at Purdue's Machine Learning Seminar on 2011-01-24.

Published in: Technology, Education
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,990
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
24
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Skew-symmetric matrix completion for rank aggregation

1. 1. Skew-symmetric matrixcompletion for rankaggregation !and other matrix computationsDAVID F. GLEICHPURDUE UNIVERSITYCOMPUTER SCIENCE DEPARTMENT 1/40February 24 th , 12pm Purdue ML Seminar David Gleich, Purdue
2. 2. Skew-symmetric matrixcompletion for rankaggregation !and other matrix computationsDAVID F. GLEICHPURDUE UNIVERSITYCOMPUTER SCIENCE DEPARTMENT 2/40February 24 th , 12pm Purdue ML Seminar David Gleich, Purdue
3. 3. Skew-symmetric matrixcompletion for rankaggregation !and other matrix computationsDAVID F. GLEICHPURDUE UNIVERSITYCOMPUTER SCIENCE DEPARTMENT 3/40January 24 th , 12pm Purdue ML Seminar David Gleich, Purdue
4. 4. 4/40Images copyright by theirrespective owners
5. 5. Matrix computations are the heart (and not brains) ofmany methods of computing. 5/40 Purdue ML Seminar David Gleich, Purdue
6. 6. Matrix computations Physics Statistics Engineering Graphics Databases … Machine learning 6/40 Purdue ML Seminar David Gleich, Purdue
7. 7. Matrix computations 2 3 A1,1 A1,2 ··· A1,n 6 . 7 . 7 6 A2,1 A2,2 ··· . 7 A=6 . 6 7 4 . .. .. . . . Am 1,n 5 Am,1 ··· Am,n 1 Am,n Ax = b min kAx bk Ax = x 7/40Linear systems Least squares Eigenvalues Purdue ML Seminar David Gleich, Purdue
8. 8. NETWORK andMATRIX COMPUTATIONS Why looking at networks of data as a matrix is a powerful and successful paradigm.
9. 9. A new matrix-based sensitivityanalysis of Google’s PageRank. PageRank (I ↵P)x = (1 ↵)v SimRank Presented at" RAPr on Wikipedia DiffusionRank WAW2007, WWW2010 E [x(A)] Std [x(A)] BlockRank Published in the United States IsoRank United States C:Living people C:Living people TrustRank J. Internet Mathematics France ItemRank C:Main topic classif. ObjectRank ProteinRank Led to new results on United Kingdom C:Contentsuncertainty quantiﬁcation in Germany C:Ctgs. by country HostRank physical simulationspublished in SIAM J. Matrix England Canada SocialPageRank United Kingdom France Random walk with Analysis and SIAM J. Scientiﬁc Computing. Japan Poland FoodRank C:Fundamental England restart Patent Pending Australia FutureRank C:Ctgs. by topic GeneRank TwitterRank Improved web-spam detection! Gleich (Stanford) Random sensitivity Ph.D. Defense 23 / 41 Collaborators Paul Constantine, Gianluca Iaccarino (physical simulation)
10. 10. j Square s2 F.L (Purdue)vid Gleich r Network alignment INFORMS Semina= (t, )twork alignment = t t t mm 40 60 80 100 A L B NETWORK ALIGNMENTX m ximize wT x + 2 xT Sx T x + 1 xT Sx 40 j subject to Axw e, 2 {0, 1} m ximize  x 2S \$ subject to Ax  engry Network alignment 2 {0, 1} problems Sparse 10/40 Bayati, Gerritsen, Gleich, Saberi, and Wang, ICDM2009UADRATIC ASSIGNMENT Bayati, Gleich, Saberi and Wang,often ignore Sparse L Submitted 60 Southeast Ranking few exceptions). Purdue ML Seminar David Gleich, Purdue Network alignment Workshop 11 / 29
11. 11. Overlapping clusters! for distributed computation Andersen, Gleich, and Mirrokni, WSDM2012 2 Swapping Probability (usroads) PageRank Communication (usroads) Swapping Probability (web−Google) 1.5 PageRank Communication (web−Google)Relative Work 1 Metis Partitioner 0.5 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Volume Ratio 11/40 How much more of the graph we need to store. Purdue ML Seminar David Gleich, Purdue
12. 12. Local methods for massive Twee network analysis RESULTS – SLIDE THRE Gleich et al. " MAIN J. Internet Mathematics, to appear.TOP-K ALGORITHM FOR KATZApproximate                                                where       is sparseKeep       sparse tooIdeally, don’t “touch” all of      David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of 47 Can solve these problemsGleich milliseconds even withICME la/opt seminar David F. in (Sandia) 100M edges! 12/40 Purdue ML Seminar David Gleich, Purdue
13. 13. DAVID F. GLEICH (PURDUE) &LEK-HENG LIM (UNIV. CHICAGO)Rankaggregation 13 Purdue ML Seminar David Gleich, Purdue
14. 14. Which is a better list of good DVDs?Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of …Lord of the Rings 1: The Fellowship Lord of the Rings 1: The Fellowship Lord of the Rings 2: The Two Towers Lord of the Rings 2: The Two TowersLost: Season 1 Star Wars V: Empire Strikes BackBattlestar Galactica: Season 1 Raiders of the Lost ArkFullmetal Alchemist Star Wars IV: A New HopeTrailer Park Boys: Season 4 Shawshank RedemptionTrailer Park Boys: Season 3 Star Wars VI: Return of the JediTenchi Muyo! Lord of the Rings 3: Bonus DVDShawshank Redemption The Godfather Standard " Nuclear Norm " rank aggregation" based rank aggregation (the mean rating) (not matrix completion on the 14/40 netﬂix rating matrix) Purdue ML Seminar David Gleich, Purdue
15. 15. Rank Aggregation Given partial orders on subsets of items, rank aggregation is the problem of ﬁnding an overall ordering. Voting Find the winning candidate Program committees Find the best papers given reviews Dining Find the best restaurant in Chicago 15/40 Purdue ML Seminar David Gleich, Purdue
16. 16. Ranking is really hard John Kemeny Dwork, Kumar, Naor, ! Ken Arrow SivikumarAll rank aggregationsinvolve some measure of A good ranking is thecompromise “average” ranking under a NP hard to compute Kemeny’s ranking 16/40 permutation distance Purdue ML Seminar David Gleich, Purdue
17. 17. Embody chair! John Cantrell (ﬂickr)Given a hard problem,what do you do?!!Numerically relax!!!It’ll probably be easier. 17/40 Purdue ML Seminar David Gleich, Purdue
18. 18. Suppose we had scoresSuppose we had scoresLet    be the score of the ith movie/song/paper/team to rankSuppose we can compare the ith to jth:   Then    is skew-symmetric, rank 2.Also works for    with an extra log. Numerical ranking is intimately intertwined with skew-symmetric matrices 18/40 Kemeny and Snell, Mathematical Models in Social Sciences (1978)David F. Gleich (Purdue) Purdue KDD 2011 ML Seminar David Gleich, Purdue 6/20
19. 19. Using ratings as comparisons Arithmetic MeanRatings inducevarious skew-symmetric matrices. Log-odds 19/40From David 1988 – TheMethod of Paired Comparisons Purdue ML Seminar David Gleich, Purdue
20. 20. Extracting the scoresExtracting the scoresGiven    with all entries, then 107    is the Borda Movie Pairs 105 count, the least-squares solution to   How many    do we have? 101 Most. 101 105 Number of ComparisonsDo we trust all    ? Not really. Netflix data 17k movies, 500k users, 100M ratings– 99.17% filled 20/40David F. Gleich (Purdue) Purdue KDD 2011 ML Seminar David Gleich, Purdue 8/20
21. 21. Onlypartial info? COMPLETE IT!Only partial info? Complete it!Let    be known for    We trust these scores.Goal Find the simplest skew-symmetric matrix that matches the data       noiseless noisy    21/40 Both of these are NP-hard too. Purdue ML Seminar David Gleich, PurdueDavid F. Gleich (Purdue) KDD 2011 9/20
22. 22. Solution GO NUCLEAR! 22/40From a French nuclear test in 1970, imagePurdue ML Seminar David Gleich, Purdue from http://picdit.wordpress.com/2008/07/21/8- insane-nuclear-explosions/
23. 23. The nuclear norm The nuclear norm! The analog the 1-norm or    -norm for matrices The analog of of the 1-norm or ℓ1for matrices.For vectors For matrices Let    be the SVD.is NP-hard while     is convex and gives the same    best convex under- answer “under appropriate estimator of rank on unit ball. circumstances” 23/40 Purdue ML Seminar David Gleich, Purdue
24. 24. Only partial info? COMPLETE IT! Only partial info? Complete it!Let    be known for    We trust these scores.Goal Find the simplest skew-symmetric matrix that matches the data       NP hard Heuristic    Convex 24/40 Purdue ML Seminar David Gleich, Purdue
25. 25. Solving the !Solving theproblemnuclear norm nuclear norm problemUse a LASSO formulation 1.    2. REPEAT   3.    = rank-k SVD of      4. 5.       6. UNTIL   Jain et al. propose SVP for this problem without    25/40 Purdue ML Seminar David Gleich, Purdue
26. 26. Skew-symmetric SVDsSkew-symmetric SVDLet    be an    skew-symmetric matrix with eigenvalues    , where    and    . Then the SVD of    is given by   for    and    given in the proof.Proof Use the Murnaghan-Wintner form and the SVD of a 2x2 skew-symmetric block This means that SVP will give us the skew- 26/40 symmetric constraint “for free”David F. Gleich (Purdue) KDD 2011 14/20 Purdue ML Seminar David Gleich, Purdue
27. 27. Only partial info? Complete it! Let    be known for    We trust these scoreMatrix completion Goal Find the simplest skew-symmetric matrix that the data   A fundamentalquestion is matrix    NP hardcompletion is when do theseproblems have the    Convexsame solution? 27/40 David F. Gleich (Purdue) KDD 2011 Purdue ML Seminar David Gleich, Purdue
28. 28. indices. Instead we view the following theorem as providing Fraction of trials recovered 1 intuition for the noisy problem. 0.8Exact recovery results Consider the operator basis for Hermitian matrices:Exact recovery results H = S [ K [ D where 0.4 0.6 p S = {1/ 2(ei eT + ej eT ) : 1  i j  n};David Gross showed how to recover Hermitian matrices. 0.2 p j i K = {ı/ 2(ei eT ej eT ) : 1we get n}; exact    i.e. the conditions under which  i j the j i 0 2 10 T D = {ei ei : 1  i  n}. Gross, arXiv, 2010Note that    is Hermitian. Thus our new result! Figure T Theorem 5. Let s be centered, i.e., s e = 0. Let Y = ity of seT esT where ✓ = maxi s2 /(sT s) and ⇢ = ((maxi si ) i about (mini si ))/ksk. Also, let ⌦ ⇢ H be a random set of elements both th with size |⌦| O(2n⌫(1 + )(log n)2 ) where ⌫ = max((n✓ + §6.1 fo 1)/4, n⇢2 ). Then the solution of 6.1 R minimize kXk⇤ The ﬁ ⇤ ⇤ subject to trace(X W i ) = trace((ıY ) W i ), W i 2 ⌦ ability o the nois 28/40 is equal to ıY with probability at least 1 n . with un These a The proof of this theorem follows directly by Theorem 4 if    Purdue ML Seminar David Gleich, PurdueY = se
29. 29. Recovery Discussion and ExperimentsConfession If    , then just look at differences from a connected set. Constants? Not very good.    Intuition for the truth.       29/40 Purdue ML Seminar David Gleich, Purdue
30. 30. Recovery Discussion and Experiments RecoveryConfession If    Experiments just look at differences from , then a connected set. Constants? Not very good.    Intuition for the truth.       30/40David F. Gleich (Purdue) KDD 2011 16/20 Purdue ML Seminar David Gleich, Purdue
31. 31. The ranking algorithm Algorithm The Ranking 0. INPUT    (ratings data) and c (for trust on comparisons) 1. Compute    from    2. Discard entries with fewer than c comparisons 3. Set    to be indices and values of what’s left 4.    = SVP(   ) 5. OUTPUT    31/40 Purdue ML Seminar David Gleich, Purdue
32. 32. Item Response ModelSynthetic evaluationThe synthetic results came from a model inspired by Ho and Quinn [2008].      - center rating for user \$i\$   - sensitivity of user \$i\$   - value of item \$j\$   - error level in ratingsSample ratings uniformly at random such that there for expected ratings per user. 32/40David F. Gleich (Purdue) Purdue KDD 2011 ML Seminar David Gleich, Purdue 21/20
33. 33. Evaluation Nuclear norm ranking Mean rating 1 1Median Kendall’s Tau Median Kendall’s Tau 0.9 0.9 0.8 0.8 20 0.7 0.7 10 5 0.6 2 0.6 1.5 0.5 0.5 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Error Error 33/40Figure 3: The performance of our algorithmPurdue Purdue ML Seminar David Gleich, (left)
34. 34. Conclusions and Future Work Our motto “aggregate, then complete” 1.  Additional comparison Rank aggregation with 2.  Noisy recovery! More the nuclear norm is realistic sampling. principled 3.  Skew-symmetric Lanczos based SVD? easy to compute The results are much better than simple approaches. 34/40 Purdue ML Seminar David Gleich, Purdue
35. 35. Current research 35 Purdue ML Seminar David Gleich, Purdue
36. 36. Data driven surrogate functionsBeyond spectral methods for UQ 36/40 Purdue ML Seminar David Gleich, Purdue
37. 37. Graph spectraGraph spectra 37/40 Purdue ML Seminar David Gleich, Purdue
38. 38. 1.33 (two!)Spectral spikes 1.5, 0.5 1.5 0.565741 1.833 1.767592 0.725708 1.607625 1.5 (two) 38/40 Purdue ML Seminar David Gleich, Purdue
39. 39. Google nuclear ranking gleich 39/40 Purdue ML Seminar David Gleich, Purdue