0
Skew-symmetric matrixcompletion for rankaggregation !and other matrix computationsDAVID F. GLEICHPURDUE UNIVERSITYCOMPUTER...
Skew-symmetric matrixcompletion for rankaggregation !and other matrix computationsDAVID F. GLEICHPURDUE UNIVERSITYCOMPUTER...
Skew-symmetric matrixcompletion for rankaggregation !and other matrix computationsDAVID F. GLEICHPURDUE UNIVERSITYCOMPUTER...
4/40Images copyright by theirrespective owners
Matrix computations  are the heart       (and not     brains) ofmany methods of computing.                                ...
Matrix computations         Physics        Statistics       Engineering        Graphics       Databases           …     Ma...
Matrix computations         2                                                   3             A1,1   A1,2        ···      ...
NETWORK andMATRIX COMPUTATIONS  Why looking at networks of data as a matrix      is a powerful and successful paradigm.
A new matrix-based sensitivityanalysis of Google’s PageRank.   PageRank (I ↵P)x = (1 ↵)v       SimRank         Presented a...
j                                        Square                    s2 F.L (Purdue)vid Gleich                         r    ...
Overlapping clusters!                                 for distributed computation                                 Andersen...
Local methods for massive                                                                                 Twee    network ...
DAVID F. GLEICH (PURDUE) &LEK-HENG LIM (UNIV. CHICAGO)Rankaggregation                                                     ...
Which is a better list of good DVDs?Lord of the Rings 3: The Return of …    Lord of the Rings 3: The Return of …Lord of th...
Rank Aggregation  Given partial orders on subsets of items, rank aggregation is the problem of finding an overall ordering....
Ranking is really hard                            John Kemeny                       Dwork, Kumar, Naor, !   Ken Arrow     ...
Embody chair!                                            John Cantrell (flickr)Given a hard problem,what do you do?!!Numeri...
Suppose we had scoresSuppose we had scoresLet    be the score of the ith movie/song/paper/team to rankSuppose we can compa...
Using ratings as comparisons                                                        Arithmetic MeanRatings inducevarious s...
Extracting the scoresExtracting the scoresGiven    with all entries, then                             107            is th...
Onlypartial info? COMPLETE IT!Only partial info? Complete it!Let             be known for                     We trust the...
Solution GO NUCLEAR!                                                                                               22/40Fr...
The nuclear norm The nuclear norm!      The analog the 1-norm or    -norm for matrices      The analog of of the 1-norm or...
Only partial info? COMPLETE IT! Only partial info? Complete it!Let       be known for        We trust these scores.Goal Fi...
Solving the !Solving theproblemnuclear norm nuclear            norm problemUse a LASSO formulation         1.             ...
Skew-symmetric SVDsSkew-symmetric SVDLet         be an                skew-symmetric matrix with  eigenvalues             ...
Only partial info? Complete it!         Let             be known for                        We trust these scoreMatrix com...
indices. Instead we view the following theorem as providing                                                               ...
Recovery Discussion and ExperimentsConfession If            , then just look at differences from   a connected set. Consta...
Recovery Discussion and Experiments  RecoveryConfession If                   Experiments just look at differences from    ...
The ranking algorithm Algorithm        The Ranking       0. INPUT    (ratings data) and c          (for trust on compariso...
Item Response ModelSynthetic evaluationThe synthetic results came from a model inspired by Ho and  Quinn [2008].          ...
Evaluation                                 Nuclear norm ranking                                         Mean rating       ...
Conclusions and Future Work Our motto                                  “aggregate, then complete”                         ...
Current research                                                   35        Purdue ML Seminar David Gleich, Purdue
Data driven surrogate functionsBeyond spectral methods for UQ                                                             ...
Graph spectraGraph spectra                                                            37/40                 Purdue ML Semi...
1.33 (two!)Spectral spikes                                                 1.5, 0.5                                       ...
Google nuclear ranking gleich                                                             39/40                 Purdue ML ...
Upcoming SlideShare
Loading in...5
×

Skew-symmetric matrix completion for rank aggregation

1,539

Published on

Slides from a talk at Purdue's Machine Learning Seminar on 2011-01-24.

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,539
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Skew-symmetric matrix completion for rank aggregation"

  1. 1. Skew-symmetric matrixcompletion for rankaggregation !and other matrix computationsDAVID F. GLEICHPURDUE UNIVERSITYCOMPUTER SCIENCE DEPARTMENT 1/40February 24 th , 12pm Purdue ML Seminar David Gleich, Purdue
  2. 2. Skew-symmetric matrixcompletion for rankaggregation !and other matrix computationsDAVID F. GLEICHPURDUE UNIVERSITYCOMPUTER SCIENCE DEPARTMENT 2/40February 24 th , 12pm Purdue ML Seminar David Gleich, Purdue
  3. 3. Skew-symmetric matrixcompletion for rankaggregation !and other matrix computationsDAVID F. GLEICHPURDUE UNIVERSITYCOMPUTER SCIENCE DEPARTMENT 3/40January 24 th , 12pm Purdue ML Seminar David Gleich, Purdue
  4. 4. 4/40Images copyright by theirrespective owners
  5. 5. Matrix computations are the heart (and not brains) ofmany methods of computing. 5/40 Purdue ML Seminar David Gleich, Purdue
  6. 6. Matrix computations Physics Statistics Engineering Graphics Databases … Machine learning 6/40 Purdue ML Seminar David Gleich, Purdue
  7. 7. Matrix computations 2 3 A1,1 A1,2 ··· A1,n 6 . 7 . 7 6 A2,1 A2,2 ··· . 7 A=6 . 6 7 4 . .. .. . . . Am 1,n 5 Am,1 ··· Am,n 1 Am,n Ax = b min kAx bk Ax = x 7/40Linear systems Least squares Eigenvalues Purdue ML Seminar David Gleich, Purdue
  8. 8. NETWORK andMATRIX COMPUTATIONS Why looking at networks of data as a matrix is a powerful and successful paradigm.
  9. 9. A new matrix-based sensitivityanalysis of Google’s PageRank. PageRank (I ↵P)x = (1 ↵)v SimRank Presented at" RAPr on Wikipedia DiffusionRank WAW2007, WWW2010 E [x(A)] Std [x(A)] BlockRank Published in the United States IsoRank United States C:Living people C:Living people TrustRank J. Internet Mathematics France ItemRank C:Main topic classif. ObjectRank ProteinRank Led to new results on United Kingdom C:Contentsuncertainty quantification in Germany C:Ctgs. by country HostRank physical simulationspublished in SIAM J. Matrix England Canada SocialPageRank United Kingdom France Random walk with Analysis and SIAM J. Scientific Computing. Japan Poland FoodRank C:Fundamental England restart Patent Pending Australia FutureRank C:Ctgs. by topic GeneRank TwitterRank Improved web-spam detection! Gleich (Stanford) Random sensitivity Ph.D. Defense 23 / 41 Collaborators Paul Constantine, Gianluca Iaccarino (physical simulation)
  10. 10. j Square s2 F.L (Purdue)vid Gleich r Network alignment INFORMS Semina= (t, )twork alignment = t t t mm 40 60 80 100 A L B NETWORK ALIGNMENTX m ximize wT x + 2 xT Sx T x + 1 xT Sx 40 j subject to Axw e, 2 {0, 1} m ximize  x 2S $ subject to Ax  engry Network alignment 2 {0, 1} problems Sparse 10/40 Bayati, Gerritsen, Gleich, Saberi, and Wang, ICDM2009UADRATIC ASSIGNMENT Bayati, Gleich, Saberi and Wang,often ignore Sparse L Submitted 60 Southeast Ranking few exceptions). Purdue ML Seminar David Gleich, Purdue Network alignment Workshop 11 / 29
  11. 11. Overlapping clusters! for distributed computation Andersen, Gleich, and Mirrokni, WSDM2012 2 Swapping Probability (usroads) PageRank Communication (usroads) Swapping Probability (web−Google) 1.5 PageRank Communication (web−Google)Relative Work 1 Metis Partitioner 0.5 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Volume Ratio 11/40 How much more of the graph we need to store. Purdue ML Seminar David Gleich, Purdue
  12. 12. Local methods for massive Twee network analysis RESULTS – SLIDE THRE Gleich et al. " MAIN J. Internet Mathematics, to appear.TOP-K ALGORITHM FOR KATZApproximate                                                where       is sparseKeep       sparse tooIdeally, don’t “touch” all of      David F. Gleich (Purdue) Univ. Chicago SSCS Seminar 34 of 47 Can solve these problemsGleich milliseconds even withICME la/opt seminar David F. in (Sandia) 100M edges! 12/40 Purdue ML Seminar David Gleich, Purdue
  13. 13. DAVID F. GLEICH (PURDUE) &LEK-HENG LIM (UNIV. CHICAGO)Rankaggregation 13 Purdue ML Seminar David Gleich, Purdue
  14. 14. Which is a better list of good DVDs?Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of …Lord of the Rings 1: The Fellowship Lord of the Rings 1: The Fellowship Lord of the Rings 2: The Two Towers Lord of the Rings 2: The Two TowersLost: Season 1 Star Wars V: Empire Strikes BackBattlestar Galactica: Season 1 Raiders of the Lost ArkFullmetal Alchemist Star Wars IV: A New HopeTrailer Park Boys: Season 4 Shawshank RedemptionTrailer Park Boys: Season 3 Star Wars VI: Return of the JediTenchi Muyo! Lord of the Rings 3: Bonus DVDShawshank Redemption The Godfather Standard " Nuclear Norm " rank aggregation" based rank aggregation (the mean rating) (not matrix completion on the 14/40 netflix rating matrix) Purdue ML Seminar David Gleich, Purdue
  15. 15. Rank Aggregation Given partial orders on subsets of items, rank aggregation is the problem of finding an overall ordering. Voting Find the winning candidate Program committees Find the best papers given reviews Dining Find the best restaurant in Chicago 15/40 Purdue ML Seminar David Gleich, Purdue
  16. 16. Ranking is really hard John Kemeny Dwork, Kumar, Naor, ! Ken Arrow SivikumarAll rank aggregationsinvolve some measure of A good ranking is thecompromise “average” ranking under a NP hard to compute Kemeny’s ranking 16/40 permutation distance Purdue ML Seminar David Gleich, Purdue
  17. 17. Embody chair! John Cantrell (flickr)Given a hard problem,what do you do?!!Numerically relax!!!It’ll probably be easier. 17/40 Purdue ML Seminar David Gleich, Purdue
  18. 18. Suppose we had scoresSuppose we had scoresLet    be the score of the ith movie/song/paper/team to rankSuppose we can compare the ith to jth:   Then    is skew-symmetric, rank 2.Also works for    with an extra log. Numerical ranking is intimately intertwined with skew-symmetric matrices 18/40 Kemeny and Snell, Mathematical Models in Social Sciences (1978)David F. Gleich (Purdue) Purdue KDD 2011 ML Seminar David Gleich, Purdue 6/20
  19. 19. Using ratings as comparisons Arithmetic MeanRatings inducevarious skew-symmetric matrices. Log-odds 19/40From David 1988 – TheMethod of Paired Comparisons Purdue ML Seminar David Gleich, Purdue
  20. 20. Extracting the scoresExtracting the scoresGiven    with all entries, then 107    is the Borda Movie Pairs 105 count, the least-squares solution to   How many    do we have? 101 Most. 101 105 Number of ComparisonsDo we trust all    ? Not really. Netflix data 17k movies, 500k users, 100M ratings– 99.17% filled 20/40David F. Gleich (Purdue) Purdue KDD 2011 ML Seminar David Gleich, Purdue 8/20
  21. 21. Onlypartial info? COMPLETE IT!Only partial info? Complete it!Let    be known for    We trust these scores.Goal Find the simplest skew-symmetric matrix that matches the data       noiseless noisy    21/40 Both of these are NP-hard too. Purdue ML Seminar David Gleich, PurdueDavid F. Gleich (Purdue) KDD 2011 9/20
  22. 22. Solution GO NUCLEAR! 22/40From a French nuclear test in 1970, imagePurdue ML Seminar David Gleich, Purdue from http://picdit.wordpress.com/2008/07/21/8- insane-nuclear-explosions/
  23. 23. The nuclear norm The nuclear norm! The analog the 1-norm or    -norm for matrices The analog of of the 1-norm or ℓ1for matrices.For vectors For matrices Let    be the SVD.is NP-hard while     is convex and gives the same    best convex under- answer “under appropriate estimator of rank on unit ball. circumstances” 23/40 Purdue ML Seminar David Gleich, Purdue
  24. 24. Only partial info? COMPLETE IT! Only partial info? Complete it!Let    be known for    We trust these scores.Goal Find the simplest skew-symmetric matrix that matches the data       NP hard Heuristic    Convex 24/40 Purdue ML Seminar David Gleich, Purdue
  25. 25. Solving the !Solving theproblemnuclear norm nuclear norm problemUse a LASSO formulation 1.    2. REPEAT   3.    = rank-k SVD of      4. 5.       6. UNTIL   Jain et al. propose SVP for this problem without    25/40 Purdue ML Seminar David Gleich, Purdue
  26. 26. Skew-symmetric SVDsSkew-symmetric SVDLet    be an    skew-symmetric matrix with eigenvalues    , where    and    . Then the SVD of    is given by   for    and    given in the proof.Proof Use the Murnaghan-Wintner form and the SVD of a 2x2 skew-symmetric block This means that SVP will give us the skew- 26/40 symmetric constraint “for free”David F. Gleich (Purdue) KDD 2011 14/20 Purdue ML Seminar David Gleich, Purdue
  27. 27. Only partial info? Complete it! Let    be known for    We trust these scoreMatrix completion Goal Find the simplest skew-symmetric matrix that the data   A fundamentalquestion is matrix    NP hardcompletion is when do theseproblems have the    Convexsame solution? 27/40 David F. Gleich (Purdue) KDD 2011 Purdue ML Seminar David Gleich, Purdue
  28. 28. indices. Instead we view the following theorem as providing Fraction of trials recovered 1 intuition for the noisy problem. 0.8Exact recovery results Consider the operator basis for Hermitian matrices:Exact recovery results H = S [ K [ D where 0.4 0.6 p S = {1/ 2(ei eT + ej eT ) : 1  i j  n};David Gross showed how to recover Hermitian matrices. 0.2 p j i K = {ı/ 2(ei eT ej eT ) : 1we get n}; exact    i.e. the conditions under which  i j the j i 0 2 10 T D = {ei ei : 1  i  n}. Gross, arXiv, 2010Note that    is Hermitian. Thus our new result! Figure T Theorem 5. Let s be centered, i.e., s e = 0. Let Y = ity of seT esT where ✓ = maxi s2 /(sT s) and ⇢ = ((maxi si ) i about (mini si ))/ksk. Also, let ⌦ ⇢ H be a random set of elements both th with size |⌦| O(2n⌫(1 + )(log n)2 ) where ⌫ = max((n✓ + §6.1 fo 1)/4, n⇢2 ). Then the solution of 6.1 R minimize kXk⇤ The fi ⇤ ⇤ subject to trace(X W i ) = trace((ıY ) W i ), W i 2 ⌦ ability o the nois 28/40 is equal to ıY with probability at least 1 n . with un These a The proof of this theorem follows directly by Theorem 4 if    Purdue ML Seminar David Gleich, PurdueY = se
  29. 29. Recovery Discussion and ExperimentsConfession If    , then just look at differences from a connected set. Constants? Not very good.    Intuition for the truth.       29/40 Purdue ML Seminar David Gleich, Purdue
  30. 30. Recovery Discussion and Experiments RecoveryConfession If    Experiments just look at differences from , then a connected set. Constants? Not very good.    Intuition for the truth.       30/40David F. Gleich (Purdue) KDD 2011 16/20 Purdue ML Seminar David Gleich, Purdue
  31. 31. The ranking algorithm Algorithm The Ranking 0. INPUT    (ratings data) and c (for trust on comparisons) 1. Compute    from    2. Discard entries with fewer than c comparisons 3. Set    to be indices and values of what’s left 4.    = SVP(   ) 5. OUTPUT    31/40 Purdue ML Seminar David Gleich, Purdue
  32. 32. Item Response ModelSynthetic evaluationThe synthetic results came from a model inspired by Ho and Quinn [2008].      - center rating for user $i$   - sensitivity of user $i$   - value of item $j$   - error level in ratingsSample ratings uniformly at random such that there for expected ratings per user. 32/40David F. Gleich (Purdue) Purdue KDD 2011 ML Seminar David Gleich, Purdue 21/20
  33. 33. Evaluation Nuclear norm ranking Mean rating 1 1Median Kendall’s Tau Median Kendall’s Tau 0.9 0.9 0.8 0.8 20 0.7 0.7 10 5 0.6 2 0.6 1.5 0.5 0.5 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Error Error 33/40Figure 3: The performance of our algorithmPurdue Purdue ML Seminar David Gleich, (left)
  34. 34. Conclusions and Future Work Our motto “aggregate, then complete” 1.  Additional comparison Rank aggregation with 2.  Noisy recovery! More the nuclear norm is realistic sampling. principled 3.  Skew-symmetric Lanczos based SVD? easy to compute The results are much better than simple approaches. 34/40 Purdue ML Seminar David Gleich, Purdue
  35. 35. Current research 35 Purdue ML Seminar David Gleich, Purdue
  36. 36. Data driven surrogate functionsBeyond spectral methods for UQ 36/40 Purdue ML Seminar David Gleich, Purdue
  37. 37. Graph spectraGraph spectra 37/40 Purdue ML Seminar David Gleich, Purdue
  38. 38. 1.33 (two!)Spectral spikes 1.5, 0.5 1.5 0.565741 1.833 1.767592 0.725708 1.607625 1.5 (two) 38/40 Purdue ML Seminar David Gleich, Purdue
  39. 39. Google nuclear ranking gleich 39/40 Purdue ML Seminar David Gleich, Purdue
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×