Rank aggregation via nuclear norm minimization


Published on

The process of rank aggregation is intimately intertwined with
the structure of skew-symmetric matrices.
We apply recent advances in the theory and algorithms of matrix completion
to skew-symmetric matrices. This combination of ideas
produces a new method for ranking a set of items. The essence
of our idea is that a rank aggregation describes a partially
filled skew-symmetric matrix. We extend an algorithm for
matrix completion to handle skew-symmetric data and use that
to extract ranks for each item.
Our algorithm applies to both pairwise comparison and rating data.
Because it is based on matrix completion, it is robust to
both noise and incomplete data. We show a formal
recovery result for the noiseless case and
present a detailed study of the algorithm on synthetic
data and Netflix ratings.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Rank aggregation via nuclear norm minimization

  1. 1. Rank aggregation via nuclear norm minimization David F. Gleich Purdue University @dgleich Lek-Heng Lim University of Chicago KDD2011 San Diego, CA Lek funded by NSF CAREER award (DMS-1057064); David funded by DOE John von Neumann and NSERCDavid F. Gleich (Purdue) KDD 2011 1/20
  2. 2. Which is a better list of good DVDs?Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of …Lord of the Rings 1: The Fellowship Lord of the Rings 1: The FellowshipLord of the Rings 2: The Two Towers Lord of the Rings 2: The Two TowersLost: Season 1 Star Wars V: Empire Strikes BackBattlestar Galactica: Season 1 Raiders of the Lost ArkFullmetal Alchemist Star Wars IV: A New HopeTrailer Park Boys: Season 4 Shawshank RedemptionTrailer Park Boys: Season 3 Star Wars VI: Return of the JediTenchi Muyo! Lord of the Rings 3: Bonus DVDShawshank Redemption The Godfather Standard Nuclear Norm rank aggregation based rank aggregation (the mean rating) (not matrix completion on the netflix rating matrix)David F. Gleich (Purdue) KDD 2011 2/20
  3. 3. Rank AggregationGiven partial orders on subsets of items, rank aggregation is the problem of finding an overall ordering.Voting Find the winning candidateProgram committees Find the best papers given reviewsDining Find the best restaurant in San Diego (subject to a budget?)David F. Gleich (Purdue) KDD 2011 3/20
  4. 4. Ranking is really hard John Kemeny Dwork, Kumar, Naor, Ken Arrow SivikumarAll rank aggregationsinvolve some measure A good ranking is theof compromise “average” ranking under NP hard to compute a permutation distance Kemeny’s rankingDavid F. Gleich (Purdue) KDD 2011 4/20
  5. 5. Embody chair John Cantrell (flickr) Given a hard problem, what do you do? Numerically relax! It’ll probably be easier.David F. Gleich (Purdue) KDD 2011 5/20
  6. 6. Suppose we had scoresLet   be the score of the ith movie/song/paper/team to rankSuppose we can compare the ith to jth:  Then   is skew-symmetric, rank 2.Also works for   with an extra log. Numerical ranking is intimately intertwined with skew-symmetric matrices Kemeny and Snell, Mathematical Models in Social Sciences (1978)David F. Gleich (Purdue) KDD 2011 6/20
  7. 7. Using ratings as comparisonsRatings induce various skew-symmetric matrices. Arithmetic Mean   Log-odds   David 1988 – The Method of Paired ComparisonsDavid F. Gleich (Purdue) KDD 2011 7/20
  8. 8. Extracting the scoresGiven   with all entries, then 107   is the Borda Movie Pairs 105 count, the least-squares solution to  How many   do we have? 101 Most. 101 105 Number of ComparisonsDo we trust all   ? Not really. Netflix data 17k movies, 500k users, 100M ratings– 99.17% filledDavid F. Gleich (Purdue) KDD 2011 8/20
  9. 9. Only partial info? Complete it!Let   be known for   We trust these scores.Goal Find the simplest skew-symmetric matrix that matches the data     noiseless noisy   Both of these are NP-hard too.David F. Gleich (Purdue) KDD 2011 9/20
  10. 10. Solution Go Nuclear From a French nuclear test in 1970, image from http://picdit.wordpress.com/2008/07/21/8-insane-nuclear-explosions/David F. Gleich (Purdue) KDD 2011 10
  11. 11. The nuclear norm The analog of the 1-norm or   for matrices.For vectors For matrices Let   be the SVD.is NP-hard while   is convex and gives the same   best convex under- answer “under appropriate estimator of rank on unit ball. circumstances”David F. Gleich (Purdue) KDD 2011 11/20
  12. 12. Only partial info? Complete it!Let   be known for   We trust these scores.Goal Find the simplest skew-symmetric matrix that matches the data     NP hard Heuristic   ConvexDavid F. Gleich (Purdue) KDD 2011 12/20
  13. 13. Solving the nuclear norm problemUse a LASSO formulation 1.   2. REPEAT  3.   = rank-k SVD of    4. 5.     6. UNTIL  Jain et al. propose SVP for this problem without   Jain et al. NIPS 2010David F. Gleich (Purdue) KDD 2011 13/20
  14. 14. Skew-symmetric SVDsLet   be an   skew-symmetric matrix with eigenvalues   , where   and   . Then the SVD of   is given by  for   and   given in the proof.Proof Use the Murnaghan-Wintner form and the SVD of a 2x2 skew-symmetric block This means that SVP will give us the skew- symmetric constraint “for free”David F. Gleich (Purdue) KDD 2011 14/20
  15. 15. Exact recovery resultsDavid Gross showed how to recover Hermitian matrices. i.e. the conditions under which we get the exact  Note that   is Hermitian. Thus our new result!   Gross arXiv 2010.David F. Gleich (Purdue) KDD 2011 15/20
  16. 16. Recovery Discussion and ExperimentsConfession If   , then just look at differences from a connected set. Constants? Not very good.   Intuition for the truth.    David F. Gleich (Purdue) KDD 2011 16/20
  17. 17. The Ranking Algorithm0. INPUT   (ratings data) and c (for trust on comparisons)1. Compute   from  2. Discard entries with fewer than c comparisons3. Set   to be indices and values of what’s left4.   = SVP(  )5. OUTPUT  David F. Gleich (Purdue) KDD 2011 17/20
  18. 18. Synthetic ResultsConstruct an Item Response Theory model. Vary number of ratings per user and a noise/error level Our Algorithm! The Average RatingDavid F. Gleich (Purdue) KDD 2011 18/20
  19. 19. Conclusions and Future Work“aggregate, then complete” 1. Compare against othersRank aggregation with 2. Noisy recovery! Morethe nuclear norm is realistic sampling. principled easy to compute 3. Skew-symmetric LanczosThe results are much better based SVD?than simple approaches.David F. Gleich (Purdue) KDD 2011 19/20
  20. 20. Google nuclear ranking gleich