Transcript of "Rank aggregation via skew-symmetric matrix completion"
Rank aggregation via nuclear norm minimization David F. Gleich Sandia National Laboratories Lek-Heng Lim University of Chicago Householder Symposium 18 Tahoe City, (Barely) CASandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.David F. Gleich (Sandia) Householder 18 1/15
Which is a better list of good DVDs?Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of …Lord of the Rings 1: The Fellowship Lord of the Rings 1: The FellowshipLord of the Rings 2: The Two Towers Lord of the Rings 2: The Two TowersLost: Season 1 Star Wars V: Empire Strikes BackBattlestar Galactica: Season 1 Raiders of the Lost ArkFullmetal Alchemist Star Wars IV: A New HopeTrailer Park Boys: Season 4 Shawshank RedemptionTrailer Park Boys: Season 3 Star Wars VI: Return of the JediTenchi Muyo! Lord of the Rings 3: Bonus DVDShawshank Redemption The Godfather Standard Nuclear Norm rank aggregation based rank aggregation (the mean rating)David F. Gleich (Sandia) Householder 18 2/15
Ranking is really hard John Kemeny Dwork, Kumar, Naor, Ken Arrow SivikumarAll rank aggregationsinvolve some measure A good ranking is theof compromise “average” ranking under NP hard to compute a permutation distance Kemeny’s rankingDavid F. Gleich (Sandia) Householder 18 3/15
Embody chair John Cantrell (flickr) Given a hard problem, what do you do? Numerically relax! It’ll probably be easier.David F. Gleich (Sandia) Householder 18 4 of 15
Suppose we had scoresLet be the score of the ith movie/song/paper/team to rankSuppose we can compare the ith to jth: Then is skew-symmetric, rank 2.Also works for with an extra log. Numerical ranking is intimately intertwined with skew-symmetric matrices Kemeny and Snell, Mathematical Models in Social Sciences (1978)David F. Gleich (Sandia) Householder 18 5/15
Using ratings instead of comparisonsRatings induce various skew-symmetric matrices. Arithmetic Mean Log-odds David 1988 – The Method of Paired ComparisonsDavid F. Gleich (Sandia) Householder 18 6/15
Extracting the scoresGiven with all entries, then 107 is the Borda Movie Pairs 105 count, the least-squares solution to How many do we have? 101 Most. 101 105 Number of ComparisonsDo we trust all ? Not really. Netflix data 17k movies, 500k users, 100M ratings– 99.17% filledDavid F. Gleich (Sandia) Householder 18 7/15
Only partial info? Complete it!Let be known for We trust these scores.Goal Find the simplest skew-symmetric matrix that matches the data NP hard Heuristic Convex best convex underestimator of rank on unit ball.David F. Gleich (Sandia) Householder 18 8/15
Solving the nuclear norm problemUse a LASSO formulation 1. 2. REPEAT 3. = rank-k SVD of 4. 5. 6. UNTIL Jain et al. propose SVP for this problem without Jain et al. NIPS 2010David F. Gleich (Sandia) Householder 18 9/15
Skew-symmetric SVDsLet be an skew-symmetric matrix with eigenvalues , where and . Then the SVD of is given by for and given in the proof.Proof Use the Murnaghan-Wintner form and the SVD of a 2x2 skew-symmetric block This means that SVP will give us the skew- symmetric constraint “for free”David F. Gleich (Sandia) Householder 18 10/15
Exact recovery resultsDavid Gross showed how to recover Hermitian matrices. i.e. the conditions under which we get the exact Note that is Hermitian. Thus our new result! Gross arXiv 2010.David F. Gleich (Sandia) Householder 18 11/15
Recovery Discussion and ExperimentsIf , then just look at differences from a connected set. Constants? Not very good. Intuition for the truth. David F. Gleich (Sandia) Householder 18 12/15
The Ranking Algorithm0. INPUT (ratings data) and c (for trust on comparisons)1. Compute from 2. Discard entries with fewer than c comparisons3. Set to be indices and values of what’s left4. = SVP( )5. OUTPUT David F. Gleich (Sandia) Householder 18 13/15
Synthetic ResultsConstruct an Item Response Theory model. Vary number of ratings per user and a noise/error level Our Algorithm! The Average RatingDavid F. Gleich (Sandia) Householder 18 14/15
Conclusions and Future Work 1. Compare against othersRank aggregation with van Dooren (fitting)the nuclear norm is: Massey (direct least principled squares for ) easy to computeThe results are much better 2. Noisy recovery! Morethan simple approaches. realistic sampling. 3. Skew-symmetric Lanczos based SVD?David F. Gleich (Sandia) Householder 18 15/15
Google nuclear ranking gleich To appear KDD2011
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.