SlideShare a Scribd company logo
Rank aggregation via nuclear norm minimization




                                                                                                     David F. Gleich
                                                                                        Sandia National Laboratories

                                                                                                              Lek-Heng Lim
                                                                                                        University of Chicago

                                                                                          Householder Symposium 18
                                                                                              Tahoe City, (Barely) CA


Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin
                       Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
David F. Gleich (Sandia)                                       Householder 18                                                              1/15
Which is a better list of good DVDs?
Lord of the Rings 3: The Return of …       Lord of the Rings 3: The Return of …
Lord of the Rings 1: The Fellowship        Lord of the Rings 1: The Fellowship
Lord of the Rings 2: The Two Towers        Lord of the Rings 2: The Two Towers
Lost: Season 1                             Star Wars V: Empire Strikes Back
Battlestar Galactica: Season 1             Raiders of the Lost Ark
Fullmetal Alchemist                        Star Wars IV: A New Hope
Trailer Park Boys: Season 4                Shawshank Redemption
Trailer Park Boys: Season 3                Star Wars VI: Return of the Jedi
Tenchi Muyo!                               Lord of the Rings 3: Bonus DVD
Shawshank Redemption                       The Godfather


                   Standard
                                                        Nuclear Norm
               rank aggregation
                                                    based rank aggregation
               (the mean rating)

David F. Gleich (Sandia)           Householder 18                             2/15
Ranking is really hard
                            John Kemeny               Dwork, Kumar, Naor,
      Ken Arrow                                                 Sivikumar




All rank aggregations
involve some measure       A good ranking is the
of compromise              “average” ranking under   NP hard to compute
                           a permutation distance    Kemeny’s ranking


David F. Gleich (Sandia)         Householder 18                        3/15
Embody chair
                                            John Cantrell (flickr)


     Given a hard problem,
     what do you do?

     Numerically relax!

     It’ll probably be easier.



David F. Gleich (Sandia)   Householder 18                            4 of 15
Suppose we had scores
Let   be the score of the ith movie/song/paper/team to rank

Suppose we can compare the ith to jth:

                                 

Then                       is skew-symmetric, rank 2.

Also works for                  with an extra log.

               Numerical ranking is intimately intertwined
               with skew-symmetric matrices
                                         Kemeny and Snell, Mathematical Models in Social Sciences (1978)
David F. Gleich (Sandia)            Householder 18                                                  5/15
Using ratings instead of comparisons




Ratings induce various skew-symmetric matrices.

                                                                  Arithmetic Mean
                            

                                                                  Log-odds
                                
                                                    David 1988 – The Method of Paired Comparisons
David F. Gleich (Sandia)           Householder 18                                            6/15
Extracting the scores
Given   with all entries, then                               107

            is the Borda




                                               Movie Pairs
                                                             105
  count, the least-squares
  solution to  

How many                   do we have?                    101
  Most.
                                                                     101           105
                                                                   Number of Comparisons
Do we trust all              ?
  Not really.                                                      Netflix data 17k movies,
                                                                   500k users, 100M ratings–
                                                                   99.17% filled



David F. Gleich (Sandia)             Householder 18                                            7/15
Only partial info? Complete it!
Let              be known for                     We trust these scores.

Goal Find the simplest skew-symmetric matrix that matches
  the data  




                            
               NP hard




                                                                                  Heuristic
                            
                Convex




                               best convex underestimator of rank on unit ball.
David F. Gleich (Sandia)                 Householder 18                              8/15
Solving the nuclear norm problem
Use a LASSO formulation                1.  
                                       2. REPEAT
 
                                       3.           = rank-k SVD of
                                                

                                       4.
                                       5.  
                                                

                                       6. UNTIL  
Jain et al. propose SVP for
   this problem without
    



                                                          Jain et al. NIPS 2010
David F. Gleich (Sandia)      Householder 18                              9/15
Skew-symmetric SVDs
Let         be an                skew-symmetric matrix with
  eigenvalues                                      ,
  where                          and       . Then the SVD of   is
  given by




                            
for   and   given in the proof.
Proof Use the Murnaghan-Wintner form and the SVD of a
   2x2 skew-symmetric block

                 This means that SVP will give us the skew-
                 symmetric constraint “for free”
David F. Gleich (Sandia)           Householder 18              10/15
Exact recovery results
David Gross showed how to recover Hermitian matrices.
  i.e. the conditions under which we get the exact  

Note that                  is Hermitian. Thus our new result!




                                       
                                                                Gross arXiv 2010.
David F. Gleich (Sandia)                  Householder 18                    11/15
Recovery Discussion and Experiments
If                  , then just look at differences from a
       connected set. Constants? Not very good.
                               Intuition for the truth.
                                                  




David F. Gleich (Sandia)        Householder 18               12/15
The Ranking Algorithm
0. INPUT   (ratings data) and c
   (for trust on comparisons)
1. Compute   from  
2. Discard entries with fewer than
   c comparisons
3. Set      to be indices and
   values of what’s left
4.                         = SVP(    )
5. OUTPUT  




David F. Gleich (Sandia)                 Householder 18   13/15
Synthetic Results
Construct an Item Response Theory model. Vary number of
  ratings per user and a noise/error level
                           Our Algorithm!                    The Average Rating




David F. Gleich (Sandia)                    Householder 18                        14/15
Conclusions and Future Work

                                    1. Compare against others
Rank aggregation with                  van Dooren (fitting)
the nuclear norm is:                   Massey (direct least
  principled                           squares for   )
  easy to compute
The results are much better         2. Noisy recovery! More
than simple approaches.                realistic sampling.

                                    3. Skew-symmetric Lanczos
                                       based SVD?



David F. Gleich (Sandia)   Householder 18                     15/15
Google nuclear ranking gleich


          To appear KDD2011

More Related Content

More from David Gleich

Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
David Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
David Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
David Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
David Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
David Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
David Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
David Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
David Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
David Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
David Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
David Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
David Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
David Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
David Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
David Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
David Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
David Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
David Gleich
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
David Gleich
 

More from David Gleich (20)

Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 

Rank aggregation via skew-symmetric matrix completion

  • 1. Rank aggregation via nuclear norm minimization David F. Gleich Sandia National Laboratories Lek-Heng Lim University of Chicago Householder Symposium 18 Tahoe City, (Barely) CA Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. David F. Gleich (Sandia) Householder 18 1/15
  • 2. Which is a better list of good DVDs? Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of … Lord of the Rings 1: The Fellowship Lord of the Rings 1: The Fellowship Lord of the Rings 2: The Two Towers Lord of the Rings 2: The Two Towers Lost: Season 1 Star Wars V: Empire Strikes Back Battlestar Galactica: Season 1 Raiders of the Lost Ark Fullmetal Alchemist Star Wars IV: A New Hope Trailer Park Boys: Season 4 Shawshank Redemption Trailer Park Boys: Season 3 Star Wars VI: Return of the Jedi Tenchi Muyo! Lord of the Rings 3: Bonus DVD Shawshank Redemption The Godfather Standard Nuclear Norm rank aggregation based rank aggregation (the mean rating) David F. Gleich (Sandia) Householder 18 2/15
  • 3. Ranking is really hard John Kemeny Dwork, Kumar, Naor, Ken Arrow Sivikumar All rank aggregations involve some measure A good ranking is the of compromise “average” ranking under NP hard to compute a permutation distance Kemeny’s ranking David F. Gleich (Sandia) Householder 18 3/15
  • 4. Embody chair John Cantrell (flickr) Given a hard problem, what do you do? Numerically relax! It’ll probably be easier. David F. Gleich (Sandia) Householder 18 4 of 15
  • 5. Suppose we had scores Let   be the score of the ith movie/song/paper/team to rank Suppose we can compare the ith to jth:   Then   is skew-symmetric, rank 2. Also works for   with an extra log. Numerical ranking is intimately intertwined with skew-symmetric matrices Kemeny and Snell, Mathematical Models in Social Sciences (1978) David F. Gleich (Sandia) Householder 18 5/15
  • 6. Using ratings instead of comparisons Ratings induce various skew-symmetric matrices. Arithmetic Mean   Log-odds   David 1988 – The Method of Paired Comparisons David F. Gleich (Sandia) Householder 18 6/15
  • 7. Extracting the scores Given   with all entries, then 107   is the Borda Movie Pairs 105 count, the least-squares solution to   How many   do we have? 101 Most. 101 105 Number of Comparisons Do we trust all   ? Not really. Netflix data 17k movies, 500k users, 100M ratings– 99.17% filled David F. Gleich (Sandia) Householder 18 7/15
  • 8. Only partial info? Complete it! Let   be known for   We trust these scores. Goal Find the simplest skew-symmetric matrix that matches the data     NP hard Heuristic   Convex   best convex underestimator of rank on unit ball. David F. Gleich (Sandia) Householder 18 8/15
  • 9. Solving the nuclear norm problem Use a LASSO formulation 1.   2. REPEAT   3.   = rank-k SVD of     4. 5.     6. UNTIL   Jain et al. propose SVP for this problem without   Jain et al. NIPS 2010 David F. Gleich (Sandia) Householder 18 9/15
  • 10. Skew-symmetric SVDs Let   be an   skew-symmetric matrix with eigenvalues   , where   and   . Then the SVD of   is given by   for   and   given in the proof. Proof Use the Murnaghan-Wintner form and the SVD of a 2x2 skew-symmetric block This means that SVP will give us the skew- symmetric constraint “for free” David F. Gleich (Sandia) Householder 18 10/15
  • 11. Exact recovery results David Gross showed how to recover Hermitian matrices. i.e. the conditions under which we get the exact   Note that   is Hermitian. Thus our new result!   Gross arXiv 2010. David F. Gleich (Sandia) Householder 18 11/15
  • 12. Recovery Discussion and Experiments If   , then just look at differences from a connected set. Constants? Not very good.   Intuition for the truth.     David F. Gleich (Sandia) Householder 18 12/15
  • 13. The Ranking Algorithm 0. INPUT   (ratings data) and c (for trust on comparisons) 1. Compute   from   2. Discard entries with fewer than c comparisons 3. Set   to be indices and values of what’s left 4.   = SVP(  ) 5. OUTPUT   David F. Gleich (Sandia) Householder 18 13/15
  • 14. Synthetic Results Construct an Item Response Theory model. Vary number of ratings per user and a noise/error level Our Algorithm! The Average Rating David F. Gleich (Sandia) Householder 18 14/15
  • 15. Conclusions and Future Work 1. Compare against others Rank aggregation with van Dooren (fitting) the nuclear norm is: Massey (direct least principled squares for   ) easy to compute The results are much better 2. Noisy recovery! More than simple approaches. realistic sampling. 3. Skew-symmetric Lanczos based SVD? David F. Gleich (Sandia) Householder 18 15/15
  • 16. Google nuclear ranking gleich To appear KDD2011