SlideShare a Scribd company logo
Just Count the Love-Hate Squares:
a Rating Network Based Method for
           Recommender Systems
                                                 KDD Cup 2011
                                                        August 21, 2011


          Joseph Kong, Kyle Teague, Justin Kessler




                      Approved for public release by Northrop Grumman Information Systems, ISHQ-2011-0042
Link Prediction in Bipartite Rating Network

                1               2                 3           4   Items
                                                      80

                20        100       90       50
                                                      ?

                      A                  B                        Users
                 1              2                 3           4   Items
                                                          +

                 -         +        +        -
                                                      ?

                       A                 B                         Users
    •  Solid edges represent the observed rating pattern

    •  Score >= 80 ( I-love-it, “+” ); score < 80 ( I-hate-it, “-” );

2   •  Goal: predict whether unobserved link is highly rated?
Motivation: Happy Hour with Brock and Donald


       Song 1
                        +               Brock                    +
                                                Song 2                       Donald
                                -                                    +
                ?                   -                    ?               +
                    -                                        +
                                        -                                    +
         Me             -                         Me             +
                            -                                        +

     •  Happy hour chat: with Brock, there are 3 songs that we
        both hate; with Donald, we find 3 songs we both love.

     •  Now, Brock loves Song 1 and Donald loves Song 2

     •  Am I more likely to love Song 1 or Song 2?

     •  Main idea: the presence of certain type of square may be
3
        highly indicative of love/hate; so, just count them!
The Square Counting Method: How to Count

     -           +           -            +
?    0   -   ?   1   -   ?   2    +   ?   3    +
     -           -           -            -
     -           +           -            +
?    4   -   ?   5   -   ?   6    +   ?   7    +
     +           +           +            +
         Configuration No. denoted in middle

•  Given user-item (utg-itg) pair: Count number of each
   configuration and form feature vector

•  For example, in right Fig., the path (utg-i1-u1-itg), which has a
   sign sequence of {-,+,-}, corresponds to configuration No. 2
   (see left Fig.); thus, the count for configuration No. 2 is 1.
4
The Square Counting Method: Machine Learning

    •  Counts for different square configurations form the features.

    •  Construct the validation set with user-item pairs with known ratings.

    •  Machine learning framework:

    1.    Perform square counting on rating network for each user-item pair in the
          validation set and generate the validation instance-feature matrix.

    2.    Train a machine learned classifier on validation instance-feature matrix.

    3.    Repeat square counting on the rating network for the test set and generate the
          test instance-feature matrix.

    4.    Apply the machine learned classifier for each instance in the test instance-
          feature matrix.



5
KDD Cup Track 2-Yahoo! Music Dataset


•  Goal is to develop algorithms to separate which ratings were
   highly rated by a user (score >=80) and which were not.

•  For each user in the test set, 6 songs were given; out of the 6
   songs, 3 songs were highly rated by the user and 3 songs were
   not (task is to distinguish them)

•  Winners are determined by the error rate on a hold-out test set

                               Statistic          Count
                               Users              249,012
                               Items              296,111
                               Ratings            62,551,438
                               Training Ratings   61,944,406
                               Test Ratings       607,032
Summary of Results-KDD Cup Track 2




    •  Enhancements                       •  Square counting
       –  Normalizing square counts           –  Generate feature-instance matrix
          against random network model        –  Implemented in C++/OpenMP
       –  Separate counts based on item       –  ~ 5 hr on 8-core workstation (2 GB
          hierarchy                              RAM)
       –  Further edge categorization
                                          •  Machine learning: ~1 hr
       –  Removing very popular items
       –  Using bias-removed scores


7
Hate is a Powerful Signal in Predicting Love




    •  Logistic regression coefficients (in 10-3) for each love-hate
       square configuration in predicting a user's highly rated items

    •  Interesting observation: most powerful configs for predicting
       a user’s love for an item comes from hate edges: config. No.
       1 & 4 (2nd top row; 1st bottom row).

    •  Config. No. 1 (2nd top row) means: Item X is recommended
       to you because you hate items Y and Z!
8

More Related Content

Similar to Just Count the Love-Hate Squares

Overlapping community detection survey
Overlapping community detection surveyOverlapping community detection survey
Overlapping community detection survey
煜林 车
 
ML Label engineering and N-Hot Encoders
ML Label engineering and N-Hot EncodersML Label engineering and N-Hot Encoders
ML Label engineering and N-Hot Encoders
Mor Krispil
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problem
Grokking VN
 
Enar short course
Enar short courseEnar short course
Enar short course
Deepak Agarwal
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
Gábor Szárnyas
 
22期.百度彭滔 搜索引擎评估与用户行为分析
22期.百度彭滔 搜索引擎评估与用户行为分析22期.百度彭滔 搜索引擎评估与用户行为分析
22期.百度彭滔 搜索引擎评估与用户行为分析
Janwen Lou
 
Mmclass3
Mmclass3Mmclass3
Mmclass3
Hassan Dar
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzik
itzik cohen
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
Caleb Jones
 
Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016
Sarasi Sarangi
 
Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016
Artificial Intelligence Institute at UofSC
 
Parking space detect
Parking space detectParking space detect
Parking space detect
Amanullah Tariq
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
Wes McKinney
 
Game Programming 07 - Procedural Content Generation
Game Programming 07 - Procedural Content GenerationGame Programming 07 - Procedural Content Generation
Game Programming 07 - Procedural Content Generation
Nick Pruehs
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
Yoonho Na
 
Numerical Linear Algebra for Data and Link Analysis
Numerical Linear Algebra for Data and Link AnalysisNumerical Linear Algebra for Data and Link Analysis
Numerical Linear Algebra for Data and Link Analysis
Leonid Zhukov
 
Multi-label graph analysis and computations using GraphX
Multi-label graph analysis and computations using GraphXMulti-label graph analysis and computations using GraphX
Multi-label graph analysis and computations using GraphX
Qingbo Hu
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Databricks
 
Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...
Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...
Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...
Databricks
 
[SNU Computer Vision Course Project] Image Style Recognition
[SNU Computer Vision Course Project] Image Style Recognition[SNU Computer Vision Course Project] Image Style Recognition
[SNU Computer Vision Course Project] Image Style Recognition
Hunjae Jung
 

Similar to Just Count the Love-Hate Squares (20)

Overlapping community detection survey
Overlapping community detection surveyOverlapping community detection survey
Overlapping community detection survey
 
ML Label engineering and N-Hot Encoders
ML Label engineering and N-Hot EncodersML Label engineering and N-Hot Encoders
ML Label engineering and N-Hot Encoders
 
Grokking Techtalk #37: Data intensive problem
 Grokking Techtalk #37: Data intensive problem Grokking Techtalk #37: Data intensive problem
Grokking Techtalk #37: Data intensive problem
 
Enar short course
Enar short courseEnar short course
Enar short course
 
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
An early look at the LDBC Social Network Benchmark's Business Intelligence wo...
 
22期.百度彭滔 搜索引擎评估与用户行为分析
22期.百度彭滔 搜索引擎评估与用户行为分析22期.百度彭滔 搜索引擎评估与用户行为分析
22期.百度彭滔 搜索引擎评估与用户行为分析
 
Mmclass3
Mmclass3Mmclass3
Mmclass3
 
Final Presentation - Edan&Itzik
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzik
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016
 
Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016
 
Parking space detect
Parking space detectParking space detect
Parking space detect
 
A look inside pandas design and development
A look inside pandas design and developmentA look inside pandas design and development
A look inside pandas design and development
 
Game Programming 07 - Procedural Content Generation
Game Programming 07 - Procedural Content GenerationGame Programming 07 - Procedural Content Generation
Game Programming 07 - Procedural Content Generation
 
object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
Numerical Linear Algebra for Data and Link Analysis
Numerical Linear Algebra for Data and Link AnalysisNumerical Linear Algebra for Data and Link Analysis
Numerical Linear Algebra for Data and Link Analysis
 
Multi-label graph analysis and computations using GraphX
Multi-label graph analysis and computations using GraphXMulti-label graph analysis and computations using GraphX
Multi-label graph analysis and computations using GraphX
 
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
Multi-Label Graph Analysis and Computations Using GraphX with Qiang Zhu and Q...
 
Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...
Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...
Scoring at Scale: Generating Follow Recommendations for Over 690 Million Link...
 
[SNU Computer Vision Course Project] Image Style Recognition
[SNU Computer Vision Course Project] Image Style Recognition[SNU Computer Vision Course Project] Image Style Recognition
[SNU Computer Vision Course Project] Image Style Recognition
 

Recently uploaded

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 

Recently uploaded (20)

“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 

Just Count the Love-Hate Squares

  • 1. Just Count the Love-Hate Squares: a Rating Network Based Method for Recommender Systems KDD Cup 2011 August 21, 2011 Joseph Kong, Kyle Teague, Justin Kessler Approved for public release by Northrop Grumman Information Systems, ISHQ-2011-0042
  • 2. Link Prediction in Bipartite Rating Network 1 2 3 4 Items 80 20 100 90 50 ? A B Users 1 2 3 4 Items + - + + - ? A B Users •  Solid edges represent the observed rating pattern •  Score >= 80 ( I-love-it, “+” ); score < 80 ( I-hate-it, “-” ); 2 •  Goal: predict whether unobserved link is highly rated?
  • 3. Motivation: Happy Hour with Brock and Donald Song 1 + Brock + Song 2 Donald - + ? - ? + - + - + Me - Me + - + •  Happy hour chat: with Brock, there are 3 songs that we both hate; with Donald, we find 3 songs we both love. •  Now, Brock loves Song 1 and Donald loves Song 2 •  Am I more likely to love Song 1 or Song 2? •  Main idea: the presence of certain type of square may be 3 highly indicative of love/hate; so, just count them!
  • 4. The Square Counting Method: How to Count - + - + ? 0 - ? 1 - ? 2 + ? 3 + - - - - - + - + ? 4 - ? 5 - ? 6 + ? 7 + + + + + Configuration No. denoted in middle •  Given user-item (utg-itg) pair: Count number of each configuration and form feature vector •  For example, in right Fig., the path (utg-i1-u1-itg), which has a sign sequence of {-,+,-}, corresponds to configuration No. 2 (see left Fig.); thus, the count for configuration No. 2 is 1. 4
  • 5. The Square Counting Method: Machine Learning •  Counts for different square configurations form the features. •  Construct the validation set with user-item pairs with known ratings. •  Machine learning framework: 1.  Perform square counting on rating network for each user-item pair in the validation set and generate the validation instance-feature matrix. 2.  Train a machine learned classifier on validation instance-feature matrix. 3.  Repeat square counting on the rating network for the test set and generate the test instance-feature matrix. 4.  Apply the machine learned classifier for each instance in the test instance- feature matrix. 5
  • 6. KDD Cup Track 2-Yahoo! Music Dataset •  Goal is to develop algorithms to separate which ratings were highly rated by a user (score >=80) and which were not. •  For each user in the test set, 6 songs were given; out of the 6 songs, 3 songs were highly rated by the user and 3 songs were not (task is to distinguish them) •  Winners are determined by the error rate on a hold-out test set Statistic Count Users 249,012 Items 296,111 Ratings 62,551,438 Training Ratings 61,944,406 Test Ratings 607,032
  • 7. Summary of Results-KDD Cup Track 2 •  Enhancements •  Square counting –  Normalizing square counts –  Generate feature-instance matrix against random network model –  Implemented in C++/OpenMP –  Separate counts based on item –  ~ 5 hr on 8-core workstation (2 GB hierarchy RAM) –  Further edge categorization •  Machine learning: ~1 hr –  Removing very popular items –  Using bias-removed scores 7
  • 8. Hate is a Powerful Signal in Predicting Love •  Logistic regression coefficients (in 10-3) for each love-hate square configuration in predicting a user's highly rated items •  Interesting observation: most powerful configs for predicting a user’s love for an item comes from hate edges: config. No. 1 & 4 (2nd top row; 1st bottom row). •  Config. No. 1 (2nd top row) means: Item X is recommended to you because you hate items Y and Z! 8