Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Just Count the Love-Hate Squares:a Rating Network Based Method for           Recommender Systems                          ...
Link Prediction in Bipartite Rating Network                1               2                 3           4   Items        ...
Motivation: Happy Hour with Brock and Donald       Song 1                        +               Brock                    ...
The Square Counting Method: How to Count     -           +           -            +?    0   -   ?   1   -   ?   2    +   ?...
The Square Counting Method: Machine Learning    •  Counts for different square configurations form the features.    •  Con...
KDD Cup Track 2-Yahoo! Music Dataset•  Goal is to develop algorithms to separate which ratings were   highly rated by a us...
Summary of Results-KDD Cup Track 2    •  Enhancements                       •  Square counting       –  Normalizing square...
Hate is a Powerful Signal in Predicting Love    •  Logistic regression coefficients (in 10-3) for each love-hate       squ...
Upcoming SlideShare
Loading in …5
×

Just Count the Love-Hate Squares

1,577 views

Published on

A square counting machine learning method for recommendation systems.

Published in: Technology, Education
  • Be the first to comment

Just Count the Love-Hate Squares

  1. 1. Just Count the Love-Hate Squares:a Rating Network Based Method for Recommender Systems KDD Cup 2011 August 21, 2011 Joseph Kong, Kyle Teague, Justin Kessler Approved for public release by Northrop Grumman Information Systems, ISHQ-2011-0042
  2. 2. Link Prediction in Bipartite Rating Network 1 2 3 4 Items 80 20 100 90 50 ? A B Users 1 2 3 4 Items + - + + - ? A B Users •  Solid edges represent the observed rating pattern •  Score >= 80 ( I-love-it, “+” ); score < 80 ( I-hate-it, “-” );2 •  Goal: predict whether unobserved link is highly rated?
  3. 3. Motivation: Happy Hour with Brock and Donald Song 1 + Brock + Song 2 Donald - + ? - ? + - + - + Me - Me + - + •  Happy hour chat: with Brock, there are 3 songs that we both hate; with Donald, we find 3 songs we both love. •  Now, Brock loves Song 1 and Donald loves Song 2 •  Am I more likely to love Song 1 or Song 2? •  Main idea: the presence of certain type of square may be3 highly indicative of love/hate; so, just count them!
  4. 4. The Square Counting Method: How to Count - + - +? 0 - ? 1 - ? 2 + ? 3 + - - - - - + - +? 4 - ? 5 - ? 6 + ? 7 + + + + + Configuration No. denoted in middle•  Given user-item (utg-itg) pair: Count number of each configuration and form feature vector•  For example, in right Fig., the path (utg-i1-u1-itg), which has a sign sequence of {-,+,-}, corresponds to configuration No. 2 (see left Fig.); thus, the count for configuration No. 2 is 1.4
  5. 5. The Square Counting Method: Machine Learning •  Counts for different square configurations form the features. •  Construct the validation set with user-item pairs with known ratings. •  Machine learning framework: 1.  Perform square counting on rating network for each user-item pair in the validation set and generate the validation instance-feature matrix. 2.  Train a machine learned classifier on validation instance-feature matrix. 3.  Repeat square counting on the rating network for the test set and generate the test instance-feature matrix. 4.  Apply the machine learned classifier for each instance in the test instance- feature matrix.5
  6. 6. KDD Cup Track 2-Yahoo! Music Dataset•  Goal is to develop algorithms to separate which ratings were highly rated by a user (score >=80) and which were not.•  For each user in the test set, 6 songs were given; out of the 6 songs, 3 songs were highly rated by the user and 3 songs were not (task is to distinguish them)•  Winners are determined by the error rate on a hold-out test set Statistic Count Users 249,012 Items 296,111 Ratings 62,551,438 Training Ratings 61,944,406 Test Ratings 607,032
  7. 7. Summary of Results-KDD Cup Track 2 •  Enhancements •  Square counting –  Normalizing square counts –  Generate feature-instance matrix against random network model –  Implemented in C++/OpenMP –  Separate counts based on item –  ~ 5 hr on 8-core workstation (2 GB hierarchy RAM) –  Further edge categorization •  Machine learning: ~1 hr –  Removing very popular items –  Using bias-removed scores7
  8. 8. Hate is a Powerful Signal in Predicting Love •  Logistic regression coefficients (in 10-3) for each love-hate square configuration in predicting a users highly rated items •  Interesting observation: most powerful configs for predicting a user’s love for an item comes from hate edges: config. No. 1 & 4 (2nd top row; 1st bottom row). •  Config. No. 1 (2nd top row) means: Item X is recommended to you because you hate items Y and Z!8

×