3. Data Set
Total Ratings: 100000
Number of Users : 943
Number of Items : 1682
Sparsity = 0.0630
4. Evaluation Methods
• We use hold out cross validation methot for the
experiments
• We select %5 for test %5 for
validation data randomly.
• Repeat this process 3 times and
averaged out the results
5. User-based
Neighbors can have different levels of similarity.
Wuv: Similarity of user u and v.
rvi: Rating value of user v for item i.
Ni(u): Set of neighbors who have rated for item i.
6. ruj: Rating value of user u for item j.
Nu(i): the items rated by user u most similar to item i.
Wij: Similarity of item i and j
Item-based
9. LSH for Prediction
L : number of hash tables (bands)
Cvi(t) : the set of candidate pairs retrieved from hash table t
rated for item i.
rvi : rating of user v (in C) on item i
10. Computational Complexty
|U | : User set size
| I | : Item set size
k : Number of neighbors used in the predictions
p : Maximum number of ratings per user
q : Maximum number of ratings per item
15. Results
User-based
With the optimum k = 30 and Y=7 ;
• Average MAE: 0.79527
• Average running time: 9.437 seconds.
We compare this results LSH method.
20. Conclusion
• LSH tremendously improved the scalability
• Accuracy decreased in acceptable ranges
• Performance improved a lot.
• LSH needs to be configured to balance MAE and
performance according to expectations from the
system.