Successfully reported this slideshow.

# LSH for  Prediction Problem in Recommendation

Using LSH for predicting user ratings on the items.

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### LSH for  Prediction Problem in Recommendation

1. 1. LSH for  Prediction Problem in Recommendation Maruf Aytekin PhD Student Computer Engineering Department Bahcesehir University May 5, 2015
2. 2. Outline • User-based • Item-based • LSH • Parameters • Model Build Performance • Accuracy Performance • LSH Parameters
3. 3. Data Set Total Ratings: 100000 Number of Users : 943 Number of Items : 1682 Sparsity = 0.0630
4. 4. Evaluation Methods • We use hold out cross validation methot for the experiments • We select %5 for test %5 for validation data randomly. • Repeat this process 3 times and averaged out the results
5. 5. User-based Neighbors can have different levels of similarity. Wuv: Similarity of user u and v. rvi: Rating value of user v for item i. Ni(u): Set of neighbors who have rated for item i.
6. 6. ruj: Rating value of user u for item j. Nu(i): the items rated by user u most similar to item i. Wij: Similarity of item i and j Item-based
7. 7. U1 U2 U3 Um . . . . . H1 H2 U7 U11 U10 . . U13 U39 Um . . U1 U3 U9 . . U2 U5 U6 . . bucket 1 key: 0101 bucket 2 key: 1110 bucket 3 key: 1101 bucket 4 key: 1001 [0,1] [0,1] AND-Construction Locality Sensitive Hashing
8. 8. Hash Tables U2 U6 U1 U3 . . . candidate set for U5: C(U5) L = 2 K = 4 t = 1 t = 2
9. 9. LSH for Prediction L : number of hash tables (bands) Cvi(t) : the set of candidate pairs retrieved from hash table t rated for item i. rvi : rating of user v (in C) on item i
10. 10. Computational Complexty |U | : User set size | I | : Item set size k : Number of neighbors used in the predictions p : Maximum number of ratings per user q : Maximum number of ratings per item
11. 11. Parameters (CF)
12. 12. LSH Parameters
13. 13. LSH Parameters
14. 14. Model Build Time
15. 15. Results  User-based With the optimum k = 30 and Y=7 ; • Average MAE: 0.79527 • Average running time: 9.437 seconds. We compare this results LSH method.
16. 16. LSH & User-based  Hash Functions
17. 17. LSH & User-based  Hash Functions
18. 18. LSH & User-based  Hash Tables
19. 19. LSH & User-based  Hash Tables
20. 20. Conclusion • LSH tremendously improved the scalability • Accuracy decreased in acceptable ranges • Performance improved a lot. • LSH needs to be configured to balance MAE and performance according to expectations from the system.
21. 21. Source Code User-based Prediction:
22. 22. Source Code LSH Prediction:
23. 23. Q&A