Music Recommender Systems 超群 .com [email_address] http://www.fuchaoqun.com 2009.11.22  Beta 技术沙龙 官方 twitter : @betasalon 官...
Music Recommender Systems 超群 .com [email_address] http:// www.fuchaoqun.com
Who is Using Recommender Systems?
Recommender Systems <ul><li>Summary : </li></ul><ul><ul><li>http://en.wikipedia.org/wiki/Recommender_system   </li></ul></...
Algorithms <ul><li>Association Rules </li></ul><ul><li>Slope one </li></ul><ul><li>SVD </li></ul><ul><li>… . </li></ul>
Algorithms <ul><li>Association Rules </li></ul><ul><li>Slope one </li></ul><ul><li>SVD </li></ul><ul><li>… . </li></ul>
Association Rules TID Items 1 Bread 、 Milk 2 Bread 、 Diaper 、 Beer 、 Egg 3 Diaper 、 Beer 、 Cola 4 Bread 、 Milk 、 Diaper 、 ...
Association Rules <ul><li>Support : </li></ul><ul><li>Confidence: </li></ul><ul><li>Algorithms : Apriori algorithm 、 FP-gr...
Algorithms <ul><li>Association Rules </li></ul><ul><li>Slope one </li></ul><ul><li>SVD </li></ul><ul><li>… . </li></ul>
Slope One User That is it Straight Through My Heart Jim 4 5 Mike 2 4 Fred 3 ?
Slope One <ul><li>By  Daniel Lemire  in 2005 </li></ul><ul><ul><li>http://www.daniel-lemire.com/fr/abstracts/SDM2005.html ...
Algorithms <ul><li>Association Rules </li></ul><ul><li>Slope one </li></ul><ul><li>SVD </li></ul><ul><li>… . </li></ul>
Similarity Similarity :
SVD Image copy from  Here
SVD In Image Compression Original K=10 K=20
Process SVD <ul><li>Define the original user-item matrix, R, of size m x n, which includes the ratings of m users on n ite...
Demo from  Here Which two people have the most similar tastes? Which two season are the most close?
Demo
Demo
SVD <ul><li>SVD </li></ul><ul><ul><li>matlab </li></ul></ul><ul><ul><li>LAPCKL 、 BLAS   ( Fortran ) </li></ul></ul><ul><ul...
MAGIC DIVISI ! #!/usr/bin/env python #coding=utf-8 import divisi from divisi.cnet import * data = divisi.SparseLabeledTens...
Music Recommender Systems <ul><li>Data collection </li></ul><ul><li>Data Cleaning </li></ul><ul><li>Data Preprocessing </l...
Data collection <ul><li>User rating </li></ul><ul><li>User collection </li></ul><ul><li>User listen log </li></ul><ul><li>...
Data Cleaning <ul><li>Missing data </li></ul><ul><li>Wrong data </li></ul><ul><li>Noise data </li></ul><ul><li>Duplicate d...
Data Preprocessing UserId SongId Times 3306 3654 200 3306 6950 236 3306 6528 268 3306 5874 325 3306 9527 126 3306 5624 98 ...
Data Mining UserId SongId Weight 3306 3654 0.62 3306 6950 0.73 3306 6528 0.82 3306 5874 1 3306 9527 0.39 3306 5624 0.30 33...
Tracking & Optimization <ul><li>Recommended result </li></ul><ul><li>User view and click what he like </li></ul><ul><li>St...
That's it, Thanks. Q&A
Upcoming SlideShare
Loading in …5
×

Music Recommender Systems

1,632 views

Published on

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,632
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
67
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Music Recommender Systems

  1. 1. Music Recommender Systems 超群 .com [email_address] http://www.fuchaoqun.com 2009.11.22 Beta 技术沙龙 官方 twitter : @betasalon 官方网站: http://club.blogbeta.com 邮件组: http://groups.google.com/group/betasalon?hl=zh-CN
  2. 2. Music Recommender Systems 超群 .com [email_address] http:// www.fuchaoqun.com
  3. 3. Who is Using Recommender Systems?
  4. 4. Recommender Systems <ul><li>Summary : </li></ul><ul><ul><li>http://en.wikipedia.org/wiki/Recommender_system </li></ul></ul><ul><li>Keywords : </li></ul><ul><li>recommender system s 、 association rules 、 collaborative filtering 、 slope one 、 SVD 、 KNN.... </li></ul>
  5. 5. Algorithms <ul><li>Association Rules </li></ul><ul><li>Slope one </li></ul><ul><li>SVD </li></ul><ul><li>… . </li></ul>
  6. 6. Algorithms <ul><li>Association Rules </li></ul><ul><li>Slope one </li></ul><ul><li>SVD </li></ul><ul><li>… . </li></ul>
  7. 7. Association Rules TID Items 1 Bread 、 Milk 2 Bread 、 Diaper 、 Beer 、 Egg 3 Diaper 、 Beer 、 Cola 4 Bread 、 Milk 、 Diaper 、 Beer 5 Bread 、 Milk 、 Diaper 、 Cola Items Times Beer 、 Diaper 3 Bread 、 Milk 3 Beer 、 Bread 2 Diaper 、 Milk 2 Beer 、 Milk 1
  8. 8. Association Rules <ul><li>Support : </li></ul><ul><li>Confidence: </li></ul><ul><li>Algorithms : Apriori algorithm 、 FP-growth algorithm </li></ul><ul><li>http:// en.wikipedia.org/wiki/Association_rule_learning </li></ul><ul><li>Demo : Python + Orange </li></ul><ul><li>http://www.fuchaoqun.com/2008/08/data-mining-with-python-orange-association_rule/ </li></ul>
  9. 9. Algorithms <ul><li>Association Rules </li></ul><ul><li>Slope one </li></ul><ul><li>SVD </li></ul><ul><li>… . </li></ul>
  10. 10. Slope One User That is it Straight Through My Heart Jim 4 5 Mike 2 4 Fred 3 ?
  11. 11. Slope One <ul><li>By Daniel Lemire in 2005 </li></ul><ul><ul><li>http://www.daniel-lemire.com/fr/abstracts/SDM2005.html </li></ul></ul><ul><li>Simper Could Be Better </li></ul><ul><li>Weighted Average : </li></ul><ul><li>http:// en.wikipedia.org/wiki/Slope_One </li></ul><ul><li>Implements: http:// taste.sourceforge.net / (Java) http:// code.google.com/p/openslopeone (PHP&MySQL) </li></ul>
  12. 12. Algorithms <ul><li>Association Rules </li></ul><ul><li>Slope one </li></ul><ul><li>SVD </li></ul><ul><li>… . </li></ul>
  13. 13. Similarity Similarity :
  14. 14. SVD Image copy from Here
  15. 15. SVD In Image Compression Original K=10 K=20
  16. 16. Process SVD <ul><li>Define the original user-item matrix, R, of size m x n, which includes the ratings of m users on n items. r ij refers to the rating of user u i on item i j . </li></ul><ul><li>Preprocess user-item matrix R in order to eliminate all missing data values. </li></ul><ul><li>Compute the SVD of R and obtain matrices U, S and V , of size m x m, m x n, and n x n, respectively. Their relationship is expressed by: R =U * S * V T . </li></ul><ul><li>Perform the dimensionality reduction step by keeping only k diagonal entries from matrix S to obtain a k x k matrix, S k . Similarly, matrices U k and V k of size m x k and k x n are generated. The &quot;reduced&quot; user-item matrix, R ’ , is obtained by R ’ = U k * S k * V k T , while r ' ij denotes the rating by user u i on item i j as included in this reduced matrix. </li></ul><ul><li>Compute sqrt(S k ) and then calculate two matrix products: U k * sqrt(S k ) T , which represents m users and sqrt(S k ) * V k T , which represents n items in the k dimen-sional feature space. We are particularly interested in the latter matrix, of size k x n. </li></ul><ul><li>Use KNN on user matrix and item matrix, or you can multiply them to get user's rating on every item. </li></ul>
  17. 17. Demo from Here Which two people have the most similar tastes? Which two season are the most close?
  18. 18. Demo
  19. 19. Demo
  20. 20. SVD <ul><li>SVD </li></ul><ul><ul><li>matlab </li></ul></ul><ul><ul><li>LAPCKL 、 BLAS ( Fortran ) </li></ul></ul><ul><ul><li>numpy 、 scipy ( Python ) </li></ul></ul><ul><ul><li>SVDLIBC 、 Meschach (C) </li></ul></ul><ul><ul><li>http://en.wikipedia.org/wiki/Singular_value_decomposition </li></ul></ul><ul><ul><li>…… </li></ul></ul><ul><li>KNN: </li></ul><ul><ul><li>matlab </li></ul></ul><ul><ul><li>FLANN </li></ul></ul><ul><ul><li>…… </li></ul></ul><ul><li>All in one solution : </li></ul><ul><ul><li>DIVISI </li></ul></ul><ul><ul><li>…… </li></ul></ul>
  21. 21. MAGIC DIVISI ! #!/usr/bin/env python #coding=utf-8 import divisi from divisi.cnet import * data = divisi.SparseLabeledTensor(ndim = 2) # read some rating into data # data[user_id, song_id] = 4 svd_result = data.svd(k = 128) # get songs that the user may like # predict_features(svd_result, user_id).top_items(100) # get similar songs # feature_similarity(svd_result, song_id).top_items(100) # get users that have similar tastes # concept_similarity(svd_result, user_id).top_items(100)
  22. 22. Music Recommender Systems <ul><li>Data collection </li></ul><ul><li>Data Cleaning </li></ul><ul><li>Data Preprocessing </li></ul><ul><li>Data Mining </li></ul><ul><li>Tracking & Optimization </li></ul>
  23. 23. Data collection <ul><li>User rating </li></ul><ul><li>User collection </li></ul><ul><li>User listen log </li></ul><ul><li>User view log </li></ul><ul><li>… . </li></ul>
  24. 24. Data Cleaning <ul><li>Missing data </li></ul><ul><li>Wrong data </li></ul><ul><li>Noise data </li></ul><ul><li>Duplicate data </li></ul><ul><li>… . </li></ul>UserId SongId Times 3306 3654 200 3306 6950 236 3306 6528 268 3306 5874 3306 9527 foo 3306 5624 1000000 3306 9635 5 3306 6950 236 … . … . … .
  25. 25. Data Preprocessing UserId SongId Times 3306 3654 200 3306 6950 236 3306 6528 268 3306 5874 325 3306 9527 126 3306 5624 98 3306 9635 115 3306 6962 210 … . … . … . UserId SongId Weight 3306 3654 0.62 3306 6950 0.73 3306 6528 0.82 3306 5874 1 3306 9527 0.39 3306 5624 0.30 3306 9635 0.35 3306 6962 0.65 … . … . … .
  26. 26. Data Mining UserId SongId Weight 3306 3654 0.62 3306 6950 0.73 3306 6528 0.82 3306 5874 1 3306 9527 0.39 3306 5624 0.30 3306 9635 0.35 3306 6962 0.65 … . … . … . UserId Similary Users’ Id … . … . SongId Similary Songs’ Id … . … .
  27. 27. Tracking & Optimization <ul><li>Recommended result </li></ul><ul><li>User view and click what he like </li></ul><ul><li>Store user's click </li></ul><ul><li>Data Mining </li></ul><ul><li>Better recommendation </li></ul>
  28. 28. That's it, Thanks. Q&A

×