A Unified Music Recommender System Using Listening Habits and Semantics of Tags

1,961 views
1,867 views

Published on

데이터 사이언스 학회 월례회 (4월)
- 김현희, 동덕여대 정보통계학과

Published in: Technology
1 Comment
5 Likes
Statistics
Notes
No Downloads
Views
Total views
1,961
On SlideShare
0
From Embeds
0
Number of Embeds
1,218
Actions
Shares
0
Downloads
0
Comments
1
Likes
5
Embeds 0
No embeds

No notes for slide

A Unified Music Recommender System Using Listening Habits and Semantics of Tags

  1. 1. A Unified Music Recommender System UsingUsers’ Listening Habits and Semantics of TagsHyon Hee KimDepartment of Statistics and Information Science,Dongduk Women’s University
  2. 2. Outline• Motivation & Objectives• Overview of the System• Generation of User Profiles• A Unified Music Recommendation• Performance Evaluation• Related Work• Conclusions and Future Work
  3. 3. Motivation (1/3)• In a Social Music Site– Music recommendation is essential.– Music recommendation is different from other product recommendation• Explicit information : Rating system• Implicit information : the number of plays• Listening habits-based User Profiling– Cold Start Problem• A new users with little information• A new items with only a few ratings– Data Sparsity Problem• Data is very small compared to needed music items
  4. 4. Classic rockbritishpoprock• Collaborative Tagging– A tool for users to represent their preferences about web resources– Users add keywords which are freely chosen by themselves to web resources– Using tag data for user profiling in personalized recommender systems• Tag-based User Profiling– More Easily added tags without listening to music– Semantically meaningful tagsMotivation (2/3)
  5. 5. Motivation (3/3)• In the case of last.fm• Factual Tags– 85% of tags– genre, region, instrumentation• Emotional Tags– 10% of tags– opinion, sentiment, mood• Personal Tags– 5% of tags– to organize, to browse, etc.
  6. 6. Objectives• A Novel Approach to Music Recommendation– Combining listening habits and semantics of tags• Using a Tag Ontology and an Emotion Ontology– UniTag: Resolving semantic ambiguity of tags– UniEmotion: Assigning weighted values to the emotional tags→ Semantically Enhanced Music Recommendation
  7. 7. Outline• Motivation & Objectives• Overview of the System• Generation of User Profiles• A Unified Music Recommendation• Performance Evaluation• Related Work• Conclusions and Future Work
  8. 8. Overview of the System
  9. 9. Outline• Motivation & Objectives• Overview of the System• Tag-based User Profiling– Preprocessing of tags– Algorithms for generating user profiles– Preliminary experimental results• A Unified Music Recommendation• Performance Evaluation• Related Work• Conclusions and Future Work
  10. 10. Preprocessing of Tags (1/3)• A tag does not have any pre-defined term or hierarchies of a term• Problems of tag data– Synonymy• Different words represents the same meaning• E.g., hiphop, hip-hop, hip hop/ R & B, Rhythm and Blues, Blues– Polysemy• A single word contains multiple meanings• E.g., French => French rock, French pop, French artist– Spelling variants• misspelling• Foreign language
  11. 11. Preprocessing of Tags (2/3)• Tag Ontology– Tags, users, items• UniTag Ontology– uniTag:Users• uniTag:userID, uniTag:hasAdded, uniTag:hasAddedTo– uniTag:Items• uniTag:itemID– uniTag:Tags• uniTag:tagID, uniTag:tagName, uniTag:RTag, uniTag:subTag,• uniTag:Rtags {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae}• uniTag:classifiedAs, uniTag:isKindOf, uniTag:istheSameAs, uniTag:tagVariation
  12. 12. Preprocessing of Tags (3/3)• Rules for reasoning prefix– French rock, progressive rock, post rock=> rock(Tag (?t) ^ tagPrefix (?t, ?p) ^ Prefix(?p) ^ subTag(?t, ?s) ^ Rtags (?s) ->classifiedAs (?t, ?s)• Rules for reasoning expert knowledge– Soul => rhythm and blues, rhythm and blues => blues then Soul => blues(Tag (?t) ^ isKindof (?t, ?A) ^ isKindof (?A, ?B) -> isKindof (?t, ?B)• Rules for reasoning synonym– Hip-hop, hiphop => hip hop(Tag(?t) ^tagVariation (?t, ?R) ^ istheSameAs (?t, ?s) -> tagVariation (?s, ?R)
  13. 13. Algorithm for Generating User Profiles (1/2)Algorithm 1. Generation of A Tag-based ProfileInput: set of Representative tags Tr, set of a user’s tag TuOutput: set of frequencey for each representative tag of the user FTrvar RTags[] = {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae}var tagFrequency[] = { }, tempFrequency [] = { }var RTag = nullwhile ∃next tag t in Tu doRTag = FindRTag (t)If Rtag == RTags [i] then{ tempFrequency[i] = tempFrequency[i] + 1tagFrequency [i] = tempFrequency [i] }elsetagFrequency [i] = tempFrequency [i]endwhile rock hiphop electronic metal jazz rap funk folk blues reggaeuser1 6 2 2 3 2 4 3 1 1 1user2 5 0 0 0 0 0 0 0 1 0user3 2 2 1 1 1 1 2 0 0 1user4 10 1 0 1 2 0 2 3 3 1user5 1 4 0 0 0 4 1 0 0 0Table 1. An example of tag-based profiles
  14. 14. Algorithm for generating User Profiles (2/2)Algorithm 2. Generation of A Track-based ProfileInput: set of tracks of a usr TRu, set of Representative tags TrOutput: set of number of a user’s tracks for each representative musical genre Tnvar RTags[] = {rock, hiphop, electronic, metal, jazz, rap, funk, folk, blues, reggae}var numTrack[ ] = { }, tempnumTrack [ ] = { }var RTrack = nullwhile ∃next tag t in Tu doRTrack = FindGenre (t)If Rtrack == RTags [i] then{ tempnumTrack [i] = tempnumTrack[i] + 1numTrack[i] = tempnumTrack [i] }elsenumTrack [i] = tempnumTrack [i]endwhile rock hiphop electronic metal jazz rap funk folk blues reggaeUser1 65 176 5 4 0 168 0 3 0 0User2 411 8 11 109 3 5 8 1 0 0User3 157 7 11 10 6 2 1 39 4 2User4 257 20 9 18 2 5 0 9 0 0User5 110 277 15 8 6 85 10 3 2 7Table 2. An example of track-based profiles
  15. 15. Preliminary Experimental Results (1/3)• 1,000 user data set from Last.fm– Users, tags, music items• Standardization– To remove extensive preference• K-Means clustering algorithm– Canopy Clustering– 6 centroid points and 6 clusters
  16. 16. Preliminary Experimental Results (2/3)X1 X2 X3 X4 X5 X6 X7 X8 X9 X10Cluster1 0.241 1.472 0.626 0.130 1.267 1.621 2.168 0.274 1.078 0.381Cluster2 2.171 0.032 0.517 3.052 0.011 -0.030 0.328 1.533 1.245 0.162Cluster3 -0.206 -0.273 -0.517 -0.178 -0.180 -0.294 -0.233 -0.171 -0.204 -0.136Cluster4 -0.341 0.660 -0.459 -0.284 -0.208 1.178 -0.179 -0.321 -0.166 0.273Cluster5 -0.074 -0.155 1.320 -0.230 -0.115 -0.261 -0.209 -0.070 -0.172 -0.071Cluster6 2.815 7.640 5.168 -0.136 9.254 6.135 7.000 4.286 4.421 5.254X1 X2 X3 X4 X5 X6 X7 X8 X9 X10Cluster1 -0.411 0.495 0.406 -0.338 1.565 0.131 1.632 -0.135 0.147 0.812Cluster2 0.200 -0.444 0.007 -0.341 0.907 -0.468 -0.288 2.617 1.097 0.020Cluster3 -0.897 1.651 -0.539 -0.442 -0.213 1.836 0.059 -0.507 -0.415 0.034Cluster4 1.925 -0.590 -0.404 0.852 -0.264 -0.491 0.655 -0.002 2.850 -0.108Cluster5 0.914 -0.557 -0.216 0.794 -0.296 -0.511 -0.297 0.014 -0.157 -0.147Cluster6 -0.472 -0.327 0.380 -0.373 -0.184 -0.371 -0.241 -0.205 -0.300 -0.093Table 3. Values of Centers of Tag-based ProfilesTable 4. Values of Centers of Track-based Profiles• Clustering Validity– Inter-cluster distances– Distances between all pairs of centroids using cosine distance measure
  17. 17. Preliminary Experimental Results (3/3)– T-test• Mean of inter-cluster distances of tag-based profiles• Mean of inter-cluster distances of track-based profilesN Mean Std Dev t p-valueTag-based profiles 15 0.8325 0.68342.55 0.0165Track-based profiles 15 0.3785 0.0885Table 5. T-test result for the means of inter-cluster distances
  18. 18. Outline• Motivation & Objectives• Overview of the System• Generation of User Profiles• A Unified Music Recommendation– UniEmotion Ontology– Generation of User Profiles– Music Recommendation Algorithm• Performance Evaluation• Related Work• Conclusions and Future Work
  19. 19. UniEmotion Ontology (1/5)[Plutchik’s model]
  20. 20. UniEmotion Ontology (2/5)P: 0.625, O: 0.25, N: 0.125P: 0.375, O: 0.625, N: 0P: 1.0, O: 0, N: 0• Definition of the intensity of emotional tags• SentiWordNet, http://sentiwordnet.isti.cnr.it/
  21. 21. UniEmotion Ontology (3/5)• Intensity of emotional tags– Strong• Positive value >= 0.75 or Negative value>= 0.75– Middle• 0.25 <= Positive value <= 0.75 or• 0.25 <= Negative value <= 0.75– Weak• Positive value < 0.25 and Negative value < 0.25
  22. 22. UniEmotion Ontology (4/5)• Assigning the weights to the tags– Factual tags: 1– Positive tags• Strong: 2.5• Middle: 2• Weak: 1.5– Negative tags• Strong: -2.5• Middle: -2• Weak: -1.5• Final score of an item => sum of the weights
  23. 23. UniEmotion Ontology (5/5)• Two classes– UniEmotion:Positive• Emotional tags belonging to the positive emotional categories• trust, surprise, anticipation, and happiness– UniEmotion:Negative• Emotional tags belonging to the negative emotional categories• disgust, anger, fear, and sadness• Two properties– UniEmotion:Intensity• Specifying the intensity of tags– UniEmotion:Weight• Specifying the weight of tags
  24. 24. Generation of User Profiles (1/2)1. Listening habits-based User Profiles– U1 = {u1, u2, …, um}, I1 = {i1, i2, …, in},– <u, I, n>• N: number of plays2. Tag score-based User Profiles– U2 = {u1, u2, …, um}, I2 = {i1, i2, …, in},– <u, I, s>• S: scores of tags assigned by UniEmotion ontology3. Hybrid User Profiles– U3 = {u1, u2, …, um}, I3 = I1 ∩ I2,– <u, I, m>• M = α * n +(1- α) * s; α = 0.5
  25. 25. Generation of User Profiles (2/2)1. Listening habits-basedUser profiles2. Tag score-basedUser profiles3. HybridUser profiles
  26. 26. Music Recommendation Algorithm (1/2)• Finding Similar Users– Pearson Correlation Similarity• Calculating scores of items– Considering the similar users’ rates• Recommending top n items
  27. 27. Music Recommendation Algorithm (2/2)Input: a set of user profiles UPOutput: a set of recommended items RI1. For all yi ∈ UCompute a similarity s between X and yi.2. Sort by similarity3. Select top n neighbors4.5. For allCompute a similarity t between x andFor allpreference +=t * pref6. Rank by preference7. Select top n items
  28. 28. Outline• Motivation & Objectives• Overview of the System• Generation of User Profiles• A Unified Music Recommendation• Performance Evaluation• Related Work• Conclusions and Future Work
  29. 29. Performance Evaluation• Implementation Environment: Apache Web Server– User database : MySQL 5.0– Listening habits collector, tag score generator: PHP– Recommendation Engine: Apache Mahout– UniTag and UniEmotion Ontology: JDK6.0• Experimental Data– 1, 000 user information from last.fm [http://mir.dcs.gla.ac.uk/]– Containing 18,700 artist and 12,600 tags– 70% training data, 30% test data
  30. 30. Performance Evaluation• Evaluation Model– Recommended items• Items which users are interested in (True Positive, TP)• Items which users are not (False Positive, FP)– Items which are not recommended• Items which users are interested in (False Negative, FN)• Items which users are not interested in (True Negative, TN)– Precision P = TP/ TP+ FP• # of correct recommendation/# of all recommended items– Recall R = TP / TP+FN• # of correct recommendation/# of preferred items– F-measure F = 2* P* R / P+R• Harmonic average between precision and recall
  31. 31. Experimental Results (1/3)• Precisions[Number of similar users] [Number of recommended items]A: Listening habits-based approachB: Tag-based approachC: Hybrid approach
  32. 32. Experimental Results (2/3)• Recalls[Number of similar users] [Number of recommended items]A: Listening habits-based approachB: Tag-based approachC: Hybrid approach
  33. 33. Experimental Results (3/3)• F-measure[Number of similar users] [Number of recommended items]A: Listening habits-based approachB: Tag-based approachC: Hybrid approach
  34. 34. Statistical Validation• One-way ANOVA about three groups– Method1: listening habits-based approach– Method2: tag-based approach– Method3: hybrid approach• Tukey Multiple Comparison Test– Asymmetric distributions• Log transformation– Different characters in case two groups have significantdifference
  35. 35. Method 1 2 3 FMean of log(prec) -3.962B -4.036B -2.879A 34.27***MeanPrecision(SD)0.020(0.006)0.020(0.009)0.068(0.040)N 24 24 24Method 1 2 3 FMean of log(recall) -3.285B -4.099c -2.635A 26.80***MeanRecall (SD)0.044(0.023)0.019(0.010)0.093(0.056)N 24 24 24<Table1. test for precision> ***: p<0.001<Table2. test for recall> ***:p<0.001Method 1 2 3 FMean of log(F-measure) -3.748B -4.117c -2.894A 41.31***MeanF-measure (SD)0.024(0.006)0.018(0.008)0.06(0.034)N 24 24 24<Table2. test for F-measure> ***: p<0.001
  36. 36. Related Work• MusicBox– A personalized music recommender system based on social tags– 3-order tensors model– The method improves the recommendation quality• Foafing the music– Collecting music information in a semantic web environment– User information, music information, concert information– Recommendation of similar music items• OntoEmotions– An ontology of emotional categories covering the basic emotions– Armeteo art portal– New relations can be inferred by reasoning on the ontology of emotions
  37. 37. Conclusions• Solution to Cold Start Problem– It takes time to collect users’ listening habits.– Adding tags is easily done– Tags look like word-of-mouth• Performance Enhancement– Precision, Recall, F-measure– Hybrid approach > listening habits-based approach, tag-based approach
  38. 38. Future Work• Elaborating UniEmotion Ontology– Emerging Internet Slangs• Item Selection– Product Network Analysis Considering Tags– Analyzing short description

×