Comparing social tags to microblogs

879 views
832 views

Published on

Presentation at MSM2011

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
879
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Comparing social tags to microblogs

  1. 1. Comparing social tags to microblogs Victoria Lai, Christopher Rajashekar, William Rand Modeling Social Media 2011 October 9, 2011
  2. 2. Social Tags and Social Media  Brand manager – what are people saying about a product online?  Goal: See if tags about an album reflect Twitter conversations  Amazon tags  Where purchases take place  Easier to collect than tweets2
  3. 3. Similarity framework S(fa(ta),fw(tw)) > θ ta album tweets all tagsalbum tags (ta) top ten tags keywords (tw) fa importance importance tag weights measure (fa) measure (fw) fw frequency tf-idf phrase 1 # phrase 1 # phrase 2 # phrase 2 # S Spearman phrase 3 # S > θ? phrase 3 # Kendall tau … … Precision Recall
  4. 4. Baselines (θ) General control  I, the, and, a, of  Used in tf-idf Music control  music  Used as threshold
  5. 5. Relevant Work Heymann, Ramage, and Garcia-Molina (2008) IR measures Eck, Lamere, Bertin-Mahieux, and Green (2007) correlation measures Wagner and Strohmaier (2010) tweet stream properties Inouye and Kalita (2011) automatic tweet summarization Wu, Zhang, and Ostendorf (2010) tf-idf on user tweets
  6. 6. Correlations Threshold (music control) Base case Best case C1: ta = all tags, fw = C2: ta = all tags, fw = C3: ta = top tags, fw =Album freq, tw = music freq tf-idf Spearman Kendall Spearman Kendall Spearman Kendall D1 0.44 0.38 0.29 0.25 0.69 0.43 D2 0.29 0.24 0.38 0.37 0.78 0.70 D3 0.24 0.20 0.38 0.33 0.33 0.31 D4 0.30 0.26 0.40 0.35 0.60 0.51 J1 0.64 0.55 0.31 0.28 0.31 0.28 J5 0.20 0.18 0.23 0.18 0.63 0.44 J6 0.47 0.37 0.28 0.19 0.63 0.45 F2 0.24 0.20 0.43 0.36 0.30 0.28 Shaded – strongest correlation listed C3 Bolded – better than base case
  7. 7. Information Retrieval Album Precision Precision Recall (P1) threshold (P2) D1 0.48 0.43 0.002 D2 0.24 0.62 0.008 D3 0.29 0.36 0.001 D4 0.36 0.36 0.0004 J1 0.20 0.50 0.0003 J3 0.00 0.75 0.00 J5 0.57 0.40 0.0002 J6 0.75 0.38 0.0004 F1 0.00 0.50 0.00 F2 0.67 0.59 0.00009 Average 0.35 0.49 0.001 HV 0.51 0.45 0.0003 average LV average 0.20 0.53 0.002
  8. 8. Conclusions Good proxy for top content when sufficient Twitter activity More relevant tags are higher in tweet keyword rankings TF-IDF is effectiveNext Steps Larger dataset Analysis over time Other sources like LastFM Linguistic analysis (clustering, stemming) Other user-generated data (e.g. user reviews)
  9. 9. Questions?

×