Comparing social tags to microblogs

Comparing social tags to microblogs

Victoria Lai, Christopher Rajashekar, William Rand
Modeling Social Media 2011
October 9, 2011

Social Tags and Social Media
 Brand manager – what are people saying about a product
online?
 Goal: See if tags about an album
reflect Twitter conversations
 Amazon tags
 Where purchases take place
 Easier to collect than tweets

2

Similarity framework S(fa(ta),fw(tw)) > θ
ta
album tweets all tags
album tags (ta) top ten tags
keywords (tw)
fa
importance importance tag weights
measure (fa) measure (fw) fw
frequency
tf-idf
phrase 1 # phrase 1 #
phrase 2 # phrase 2 # S
Spearman
phrase 3 # S > θ? phrase 3 #
Kendall tau
…

…
Precision
Recall

Baselines (θ)
 General control
 I, the, and, a, of
 Used in tf-idf
 Music control
 music
 Used as threshold

Relevant Work
 Heymann, Ramage, and Garcia-Molina (2008)
IR measures
 Eck, Lamere, Bertin-Mahieux, and Green (2007)
correlation measures
 Wagner and Strohmaier (2010)
tweet stream properties
 Inouye and Kalita (2011)
automatic tweet summarization
 Wu, Zhang, and Ostendorf (2010)
tf-idf on user tweets

Correlations
Threshold (music control) Base case Best case
C1: ta = all tags, fw = C2: ta = all tags, fw = C3: ta = top tags, fw =
Album freq, tw = music freq tf-idf
Spearman Kendall Spearman Kendall Spearman Kendall
D1 0.44 0.38 0.29 0.25 0.69 0.43
D2 0.29 0.24 0.38 0.37 0.78 0.70
D3 0.24 0.20 0.38 0.33 0.33 0.31
D4 0.30 0.26 0.40 0.35 0.60 0.51
J1 0.64 0.55 0.31 0.28 0.31 0.28
J5 0.20 0.18 0.23 0.18 0.63 0.44
J6 0.47 0.37 0.28 0.19 0.63 0.45
F2 0.24 0.20 0.43 0.36 0.30 0.28
Shaded – strongest correlation listed
C3 Bolded – better than base case

Information Retrieval
Album Precision Precision Recall
(P1) threshold (P2)
D1 0.48 0.43 0.002
D2 0.24 0.62 0.008
D3 0.29 0.36 0.001
D4 0.36 0.36 0.0004
J1 0.20 0.50 0.0003
J3 0.00 0.75 0.00
J5 0.57 0.40 0.0002
J6 0.75 0.38 0.0004
F1 0.00 0.50 0.00
F2 0.67 0.59 0.00009
Average 0.35 0.49 0.001
HV 0.51 0.45 0.0003
average
LV average 0.20 0.53 0.002

Conclusions
 Good proxy for top content when sufficient Twitter activity
 More relevant tags are higher in tweet keyword rankings
 TF-IDF is effective

Next Steps
 Larger dataset
 Analysis over time
 Other sources like LastFM
 Linguistic analysis (clustering, stemming)
 Other user-generated data (e.g. user reviews)

Comparing social tags to microblogs

More Related Content

Viewers also liked

Recently uploaded

Comparing social tags to microblogs