Leveraging Microblogs                                               for Resource RankingTomáš Majer, Marián Šimkotomasmaje...
Microblog• a brief form of a blog with limited size of a post,  typically 140 characters of length• sharing experiences, o...
Microblog – why important?• „Read-Write Web“ vision• user-generated data ▫ unbiased ▫ not moderated• huge amount of data –...
Microblog – why important?• „Read-Write Web“ vision• user-generated data ▫ unbiased ▫ not moderated                       ...
State-of-the-Art• topic identification/keyword extraction             (Ramage et al., 2010)• opinion mining    (Pendey, Iy...
Twitter Graph                              T1                                          R1       U1                        ...
Twitter Graph                                  T1                                              R1 posts     U1            ...
Resource Ranking: Overview• TweetRank ▫ ranking of a resource based on Twitter graph analysis• Computation 1. UserRank 2. ...
Resource Ranking: Overview• TweetRank ▫ ranking of a resource based on Twitter graph analysis• Computation                ...
UserRank                                1+γ (u)UserRank ( f ) UserRank (u) = ∑               f ∈ followers(u)   ∣ follower...
UserRank                                        ∣ followers(u)∣                                   γ(u)=                   ...
TweetRelevance, TweetRank                      UserRank ( Author (t)) TweetRelevance (t) =                       ∣tweets( ...
TweetRelevance, TweetRank                      UserRank ( Author (t)) TweetRelevance (t) =                         = TR(t)...
Evaluation1. TweetRank ranking vs. explicit user ranking (YouTube )2. Search results ranking study (Search)• Data: ▫ 1,997...
Computing TweetRank• TweetRank computed for each resource ▫ power-law distribution   # of resources                       ...
Experiment YouTube• YouTube – explicit user rating (Y1Rank) ▫ positive/negative vote, normalized• TweetRank vs. Y1Rank    ...
Experiment YouTube 2• YouTube – application-collected user rating (Y2Rank) ▫ „how do you like the video“? ▫ 5 degree scale...
Experiment YouTube 2 ▫ 5-tuplets
Experiment YouTube 2 ▫ pairs
Experiment Search• 20,000 resources• indexing: SOLR (Apache Lucene)• searching: resource ranking extended with TweetRank• ...
Experiment Search• findings:  ▫ in general, „newer“ resources rank better  ▫ ranking does not reflect chronological orderi...
Conclusions• microblog ▫ perspective source of data, information, knowledge• we proposed novel method for resource ranking...
Upcoming SlideShare
Loading in …5
×

Leveraging microblogs for resource ranking

2,423 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,423
On SlideShare
0
From Embeds
0
Number of Embeds
1,885
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Leveraging microblogs for resource ranking

  1. 1. Leveraging Microblogs for Resource RankingTomáš Majer, Marián Šimkotomasmajer@gmail.com, simko@fiit.stuba.sk23.01.2012, SOFSEM 2012Institute of Informatics and Software EngineeringFaculty of Informatics and Information TechnologiesSlovak University of Technology in Bratislava
  2. 2. Microblog• a brief form of a blog with limited size of a post, typically 140 characters of length• sharing experiences, opinions, comments, links• Twitter, Identi.ca, Jaiku, (Google+, Facebook)• Twitter: ▫ over 300 millions of users (100 mil. at the time of writing paper) ▫ over 300 millions of tweets/day (100 mil. --||--)
  3. 3. Microblog – why important?• „Read-Write Web“ vision• user-generated data ▫ unbiased ▫ not moderated• huge amount of data – text, links• valuable source of information ▫ data mining
  4. 4. Microblog – why important?• „Read-Write Web“ vision• user-generated data ▫ unbiased ▫ not moderated 22 % of posts• huge amount of data – text, links• valuable source of information ▫ data mining
  5. 5. State-of-the-Art• topic identification/keyword extraction (Ramage et al., 2010)• opinion mining (Pendey, Iyer, 2009)• twitter search, tweets ranking (Teevan et al., 2011)• user ranking (Gayo-Avello, 2010)• user modeling (Abel et al., 2011)•…• challenge: resource ranking
  6. 6. Twitter Graph T1 R1 U1 T2 R2 T3 U2 R3 T4 T5 R4 U3 T6 R5 T7 Users Tweets Resources
  7. 7. Twitter Graph T1 R1 posts U1 T2 contains R2 T3 U2 R3 T4 T5 R4 U3 re-posts follows T6 R5 T7 Users Tweets Resources
  8. 8. Resource Ranking: Overview• TweetRank ▫ ranking of a resource based on Twitter graph analysis• Computation 1. UserRank 2. TweetRelevance 3. TweetRank
  9. 9. Resource Ranking: Overview• TweetRank ▫ ranking of a resource based on Twitter graph analysis• Computation T1 R1 1. UserRank U1 T2 R2 T3 2. TweetRelevance U2 R3 T4 T5 R4 3. TweetRank U3 T6 R5 T7
  10. 10. UserRank 1+γ (u)UserRank ( f ) UserRank (u) = ∑ f ∈ followers(u) ∣ followers( f )∣
  11. 11. UserRank ∣ followers(u)∣ γ(u)= ∣tweets(u)∣ 1+γ (u)UserRank ( f ) UserRank (u) = ∑ f ∈ followers(u) ∣ followers( f )∣
  12. 12. TweetRelevance, TweetRank UserRank ( Author (t)) TweetRelevance (t) = ∣tweets( Author (t ))∣
  13. 13. TweetRelevance, TweetRank UserRank ( Author (t)) TweetRelevance (t) = = TR(t) ∣tweets( Author (t ))∣ TweetRank (r) = ∑ t ∈tweets(r ) ( TR(t)+ ∑ rt ∈retweets(t) TR(t)TR(rt) )
  14. 14. Evaluation1. TweetRank ranking vs. explicit user ranking (YouTube )2. Search results ranking study (Search)• Data: ▫ 1,997,466 tweets from 367,824 users  85 % in English ▫ 1,468,365,182 connections between 40,103,281 users ▫ 1,150,168 unique web links  3 % of them: YouTube
  15. 15. Computing TweetRank• TweetRank computed for each resource ▫ power-law distribution # of resources TweetRank intervals
  16. 16. Experiment YouTube• YouTube – explicit user rating (Y1Rank) ▫ positive/negative vote, normalized• TweetRank vs. Y1Rank TweetRank, Y1Rank ▫ correlation coefficient r = 0.02 YouTube videos
  17. 17. Experiment YouTube 2• YouTube – application-collected user rating (Y2Rank) ▫ „how do you like the video“? ▫ 5 degree scale (1-best, 5-worst), 70 participants• TweetRank vs. Y2Rank ▫ Kendall rank correlation coefficient τ= 0.125• Relative video ranking
  18. 18. Experiment YouTube 2 ▫ 5-tuplets
  19. 19. Experiment YouTube 2 ▫ pairs
  20. 20. Experiment Search• 20,000 resources• indexing: SOLR (Apache Lucene)• searching: resource ranking extended with TweetRank• search results manual comparison ▫ 20 randomly selected queries (e.g.: „apple“) ▫ analyzed top-k results
  21. 21. Experiment Search• findings: ▫ in general, „newer“ resources rank better ▫ ranking does not reflect chronological ordering of resources• suitable for sorting search results within a predefined time window
  22. 22. Conclusions• microblog ▫ perspective source of data, information, knowledge• we proposed novel method for resource ranking leveraging microblog network• an important additional knowledge from the crowd ▫ a form of indirect explicit user rating• great potential for search improvement ▫ reflects temporal characteristics (not linearly) ▫ sorting results within a predefined time window

×