• Like
  • Save
Leveraging microblogs for resource ranking
Upcoming SlideShare
Loading in...5
×
 

Leveraging microblogs for resource ranking

on

  • 1,733 views

 

Statistics

Views

Total Views
1,733
Views on SlideShare
368
Embed Views
1,365

Actions

Likes
1
Downloads
1
Comments
0

4 Embeds 1,365

http://blog.tomaj.sk 723
http://www.tomaj.sk 638
http://tomasmajer.tumblr.com 3
https://www.tomaj.sk 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Leveraging microblogs for resource ranking Leveraging microblogs for resource ranking Presentation Transcript

    • Leveraging Microblogs for Resource RankingTomáš Majer, Marián Šimkotomasmajer@gmail.com, simko@fiit.stuba.sk23.01.2012, SOFSEM 2012Institute of Informatics and Software EngineeringFaculty of Informatics and Information TechnologiesSlovak University of Technology in Bratislava
    • Microblog• a brief form of a blog with limited size of a post, typically 140 characters of length• sharing experiences, opinions, comments, links• Twitter, Identi.ca, Jaiku, (Google+, Facebook)• Twitter: ▫ over 300 millions of users (100 mil. at the time of writing paper) ▫ over 300 millions of tweets/day (100 mil. --||--)
    • Microblog – why important?• „Read-Write Web“ vision• user-generated data ▫ unbiased ▫ not moderated• huge amount of data – text, links• valuable source of information ▫ data mining
    • Microblog – why important?• „Read-Write Web“ vision• user-generated data ▫ unbiased ▫ not moderated 22 % of posts• huge amount of data – text, links• valuable source of information ▫ data mining
    • State-of-the-Art• topic identification/keyword extraction (Ramage et al., 2010)• opinion mining (Pendey, Iyer, 2009)• twitter search, tweets ranking (Teevan et al., 2011)• user ranking (Gayo-Avello, 2010)• user modeling (Abel et al., 2011)•…• challenge: resource ranking
    • Twitter Graph T1 R1 U1 T2 R2 T3 U2 R3 T4 T5 R4 U3 T6 R5 T7 Users Tweets Resources
    • Twitter Graph T1 R1 posts U1 T2 contains R2 T3 U2 R3 T4 T5 R4 U3 re-posts follows T6 R5 T7 Users Tweets Resources
    • Resource Ranking: Overview• TweetRank ▫ ranking of a resource based on Twitter graph analysis• Computation 1. UserRank 2. TweetRelevance 3. TweetRank
    • Resource Ranking: Overview• TweetRank ▫ ranking of a resource based on Twitter graph analysis• Computation T1 R1 1. UserRank U1 T2 R2 T3 2. TweetRelevance U2 R3 T4 T5 R4 3. TweetRank U3 T6 R5 T7
    • UserRank 1+γ (u)UserRank ( f ) UserRank (u) = ∑ f ∈ followers(u) ∣ followers( f )∣
    • UserRank ∣ followers(u)∣ γ(u)= ∣tweets(u)∣ 1+γ (u)UserRank ( f ) UserRank (u) = ∑ f ∈ followers(u) ∣ followers( f )∣
    • TweetRelevance, TweetRank UserRank ( Author (t)) TweetRelevance (t) = ∣tweets( Author (t ))∣
    • TweetRelevance, TweetRank UserRank ( Author (t)) TweetRelevance (t) = = TR(t) ∣tweets( Author (t ))∣ TweetRank (r) = ∑ t ∈tweets(r ) ( TR(t)+ ∑ rt ∈retweets(t) TR(t)TR(rt) )
    • Evaluation1. TweetRank ranking vs. explicit user ranking (YouTube )2. Search results ranking study (Search)• Data: ▫ 1,997,466 tweets from 367,824 users  85 % in English ▫ 1,468,365,182 connections between 40,103,281 users ▫ 1,150,168 unique web links  3 % of them: YouTube
    • Computing TweetRank• TweetRank computed for each resource ▫ power-law distribution # of resources TweetRank intervals
    • Experiment YouTube• YouTube – explicit user rating (Y1Rank) ▫ positive/negative vote, normalized• TweetRank vs. Y1Rank TweetRank, Y1Rank ▫ correlation coefficient r = 0.02 YouTube videos
    • Experiment YouTube 2• YouTube – application-collected user rating (Y2Rank) ▫ „how do you like the video“? ▫ 5 degree scale (1-best, 5-worst), 70 participants• TweetRank vs. Y2Rank ▫ Kendall rank correlation coefficient τ= 0.125• Relative video ranking
    • Experiment YouTube 2 ▫ 5-tuplets
    • Experiment YouTube 2 ▫ pairs
    • Experiment Search• 20,000 resources• indexing: SOLR (Apache Lucene)• searching: resource ranking extended with TweetRank• search results manual comparison ▫ 20 randomly selected queries (e.g.: „apple“) ▫ analyzed top-k results
    • Experiment Search• findings: ▫ in general, „newer“ resources rank better ▫ ranking does not reflect chronological ordering of resources• suitable for sorting search results within a predefined time window
    • Conclusions• microblog ▫ perspective source of data, information, knowledge• we proposed novel method for resource ranking leveraging microblog network• an important additional knowledge from the crowd ▫ a form of indirect explicit user rating• great potential for search improvement ▫ reflects temporal characteristics (not linearly) ▫ sorting results within a predefined time window