• Save
Overcoming Spammers in Twitter – A Tale of Five Algorithms
Upcoming SlideShare
Loading in...5
×
 

Overcoming Spammers in Twitter – A Tale of Five Algorithms

on

  • 5,394 views

Micro-blogging services such as Twitter can develop into valuable sources of up-to-date information provided the spam problem is overcome. Thus, separating the most relevant users from the spammers is ...

Micro-blogging services such as Twitter can develop into valuable sources of up-to-date information provided the spam problem is overcome. Thus, separating the most relevant users from the spammers is a highly pertinent question for which graph centrality methods can provide an answer. In this paper we examine the vulnerability of five different algorithms to linking malpractice in Twitter and propose a first step towards "desensitizing" them against such abusive behavior.

Statistics

Views

Total Views
5,394
Views on SlideShare
5,211
Embed Views
183

Actions

Likes
14
Downloads
0
Comments
1

6 Embeds 183

http://webnik.dk 107
http://thenoisychannel.com 40
http://www.slideshare.net 31
http://twittertim.es 2
http://cmc.l50sw.com 2
http://www.iweb34.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Overcoming Spammers in Twitter – A Tale of Five Algorithms Overcoming Spammers in Twitter – A Tale of Five Algorithms Presentation Transcript

    • CERI2010 I Congreso Español de Recuperación de Información Daniel Gayo Avello @pfcdgayo, David J. Brenes @brenes
    • 40% of Twitter conversation is pointless babble Pear Analytics (2009)
    • 40% of Twitter conversation is pointless babble Pear Analytics (2009) “Who would have thought that the status message would be one of the hottest features on the Web? Jansen, Chowdury & Cook (2010)
    • 40% of Twitter conversation is pointless babble Pear Analytics (2009) “Who would have thought that the status message would be one of the hottest features on the Web? Jansen, Chowdury & Cook (2010) “Micro-blogging services can develop into valuable sources of up-to-date information provided the spam problem is overcome. Us (Now)
    • 40% of Twitter conversation is pointless babble Pear Analytics (2009) “Who would have thought that the status message would be how can all of this one of the hottest features on the Web? be reconcilable? Jansen, Chowdury & Cook (2010) “Micro-blogging services can develop into valuable sources of up-to-date information provided the spam problem is overcome. Us (Now)
    • 40% of Twitter conversation is pointless babble Pear Analytics (2009)
    • 40% of Twitter conversation is pointless babble Pear Analytics (2009)
    • 40% of Twitter conversation is pointless babble Pear Analytics (2009) ok, it may be true, but. .
    • 60% not 40% of Twitter conversation is pointless babble Pear Analytics (2009) troomereforis hope h stil huuuuuuuuge number of users info on current events also valuable contents …
    • “Micro-blogging services can develop into valuable sources of up-to-date information provided the spam problem is overcome. Us (Now) huuuuuuuuge number of users info on current events also valuable contents …
    • “Micro-blogging services can develop into valuable sources of up-to-date information provided the spam problem is overcome. Us (Now)
    • “Micro-blogging services can develop into valuable sources of up-to-date information provided the spam problem is overcome. Us (Now) Aggregated analysis topic detection and tracking opinion mining …
    • “Micro-blogging services can develop into valuable sources of up-to-date information provided the spam problem is overcome. Us (Now) Aggregated analysis topic detection and tracking opinion mining … finding “authoritative” sources/users
    • “Micro-blogging services can develop into valuable sources of up-to-date information provided the spam problem is overcome. Us (Now) Aggregated analysis topic detection and tracking opinion mining … finding “authoritative” sources/users
    • finding “authoritative” sources/users
    • black magic, secret sauce approach finding “authoritative” sources/users
    • black magic, secret sauce approach finding “authoritative” sources/users algorithmic methods based on the users’ graph
    • black magic, secret sauce approach finding “authoritative” sources/users algorithmic methods based on the users’ graph
    • Warning! Slight detour…
    • El Listo – Chulerías 2.0 http://listocomics.com/414-chulerias-2-0/
    • El Listo – Chulerías 2.0 http://listocomics.com/414-chulerias-2-0/
    • El Listo – Chulerías 2.0 http://listocomics.com/414-chulerias-2-0/
    • http://listocomics.com/394- piramide-del-glamour-twittero/
    • The Brads – Twitter Outage http://bradcolbow.com/archive/view/the_brads_twitter_outage/
    • The Brads – Twitter Outage http://bradcolbow.com/archive/view/the_brads_twitter_outage/
    • End of detour. What’s the moral? People want lots of followers (?!) The follower/followee ratio “matters” more than raw number of followers. Following people is a simple way to get followers. In other words, users are “abusing” social networks to grab “prestige”.
    • Research questions? Vulnerability of rank prestige algorithms to link spamming in social graphs. Feasibility of “desensitization”.
    • 5 Rank prestige algorithms PageRank HITS NodeRanking TunkRank TwitterRank
    • TunkRank Originally proposed by Daniel Tunkelang Rather similar to PageRank
    • Desensitizing against link spamming in social graphs follower/followee ratio can be interpreted as the user’s value regarding the introduction of new information from the outside world into the Twitter global ecosystem. Reciprocal links are “counterfit currency” to increase followers count.
    • Desensitizing against link spamming in social graphs follower/followee ratio can be interpreted as the user’s value regarding the introduction of new information from the outside world into the Twitter global ecosystem. Reciprocal links are “counterfit currency” to increase followers count.
    • Desensitizing against link spamming in social graphs
    • Desensitizing against link spamming in social graphs HotSEOGuru 12,000 11,800 following followers ratio=11,800/12,000=0.983
    • Desensitizing against link spamming in social graphs HotSEOGuru 12,000 11,800 following followers ratio=11,800/12,000=0.983 if 11,650 were reciprocal links ratio_discounted=150/350=0.429
    • Desensitizing against link spamming in social graphs stevebaker 225 2,227 following followers ratio=2,227/225=9.898
    • Desensitizing against link spamming in social graphs stevebaker 225 2,227 following followers ratio=2,227/225=9.898 140 are reciprocal links ratio_discounted=2,087/85=24.553
    • Desensitizing against link spamming in social graphs HotSEOGuru stevebaker 12,000 11,800 225 2,227 following followers following followers she would prefer 0.983 but he would prefer 24.553 but she obtains 0.429 he obtains 9.898
    • Desensitizing against link spamming in social graphs for this study it has been applied as an extra weight to PageRank and also to prune the graph before applying PageRank
    • How can we measure performance in this scenario?
    • How can we measure performance in this scenario?
    • How can we measure performance in this scenario?
    • How can we measure performance in this scenario?
    • How can we measure performance in this scenario? …
    • How can we measure performance in this scenario? The lower the ranking spammers reach, the better a method is.
    • A Twitter dataset is needed!
    • A Twitter dataset is needed!
    • A Twitter dataset is needed! January to August 2009 Tweets: 27.9M English entries by 4.98M users Graph: 1.8M users, 134M links
    • What about the spammers?
    • What about the spammers?
    • What about the spammers? simple method based on URL presence and keyword matching using this method 9,369 users marked as spammers another 22,290 users marked as aggressive marketers (similar bios)
    • Results
    • Results HITS and TwitterRank underperform PageRank :(
    • Results HITS and TwitterRank underperform PageRank :( NodeRanking, “pruned” PageRank and PageRank very similar :/
    • Results HITS and TwitterRank underperform PageRank :( NodeRanking, “pruned” PageRank and PageRank very similar :/ “discounted” PageRank unconclusive (elitism & “giant shoulders” effect) :S
    • Results HITS and TwitterRank underperform PageRank :( NodeRanking, “pruned” PageRank and PageRank very similar :/ “discounted” PageRank unconclusive (elitism & “giant shoulders” effect) :S TunkRank better choice :)
    • Conclusions Rank prestige can be “gamed” in social networks. Ranking in itself shouldn’t be the point. TunkRank better choice to rank social graphs. Reciprocal links can help to find valuable users. An extended report is available at http://arxiv.org/abs/1004.0816