Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
AComparativeStudy of
HITS vs PageRank
Algorithms forTwitter
UsersAnalysis
Ong Kok Chien , Poo Kuan Hoong and Chiung Ching ...
Outline
Introduction
Problem Statement
Objective
Methods
Results
Summary
2
Introduction
 Graph analysis algorithms on Social Network.
 Identify worth noticingTwitter Users for a
specific bag of t...
Twitter
 Maximum 140 characters microblogging site.
 “ATweet is an expression of a moment or idea. It can
contain text, ...
Problem
Statement
Why is it important to rank users?
 By ranking users, we aims to differentiate
relevant important infor...
Objectives
 To rankTwitter users using HITS and Page Rank.
 To identify the direction of edges for the graph.
6
Methods
 Link-based ranking algorithms (HITS &
PageRank)
 Twitter Users as Nodes.
 Retweet relationships as Edges.
 Di...
Example
 PageRank (PR)
 E.g.: BackLinks inWebsites - Referring back to OriginalContent.
- Sergey Brin & Larry Page (1998...
Example
 Hyperlink-InducedTopic Search (HITS)
 Hubs : Catalog for relevant contents.
 Authorities : Great contents itse...
Example
Minister ofYouth & Sports
Khairykj
shatyrah2 AyenSanji
10 https://twitter.com/Khairykj/status/410964119521460224
Architecture
Twitter
Streaming API
Configure
Keywords
1 JSON raw
data
2
3 HiveQL 4 UnixScript
11
Keywords
 TeamMsia
 every1connects
 tmrewards
 yellowpages_my
 tmsmebiz
 MaxisComms
 MaxisListens
 DiGi_Telco
 Di...
Basic
Statistics
Raw Dataset
 TotalTweets : 230,166
 Total Unique Users : 121,461 ( screen_name )
 TotalVerified Users ...
Results
PageRank Ranking HITS
TeamMsia 1 TeamMsia
ManOlimpik 2 Khairykj
Khairykj 3 ManOlimpik
OKS_HARIMAUMUDA 4 OKS_HARIMA...
Results
User Screen Name Verified Follower
Counts
TeamMsia False 98,469
ManOlimpik False 1,661
Khairykj True 432,259
OKS_H...
Results
“Football at the 2013 Southeast Asian Games”
9th Dec 2013 -> 29th Dec 2013
16
Results
Closer look of how TeamMsia involved in the conversation.
17
Summary
 The use of Link-based ranking algorithms such
as Page Rank and HITS does promise us some
insights about concerni...
FutureWork
 Additional relationships to be considered.
(Conversational Reply, Pure Mentions)
 Further validation of addi...
Question?
Contact Me
Ong Kok Chien
ahchienong@gmail.com
http://qrs.ly/2t49r7l
vCard Download link
Thanks…
20
Upcoming SlideShare
Loading in …5
×

A Comparative Study of HITS vs PageRank Algorithms for Twitter Users Analysis

365 views

Published on

Social Networks such as Facebook, Twitter, Google+
and LinkedIn have millions of users. These networks are constantly
evolving and it is a good source of information, both
explicitly and implicitly. The analysis of Social Network mainly
focuses on the aspect of social networking with an emphasis
on mapping relationships, patterns of interaction between user
and content information. One of the common research topics
focuses on the centrality measures where useful information of
the connected people in the social network is represented in
a graph. In this paper, we employed two link-based ranking
algorithms to analyze the ranking of the users: HITS (Hyperlink-
Induced Topic Search) and PageRank. We constructed Twitter
user retweet-relationship graph using 21 days worth of data.
Lastly, we compared the ranking sequence of the users in addition
to their followers count against the average and also whether
they are verified Twitter accounts. From the results obtained,
both HITS and PageRank showed a similar trend, and more
importantly highlighted the importance of the direction of the
edges in this work.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

A Comparative Study of HITS vs PageRank Algorithms for Twitter Users Analysis

  1. 1. AComparativeStudy of HITS vs PageRank Algorithms forTwitter UsersAnalysis Ong Kok Chien , Poo Kuan Hoong and Chiung Ching Ho Faculty of Computing Informatics, Multimedia University Cyberjaya.1
  2. 2. Outline Introduction Problem Statement Objective Methods Results Summary 2
  3. 3. Introduction  Graph analysis algorithms on Social Network.  Identify worth noticingTwitter Users for a specific bag of topics. 3
  4. 4. Twitter  Maximum 140 characters microblogging site.  “ATweet is an expression of a moment or idea. It can contain text, photos, and videos. Millions ofTweets are shared in real time, every day.” Reply Retweet Favorite Hashtags https://about.twitter.com/what-is-twitter/story-of-a-tweet .com 4
  5. 5. Problem Statement Why is it important to rank users?  By ranking users, we aims to differentiate relevant important information sources from those provided by spam accounts. 5
  6. 6. Objectives  To rankTwitter users using HITS and Page Rank.  To identify the direction of edges for the graph. 6
  7. 7. Methods  Link-based ranking algorithms (HITS & PageRank)  Twitter Users as Nodes.  Retweet relationships as Edges.  Direction of graph. 7
  8. 8. Example  PageRank (PR)  E.g.: BackLinks inWebsites - Referring back to OriginalContent. - Sergey Brin & Larry Page (1998). The anatomy of a large-scale hypertextual Web search engine. Image extracted from Wikipedia 8
  9. 9. Example  Hyperlink-InducedTopic Search (HITS)  Hubs : Catalog for relevant contents.  Authorities : Great contents itself. - Jon Kleinberg (1999). Authoritative sources in a hyperlinked environment. Image extracted from cornell.edu 9
  10. 10. Example Minister ofYouth & Sports Khairykj shatyrah2 AyenSanji 10 https://twitter.com/Khairykj/status/410964119521460224
  11. 11. Architecture Twitter Streaming API Configure Keywords 1 JSON raw data 2 3 HiveQL 4 UnixScript 11
  12. 12. Keywords  TeamMsia  every1connects  tmrewards  yellowpages_my  tmsmebiz  MaxisComms  MaxisListens  DiGi_Telco  DiGi_Youths  helloUMobile  HyppTV  Streamyx  UMobile  Digi  Maxis  Yes4G  Celcom  xpaxsays  TMCorp  TMConnects 12
  13. 13. Basic Statistics Raw Dataset  TotalTweets : 230,166  Total Unique Users : 121,461 ( screen_name )  TotalVerified Users : 113 ( verified )  Average Followers Count : 983 ( followers_count ) Experiment Dataset ( Retweets )  No. ofTweets: 56,727  No. of Unique users: 50,636 9th Dec 2013 -> 29th Dec 2013 13
  14. 14. Results PageRank Ranking HITS TeamMsia 1 TeamMsia ManOlimpik 2 Khairykj Khairykj 3 ManOlimpik OKS_HARIMAUMUDA 4 OKS_HARIMAUMUDA BrooksBeau 5 FIH_Hockey TMCorp 6 BB_Johor LawakLegend 7 AtletMalaysia WTFSG 8 TMCorp JanganPanas 9 Faif_D FIH_Hockey 10 BBST15  60%Top10 were the same bag of users. (70% forTOP20) 14
  15. 15. Results User Screen Name Verified Follower Counts TeamMsia False 98,469 ManOlimpik False 1,661 Khairykj True 432,259 OKS_HARIMAUMUDA False 35,058 BrooksBeau False 1,226,629 TMCorp False 13,767 LawakLegend False 49,593 WTFSG False 469,909 JanganPanas False 14,642 FIH_Hockey False 24,628 No. of followers ofTwitter user doesn’t directly affect the No. of retweets. Not very important to have a verified account to get retweeted.15
  16. 16. Results “Football at the 2013 Southeast Asian Games” 9th Dec 2013 -> 29th Dec 2013 16
  17. 17. Results Closer look of how TeamMsia involved in the conversation. 17
  18. 18. Summary  The use of Link-based ranking algorithms such as Page Rank and HITS does promise us some insights about concerningTwitter Users and their significance.  These insights can be useful for Customer Care / Churn Management 18
  19. 19. FutureWork  Additional relationships to be considered. (Conversational Reply, Pure Mentions)  Further validation of additional attributes. (Verified,Tweet Count, Followers Count, Following Count etc. )  Extend deeper intoTweet level analysis. 19
  20. 20. Question? Contact Me Ong Kok Chien ahchienong@gmail.com http://qrs.ly/2t49r7l vCard Download link Thanks… 20

×