PCI13 Thessaloniki, 19 Sep 2013
Community Structure, Interaction and Evolution
Analysis of Online Social Networks around R...
Problem
#2
Online Social Networks (OSNs) are immense!
#3
Motivation
• Social Networks
– Used to be small (Grevy's zebra dataset)
– Easy to organize
• Online Social Networks (Tw...
#4
Framework overview
Feature
Fusion
Most influential
users and
communities
+
Popular
hashtags
Persistence
Stability
Centr...
#5
Interaction data discretization
• Community evolution study requires timeslot analysis
• Tweeting activity provides inf...
#6
Interaction data discretization example
#7
Community detection & evolution
1
1 2 1 1 3
1 2 1 1 1
2
2 2 2
1 1 1 1
1 1
1 1
2 1
2
1 4 1
1 2
2 2
2
1 1 1
1
8 2 1 1
1 1...
Louvain Community Detection
A popular greedy modularity optimization approach.
The two following steps are repeated iterat...
T11 T21 T41 T61 T81 T91
T11 T41 T52 T91
T11 T21 T52 T81 T91
T21 T52 T74 T91
T41 T52 T74 T81 T91
#9
Community evolution det...
#10
Single timeslot graph example
Searching through a single
timeslot (i.e. approximately 24
hours) can be time consuming....
#11
Evolution features, fusion & ranking
Centrality
Persistence
Stability
Community
Evolution
Dynamic
Community
Ranking
Ra...
Pros and Cons
#12
Dynamic Community and User Ranking
• Advantages
– Saves user time (manually searching for news is extrem...
Framework application example
Application on a dataset extracted from the Twitter OSN.
• Dataset Characteristics:
– Period...
Framework application example
• Results
– Total number of communities:
232K
– Final number of communities
(excluding self ...
Framework application example (results)
Rank 1 2 3 4 5
Community Id 1,122 13,2044 10,404 18,89 22,2
Timeslot
appearance
1,...
Framework application example (Greek interest)
Group of interconnected foreign and
Greek communities surrounded by an
abun...
Future Work
• Enhance community
similarity search
(speedup)
• Framework
enrichment by
incorporating retweets
as a feature
...
Conclusions
• A framework for extracting information from
evolving communities in dynamic social networks.
• Significant i...
Thank you!
Questions?
#19
Data and code are available at:
https://github.com/socialsensor/community-evolution-analysis/
Upcoming SlideShare
Loading in …5
×

Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

2,368 views

Published on

Paper presentation in PCI 2013.
Abstract:

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,368
On SlideShare
0
From Embeds
0
Number of Embeds
611
Actions
Shares
0
Downloads
23
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

  1. 1. PCI13 Thessaloniki, 19 Sep 2013 Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena Konstantinos Konstantinidis, Symeon Papadopoulos, Yiannis Kompatsiaris
  2. 2. Problem #2 Online Social Networks (OSNs) are immense!
  3. 3. #3 Motivation • Social Networks – Used to be small (Grevy's zebra dataset) – Easy to organize • Online Social Networks (Twitter) – Have an immense amount of data – Incredibly difficult to organize and extract useful information • Ways to monitor activity in OSNs: – Keywords (Produces too much info, doesn’t work when lexical variations are used) – Newshounds and Persons of Interest (may result in loss of info) • Proposal to leverage: – Time – Communities formulated by users interested in a specific topic – The behavior of these communities in time • Provide the user with info regarding: – Temporal user activity per topic – Influential, Stable and Persistent Communities – Users worth following (possibility of new newshounds) – Content worth monitoring
  4. 4. #4 Framework overview Feature Fusion Most influential users and communities + Popular hashtags Persistence Stability Centrality* (PageRank) Community Size Evolution Heatmap Pre-processsing (Information Extraction) Temporal Adjacency Matrix Creation Interaction Data Discretization Community Evolution Detection Community Detection (Louvain) Ranking Process Evolution Detection Process *Ongoing work Twitter Data Mentions and hashtags in time
  5. 5. #5 Interaction data discretization • Community evolution study requires timeslot analysis • Tweeting activity provides information on whether or not the users are active as well as if something interesting is happening (has happened) • In this framework, the timeslots are created using the local minima of the overall activity • Peaks and positive slopes inform us that the users are interested in some phenomenon or are involved in a conversation • Minima and negative slopes show us that the users’ interest is diminishing
  6. 6. #6 Interaction data discretization example
  7. 7. #7 Community detection & evolution 1 1 2 1 1 3 1 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 1 2 1 4 1 1 2 2 2 2 1 1 1 1 8 2 1 1 1 1 1 2 4 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 4 1 2 1 1 1 1 4 1 1 2 1 1 3 1 1 1 1 2 1 1 2 1 1 1 1 1 2 1 5 1 1 2 2 Timeslot (n-2) Timeslot (n-1) Timeslot (n) Timeslot (n+1) Louvain Community Detection Method (V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008 (12pp), 2008.) n-1 n n+1 T1 T5 T4 T3 T2 C6(n-1) C1n C1(n+1)C1(n-1) C2(n-1) C2n C2(n+1) C4(n-1) C4(n+1) C5n C5(n+1) C3n C3(n+1)C3(n-1) Sequential Adjacency Matrices Evolving Communities Timeslots [1,…,n-1,n,n+1,…] Communities C = {C1n,C2n, ...,Ckn} Time-Evolving Communities Ti
  8. 8. Louvain Community Detection A popular greedy modularity optimization approach. The two following steps are repeated iteratively until a maximum of modularity is attained and a hierarchy of communities is produced: a) Small community detection by local modularity optimization b) Aggregation of nodes belonging to the same community and creation of a network with the communities as nodes It was selected due to its efficiency regarding: • Speed • Accuracy when dealing with ad-hoc networks • Due to its hierarchical structure it allows to look at communities at different resolutions #8
  9. 9. T11 T21 T41 T61 T81 T91 T11 T41 T52 T91 T11 T21 T52 T81 T91 T21 T52 T74 T91 T41 T52 T74 T81 T91 #9 Community evolution detection C11 C21 C31 C41 C51 C61 C71 C81 C91 C12 C22 C32 C42 C52 C62 C72 C82 C92 C13 C23 C33 C43 C53 C63 C73 C83 C93 C14 C24 C34 C44 C54 C64 C74 C84 C94 C15 C25 C35 C45 C55 C65 C75 C85 C95 Comparing the communities from each row to communities from past rows using the Jaccard Index Community similarity according to: • Jaccard Index • Adaptive threshold Adaptive threshold: • Relative to size • Range: [0.7,0.1]
  10. 10. #10 Single timeslot graph example Searching through a single timeslot (i.e. approximately 24 hours) can be time consuming. Imagine browsing through months of data! Indexing is clearly a necessity.
  11. 11. #11 Evolution features, fusion & ranking Centrality Persistence Stability Community Evolution Dynamic Community Ranking Ranked Communities (All Users) Ranked Users in Communities based on Centrality Content (txt) from timeslots of interest User Interface • Persistence: overall appearances / total number of timeslots • Stability: overall consecutive appearances/ total number of timeslots • PageRank Centrality: a rough estimate of how important a node is by counting the number and quality of links
  12. 12. Pros and Cons #12 Dynamic Community and User Ranking • Advantages – Saves user time (manually searching for news is extremely time consuming) – Enables browsing through the most important information – Provides a sense of user importance over time (users worth following for future investigations) • Disadvantages – Community Detection and Community Evolution Detection are slow processes – No semantic ranking (lack of content consideration) renders the framework susceptible to error
  13. 13. Framework application example Application on a dataset extracted from the Twitter OSN. • Dataset Characteristics: – Period: 32 days – Keywords: 40 (English and Greek) – Unique users: 857K – Messages: 880K – Edges: 1.07M #13 Greek Global Hashtags Keywords Hashtags Keywords Michaloliakos nazi #Xryshaygh Kasidiaris #nazi far right #GoldenDawn golden dawn #extremeright extreme right #Kasidiaris xrysh aygh #farright Hitler illegal immigrants Swastica
  14. 14. Framework application example • Results – Total number of communities: 232K – Final number of communities (excluding self loops & communities<3): 89K – Total evolution steps: 7K – Total evolving communities: 1.1K – Number of Timeslots: 28 #14 • Light Shades signify Small communities • Dark Shades signify Large Communities
  15. 15. Framework application example (results) Rank 1 2 3 4 5 Community Id 1,122 13,2044 10,404 18,89 22,2 Timeslot appearance 1,2,3,4,5,6,7,8,9,11, 13 13,15,16,17,18,19,20, 22,23,25 10,11,12,15,16,17,1 8,19 18,19,20,21,22,23,2 5 22,23,24,25,26,27 Size/slot 16,15,8,5,7,28,4,8,9, 8,30 3,4,9,4,6,6,5,4,7,5 6,5,4,4,9,5,3,3 36,137,323,281,64,1 46,139 977,1129,942,946,1 251,2054 Persistence 0.392857 0.357142 0.285714 0.25 0.214285 Stability 0.310344 0.241379 0.241379 0.206896 0.206896 Centrality 0.635401 0.801170 0.817923 0.820052 0.797400 Popular Tags (ranked) Indiebooks, bcn, madrid, andalucía, españa keepmovingforward Israel, ashkenazi, ptsd, 2rrf Jamaat, nazi, shahbag, taliban, sayeedi 1,01,31,4,2 Topic Spanish book on Hitler: El Legado Pakistani person named Nazi Israeli anti-nazi posts Associating Jamaat (Bangladesh) to nazi Videogame #15
  16. 16. Framework application example (Greek interest) Group of interconnected foreign and Greek communities surrounded by an abundance of groups and single users. #16 A Greek community commenting on a poll that presented the GGD party as the most popular amongst unemployed citizens
  17. 17. Future Work • Enhance community similarity search (speedup) • Framework enrichment by incorporating retweets as a feature • Introduce to journalists for constructive criticism #17 Mention, Retweet & Timestamp Information Extraction Community Detection Community Evolution Detection Community Size Total # of Mentions Degree of mentions Persistence Stability Centrality Could they be used as a Ground Truth Set? Provide a base line Fusion Most influential users and communities + Popular hashtags Query Correction & Improvement via Relevance Feedback? Twitter Data Retweets in time
  18. 18. Conclusions • A framework for extracting information from evolving communities in dynamic social networks. • Significant information can be retrieved by studying the evolution of communities of OSNs (e.g. Twitter). • Existence of a large number of dynamic communities with various evolutionary characteristics. #18
  19. 19. Thank you! Questions? #19 Data and code are available at: https://github.com/socialsensor/community-evolution-analysis/

×