• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena



Paper presentation in PCI 2013.

Paper presentation in PCI 2013.



Total Views
Views on SlideShare
Embed Views



2 Embeds 578

http://www.scoop.it 576
https://twitter.com 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena Presentation Transcript

    • PCI13 Thessaloniki, 19 Sep 2013 Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena Konstantinos Konstantinidis, Symeon Papadopoulos, Yiannis Kompatsiaris
    • Problem #2 Online Social Networks (OSNs) are immense!
    • #3 Motivation • Social Networks – Used to be small (Grevy's zebra dataset) – Easy to organize • Online Social Networks (Twitter) – Have an immense amount of data – Incredibly difficult to organize and extract useful information • Ways to monitor activity in OSNs: – Keywords (Produces too much info, doesn’t work when lexical variations are used) – Newshounds and Persons of Interest (may result in loss of info) • Proposal to leverage: – Time – Communities formulated by users interested in a specific topic – The behavior of these communities in time • Provide the user with info regarding: – Temporal user activity per topic – Influential, Stable and Persistent Communities – Users worth following (possibility of new newshounds) – Content worth monitoring
    • #4 Framework overview Feature Fusion Most influential users and communities + Popular hashtags Persistence Stability Centrality* (PageRank) Community Size Evolution Heatmap Pre-processsing (Information Extraction) Temporal Adjacency Matrix Creation Interaction Data Discretization Community Evolution Detection Community Detection (Louvain) Ranking Process Evolution Detection Process *Ongoing work Twitter Data Mentions and hashtags in time
    • #5 Interaction data discretization • Community evolution study requires timeslot analysis • Tweeting activity provides information on whether or not the users are active as well as if something interesting is happening (has happened) • In this framework, the timeslots are created using the local minima of the overall activity • Peaks and positive slopes inform us that the users are interested in some phenomenon or are involved in a conversation • Minima and negative slopes show us that the users’ interest is diminishing
    • #6 Interaction data discretization example
    • #7 Community detection & evolution 1 1 2 1 1 3 1 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 1 2 1 4 1 1 2 2 2 2 1 1 1 1 8 2 1 1 1 1 1 2 4 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 4 1 2 1 1 1 1 4 1 1 2 1 1 3 1 1 1 1 2 1 1 2 1 1 1 1 1 2 1 5 1 1 2 2 Timeslot (n-2) Timeslot (n-1) Timeslot (n) Timeslot (n+1) Louvain Community Detection Method (V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008 (12pp), 2008.) n-1 n n+1 T1 T5 T4 T3 T2 C6(n-1) C1n C1(n+1)C1(n-1) C2(n-1) C2n C2(n+1) C4(n-1) C4(n+1) C5n C5(n+1) C3n C3(n+1)C3(n-1) Sequential Adjacency Matrices Evolving Communities Timeslots [1,…,n-1,n,n+1,…] Communities C = {C1n,C2n, ...,Ckn} Time-Evolving Communities Ti
    • Louvain Community Detection A popular greedy modularity optimization approach. The two following steps are repeated iteratively until a maximum of modularity is attained and a hierarchy of communities is produced: a) Small community detection by local modularity optimization b) Aggregation of nodes belonging to the same community and creation of a network with the communities as nodes It was selected due to its efficiency regarding: • Speed • Accuracy when dealing with ad-hoc networks • Due to its hierarchical structure it allows to look at communities at different resolutions #8
    • T11 T21 T41 T61 T81 T91 T11 T41 T52 T91 T11 T21 T52 T81 T91 T21 T52 T74 T91 T41 T52 T74 T81 T91 #9 Community evolution detection C11 C21 C31 C41 C51 C61 C71 C81 C91 C12 C22 C32 C42 C52 C62 C72 C82 C92 C13 C23 C33 C43 C53 C63 C73 C83 C93 C14 C24 C34 C44 C54 C64 C74 C84 C94 C15 C25 C35 C45 C55 C65 C75 C85 C95 Comparing the communities from each row to communities from past rows using the Jaccard Index Community similarity according to: • Jaccard Index • Adaptive threshold Adaptive threshold: • Relative to size • Range: [0.7,0.1]
    • #10 Single timeslot graph example Searching through a single timeslot (i.e. approximately 24 hours) can be time consuming. Imagine browsing through months of data! Indexing is clearly a necessity.
    • #11 Evolution features, fusion & ranking Centrality Persistence Stability Community Evolution Dynamic Community Ranking Ranked Communities (All Users) Ranked Users in Communities based on Centrality Content (txt) from timeslots of interest User Interface • Persistence: overall appearances / total number of timeslots • Stability: overall consecutive appearances/ total number of timeslots • PageRank Centrality: a rough estimate of how important a node is by counting the number and quality of links
    • Pros and Cons #12 Dynamic Community and User Ranking • Advantages – Saves user time (manually searching for news is extremely time consuming) – Enables browsing through the most important information – Provides a sense of user importance over time (users worth following for future investigations) • Disadvantages – Community Detection and Community Evolution Detection are slow processes – No semantic ranking (lack of content consideration) renders the framework susceptible to error
    • Framework application example Application on a dataset extracted from the Twitter OSN. • Dataset Characteristics: – Period: 32 days – Keywords: 40 (English and Greek) – Unique users: 857K – Messages: 880K – Edges: 1.07M #13 Greek Global Hashtags Keywords Hashtags Keywords Michaloliakos nazi #Xryshaygh Kasidiaris #nazi far right #GoldenDawn golden dawn #extremeright extreme right #Kasidiaris xrysh aygh #farright Hitler illegal immigrants Swastica
    • Framework application example • Results – Total number of communities: 232K – Final number of communities (excluding self loops & communities<3): 89K – Total evolution steps: 7K – Total evolving communities: 1.1K – Number of Timeslots: 28 #14 • Light Shades signify Small communities • Dark Shades signify Large Communities
    • Framework application example (results) Rank 1 2 3 4 5 Community Id 1,122 13,2044 10,404 18,89 22,2 Timeslot appearance 1,2,3,4,5,6,7,8,9,11, 13 13,15,16,17,18,19,20, 22,23,25 10,11,12,15,16,17,1 8,19 18,19,20,21,22,23,2 5 22,23,24,25,26,27 Size/slot 16,15,8,5,7,28,4,8,9, 8,30 3,4,9,4,6,6,5,4,7,5 6,5,4,4,9,5,3,3 36,137,323,281,64,1 46,139 977,1129,942,946,1 251,2054 Persistence 0.392857 0.357142 0.285714 0.25 0.214285 Stability 0.310344 0.241379 0.241379 0.206896 0.206896 Centrality 0.635401 0.801170 0.817923 0.820052 0.797400 Popular Tags (ranked) Indiebooks, bcn, madrid, andalucía, españa keepmovingforward Israel, ashkenazi, ptsd, 2rrf Jamaat, nazi, shahbag, taliban, sayeedi 1,01,31,4,2 Topic Spanish book on Hitler: El Legado Pakistani person named Nazi Israeli anti-nazi posts Associating Jamaat (Bangladesh) to nazi Videogame #15
    • Framework application example (Greek interest) Group of interconnected foreign and Greek communities surrounded by an abundance of groups and single users. #16 A Greek community commenting on a poll that presented the GGD party as the most popular amongst unemployed citizens
    • Future Work • Enhance community similarity search (speedup) • Framework enrichment by incorporating retweets as a feature • Introduce to journalists for constructive criticism #17 Mention, Retweet & Timestamp Information Extraction Community Detection Community Evolution Detection Community Size Total # of Mentions Degree of mentions Persistence Stability Centrality Could they be used as a Ground Truth Set? Provide a base line Fusion Most influential users and communities + Popular hashtags Query Correction & Improvement via Relevance Feedback? Twitter Data Retweets in time
    • Conclusions • A framework for extracting information from evolving communities in dynamic social networks. • Significant information can be retrieved by studying the evolution of communities of OSNs (e.g. Twitter). • Existence of a large number of dynamic communities with various evolutionary characteristics. #18
    • Thank you! Questions? #19 Data and code are available at: https://github.com/socialsensor/community-evolution-analysis/