SlideShare a Scribd company logo
1 of 18
QIN GAO, QU QU, XUHUI ZHANG
  INSTITUTE OF HUMAN FACTORS & ERGONOMICS
        DEPT. OF INDUSTRIAL ENGINEERING
       TSINGHUA UNIVERSITY, Beijing, China




MINING SOCIAL RELATIONSHIPS IN
     MICRO-BLOGGING SYSTEMS



                             HCI International 2011
                             9-14 July, Orlando, USA
CONTENT

• Motivation
• A graph-based approach to social relationship
  mining in micro-blogging systems
• Preliminary validation
• Future work




           Mining Social Relationships in Micro-blogging systems   2
WHY MINING SOCIAL RELATIONSHIPS IN
    MICRO-BLOGGING SYSTEMS?

          Potential                                             Challenge
• High popularity of micro-                     • Most available methods
  blogging systems                                emphasize structural
• Explicit indication of                          analysis of the network
  information dissemination                     • Many do not take
  directions by “following”                       information flow directions
  relationships                                   into analysis
• Networks in micro-blogging                    • Existing methods often have
  systems overlap heavily with                    limitations in analyzing huge
  social networks in real life                    volume of data sets
  (Java, et al., 2007)


                Mining Social Relationships in Micro-blogging systems         3
RELATED WORK

• Analysis of online social networks
  • Most influential method: SNA
    •   Useful measures: centrality, betweenness
    •   Used for structural analysis of blog and email networks
    •   Useful for structural analysis of the network
    •   Difficult to evaluate information dissemination between users
    •   Time consuming
  • Other methods: Matsumura, 2003; Kazienko & Musial, 2008
• Graph theory
  • Useful for modeling complex networks
    • E.g., Protein structure by Sadumrala, 1998
  • Many methods for mining frequent subgraph patterns
  • Use of graph theory in social network analysis (e.g., Cai, 2005)


                   Mining Social Relationships in Micro-blogging systems   4
A GENERAL
INFORMATION DIFFUSION MODEL




     Mining Social Relationships in Micro-blogging systems   5
1. USER GROUPING BY INFORMATION
   DISSEMINATION RELATIONSHIPS
• Definition: A user group is a set of nodes within which any two
  nodes can transfer information bi-directionally, and any user in
  a group cannot transfer information bi-directionally with any
  other user outside of the group
• Developed the definition based on maximum strongly
  connected components
  • Given a G = (V, G) where V(G) is a finite set of nodes, E(G) is a
    finite set of edges (each edges have its endpoints in V(G)
  • For ∀a∈V,∀b∈V, if there is at least one path from a to b, and at
    least path from b to a, then G is a bi-directionally strongly
    connected component
  • G is a maximum bi-directionally strongly connected component
    (MBSCC) if G would not be a bi-directionally strongly connected
    component when any node or edge were added to G

                Mining Social Relationships in Micro-blogging systems   6
2. GROUP RANKING BY CONTRIBUTIONS IN
     INFORMATION DISSEMINATION

• Each group (MBSCC) is denoted as a node
• The network of a micro-blogging system is then
  condensed into a directed acyclic graph G’
  • Each node of G’ is a MBSCC
• Topological sorting algorithm
  • The node without any information outflow is deleted from
    G´and put at the end of the ranking list.
  • This step is repeated till all nodes are deleted.




               Mining Social Relationships in Micro-blogging systems   7
2. GROUP RANKING BY CONTRIBUTIONS IN
     INFORMATION DISSEMINATION

• Sorting algorithm
      P<Set<Node>> Empty list that will contain sets of
      nodes in sequence
      N Set of nodes with no outside link

      Insert all nodes which have no outside link into N
      while N is non-empty do
        insert N into P
        for each node n in N
          remove n
            for each node m with a link e from n to m do
          remove e


• In the final ranking list P, groups are listed in a
  descending order with regard to their contribution
  to information dissemination in the network

                 Mining Social Relationships in Micro-blogging systems   8
3. USER INFLUENCE EVALUATION BY THE
 PROBABILITY OF INFORMATION DISSEMINATION

• Term definition
  • Path distance: the number of nodes from the source node a to the target
    node b along a path
  • Distance between nodes: smallest path distance between the source node
    a and the target b
  • Width between nodes: the number of different paths connecting the
    source node a and the target node b
• Assuming the probability that any user retweets a certain received
  information is P, the probability that the target user can receive this
  information from the source user is:
                                p= Ʃ i∈N P di
  • N: the set of different paths from the source to the target
  • di: the distance of path I
• The shorter the distance and the wider the width of paths, the more
  probably information is transmitted.

                    Mining Social Relationships in Micro-blogging systems   9
3. USER INFLUENCE EVALUATION BY THE
 PROBABILITY OF INFORMATION DISSEMINATION

• The shortest path from the information source to the
  target makes the greatest contribution.
• According to observation, it is reasonable to
  assume P < .5
• To simplify the problem, we can set a threshold T
  • If di > T, pi (the probability that information transmits via path
    i)  0




                Mining Social Relationships in Micro-blogging systems   10
3. USER INFLUENCE EVALUATION BY THE
 PROBABILITY OF INFORMATION DISSEMINATION

• QIndex Algorithm (inspired by Dijkstra)
  • For a G = (V, E), the information source node is labeled as vs (vs ∈
    V); the current node is denoted as nc; distance value and width
    value is denoted as d, w.
  1. Initializing: ds =0, ws = 1; d = infinity and w = 0 for all the other
      nodes; mark all nodes unvisited; set vs as the current node (nc)
  2. An unvisited node which is linked to nc is denoted as n’,
      distance between n’ and the source node via nc is dc+1
       • If dc+1<d’ and dc+1 <T, then d’=dc+1 and w’=wc
       • If dc+1≥ d’ and dc+1 <T, then w’=wc+1
  3.    The current node nc will be marked as a visited node when all
        unvisited nodes directly linked to it are calculated
  4.    Set the node with the smallest distance value in all unvisited
        node as nc, and repeat step 2

                    Mining Social Relationships in Micro-blogging systems    11
3. USER INFLUENCE EVALUATION BY THE
 PROBABILITY OF INFORMATION DISSEMINATION

• Qindex Algorithm
  • If there is no unvisited nodes in a distance less than T, Qindex of all
    visited nodes will be calculated as
                                Qindex = d/w
• The smaller the Qindex, the more probably the target
  node would receive information from the source node
• Importance of the setting of T
  • The worst case: the running time of Qindex algorithm is O (ǀVǀ2 +
    ǀEǀ); if T approximates 0, the time cost of Qindex is close to O (ǀVǀ +
    ǀEǀ)




                  Mining Social Relationships in Micro-blogging systems   12
VALIDATION

• Source: digu.com
  • A Chinese micro-blogging
    system since 2009
  • More than 2 million users
• Data collection
  • Snowball sampling
  • 20 users randomly chosen as
    “seeds”
  • Last for 2 weeks
  • 332, 122 users and 11, 160,
    822 following relationships


              Mining Social Relationships in Micro-blogging systems   13
VALIDATION

• Data collection example:
            Item
            ID                          11528569
            User name                   ququjoy
            Nick name                   Qu
            Location                    Beijing
            Gender                      1(1-male,2-female,3-private)
            Self-introduction           From Chongqing
            Address                     http://pic.minicloud.com.cn/file/default/SIGN_24x24.png
            Homepage                    http://digu.com/ququjoy
            Information Privacy         false(false-information disclosure , true-information
            The Number of Followees     protection)
                                        2
            The Number of followers     2
            The Number of updates       7
            Folloee                     digu, robot
            Follower                    xabcdefg, flyinglin456




                   Mining Social Relationships in Micro-blogging systems                          14
VALIDATION

• A sub-sample of 2,556
  users with 35, 510 following
  relationships was used in
  validation
• Using MBSCC to find
  groups, a biggest group
  contains 1,426 users
• Network pattern of the
  biggest group is highly
  similar to the whole
  network pattern

              Mining Social Relationships in Micro-blogging systems   15
VALIDATION

• Users most influenced by a chosen user
  yoohee1221_ (T = 5)
           Users                  Distance       Width         QIndex
           classyuan              1              1             1
           gambol                 1              1             1
           liuxinwu               2              2             1
           xujun99663             3              2             1.5
           dan123                 4              2             2
           chervun                4              2             2
           tuniu                  4              2             2
           harliger               4              2             2
           zxb888                 4              2             2
           topidea                4              2             2
           yuanjuan               4              2             2
           WDM123                 4              2             2
           shaun                  4              2             2


• Note that the influence on liuxinwu is as strong as
  those directly connected to yoohee1221_

                      Mining Social Relationships in Micro-blogging systems   16
CONCLUSION

• Pros of the proposed method
  • Incorporating direction information into network analysis
  • Evaluate groups/users by their contribution to information
    dissemination
  • Competent of handling large amount of data and timely
    efficient
• Limitation of the proposed method
  • Useful for studying characteristics of the whole network, but
    not good for splitting the whole network into sub-networks
  • Vulnerable to spam following relationships in grouping
• Future work: revise the grouping algorithm

               Mining Social Relationships in Micro-blogging systems   17
THANKS, AND QUESTIONS?




   Mining Social Relationships in Micro-blogging systems   18

More Related Content

What's hot

Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...butest
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksIJDKP
 
Management and analysis of social media data
Management and analysis of social media dataManagement and analysis of social media data
Management and analysis of social media dataWeining Qian
 
A Proposed Algorithm to Detect the Largest Community Based On Depth Level
A Proposed Algorithm to Detect the Largest Community Based On Depth LevelA Proposed Algorithm to Detect the Largest Community Based On Depth Level
A Proposed Algorithm to Detect the Largest Community Based On Depth LevelEswar Publications
 
Clique-based Network Clustering
Clique-based Network ClusteringClique-based Network Clustering
Clique-based Network ClusteringGuang Ouyang
 
2006 hicss - you are who you talk to - detecting roles in usenet newsgroups
2006   hicss - you are who you talk to - detecting roles in usenet newsgroups2006   hicss - you are who you talk to - detecting roles in usenet newsgroups
2006 hicss - you are who you talk to - detecting roles in usenet newsgroupsMarc Smith
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social NetworksIJCSIS Research Publications
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficientsAustin Benson
 
Network Exposure Influence on Facebook Behaviors
Network Exposure Influence on Facebook BehaviorsNetwork Exposure Influence on Facebook Behaviors
Network Exposure Influence on Facebook BehaviorsKyounghee Hazel Kwon
 
Trust management in adhoc networks a social network based approach
Trust management in adhoc networks a social network based approachTrust management in adhoc networks a social network based approach
Trust management in adhoc networks a social network based approachAlexander Decker
 
11.trust management in adhoc networks a social network based approach
11.trust management in adhoc networks a social network based approach11.trust management in adhoc networks a social network based approach
11.trust management in adhoc networks a social network based approachAlexander Decker
 
Social Network Analysis and Visualization
Social Network Analysis and VisualizationSocial Network Analysis and Visualization
Social Network Analysis and VisualizationAlberto Ramirez
 
An Improved PageRank Algorithm for Multilayer Networks
An Improved PageRank Algorithm for Multilayer NetworksAn Improved PageRank Algorithm for Multilayer Networks
An Improved PageRank Algorithm for Multilayer NetworksSubhajit Sahu
 
Ternary Tree Based Approach For Accessing the Resources by Overlapping Member...
Ternary Tree Based Approach For Accessing the Resources by Overlapping Member...Ternary Tree Based Approach For Accessing the Resources by Overlapping Member...
Ternary Tree Based Approach For Accessing the Resources by Overlapping Member...IJECEIAES
 

What's hot (18)

Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...Machine Learning for Efficient Neighbor Selection in ...
Machine Learning for Efficient Neighbor Selection in ...
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large Networks
 
05 Whole Network Descriptive Stats
05 Whole Network Descriptive Stats05 Whole Network Descriptive Stats
05 Whole Network Descriptive Stats
 
09 Diffusion Models & Peer Influence
09 Diffusion Models & Peer Influence09 Diffusion Models & Peer Influence
09 Diffusion Models & Peer Influence
 
18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence
 
Management and analysis of social media data
Management and analysis of social media dataManagement and analysis of social media data
Management and analysis of social media data
 
A Proposed Algorithm to Detect the Largest Community Based On Depth Level
A Proposed Algorithm to Detect the Largest Community Based On Depth LevelA Proposed Algorithm to Detect the Largest Community Based On Depth Level
A Proposed Algorithm to Detect the Largest Community Based On Depth Level
 
Clique-based Network Clustering
Clique-based Network ClusteringClique-based Network Clustering
Clique-based Network Clustering
 
12 SN&H Keynote: Thomas Valente, USC
12 SN&H Keynote: Thomas Valente, USC12 SN&H Keynote: Thomas Valente, USC
12 SN&H Keynote: Thomas Valente, USC
 
2006 hicss - you are who you talk to - detecting roles in usenet newsgroups
2006   hicss - you are who you talk to - detecting roles in usenet newsgroups2006   hicss - you are who you talk to - detecting roles in usenet newsgroups
2006 hicss - you are who you talk to - detecting roles in usenet newsgroups
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 
Higher-order clustering coefficients
Higher-order clustering coefficientsHigher-order clustering coefficients
Higher-order clustering coefficients
 
Network Exposure Influence on Facebook Behaviors
Network Exposure Influence on Facebook BehaviorsNetwork Exposure Influence on Facebook Behaviors
Network Exposure Influence on Facebook Behaviors
 
Trust management in adhoc networks a social network based approach
Trust management in adhoc networks a social network based approachTrust management in adhoc networks a social network based approach
Trust management in adhoc networks a social network based approach
 
11.trust management in adhoc networks a social network based approach
11.trust management in adhoc networks a social network based approach11.trust management in adhoc networks a social network based approach
11.trust management in adhoc networks a social network based approach
 
Social Network Analysis and Visualization
Social Network Analysis and VisualizationSocial Network Analysis and Visualization
Social Network Analysis and Visualization
 
An Improved PageRank Algorithm for Multilayer Networks
An Improved PageRank Algorithm for Multilayer NetworksAn Improved PageRank Algorithm for Multilayer Networks
An Improved PageRank Algorithm for Multilayer Networks
 
Ternary Tree Based Approach For Accessing the Resources by Overlapping Member...
Ternary Tree Based Approach For Accessing the Resources by Overlapping Member...Ternary Tree Based Approach For Accessing the Resources by Overlapping Member...
Ternary Tree Based Approach For Accessing the Resources by Overlapping Member...
 

Viewers also liked

Observation Lab: Store Experiences
Observation Lab: Store ExperiencesObservation Lab: Store Experiences
Observation Lab: Store ExperiencesHaraldZimmer
 
[HCII2011] Performance Visualization for Large Scale Computing System - A Lit...
[HCII2011] Performance Visualization for Large Scale Computing System - A Lit...[HCII2011] Performance Visualization for Large Scale Computing System - A Lit...
[HCII2011] Performance Visualization for Large Scale Computing System - A Lit...Qin Gao
 
Ams Presentation
Ams PresentationAms Presentation
Ams Presentationrafaazua
 
Accenture PoV: 55m conversations over 55 days - Making Social Media Matter
Accenture PoV: 55m conversations over 55 days - Making Social Media Matter Accenture PoV: 55m conversations over 55 days - Making Social Media Matter
Accenture PoV: 55m conversations over 55 days - Making Social Media Matter Mac Karlekar
 
Goodyear, Arizona Information
Goodyear, Arizona InformationGoodyear, Arizona Information
Goodyear, Arizona InformationElise Fay
 
Surprise AZ Information
Surprise AZ InformationSurprise AZ Information
Surprise AZ InformationElise Fay
 

Viewers also liked (8)

Observation Lab: Store Experiences
Observation Lab: Store ExperiencesObservation Lab: Store Experiences
Observation Lab: Store Experiences
 
[HCII2011] Performance Visualization for Large Scale Computing System - A Lit...
[HCII2011] Performance Visualization for Large Scale Computing System - A Lit...[HCII2011] Performance Visualization for Large Scale Computing System - A Lit...
[HCII2011] Performance Visualization for Large Scale Computing System - A Lit...
 
Sleep Challenge
Sleep ChallengeSleep Challenge
Sleep Challenge
 
Ams Presentation
Ams PresentationAms Presentation
Ams Presentation
 
Accenture PoV: 55m conversations over 55 days - Making Social Media Matter
Accenture PoV: 55m conversations over 55 days - Making Social Media Matter Accenture PoV: 55m conversations over 55 days - Making Social Media Matter
Accenture PoV: 55m conversations over 55 days - Making Social Media Matter
 
Goodyear, Arizona Information
Goodyear, Arizona InformationGoodyear, Arizona Information
Goodyear, Arizona Information
 
Surprise AZ Information
Surprise AZ InformationSurprise AZ Information
Surprise AZ Information
 
Ipads-CRoom
Ipads-CRoomIpads-CRoom
Ipads-CRoom
 

Similar to [HCII2011] Mining Social Relationships in Micro-blogging systems

Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave kingDave King
 
New Similarity Index for Finding Followers in Leaders Based Community Detection
New Similarity Index for Finding Followers in Leaders Based Community DetectionNew Similarity Index for Finding Followers in Leaders Based Community Detection
New Similarity Index for Finding Followers in Leaders Based Community DetectionIRJET Journal
 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detectionroberval mariano
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithmnishant24894
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisIsmail Fahmi
 
Ripple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network NodesRipple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network Nodesrahulmonikasharma
 
Algorithm in Social network of graph and social network analysis
Algorithm in Social network of graph and social network analysisAlgorithm in Social network of graph and social network analysis
Algorithm in Social network of graph and social network analysisoliviaclark2905
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSIJDKP
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...Paolo Missier
 
16 zaman nips10_workshop_v2
16 zaman nips10_workshop_v216 zaman nips10_workshop_v2
16 zaman nips10_workshop_v2talktoharry
 
Avi-newmans_fast_community_detection.pptx
Avi-newmans_fast_community_detection.pptxAvi-newmans_fast_community_detection.pptx
Avi-newmans_fast_community_detection.pptxssuser3fa333
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018Arsalan Khan
 
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...IIIT Hyderabad
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisSujoy Bag
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)Duke Network Analysis Center
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measuresdnac
 
04 Diffusion and Peer Influence
04 Diffusion and Peer Influence04 Diffusion and Peer Influence
04 Diffusion and Peer Influencednac
 
CSE5656 Complex Networks - Final Presentation
CSE5656  Complex Networks - Final PresentationCSE5656  Complex Networks - Final Presentation
CSE5656 Complex Networks - Final PresentationMarcello Tomasini
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors ijbbjournal
 

Similar to [HCII2011] Mining Social Relationships in Micro-blogging systems (20)

Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave king
 
New Similarity Index for Finding Followers in Leaders Based Community Detection
New Similarity Index for Finding Followers in Leaders Based Community DetectionNew Similarity Index for Finding Followers in Leaders Based Community Detection
New Similarity Index for Finding Followers in Leaders Based Community Detection
 
Network sampling, community detection
Network sampling, community detectionNetwork sampling, community detection
Network sampling, community detection
 
Data Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering AlgorithmData Mining In Social Networks Using K-Means Clustering Algorithm
Data Mining In Social Networks Using K-Means Clustering Algorithm
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Ripple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network NodesRipple Algorithm to Evaluate the Importance of Network Nodes
Ripple Algorithm to Evaluate the Importance of Network Nodes
 
Algorithm in Social network of graph and social network analysis
Algorithm in Social network of graph and social network analysisAlgorithm in Social network of graph and social network analysis
Algorithm in Social network of graph and social network analysis
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
16 zaman nips10_workshop_v2
16 zaman nips10_workshop_v216 zaman nips10_workshop_v2
16 zaman nips10_workshop_v2
 
Avi-newmans_fast_community_detection.pptx
Avi-newmans_fast_community_detection.pptxAvi-newmans_fast_community_detection.pptx
Avi-newmans_fast_community_detection.pptx
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
User Identity Linkage: Data Collection, DataSet Biases, Method, Control and A...
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
04 Diffusion and Peer Influence
04 Diffusion and Peer Influence04 Diffusion and Peer Influence
04 Diffusion and Peer Influence
 
04 Diffusion and Peer Influence (2016)
04 Diffusion and Peer Influence (2016)04 Diffusion and Peer Influence (2016)
04 Diffusion and Peer Influence (2016)
 
CSE5656 Complex Networks - Final Presentation
CSE5656  Complex Networks - Final PresentationCSE5656  Complex Networks - Final Presentation
CSE5656 Complex Networks - Final Presentation
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

[HCII2011] Mining Social Relationships in Micro-blogging systems

  • 1. QIN GAO, QU QU, XUHUI ZHANG INSTITUTE OF HUMAN FACTORS & ERGONOMICS DEPT. OF INDUSTRIAL ENGINEERING TSINGHUA UNIVERSITY, Beijing, China MINING SOCIAL RELATIONSHIPS IN MICRO-BLOGGING SYSTEMS HCI International 2011 9-14 July, Orlando, USA
  • 2. CONTENT • Motivation • A graph-based approach to social relationship mining in micro-blogging systems • Preliminary validation • Future work Mining Social Relationships in Micro-blogging systems 2
  • 3. WHY MINING SOCIAL RELATIONSHIPS IN MICRO-BLOGGING SYSTEMS? Potential Challenge • High popularity of micro- • Most available methods blogging systems emphasize structural • Explicit indication of analysis of the network information dissemination • Many do not take directions by “following” information flow directions relationships into analysis • Networks in micro-blogging • Existing methods often have systems overlap heavily with limitations in analyzing huge social networks in real life volume of data sets (Java, et al., 2007) Mining Social Relationships in Micro-blogging systems 3
  • 4. RELATED WORK • Analysis of online social networks • Most influential method: SNA • Useful measures: centrality, betweenness • Used for structural analysis of blog and email networks • Useful for structural analysis of the network • Difficult to evaluate information dissemination between users • Time consuming • Other methods: Matsumura, 2003; Kazienko & Musial, 2008 • Graph theory • Useful for modeling complex networks • E.g., Protein structure by Sadumrala, 1998 • Many methods for mining frequent subgraph patterns • Use of graph theory in social network analysis (e.g., Cai, 2005) Mining Social Relationships in Micro-blogging systems 4
  • 5. A GENERAL INFORMATION DIFFUSION MODEL Mining Social Relationships in Micro-blogging systems 5
  • 6. 1. USER GROUPING BY INFORMATION DISSEMINATION RELATIONSHIPS • Definition: A user group is a set of nodes within which any two nodes can transfer information bi-directionally, and any user in a group cannot transfer information bi-directionally with any other user outside of the group • Developed the definition based on maximum strongly connected components • Given a G = (V, G) where V(G) is a finite set of nodes, E(G) is a finite set of edges (each edges have its endpoints in V(G) • For ∀a∈V,∀b∈V, if there is at least one path from a to b, and at least path from b to a, then G is a bi-directionally strongly connected component • G is a maximum bi-directionally strongly connected component (MBSCC) if G would not be a bi-directionally strongly connected component when any node or edge were added to G Mining Social Relationships in Micro-blogging systems 6
  • 7. 2. GROUP RANKING BY CONTRIBUTIONS IN INFORMATION DISSEMINATION • Each group (MBSCC) is denoted as a node • The network of a micro-blogging system is then condensed into a directed acyclic graph G’ • Each node of G’ is a MBSCC • Topological sorting algorithm • The node without any information outflow is deleted from G´and put at the end of the ranking list. • This step is repeated till all nodes are deleted. Mining Social Relationships in Micro-blogging systems 7
  • 8. 2. GROUP RANKING BY CONTRIBUTIONS IN INFORMATION DISSEMINATION • Sorting algorithm P<Set<Node>> Empty list that will contain sets of nodes in sequence N Set of nodes with no outside link Insert all nodes which have no outside link into N while N is non-empty do insert N into P for each node n in N remove n for each node m with a link e from n to m do remove e • In the final ranking list P, groups are listed in a descending order with regard to their contribution to information dissemination in the network Mining Social Relationships in Micro-blogging systems 8
  • 9. 3. USER INFLUENCE EVALUATION BY THE PROBABILITY OF INFORMATION DISSEMINATION • Term definition • Path distance: the number of nodes from the source node a to the target node b along a path • Distance between nodes: smallest path distance between the source node a and the target b • Width between nodes: the number of different paths connecting the source node a and the target node b • Assuming the probability that any user retweets a certain received information is P, the probability that the target user can receive this information from the source user is: p= Ʃ i∈N P di • N: the set of different paths from the source to the target • di: the distance of path I • The shorter the distance and the wider the width of paths, the more probably information is transmitted. Mining Social Relationships in Micro-blogging systems 9
  • 10. 3. USER INFLUENCE EVALUATION BY THE PROBABILITY OF INFORMATION DISSEMINATION • The shortest path from the information source to the target makes the greatest contribution. • According to observation, it is reasonable to assume P < .5 • To simplify the problem, we can set a threshold T • If di > T, pi (the probability that information transmits via path i)  0 Mining Social Relationships in Micro-blogging systems 10
  • 11. 3. USER INFLUENCE EVALUATION BY THE PROBABILITY OF INFORMATION DISSEMINATION • QIndex Algorithm (inspired by Dijkstra) • For a G = (V, E), the information source node is labeled as vs (vs ∈ V); the current node is denoted as nc; distance value and width value is denoted as d, w. 1. Initializing: ds =0, ws = 1; d = infinity and w = 0 for all the other nodes; mark all nodes unvisited; set vs as the current node (nc) 2. An unvisited node which is linked to nc is denoted as n’, distance between n’ and the source node via nc is dc+1 • If dc+1<d’ and dc+1 <T, then d’=dc+1 and w’=wc • If dc+1≥ d’ and dc+1 <T, then w’=wc+1 3. The current node nc will be marked as a visited node when all unvisited nodes directly linked to it are calculated 4. Set the node with the smallest distance value in all unvisited node as nc, and repeat step 2 Mining Social Relationships in Micro-blogging systems 11
  • 12. 3. USER INFLUENCE EVALUATION BY THE PROBABILITY OF INFORMATION DISSEMINATION • Qindex Algorithm • If there is no unvisited nodes in a distance less than T, Qindex of all visited nodes will be calculated as Qindex = d/w • The smaller the Qindex, the more probably the target node would receive information from the source node • Importance of the setting of T • The worst case: the running time of Qindex algorithm is O (ǀVǀ2 + ǀEǀ); if T approximates 0, the time cost of Qindex is close to O (ǀVǀ + ǀEǀ) Mining Social Relationships in Micro-blogging systems 12
  • 13. VALIDATION • Source: digu.com • A Chinese micro-blogging system since 2009 • More than 2 million users • Data collection • Snowball sampling • 20 users randomly chosen as “seeds” • Last for 2 weeks • 332, 122 users and 11, 160, 822 following relationships Mining Social Relationships in Micro-blogging systems 13
  • 14. VALIDATION • Data collection example: Item ID 11528569 User name ququjoy Nick name Qu Location Beijing Gender 1(1-male,2-female,3-private) Self-introduction From Chongqing Address http://pic.minicloud.com.cn/file/default/SIGN_24x24.png Homepage http://digu.com/ququjoy Information Privacy false(false-information disclosure , true-information The Number of Followees protection) 2 The Number of followers 2 The Number of updates 7 Folloee digu, robot Follower xabcdefg, flyinglin456 Mining Social Relationships in Micro-blogging systems 14
  • 15. VALIDATION • A sub-sample of 2,556 users with 35, 510 following relationships was used in validation • Using MBSCC to find groups, a biggest group contains 1,426 users • Network pattern of the biggest group is highly similar to the whole network pattern Mining Social Relationships in Micro-blogging systems 15
  • 16. VALIDATION • Users most influenced by a chosen user yoohee1221_ (T = 5) Users Distance Width QIndex classyuan 1 1 1 gambol 1 1 1 liuxinwu 2 2 1 xujun99663 3 2 1.5 dan123 4 2 2 chervun 4 2 2 tuniu 4 2 2 harliger 4 2 2 zxb888 4 2 2 topidea 4 2 2 yuanjuan 4 2 2 WDM123 4 2 2 shaun 4 2 2 • Note that the influence on liuxinwu is as strong as those directly connected to yoohee1221_ Mining Social Relationships in Micro-blogging systems 16
  • 17. CONCLUSION • Pros of the proposed method • Incorporating direction information into network analysis • Evaluate groups/users by their contribution to information dissemination • Competent of handling large amount of data and timely efficient • Limitation of the proposed method • Useful for studying characteristics of the whole network, but not good for splitting the whole network into sub-networks • Vulnerable to spam following relationships in grouping • Future work: revise the grouping algorithm Mining Social Relationships in Micro-blogging systems 17
  • 18. THANKS, AND QUESTIONS? Mining Social Relationships in Micro-blogging systems 18