This document describes an algorithm to automatically classify social media contacts into circles or lists. It first identifies leaders in an ego network by calculating metrics like betweenness centrality, clustering coefficient, and degree density for each node. Leaders are used as seeds to form circles by including their friend connections. The algorithm uses conductance scoring to iteratively add or remove nodes at the circle boundaries. The approach is evaluated on a Facebook dataset and achieves results comparable to previous methods, with balanced error rates and F1 scores used as metrics.
Finding important nodes in social networks based on modified pagerankcsandit
Important nodes are individuals who have huge influence on social network. Finding important
nodes in social networks is of great significance for research on the structure of the social
networks. Based on the core idea of Pagerank, a new ranking method is proposed by
considering the link similarity between the nodes. The key concept of the method is the use of
the link vector which records the contact times between nodes. Then the link similarity is
computed based on the vectors through the similarity function. The proposed method
incorporates the link similarity into original Pagerank. The experiment results show that the
proposed method can get better performance.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
The document presents a new model for intelligent social networks based on semantic tag ranking. It uses a multi-agent system approach with agents performing indexing and ranking. For indexing, it uses an enhanced Latent Dirichlet Allocation (E-LDA) model that optimizes LDA parameters. Tags above a threshold from E-LDA output are ranked using Tag Rank. Simulation results showed improvements in indexing and ranking over conventional methods. The model introduces semantics to social networks to improve search and link recommendation.
Link Prediction in (Partially) Aligned Heterogeneous Social NetworksSina Sajadmanesh
This document discusses link prediction in homogeneous and heterogeneous social networks. It begins by introducing the problem of link prediction and its applications. It then discusses various unsupervised and supervised methods for link prediction in homogeneous networks. Next, it covers relationship prediction and collective link prediction in heterogeneous networks. It also discusses link prediction in aligned heterogeneous networks using link transfer and anchor link inference. Finally, it outlines future work on this topic.
This document is a project report submitted by students Sai Akhil Reddy Gopidi, Nithin Kumar, and Roopesh Kumar Kotte for their B.Tech degree. The report discusses link prediction in heterogeneous networks, with a focus on applying various algorithms to a YouTube dataset. Commonly used link prediction algorithms are described, such as measuring graph distance, common neighbors, and local random walks. The students implemented these algorithms on the YouTube dataset and evaluated their accuracy using precision metrics to compare results to previous methods.
Done reread maximizingpagerankviaoutlinks(2)James Arnold
This document analyzes strategies for maximizing the PageRank of a set of webpages I that a webmaster controls the outlinks for but not the inlinks. It provides the optimal outlink structure, which is for each page in I to link to a single external page. It also analyzes the optimal internal link structure, which is a forward chain of links with every possible backward link when there is a unique leaking node. The overall optimal structure combines these, with a unique outlink from the last node in the chain.
Techniques that Facebook use to Analyze and QuerySocial GraphsHaneen Droubi
Facebook uses various techniques to analyze and query social graphs. These include discovering cohesive subgraphs through approaches like finding large, dense subgraphs and top-k dense subgraphs. Memcache is used as a distributed memory caching system to speed up dynamic databases. Typeahead allows searching as a user types through prefix matching. Unicorn is Facebook's system for searching the social graph through an inverted index that represents it as adjacency lists. Systems like Hive, Scuba and Dremel are used to analyze data at large scales.
In this paper, we consider a criminal investigation on the collective guilt of part members in a working group. Assuming that the statistics we used are reliable, we present the Page Rank Model based on mutual information. First, we use the average mutual information between non-suspicious topics and the suspicious topics to score the topics by degree of suspicion. Second, we build the correlation matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Then, we set the original value for all members of the working group based on the degree of suspicion. At the last, we calculate the suspected degree of each member in the working group. In the small 10-people case, we build the improved Page Rank model. By calculating the statistics of this case, we acquire a table which indicates the ranking of the suspected degree. In contrast with the results given in this issue, we find these two results basically match each other, indicating the model we have built is feasible. In the current case, firstly, we obtain a ranking list on 15 topics in order of suspicion via Page Rank Model based on mutual information. Secondly, we acquire the stable point of Markov state transition matrix using the Markov chain. Then, we build the connection matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Last, we calculate the degree of 83 candidates. From the result, we can see that those suspicious are on the top of the ranking list while those innocent people are at the bottom of the list, representing that the model we have built is feasible. When suspicious topics and conspirators changed, a relatively good result can also be obtained by this model. In the current case, we have the evidence to believe that Dolores and Jerome, who are the senior managers, have significant suspicion. It is recommended that future attention should be paid to them. The Page Rank Model, based on mutual information, takes full account of the information flow in message distribution network. This model can not only deal with the statistics used in conspiracy, but also be applied to detect the infected cells in a biological network. Finally, we present the advantages and disadvantages of this model and the direction of improvements.
Finding important nodes in social networks based on modified pagerankcsandit
Important nodes are individuals who have huge influence on social network. Finding important
nodes in social networks is of great significance for research on the structure of the social
networks. Based on the core idea of Pagerank, a new ranking method is proposed by
considering the link similarity between the nodes. The key concept of the method is the use of
the link vector which records the contact times between nodes. Then the link similarity is
computed based on the vectors through the similarity function. The proposed method
incorporates the link similarity into original Pagerank. The experiment results show that the
proposed method can get better performance.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGIJwest
The document presents a new model for intelligent social networks based on semantic tag ranking. It uses a multi-agent system approach with agents performing indexing and ranking. For indexing, it uses an enhanced Latent Dirichlet Allocation (E-LDA) model that optimizes LDA parameters. Tags above a threshold from E-LDA output are ranked using Tag Rank. Simulation results showed improvements in indexing and ranking over conventional methods. The model introduces semantics to social networks to improve search and link recommendation.
Link Prediction in (Partially) Aligned Heterogeneous Social NetworksSina Sajadmanesh
This document discusses link prediction in homogeneous and heterogeneous social networks. It begins by introducing the problem of link prediction and its applications. It then discusses various unsupervised and supervised methods for link prediction in homogeneous networks. Next, it covers relationship prediction and collective link prediction in heterogeneous networks. It also discusses link prediction in aligned heterogeneous networks using link transfer and anchor link inference. Finally, it outlines future work on this topic.
This document is a project report submitted by students Sai Akhil Reddy Gopidi, Nithin Kumar, and Roopesh Kumar Kotte for their B.Tech degree. The report discusses link prediction in heterogeneous networks, with a focus on applying various algorithms to a YouTube dataset. Commonly used link prediction algorithms are described, such as measuring graph distance, common neighbors, and local random walks. The students implemented these algorithms on the YouTube dataset and evaluated their accuracy using precision metrics to compare results to previous methods.
Done reread maximizingpagerankviaoutlinks(2)James Arnold
This document analyzes strategies for maximizing the PageRank of a set of webpages I that a webmaster controls the outlinks for but not the inlinks. It provides the optimal outlink structure, which is for each page in I to link to a single external page. It also analyzes the optimal internal link structure, which is a forward chain of links with every possible backward link when there is a unique leaking node. The overall optimal structure combines these, with a unique outlink from the last node in the chain.
Techniques that Facebook use to Analyze and QuerySocial GraphsHaneen Droubi
Facebook uses various techniques to analyze and query social graphs. These include discovering cohesive subgraphs through approaches like finding large, dense subgraphs and top-k dense subgraphs. Memcache is used as a distributed memory caching system to speed up dynamic databases. Typeahead allows searching as a user types through prefix matching. Unicorn is Facebook's system for searching the social graph through an inverted index that represents it as adjacency lists. Systems like Hive, Scuba and Dremel are used to analyze data at large scales.
In this paper, we consider a criminal investigation on the collective guilt of part members in a working group. Assuming that the statistics we used are reliable, we present the Page Rank Model based on mutual information. First, we use the average mutual information between non-suspicious topics and the suspicious topics to score the topics by degree of suspicion. Second, we build the correlation matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Then, we set the original value for all members of the working group based on the degree of suspicion. At the last, we calculate the suspected degree of each member in the working group. In the small 10-people case, we build the improved Page Rank model. By calculating the statistics of this case, we acquire a table which indicates the ranking of the suspected degree. In contrast with the results given in this issue, we find these two results basically match each other, indicating the model we have built is feasible. In the current case, firstly, we obtain a ranking list on 15 topics in order of suspicion via Page Rank Model based on mutual information. Secondly, we acquire the stable point of Markov state transition matrix using the Markov chain. Then, we build the connection matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Last, we calculate the degree of 83 candidates. From the result, we can see that those suspicious are on the top of the ranking list while those innocent people are at the bottom of the list, representing that the model we have built is feasible. When suspicious topics and conspirators changed, a relatively good result can also be obtained by this model. In the current case, we have the evidence to believe that Dolores and Jerome, who are the senior managers, have significant suspicion. It is recommended that future attention should be paid to them. The Page Rank Model, based on mutual information, takes full account of the information flow in message distribution network. This model can not only deal with the statistics used in conspiracy, but also be applied to detect the infected cells in a biological network. Finally, we present the advantages and disadvantages of this model and the direction of improvements.
Data Mining In Social Networks Using K-Means Clustering Algorithmnishant24894
This topic deals with K-Means Clustering Algorithm which is used to categorize the data set into clusters depending upon their similarities like common interest or organization or colleges, etc. It categorize the data into clusters on the basis of mutual friendship.
Who to follow and why: link prediction with explanationsNicola Barbieri
Presentation @KDD 2014.
In this paper we study link prediction with explanations for user recommendation in social networks. For this problem we propose WTFW (“Who to Follow and Why”), a stochastic topic model for link prediction over directed and nodes-attributed graphs. Our model not only predicts links, but for each predicted link it decides whether it is a “topical” or a “social” link, and depending on this decision it produces a different type of explanation.
User Behaviour Modeling on Financial Message Boardsprithan
Online social communities like discussion boards and message boards are fast evolving in
their usage bringing people with similar interests together. From a social and anthropological
standpoint, these are the most interesting to study compared to Online Social Networks because
they connect people (most often with no offline links) from different backgrounds and histories.
Various theories exist in sociology about the intended behavior of users in online forums. In this
paper, we study the applicability of one such theory - “Participation Inequality” on financial
message boards. We consider the activity of user, his network interaction structure and the
content of postings and employ Machine Learning techniques to identify, cluster and infer roles
of users exhibiting similar behavior.
This document describes a genetic algorithm approach for recommending whom a user should follow on Twitter. The algorithm analyzes a user's sub-graph consisting of the user, their followers, and followers-of-followers within 3 degrees of separation. Only users within 2 degrees are recommended. The algorithm calculates three indexes - common followers, connection density of common followers, and connection density of combined followers. A fitness function combines the indexes with weights optimized by a genetic algorithm over iterations to rank and recommend the top users to follow. The approach was tested on a dataset of 11,459 users connected to a single user, narrowing 59 potential recommendations to a final list of 56 users.
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...Markus Luczak-Rösch
Invited talk given at the QUEST (Qualitative Experise at Southampton, http://www.quest.soton.ac.uk/) group event (http://www.quest.soton.ac.uk/training/) on Qualitative Methods and Big Data.
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSijcsit
This paper introduces an evolutionary approach to enhance the process of finding central nodes in mobile networks. This can provide essential information and important applications in mobile and social networks. This evolutionary approach considers the dynamics of the network and takes into consideration the central nodes from previous time slots. We also study the applicability of maximal cliques algorithms in mobile social networks and how it can be used to find the central nodes based on the discovered maximal cliques. The experimental results are promising and show a significant enhancement in finding the central nodes.
Predictions of links in graphs based on content and information propagations.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
This document describes using social network analysis techniques to analyze power in organizations using an Enron email dataset. It discusses parsing emails to extract sender and recipient addresses to build a graph model. Various centrality measures like degree, closeness, and betweenness are calculated to identify influential individuals. The data is stored in a hashmap for serialization. The analysis finds that degree centrality, measured by outbound connections, best identifies powerful individuals in the organization.
I. The document discusses ego networks and how they can be used to study personal networks and relationships. Ego networks combine traditional survey data with network data by collecting information about respondents (egos) and their social ties (alters).
II. Ego network data can be used to examine the effects of network structure and alter characteristics on outcomes of interest. It can also provide insights into diffusion processes within personal networks.
III. While ego network data is useful for studying local network phenomena, global network data is needed to analyze higher-level structural effects, mechanisms of tie formation and diffusion across an entire network. Statistical techniques like randomization and the Quadratic Assignment Procedure are used to analyze ego and global network data
Results for Web Graph Mining Base Recommender System for Query, Image and Soc...iosrjce
1) The document discusses results from implementing a web graph mining based recommender system for queries, images, and social networks using query suggestion and heat diffusion algorithms.
2) It shows that the system effectively recommends queries and images that are literally and semantically related to test queries and images.
3) It also demonstrates recommending items to users in a social network based on trust values between users, combining user-user and user-item relationships to provide personalized recommendations.
Scalable recommendation with social contextual informationeSAT Journals
Abstract Recommender systems are used to achieve effective and useful results in a social networks. The social recommendation will provide a social network structure but it is challenging to fuse social contextual factors which are derived from user’s motivation of social behaviors into social recommendation. Here, we introduce two contextual factors in recommender systems which are used to adopt a useful results namely a) individual preference and b) interpersonal influence. Individual preference analyze the social interests of an item content with user’s interest and adopt only users recommended results. Interpersonal influence is analyzing user-user interaction and their specific social relations. Beyond this, we propose a novel probabilistic matrix factorization method to fuse them in a latent space. The scalable algorithm provides a useful results by analyzing the ranking probability of each user social contextual information and also incrementally process the contextual data in large datasets. Keywords: social recommendation, individual preference, interpersonal influence, matrix factorization
Clustering in Aggregated User Profiles across Multiple Social Networks IJECEIAES
A social network is indeed an abstraction of related groups interacting amongst themselves to develop relationships. However, toanalyze any relationships and psychology behind it, clustering plays a vital role. Clustering enhances the predictability and discoveryof like mindedness amongst users. This article’s goal exploits the technique of Ensemble Kmeans clusters to extract the entities and their corresponding interestsas per the skills and location by aggregating user profiles across the multiple online social networks. The proposed ensemble clustering utilizes known K-means algorithm to improve results for the aggregated user profiles across multiple social networks. The approach produces an ensemble similarity measure and provides 70% better results than taking a fixed value of K or guessing a value of K while not altering the clustering method. This paper states that good ensembles clusters can be spawned to envisage the discoverability of a user for a particular interest.
This paper analyzes networks of scientific collaborations and dolphin social relationships. It establishes co-author and dolphin networks from data, then analyzes properties like degree distribution, information entropy, and influential nodes. Four indicators of cooperation, degree, quadratic correlation, and betweenness are used to calculate a Marshall Entropy Index (MEI) for the co-author network. A modified PageRank algorithm applied to a quotation network identifies the most influential paper. Analysis finds the dolphin network exhibits small world properties more than scale-free properties, with the dolphin SN100 as most influential.
This document proposes a methodology for performing semantic analysis using MapReduce. It involves parsing XML files to create an in-memory tree structure representing relationships between entities. Mappers extract subject and object pairs from the XML and reducers calculate a semantic score by finding the shortest path between the objects in the tree. Experimental results show scores outputted in the format of user ID 1: user ID 2: score, with higher scores representing greater similarity between the users. The analysis concludes the scores accurately reflect the deeper level of matches between users' interests and this technique could be used to analyze user likeness at large scales.
Using content and interactions for discovering communities inmoresmile
1. The document proposes methods for discovering communities in social networks using content and interactions by modeling communities based on discussed topics and social connections between users. This allows discovering both user interests and popular topics within each community.
2. Bayesian models are used to extract latent communities from the network, assuming community relationships depend on user interests in topics and their links. Different models are proposed to handle different network structures like broadcast vs conversation networks.
3. The models aim to utilize both content and link information to discover communities in incomplete social networks with missing link information. A distance metric is learned using observed links and used for hierarchical clustering.
Scalable Local Community Detection with Mapreduce for Large NetworksIJDKP
Community detection from complex information networks draws much attention from both academia and
industry since it has many real-world applications. However, scalability of community detection algorithms
over very large networks has been a major challenge. Real-world graph structures are often complicated
accompanied with extremely large sizes. In this paper, we propose a MapReduce version called 3MA that
parallelizes a local community identification method which uses the $M$ metric. Then we adopt an
iterative expansion approach to find all the communities in the graph. Empirical results show that for large
networks in the order of millions of nodes, the parallel version of the algorithm outperforms the traditional
sequential approach to detect communities using the M-measure. The result shows that for local community
detection, when the data is too big for the original M metric-based sequential iterative expension approach
to handle, our MapReduce version 3MA can finish in a reasonable time.
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANKcscpconf
Important nodes are individuals who have huge influence on social network. Finding important
nodes in social networks is of great significance for research on the structure of the social
networks. Based on the core idea of Pagerank, a new ranking method is proposed by
considering the link similarity between the nodes. The key concept of the method is the use of
the link vector which records the contact times between nodes. Then the link similarity is
computed based on the vectors through the similarity function. The proposed method
incorporates the link similarity into original Pagerank. The experiment results show that the
proposed method can get better performance.
This document describes a project analyzing ego networks on Facebook data. The project aims to help compare ego networks and suggest friends. It uses algorithms like triangle enumeration and fast ego network construction. The team implemented these algorithms in Python using NetworkX and other tools. They demonstrated the analysis and want to detect communities and important individuals next. The conclusion is that ego networks are clusterable and could help with tasks like friend suggestion.
A new approach to erd s collaboration network using page rankAlexander Decker
This document proposes a new approach to measuring academic influence using the PageRank algorithm on the Erdos Collaboration Network. It constructs the network from data on mathematician Paul Erdos and his co-authors. The document applies PageRank to identify the most influential mathematician as ALON, NOGA M. It also analyzes properties of the network like degree distribution and discusses limitations and possibilities for measuring instantaneous influence over time.
A new approach to erd s collaboration network using page rankAlexander Decker
This document proposes a new approach to measuring academic influence using the PageRank algorithm on the Erdos Collaboration Network. It constructs the network from data on mathematician Paul Erdos and his co-authors. The document applies PageRank to find the most influential mathematician is ALON, NOGA M. It also discusses limitations and possibilities for measuring instantaneous influence through a dynamic model.
This document discusses predicting new friendships in social networks using temporal information. It describes research on predicting new links in social networks over time using supervised learning models trained on temporal features from past network interactions. The researchers used anonymized Facebook data over 28 months to train decision tree and neural network classifiers to predict new relationships, finding models using temporal information performed better than those without it.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
Data Mining In Social Networks Using K-Means Clustering Algorithmnishant24894
This topic deals with K-Means Clustering Algorithm which is used to categorize the data set into clusters depending upon their similarities like common interest or organization or colleges, etc. It categorize the data into clusters on the basis of mutual friendship.
Who to follow and why: link prediction with explanationsNicola Barbieri
Presentation @KDD 2014.
In this paper we study link prediction with explanations for user recommendation in social networks. For this problem we propose WTFW (“Who to Follow and Why”), a stochastic topic model for link prediction over directed and nodes-attributed graphs. Our model not only predicts links, but for each predicted link it decides whether it is a “topical” or a “social” link, and depending on this decision it produces a different type of explanation.
User Behaviour Modeling on Financial Message Boardsprithan
Online social communities like discussion boards and message boards are fast evolving in
their usage bringing people with similar interests together. From a social and anthropological
standpoint, these are the most interesting to study compared to Online Social Networks because
they connect people (most often with no offline links) from different backgrounds and histories.
Various theories exist in sociology about the intended behavior of users in online forums. In this
paper, we study the applicability of one such theory - “Participation Inequality” on financial
message boards. We consider the activity of user, his network interaction structure and the
content of postings and employ Machine Learning techniques to identify, cluster and infer roles
of users exhibiting similar behavior.
This document describes a genetic algorithm approach for recommending whom a user should follow on Twitter. The algorithm analyzes a user's sub-graph consisting of the user, their followers, and followers-of-followers within 3 degrees of separation. Only users within 2 degrees are recommended. The algorithm calculates three indexes - common followers, connection density of common followers, and connection density of combined followers. A fitness function combines the indexes with weights optimized by a genetic algorithm over iterations to rank and recommend the top users to follow. The approach was tested on a dataset of 11,459 users connected to a single user, narrowing 59 potential recommendations to a final list of 56 users.
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...Markus Luczak-Rösch
Invited talk given at the QUEST (Qualitative Experise at Southampton, http://www.quest.soton.ac.uk/) group event (http://www.quest.soton.ac.uk/training/) on Qualitative Methods and Big Data.
EVOLUTIONARY CENTRALITY AND MAXIMAL CLIQUES IN MOBILE SOCIAL NETWORKSijcsit
This paper introduces an evolutionary approach to enhance the process of finding central nodes in mobile networks. This can provide essential information and important applications in mobile and social networks. This evolutionary approach considers the dynamics of the network and takes into consideration the central nodes from previous time slots. We also study the applicability of maximal cliques algorithms in mobile social networks and how it can be used to find the central nodes based on the discovered maximal cliques. The experimental results are promising and show a significant enhancement in finding the central nodes.
Predictions of links in graphs based on content and information propagations.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
This document describes using social network analysis techniques to analyze power in organizations using an Enron email dataset. It discusses parsing emails to extract sender and recipient addresses to build a graph model. Various centrality measures like degree, closeness, and betweenness are calculated to identify influential individuals. The data is stored in a hashmap for serialization. The analysis finds that degree centrality, measured by outbound connections, best identifies powerful individuals in the organization.
I. The document discusses ego networks and how they can be used to study personal networks and relationships. Ego networks combine traditional survey data with network data by collecting information about respondents (egos) and their social ties (alters).
II. Ego network data can be used to examine the effects of network structure and alter characteristics on outcomes of interest. It can also provide insights into diffusion processes within personal networks.
III. While ego network data is useful for studying local network phenomena, global network data is needed to analyze higher-level structural effects, mechanisms of tie formation and diffusion across an entire network. Statistical techniques like randomization and the Quadratic Assignment Procedure are used to analyze ego and global network data
Results for Web Graph Mining Base Recommender System for Query, Image and Soc...iosrjce
1) The document discusses results from implementing a web graph mining based recommender system for queries, images, and social networks using query suggestion and heat diffusion algorithms.
2) It shows that the system effectively recommends queries and images that are literally and semantically related to test queries and images.
3) It also demonstrates recommending items to users in a social network based on trust values between users, combining user-user and user-item relationships to provide personalized recommendations.
Scalable recommendation with social contextual informationeSAT Journals
Abstract Recommender systems are used to achieve effective and useful results in a social networks. The social recommendation will provide a social network structure but it is challenging to fuse social contextual factors which are derived from user’s motivation of social behaviors into social recommendation. Here, we introduce two contextual factors in recommender systems which are used to adopt a useful results namely a) individual preference and b) interpersonal influence. Individual preference analyze the social interests of an item content with user’s interest and adopt only users recommended results. Interpersonal influence is analyzing user-user interaction and their specific social relations. Beyond this, we propose a novel probabilistic matrix factorization method to fuse them in a latent space. The scalable algorithm provides a useful results by analyzing the ranking probability of each user social contextual information and also incrementally process the contextual data in large datasets. Keywords: social recommendation, individual preference, interpersonal influence, matrix factorization
Clustering in Aggregated User Profiles across Multiple Social Networks IJECEIAES
A social network is indeed an abstraction of related groups interacting amongst themselves to develop relationships. However, toanalyze any relationships and psychology behind it, clustering plays a vital role. Clustering enhances the predictability and discoveryof like mindedness amongst users. This article’s goal exploits the technique of Ensemble Kmeans clusters to extract the entities and their corresponding interestsas per the skills and location by aggregating user profiles across the multiple online social networks. The proposed ensemble clustering utilizes known K-means algorithm to improve results for the aggregated user profiles across multiple social networks. The approach produces an ensemble similarity measure and provides 70% better results than taking a fixed value of K or guessing a value of K while not altering the clustering method. This paper states that good ensembles clusters can be spawned to envisage the discoverability of a user for a particular interest.
This paper analyzes networks of scientific collaborations and dolphin social relationships. It establishes co-author and dolphin networks from data, then analyzes properties like degree distribution, information entropy, and influential nodes. Four indicators of cooperation, degree, quadratic correlation, and betweenness are used to calculate a Marshall Entropy Index (MEI) for the co-author network. A modified PageRank algorithm applied to a quotation network identifies the most influential paper. Analysis finds the dolphin network exhibits small world properties more than scale-free properties, with the dolphin SN100 as most influential.
This document proposes a methodology for performing semantic analysis using MapReduce. It involves parsing XML files to create an in-memory tree structure representing relationships between entities. Mappers extract subject and object pairs from the XML and reducers calculate a semantic score by finding the shortest path between the objects in the tree. Experimental results show scores outputted in the format of user ID 1: user ID 2: score, with higher scores representing greater similarity between the users. The analysis concludes the scores accurately reflect the deeper level of matches between users' interests and this technique could be used to analyze user likeness at large scales.
Using content and interactions for discovering communities inmoresmile
1. The document proposes methods for discovering communities in social networks using content and interactions by modeling communities based on discussed topics and social connections between users. This allows discovering both user interests and popular topics within each community.
2. Bayesian models are used to extract latent communities from the network, assuming community relationships depend on user interests in topics and their links. Different models are proposed to handle different network structures like broadcast vs conversation networks.
3. The models aim to utilize both content and link information to discover communities in incomplete social networks with missing link information. A distance metric is learned using observed links and used for hierarchical clustering.
Scalable Local Community Detection with Mapreduce for Large NetworksIJDKP
Community detection from complex information networks draws much attention from both academia and
industry since it has many real-world applications. However, scalability of community detection algorithms
over very large networks has been a major challenge. Real-world graph structures are often complicated
accompanied with extremely large sizes. In this paper, we propose a MapReduce version called 3MA that
parallelizes a local community identification method which uses the $M$ metric. Then we adopt an
iterative expansion approach to find all the communities in the graph. Empirical results show that for large
networks in the order of millions of nodes, the parallel version of the algorithm outperforms the traditional
sequential approach to detect communities using the M-measure. The result shows that for local community
detection, when the data is too big for the original M metric-based sequential iterative expension approach
to handle, our MapReduce version 3MA can finish in a reasonable time.
FINDING IMPORTANT NODES IN SOCIAL NETWORKS BASED ON MODIFIED PAGERANKcscpconf
Important nodes are individuals who have huge influence on social network. Finding important
nodes in social networks is of great significance for research on the structure of the social
networks. Based on the core idea of Pagerank, a new ranking method is proposed by
considering the link similarity between the nodes. The key concept of the method is the use of
the link vector which records the contact times between nodes. Then the link similarity is
computed based on the vectors through the similarity function. The proposed method
incorporates the link similarity into original Pagerank. The experiment results show that the
proposed method can get better performance.
This document describes a project analyzing ego networks on Facebook data. The project aims to help compare ego networks and suggest friends. It uses algorithms like triangle enumeration and fast ego network construction. The team implemented these algorithms in Python using NetworkX and other tools. They demonstrated the analysis and want to detect communities and important individuals next. The conclusion is that ego networks are clusterable and could help with tasks like friend suggestion.
A new approach to erd s collaboration network using page rankAlexander Decker
This document proposes a new approach to measuring academic influence using the PageRank algorithm on the Erdos Collaboration Network. It constructs the network from data on mathematician Paul Erdos and his co-authors. The document applies PageRank to identify the most influential mathematician as ALON, NOGA M. It also analyzes properties of the network like degree distribution and discusses limitations and possibilities for measuring instantaneous influence over time.
A new approach to erd s collaboration network using page rankAlexander Decker
This document proposes a new approach to measuring academic influence using the PageRank algorithm on the Erdos Collaboration Network. It constructs the network from data on mathematician Paul Erdos and his co-authors. The document applies PageRank to find the most influential mathematician is ALON, NOGA M. It also discusses limitations and possibilities for measuring instantaneous influence through a dynamic model.
This document discusses predicting new friendships in social networks using temporal information. It describes research on predicting new links in social networks over time using supervised learning models trained on temporal features from past network interactions. The researchers used anonymized Facebook data over 28 months to train decision tree and neural network classifiers to predict new relationships, finding models using temporal information performed better than those without it.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share their interests without being at the same geographical location. With the great and rapid growth of Social Media sites such as Facebook, LinkedIn, Twitter…etc. causes huge amount of user-generated content. Thus, the improvement in the information quality and integrity becomes a great challenge to all social media sites, which allows users to get the desired content or be linked to the best link relation using improved search / link technique. So introducing semantics to social networks will widen up the representation of the social networks. In this paper, a new model of social networks based on semantic tag ranking is introduced. This model is based on the concept of multi-agent systems. In this proposed model the representation of social links will be extended by the semantic relationships found in the vocabularies which are known as (tags) in most of social networks.The proposed model for the social media engine is based on enhanced Latent Dirichlet Allocation(E-LDA) as a semantic indexing algorithm, combined with Tag Rank as social network ranking algorithm. The improvements on (E-LDA) phase is done by optimizing (LDA) algorithm using the optimal parameters. Then a filter is introduced to enhance the final indexing output. In ranking phase, using Tag Rank based on the indexing phase has improved the output of the ranking. Simulation results of the proposed model have shown improvements in indexing and ranking output.
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
Social Networks has become one of the most popular platforms to allow users to communicate, and share
their interests without being at the same geographical location. With the great and rapid growth of Social
Media sites such as Facebook, LinkedIn, Twitter...etc. causes huge amount of user-generated content.
Thus, the improvement in the information quality and integrity becomes a great challenge to all social
media sites, which allows users to get the desired content or be linked to the best link relation using
improved search / link technique. So introducing semantics to social networks will widen up the
representation of the social networks.
The document proposes an improved clustering algorithm for social network analysis. It combines BSP (Business System Planning) clustering with Principal Component Analysis (PCA) to group social network objects into classes based on their links and attributes. Specifically, it applies PCA before BSP clustering to reduce the dimensionality of the social network data and retain only the most important variables for clustering. This improves the BSP clustering results by focusing on the key information in the social network.
Graph Based User Interest Modeling in Twitterraghavr186
This document summarizes a research project on modeling user interests on Twitter using a graph-based approach. The project aims to predict a user's interest profile based on the interests of the users they follow on Twitter. Various graph features are explored as weighting schemes to calculate the influence of followers on a user's interests. Experimental results show that features based on retweets and mentions perform best at predicting interests, with F1 scores around 0.6. A composite model is also proposed that combines predictions from different weighting schemes using learned quality scores. Additionally, a machine learning model is trained to predict interests directly from graph features.
LCF is a temporal approach to link prediction in dynamic social networks. It proposes a new predictor called Latest Common Friend (LCF) that incorporates temporal aspects. Social networks are modeled as sequences of snapshots over time periods. Each edge is assigned a weight based on timestamp. LCF score for node pairs is the cumulative weight of their common friends, giving more weight to friends with later timestamps. LCF outperforms traditional predictors like Common Neighbor, Adamic-Adar and Jaccard coefficient on 8 real-world dynamic network datasets based on average AUC scores. Modeling networks temporally and weighting edges by timestamp allows LCF to better predict future links in dynamic social networks.
A Novel Target Marketing Approach based on Influence MaximizationSurendra Gadwal
This document proposes a new influence maximization approach called ANIM based on existing work. It uses a Yelp dataset to build a social network and calculate edge weights. ANIM is a greedy algorithm that iteratively selects nodes to maximize the difference in influence spread between the current set and adding another node. Experiments show ANIM has better influence spread and runtime than other algorithms like DegreeDiscount and NewGreedyIC. The goal is to identify influential customers for local businesses to target through online review sites.
This document proposes methods for discovering organizational structure in static and dynamic social networks. For static networks, it introduces an m-Score to represent member importance and builds a community tree to represent the organizational hierarchy. For dynamic networks, it develops a tree learning algorithm to reconstruct the evolving community tree based on scoring past and current community structures. Experiments on karate club and Enron email networks demonstrate the approach.
Graphs have become the dominant life-form of many tasks as they advance a
structure to represent many tasks and the corresponding relations. A powerful
role of networks/graphs is to bridge local feats that exist in vertices as they
blossom into patterns that help explain how nodal relations and their edges
impacts a complex effect that ripple via a graph. User cluster are formed as a
result of interactions between entities. Many users can hardly categorize their
contact into groups today such as “family”, “friends”, “colleagues” etc. Thus,
the need to analyze such user social graph via implicit clusters, enables the
dynamism in contact management. Study seeks to implement this dynamism
via a comparative study of deep neural network and friend suggest algorithm.
We analyze a user’s implicit social graph and seek to automatically create
custom contact groups using metrics that classify such contacts based on a
user’s affinity to contacts. Experimental results demonstrate the importance
of both the implicit group relationships and the interaction-based affinity in
suggesting friends.
IRJET- Link Prediction in Social NetworksIRJET Journal
1) The document discusses link prediction in social networks, which aims to predict missing or new links based on analyzing the evolution of social networks.
2) It reviews different approaches for link prediction, including supervised learning methods like decision trees and support vector machines, as well as unsupervised methods based on structural features.
3) Various metrics for analyzing social network structure are described, like centralization, adjacency, closeness, and clustering coefficient, which can be used as features for link prediction models.
Neo4J and MongoDB were evaluated for performance on social networking metrics. Network closure and assortativity had higher throughput than distance in Neo4J, since distance requires graph traversal while the others only require lookups. Embedded Java API outperformed other access methods due to eliminating network overhead. MongoDB was 10-100x slower than Neo4J RESTful and embedded versions for distance metric, which requires many index lookups that MongoDB handles less efficiently than Neo4J's relationship expander. Future work includes integrating social metrics into benchmark frameworks to better evaluate databases for complex graph operations.
ABSTRACT
This paper introduces an evolutionary approach to enhance the process of finding central nodes in mobile networks. This can provide essential information and important applications in mobile and social networks. This evolutionary approach considers the dynamics of the network and takes into consideration the central nodes from previous time slots. We also study the applicability of maximal cliques algorithms in mobile social networks and how it can be used to find the central nodes based on the discovered maximal cliques. The experimental results are promising and show a significant enhancement in finding the central nodes.
IRJET - Twitter Spam Detection using CobwebIRJET Journal
This document discusses using Cobweb clustering and Gradient Boosting techniques to detect spam on Twitter. Cobweb clustering creates a classification tree to predict attributes of new objects by summarizing the attribute distributions of existing nodes. Gradient Boosting is an ensemble method that uses multiple weak learners (decision trees) to create a stronger predictive model. The paper aims to combine these techniques to create an enhanced spam detection system. It also reviews several existing approaches for Twitter spam detection using techniques like Hidden Markov Models, Random Forests, and asynchronous link-based algorithms.
The possibility of modeling and abstracting interaction has been the key driver in social network-based
research as it facilitates, among other things, the generation of recommendations which is vital for most
businesses. Being ubiquitous, learning activities also facilitate the formation of these networks. Thus, to
gain insights into the evolution of activities and interactions that occur during learning events, it is
important to understand these networks and their respective models. In this article, we present a survey of
the representative methods employed in modeling various interactions observed in learner-centered
networks. Finally, we comparatively analyze the respective models and identify which models perform
better in respective cases.
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
This document provides an overview of social network analysis, including what social networks are, what can be learned from analyzing social networks, and how social network analysis can be performed. Some key findings that can be uncovered include the six degrees of separation principle, the 80-20 rule of social popularity where a minority of nodes have most connections, how to identify influential nodes, and how to group similar nodes into communities. Various metrics and models are described for analyzing features like path lengths, degree distributions, ranking nodes, measuring community structure, and more. Examples of social network analysis are also provided.
The document presents a new greedy incremental approach for community detection in social networks. It begins by calculating the degree of nodes and sorting them in descending order. Initial communities are formed with the highest degree nodes. Then nodes are incrementally added to communities if it increases the community density. The approach is tested on standard datasets and able to detect communities reasonably well in less dense graphs. However, there is scope to improve performance on very dense graphs such as implementing it in parallel processing.
Greedy Incremental approach for unfolding of communities in massive networks
cs224w-79-final
1. Classify My Social Contacts into Circles
Stanford University
CS224W Fall 2014
Amer Hammudi
(SUNet ID: ahammudi)
ahammudi@stanford.edu
Darren Koh
(SUNet: dtkoh)
dtkoh@stanford.edu
Jia Li
(SUNet: jli14)
jli14@stanford.edu
1 Abstract
Manually maintaining the contact list of fam-
ily, friends, business, and so on is a classical
problem of contact list management. The soar-
ing popularity and population of users in social
media made managing contact list is a very te-
dious and time consuming task. In this paper,
we proposed a topology-based approach to iden-
tify leaders then build social circles around those
leaders. Our result is comparable to previous
publications. We could apply this work to tar-
geting content for friends list and targeting ads
for business list.
2 Introduction
Using social media for our personal and business
purposes has becoming an essential accessories
in our daily life. For individual professional
network (such as LinkedIn) you want to group
recruiters for job hunting, group Alumni to
keep up with mutual interest and etc. In the
business aspect, imagine you have to manually
group your shoppers in Electronic Shoppers,
Apparel Shoppers and etc so that every week
you can target them with specific promotions
and sales in their category. It is very important
for us to come out with a sophisticated way to
automatically grouping contact list to remove
the time and impactful of managing contact list.
In this paper, we use the Facebook Dataset [1],
one of the largest social networking website.
We are interested in the network structure and
users profiles within an Ego network [2]
3 Previous Work
We reviewed some relevant literatures done by
domain expert. Jure [2][3] demonstrates how to
apply probabilistic method to model the circles
as latent variables.
The authors attempt to find social circles
in the Ego-network using probabilistic method-
ology that models the circles as latent variables.
The model takes input G(V, E) and solves for
a set of K circles and maps each node in the
Ego-Network to one or more circles.
At input they know the properties of all
the nodes in the ego network. They first encode
a similarity vector between any two nodes in the
ego network φ(x, y). Each circle Ck, k ∈ [1, K]
is going to be associated with a vector θk which
denotes the information of the circle itself, this
vector is learned from the empirical data. The
likelihood that a pair of contacts belongs to
a circle is related to the inner product of the
encoded vector of the pair of contacts and the
circle’s descriptor vector < φ(x, y), θk >. The
unsupervised machine learning model is used
to optimize the log-likelihood of Equation 2 [2]
1
2. with θk and αk the parameters of the model.
They solve for the number of circles K, and for
each circle Ck, k ∈ [1, K], its information vector
θk, and the list of contacts {x ∈ Ck} identified
belonging to the circle.
The similarity vector φ(x, y) could be built
from a function σ(x, y) which is a 0-1 vector
that describes which common properties x and y
share, or from the a certain function based on u
as well σ(x, u) and σ(y, u). They also proposed
compressed versions of σ (x, y), still producing a
0-1 vector for the common properties, but with
a less dimension. The authors propose the tree
structure to encode features from nodes (x, y)
into σ(x, y) in a clean way.
They ran the prediction over three datasets
from Facebook, Google+, and Twitter, for
the original and compressed version of the
similarity-scores. They evaluate the perfor-
mance using the Balanced Error Rate and F-1
Score. For Facebook data, using the original
version of the similarity-score greatly improves
the metrics relative to the compressed version.
However it makes little difference for the other
two datasets. One reason is that Facebook
stores more user’s information compared to
Google+ and Twitter. As for the number of
circles K, although the algorithm is designed
for an arbitrary number of circles, empirically
the optimized K is usually no more than 10.
Another similar work done by Yu and Lin
[5] the authors proposed detecting social circles
in ego networks using edge-based clustering
algorithm. Their approach can be summarized
in 3 steps.
1. Construct missing edges, using closness
measure the authors precited and created
missing edges
2. Find similarity of adjacent edges and cluster
edges using single-linkage hierarchical clus-
tering (using Jaccard Similarity for topolog-
ical and Cosine Similarity for profile vec-
tors).
3. Finally each circle is given a label by ab-
stracting their common properties.
Comparing this algorithms performance (ac-
curacy and speed) against Link- Community
[6], Low-Rank Embedding [5], and Probabilis-
tic Model [1] Base on the dataset they used
[4], they achieve 1-BER score of 0.76 (26.7%
and 20.6% higher than Link Community and
Low-Rank Embedding), and F-measure of
0.52 (205.9% and 188.9% higher than Link
Community and Low-Rank Embedding). In
terms of speed, they outperform Probabilistic
Model [1] by approximately 10 magnitude (1000
vertices in 1,000 seconds vs 10,000 seconds in
Probabilistic Model), but slightly higher than
Link-Community and Low-Rank Embedding.
4 Data Collection and Process-
ing
The goal of our project is to use network
analysis technique to group a person’s contacts
into different circles or lists, so it is important
for us to find a social network dataset that
includes ground truth circles for comparison
purposes. The dataset we used for this paper
is an undirected Facebook graph provided by
Stanford Large Network Dataset Collection
[1]. This dataset provides ground truth of 10
person’s social network (Ego Networks) with
their circles and individual profiles. To name
some of the ego network, they are {0, 107,
686, 1684, 3980, and etc}. Table 1 shows
the Dataset Statistics (compute using SNAP [4]).
Our algorithm and approach (Section 6)
required the data to be split into two parts,
network topology and node’s profiles (node’s
features) for processing. The first part is, we
partition the graph into 10 Ego networks and
2
3. each ego network contains n number of circles.
In the second part, we collect all node’s features
in an adjacent matrices for our next part of
algorithm which incorporate similarity measure
to further improve the accuracy.
Table 1: Dataset Statistics
Ego Networks 10
Total Circles 193
Total Nodes 4039
Undirected Edges 88,234
Zero Deg Nodes 0
Max WCC Size 1 (4039 nodes)
Max SCC Size 1 (4039 nodes)
Total Triads 1,612,010
5 Dataset Analysis
5.1 Graph Property
Our approach of constructing circles uses indi-
viduals in an ego net who we consider leaders
and from circles using these leaders friends.
The power low degree distribution of one ego
network shows that several individuals are as
we expect the “popular ones” who we can use
to build social circles. Please see Figure 1 and
Figure 2
100
101
102
100 101 102 103
Count
Out-degree
Ego 107 - OutDegree Distribution
Figure 1: Ego 107 log-log Out-Degree Distribu-
tion Plot
100
101
100 101 102
Count
Out-degree
Ego 686 - OutDegree Distribution
Figure 2: Ego 686 log-log Out-Degree Distribu-
tion Plot
6 Algorithm and Approach
Using the ground truth circles we extracted the
leader of each community by using the highest
score of the product, du · cu where du is the de-
gree density of node u within the circle and cu is
the clustering coefficient of node u using node u
friends that belong to the relevant circle. using
the leaders we formed new circles using the lead-
ers friends and compared our result to published
results. Our analysis suggests that forming cir-
cles using leader friends produces results compa-
rable to published results[1][5]. Accordingly, our
approach to form social circles in ego network is
composed of
1. Define a method to extract leaders within
an ego network
2. Use leader friends to form social circles
6.1 Finding leaders in an ego network
To find leaders in ego network we considered the
following parameters.
• bu is the betweenness score of node u in the
ego network.
• cu is the clustering coefficient of node u in
the ego network.
• du is the degree density of node u in the ego
network.
3
4. • su is the textual similarity between node u
and the ego using ego and node u feature
vectors.
6.1.1 Detail explanation of parameters
• bu The betweenness centrality is an indica-
tor of a node’s centrality in a network. A
node with high centrality is an indication of
the node importance and subsequently the
node leadership within a community.
• cu The the local clustering coefficient of
node u, computed as follows. Let G be the
sub graph of node u and node u friends then
cu =
2 · E(G)
|G|(|G| − 1)
(1)
where E(G) total number of edges in G and
|G| is the number of nodes in G. A high clus-
tering score is a measure of node u impor-
tance and perhaps an indication of node u
facilitating connectivity between his friends
• du The degree density of node u and is an
indication of the popularity of node u in the
ego network.
• su The measure of how similar node u to
the ego and is not a discriminating attribute
of node u importance within a community
in the ego network or an indication of node
u leadership attributes. Accordingly, our
method to find network leaders is focused
on the ego network topology. However, if
we consider su = σ(u, v) closeness measure
between two nodes u, v this measure can be
used in the algorithm to form the social cir-
cle using an identified leader and the leader
fiends, more on that below.
6.2 Identifying Leaders
To identify the leaders in an ego network, we
considered severals approaches, one of the ap-
proaches is supervised machine learning. Using
nodes with highest score of bu · cu · du as the
leaders of the ground truth circles and the rest
of nodes in the ego network as 0, we trained a lo-
gistic regression function and used it to identify
the leaders in the ego network. For each node in
the ego network we computed
1
1 − exp−Xθ
(2)
where θ is the trained weights and X =
{1, bu, cu, du}. After analyzing the BER and F1
scores of the circles formed using the top 10
nodes, we found that the results are not satisfac-
tory. In an attempt to understand the reasons
for the poor result, we found that the data is not
separable. Figure 2 shown plot on bu, cu, du.
We further used a quadratic kernel for the su-
pervised learning model and the results did not
improve. We think that the number of leaders
identified relative to the number of nodes is too
low and much more data is needed to make the
supervised learning model effective.
Figure 2
In view of the machine learning results, we
simplified our approach of finding leaders in ego
network by using the product of different combi-
nation of the three features defined above as
• (bu · cu · du)
• (bu · cu)
• (bu · du)
4
5. • (cu · du)
The algorithm to find leaders is the following
1. ∀u ∈ G compute u score using the parame-
ters bu, cu, du. where G is the ego network
2. Sort scores by descending order u0, ..., un
where u0 is the highest score
3. Add u0 to the set L of leaders, and let C(u0)
be the set of nodes containing u0 and u0
friends.
4. for ui , i ∈ [1, n] and Len(L) < 10 add node
ui to L if
|C(ui)∩C(uj)|
min(|C(ui)|,|C(uj)|) < , ∀uj ∈ L
in the algorithm is a tuning parameter, based
on our experiments, the best result is using =
60%. we decided to fix the number of circles to
create from any ego network to 10, for future
research the number of circles should vary based
on certain heuristic.
Our best result achieved using the combina-
tion bu · cu. results and scores are discussed in
results section.
6.3 Use leader friends to form social
circles
To form the social circle using the leader friends,
our method uses conductance score to trim nodes
at the edges of the community. The algorithm
takes the set Cu that contains the leader u and
the leader friends and order the nodes according
to degree.
The pseudocode of the algorithm to form the
circles is shown in Algorithm 1, the Cu v is
the set Cu less the node v. In the Conductance
function, the definition of each parameters are
• cutSet is the set of nodes to be removed
• v cutSet is the u∈cutSet du, where du is the
degree of node u
• keepSet is the set of nodes to keep
• v keepSet is u∈keepSet du
• e total is the total edges in the sub graph
containing the leader and the leader friends
• E(G(cutSet)) is number of edges in the sub
graph containing the nodes in the cutSet
Algorithm 1 Forming Circles
1: set cutSet ← {}
2: set keepSet ← Cu
3: set e total ← total # of edges in Cu
4: add node v with lowest degree to cutSet and re-
move v from keepSet
5: for node v ∈ keepSet do
6: x ← Conductance(
cutSet ∪ v, Volume(cutSet ∪ v),
keepSet v, Volume(keepSet v),
e total)
7: y ← Conductance(
cutSet, Volume(cutSet),
keepSet, Volume(keepSet),
e total)
8: if x < y then
9: cutSet ← cutSet + v
10: keepSet ← keepSet v
11: end if
12: end for
13: function Conductance(cutSet, v cutSet,
keepSet, v keepSet, e total)
14: subGraphCut ← E(G(cutSet))
15: subGraphRemain ← E(G(keepSet))
16: cut ← (e total - SubGraphCut - Sub-
GraphRemain)
17: return cut ÷ min(v cutSet, v keepSet)
18: end function
19: function Volume(setA)
20: vol ← 0
21: for node u ∈ setA do
22: vol ← vol+ out-degree of u
23: end for
24: return vol
25: end function
We considered using the measure σ(u, v),
which is the cosine measure of closeness of the
node v textual attributes and the leader node
u for adding node v to the cutSet, however we
found that, using only conductance score is ef-
ficient and produced good result. We think it
5
6. is possible to improve the result by using the
measure σ(u, v) and stop the algorithm based on
a second condition related to cosine measure of
closeness between the node v considered and the
leader node u.
7 Results and Analysis
We followed the above approach for Leader-
Identification and Circle-Forming. To evaluate
the results, we use two metrics: the Balanced
Error Rate (BER) and the F1-Score.
BER for predicted circle C and ground-truth
circle ¯C is defined as
BER(C, ¯C) =
1
2
|C ¯C|
|C|
+
|¬C ¬ ¯C|
|¬C|
(3)
where ¬C is the complement of C in the ver-
tices set V . The higher 1-BER is the better the
prediction is. F1-Score is the harmonic mean of
Precision and Recall,
F1 =
2Precision · Recall
Precision + Recall
. (4)
In order to clarify the difference between 1-
BER and F1-scores, we rewrite them in terms
of prediction True Positive (TP), False Positive
(FP), True Negative (TN), False Negative (FN),
1 − BER = 1
2
TP
TP+FP + TN
FN+TN (5)
F1 = 2TP
2TP+FP+FN (6)
We see that 1-BER takes account of True Nega-
tive (TN) whilst F1-score doesn’t.
We have yet to define how to match a set of
predicted circles C with the ground-truth set ¯C.
We find a one-to-one mapping f : C → ¯C such
that f maximises
C∈C
score(C, f(C)) (7)
where score is either 1-BER or F1. In the case
that |C| >= | ¯C|, no two predicted circles will be
mapped to the same ground-truth circle; whilst
if |C| < | ¯C| we no longer enforce this constraint.
We ran the algorithm in Section 6 and scored
the predicted circles using 1-BER and F1-scores
in Table 2.
Ego Id 1-BER F1
0 0.63 0.32
107 0.60 0.49
1684 0.68 0.29
1912 0.69 0.38
3437 0.55 0.57
348 0.74 0.38
3980 0.63 0.28
414 0.83 0.49
686 0.76 0.47
698 0.77 0.35
Avg 0.70 0.40
Table 2: Evaluation metrics of the Leader-based
Social Circles Identification Algorithm
As a comparison, [2] obtains 1-BER score 0.77,
F1-Score 0.40 using Compressed Features (col-
lapsed profile information by category); 1-BER
score 0.84, F1-score 0.59 using Uncompressed
Features (the original profile information of peo-
ple). [5] obtains 1-BER score 0.76, F1-score 0.52.
Our approach works well and is largely com-
parable to [2] using Compressed Features or [5].
Our approach that first identifies a leader, then
builds the circle of friends around the leader is
therefore firmly validated.
Given that [2] draws on personal information
attached to the network, and ours is based on
the key topology indicators, default of running
on larger data sets, we could interpret: the per-
sonal information attached to nodes and edges,
at a coarse level, is already reflected in the topol-
ogy of the network. Only when we look into the
personal information at a finer granularity will
we be able to improve the prediction.
7.1 Ground Truth Circles vs Pre-
dicted Circles
To visualize how the circles prediction perform,
Figure 4 shown how our algorithm accurately
6
7. predicted (Yellow overlay on the left two graphs
are correctly predicted nodes, Red nodes are
ground truth, and Green nodes are predicted
nodes) the circles (2 samples) for Ego-686
Network along with the “Heat Map” of the
Clustering Coefficient (cu) and Betweenness
Centrality (bu). While Figure 5 shows the weak-
ness of our algorithm, which failed to predict
most of the nodes in the circles (2 samples) for
Ego-3980 Network.
Since our algorithm starts by picking lead-
ers, the heat map highlights the algorithm
strength and weakness. It is clear from the
figures that for circles with low clustering and
few nodes our algorithm fails to pick a leader
to represent the community; accordingly our
score for such circles is low while our score
for dense circles that contain high centrality
score nodes our algorithm performs much better.
Ego-686 Network is visually dense, has dis-
tinct connected components. Nodes in the core
components have high betweenness centrality.
Our algorithm successfully picks the nodes from
these circles as leaders and forms the community
using the leader friends. On the other hand,
Ego-3980 Network is sparser. The ground truth
circles lack high local clustering and centrality
score nodes for our algorithm to pick as leaders,
accordingly our algorithm produces low results
and fails to predict such circles.
8 Conclusion and Future Work
Our approach to form social circles in an ego
network using leaders was validated based on the
comparable results to published methods that we
achieved. It is possible to achieve better results,
however, by improving on two weaknesses in our
method
1. Allowing the number of circles to vary in-
stead of fixing the number to 10 as our algo-
rithm suggests. Our results were negatively
affected due to this limitation. For example
two of the ego networks in our data set con-
tained less than 2 social circles while others
contain more.
2. Incorporating person’s information features
to the circle forming algorithms to cut cer-
tain nodes and add nodes in the ego net-
work that belong to the community but not
friends with the leader found for this com-
munity.
We believe that since our approach is easily scal-
able, company like Facebook may find our ap-
proach useful in the sense that our method can
be used to create initial circles to suggest to the
ego as starting point after which the ego may
refine to his/her liking.
9 Contribution
All team members {Amer Hammudi, Darren
Koh, Jia Li} involved and contributed equally on
all parts of this project. Followings are contri-
bution of individual who involved more in depth
on certain area.
• Amer Hammudi: Model and Algorithm
• Darren Koh: Data Processing/Analysis and
Graph Plotting
• Jia Li: Literature research and Result Scor-
ing
10 ACKNOWLEDGEMENTS
We would like to thank Himabindu (CS224W
Head TA), Vikesh Khanna (CS224W TA), and
Bruno Abrahao (Snap Group, Infolab @ Stan-
ford) for their advice and input.
7
10. References
[1] Jure Leskovec and Andrej Krevl, ”SNAP
Datasets: Stanford Large Network Dataset
Collection” Jun 2014
[2] Julian McAuley and Jure Leskovec, “Learn-
ing to discover social circles in ego net-
works”, Advances in Neural Information
Processing Systems 25 (NIPS 2012)
[3] Julian McAuley and Jure Leskovec, “Dis-
covering Social Circles in Ego Networks”,
ACM Transactions on Knowledge Discovery
from Data (TKDD), Vol 8, No 1, Feb 2012.
doi: 10.1145/2556612
[4] Jure Leskovec and Rok Sosi, “Snap.py:
SNAP for Python, a general purpose net-
work analysis and graph mining tool in
Python”
[5] Yu Wang and Lin Gao, “An Edge-based
Clustering Algorithm to Detect Social Cir-
cles in Ego Networks”, Journal of Comput-
ers, Vol 8, No 10 (2013), 2575-2582, Oct
2013. doi: 10.4304/jcp.8.10.2575-2582
10