Upcoming SlideShare
×

# Social network analysis

462 views

Published on

Published in: Technology
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
462
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
0
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Social network analysis

1. 1. Betweenness an attribute of each edge: the number of pairs of nodes whose shortest path contain current edge. To find communities, we could do this: while exist betweenness > threshold a) cut the edge with largest betweenness; b) recalculate betweenness Community partition
2. 2. GN algorithm GN algorithm is used to calculate the betweenness of a graph. step1 step2 step3 Community partition
3. 3. Maximal clique, Complete bi-partite subgraph, and Frequent itemset Find a core of a graph, expand it by including more nodes whose number of edges to the core are large than a threshold. Here is a conclusion about complete bi-partite subgraph: assume the average degree of a graph with n nodes is d, then there must have at least one complete bi-partite subgraph Ks,t if below inequality is true. n(d/n)t ≥ s Community partition
4. 4. Normailized cuts Community partition Vol(S) is the number of edges with at least one end in S. Cut(S,T) is the number of edges that connect a node in S to a node in T. The small the normalized cut, the better, for it means we cut less edge and keep more edges in each subgraph.
5. 5. Eigenvalues and Eigenvectors of the Laplacian Matrix Laplacian Matrix = Degree Matrix - Adjacency Matrix Community partition Article in below link answered why the 2nd eigenvector of Laplacian Matrix give us a suggestion of how to partition. http://www.cnblogs.com/vivounicorn/archive/2012/02/10/2343377.html in this article, there are 3 partition methods based on Laplacian matrix: Minimum Cut, Ratio Cut and Normalized Cut
6. 6. Simrank approach simrank is used to measure the similarity between nodes. Node Similarity Random walks with restart. v' = βMv + (1-β)eN v : the column vector that reflects the probability the walker is at each of the nodes at previous round, we could init v as eN. v' : the result of current round. M : the transition matrix. N : the node we are dealing with. β: the probability that walker continue going to other nodes but node N. we say 'restart' based on the probability of (1-β) that walker would go back to N on each round. eN : a column vector that has 1 in the row for node N and 0’s elsewhere, thus (1-β)eN is the contribution to current result when walker 'restart'. transition matrix iterator result
7. 7. A scenario when dealing with Page Rank Add a probability factor to walk out circle No probability factor, trapped in the circle
8. 8. m Counting Triangles m Intuitively, in a social network, a friend of my friend stands a good chance to be my friend too, so counting the number of triangles helps us to measure the extent to which a graph looks like a social network. In a graph that has n nodes with m≥n edges, there must be no more than 2 nodes whose degree are larger than , we call these nodes the heavy hitter. We reduce the time complexity of counting triangles by dividing nodes into heavy hitter group and non-heavy hitter group.
9. 9. Neighborhood Properties of Graphs Neighborhood The neighborhood of radius d for a node v is the set of nodes u for which there is a path of length at most d from v to u, denoted by N(v,d). The diameter of a directed graph is the smallest integer d such that for every two nodes u and v there is a path of length d or less from u to v. For a undirected graph, we treat each edge as double-direction edge. Transitive Closure and Reachability The transitive closure of a graph is the set of pairs of nodes (u, v) such that there is a path from u to v of length zero or more. We say node u reaches node v if (u,v) is an item of transitive closure of the graph.
10. 10. Neighborhood Properties of Graphs Neighborhood The neighborhood of radius d for a node v is the set of nodes u for which there is a path of length at most d from v to u, denoted by N(v,d). The diameter of a directed graph is the smallest integer d such that for every two nodes u and v there is a path of length d or less from u to v. For a undirected graph, we treat each edge as double-direction edge. Transitive Closure and Reachability The transitive closure of a graph is the set of pairs of nodes (u, v) such that there is a path from u to v of length zero or more. We say node u reaches node v if (u,v) is an item of transitive closure of the graph.