Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Web scraping in python by Saurav Tomar 970 views
- Almost Scraping: Web Scraping witho... by Michelle Minkoff 27353 views
- Scraping data from the web and docu... by Tommy Tavenner 5625 views
- Social Media Mining - Chapter 1 (In... by SocialMediaMining 1204 views
- Web Scraping and Data Extraction Se... by PromptCloud 2413 views
- Web Scraping With Python by Robert Dempsey 10915 views

1,102 views

Published on

Free book and slides at http://socialmediamining.info/

Published in:
Education

No Downloads

Total views

1,102

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

46

Comments

0

Likes

3

No embeds

No notes for slide

2|E| = the total number of degrees,

Normally, \alpha =0.85 and \beta = 0.15

- 1. Network Measures SOCIAL MEDIA MINING
- 2. 2Social Media Mining Measures and Metrics 2Social Media Mining Network Measureshttp://socialmediamining.info/ Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate these slides into your presentations, please include the following note: R. Zafarani, M. A. Abbasi, and H. Liu, Social Media Mining: An Introduction, Cambridge University Press, 2014. Free book and slides at http://socialmediamining.info/ or include a link to the website: http://socialmediamining.info/
- 3. 3Social Media Mining Measures and Metrics 3Social Media Mining Network Measureshttp://socialmediamining.info/ Klout It is difficult to measure influence!
- 4. 4Social Media Mining Measures and Metrics 4Social Media Mining Network Measureshttp://socialmediamining.info/ Why Do We Need Measures? β’ Who are the central figures (influential individuals) in the network? β Centrality β’ What interaction patterns are common in friends? β Reciprocity and Transitivity β Balance and Status β’ Who are the like-minded users and how can we find these similar individuals? β Similarity β’ To answer these and similar questions, one first needs to define measures for quantifying centrality, level of interactions, and similarity, among others.
- 5. 5Social Media Mining Measures and Metrics 5Social Media Mining Network Measureshttp://socialmediamining.info/ Centrality defines how important a node is within a network Centrality
- 6. 6Social Media Mining Measures and Metrics 6Social Media Mining Network Measureshttp://socialmediamining.info/ Centrality in terms of those who you are connected to
- 7. 7Social Media Mining Measures and Metrics 7Social Media Mining Network Measureshttp://socialmediamining.info/ Degree Centrality β’ Degree centrality: ranks nodes with more connections higher in terms of centrality β’ ππ is the degree (number of friends) for node π£π β i.e., the number of length-1 paths (can be generalized) In this graph, degree centrality for node π£1 is π1=8 and for all others is ππ = 1, π β 1
- 8. 8Social Media Mining Measures and Metrics 8Social Media Mining Network Measureshttp://socialmediamining.info/ Degree Centrality in Directed Graphs β’ In directed graphs, we can either use the in- degree, the out-degree, or the combination as the degree centrality value: β’ In practice, mostly in-degree is used. ππ ππ’π‘ is the number of outgoing links for node π£π
- 9. 9Social Media Mining Measures and Metrics 9Social Media Mining Network Measureshttp://socialmediamining.info/ Normalized Degree Centrality β’ Normalized by the maximum possible degree β’ Normalized by the maximum degree β’ Normalized by the degree sum
- 10. 10Social Media Mining Measures and Metrics 10Social Media Mining Network Measureshttp://socialmediamining.info/ Degree Centrality (Directed Graph)Example Normalized by the maximum possible degree E B A C F D G Node In-Degree Out-Degree Centrality Rank A 1 3 1/2 1 B 1 2 1/3 3 C 2 3 1/2 1 D 3 1 1/6 5 E 2 1 1/6 5 F 2 2 1/3 3 G 2 1 1/6 5
- 11. 11Social Media Mining Measures and Metrics 11Social Media Mining Network Measureshttp://socialmediamining.info/ Degree Centrality (undirected Graph) Example Node Degree Centrality Rank A 4 2/3 2 B 3 1/2 5 C 5 5/6 1 D 4 2/3 2 E 3 1/2 5 F 4 2/3 2 G 3 1/2 5 E B A C F D G
- 12. 12Social Media Mining Measures and Metrics 12Social Media Mining Network Measureshttp://socialmediamining.info/ Eigenvector Centrality β’ Having more friends does not by itself guarantee that someone is more important β Having more important friends provides a stronger signal Phillip Bonacich β’ Eigenvector centrality generalizes degree centrality by incorporating the importance of the neighbors (undirected) β’ For directed graphs, we can use incoming or outgoing edges
- 13. 13Social Media Mining Measures and Metrics 13Social Media Mining Network Measureshttp://socialmediamining.info/ Formulation β’ Letβs assume the eigenvector centrality of a node is π π π£π (unknown) β’ We would like π π π£π to be higher when important neighbors (node π£π with higher π π π£π ) point to us β Incoming or outgoing neighbors? β For incoming neighbors π΄π,π = 1 β’ We can assume that π£πβs centrality is the summation of its neighborsβ centralities β’ Is this summation bounded? β’ We have to normalize! ο¬: some fixed constant
- 14. 14Social Media Mining Measures and Metrics 14Social Media Mining Network Measureshttp://socialmediamining.info/ β’ Let ο¨ β’ This means that πͺ π is an eigenvector of adjacency matrix π΄ π (or π΄ when undirected) and ο¬ is the corresponding eigenvalue β’ Which eigenvalue-eigenvector pair should we choose? Eigenvector Centrality (Matrix Formulation)
- 15. 15Social Media Mining Measures and Metrics 15Social Media Mining Network Measureshttp://socialmediamining.info/ Finding the eigenvalue by finding a fixed pointβ¦ β’ Start from an initial guess πΆπ(0) (e.g., all centralities are 1) and iterative π‘ times β’ We can write πΆπ(0) as a linear combination of eigenvectors π£πβs of the π΄ π β’ Substituting this, we get π1 is the largest eigenvalue
- 16. 16Social Media Mining Measures and Metrics 16Social Media Mining Network Measureshttp://socialmediamining.info/ Finding the eigenvalue by finding a fixed pointβ¦ β’ As π‘ grows, we will have in the limit β’ Or equivalently β’ If we start with an all positive πΆπ(0) all πΆπ(π‘)βs will be positive (why?) β All the centrality values would be positive β We need an eigenvalue-eigenvector pair that guarantees all centralities have the same sign β’ E.g., for comparison purposes
- 17. 17Social Media Mining Measures and Metrics 17Social Media Mining Network Measureshttp://socialmediamining.info/ Eigenvector Centrality, cont. So, to compute eigenvector centrality of π΄, 1. We compute the eigenvalues of A 2. Select the largest eigenvalue ο¬ 3. The corresponding eigenvector of ο¬ is π π. 4. Based on the Perron-Frobenius theorem, all the components of π πwill be positive 5. The components of π π are the eigenvector centralities for the graph.
- 18. 18Social Media Mining Measures and Metrics 18Social Media Mining Network Measureshttp://socialmediamining.info/ Eigenvector Centrality: Example 1 Eigenvalues are Largest Eigenvalue Corresponding eigenvector (assuming π π has norm 1)
- 19. 19Social Media Mining Measures and Metrics 19Social Media Mining Network Measureshttp://socialmediamining.info/ Eigenvector Centrality: Example 2 ο¬ = (2.68, -1.74, -1.27, 0.33, 0.00) Eigenvalues Vector ο¬max = 2.68
- 20. 20Social Media Mining Measures and Metrics 20Social Media Mining Network Measureshttp://socialmediamining.info/ Katz Centrality β’ A major problem with eigenvector centrality arises when it deals with directed graphs β’ Centrality only passes over outgoing edges and in special cases such as when a node is in a directed acyclic graph centrality becomes zero β The node can have many edge connected to it Eigenvector Centrality Elihu Katz β’ To resolve this problem we add bias term ο’ to the centrality values for all nodes
- 21. 21Social Media Mining Measures and Metrics 21Social Media Mining Network Measureshttp://socialmediamining.info/ Katz Centrality, cont. Bias termControlling term Rewriting equation in a vector form vector of all 1βs Katzcentrality:
- 22. 22Social Media Mining Measures and Metrics 22Social Media Mining Network Measureshttp://socialmediamining.info/ Katz Centrality, cont. β’ When Ξ±=0, the eigenvector centrality is removed and all nodes get the same centrality value π½ β As πΌ gets larger the effect of π½ is reduced β’ For the matrix (πΌ β πΌπ΄ π) to be invertible, we must have β πππ‘ πΌ β πΌπ΄ π β 0 β By rearranging we get πππ‘ AT β πΌβ1 πΌ = 0 β This is basically the characteristic equation, β The characteristic equation first becomes zero when the largest eigenvalue equals Ξ±-1 The largest eigenvalue is easier to compute (power method) In practice we select πΆ < π/π, where π is the largest eigenvalue of π¨ π»
- 23. 23Social Media Mining Measures and Metrics 23Social Media Mining Network Measureshttp://socialmediamining.info/ β’ The Eigenvalues are -1.68, -1.0, -1.0, 0.35, 3.32 β’ We assume Ξ±=0.25 < 1/3.32 and π½ = 0.2 Katz Centrality Example Most important nodes!
- 24. 24Social Media Mining Measures and Metrics 24Social Media Mining Network Measureshttp://socialmediamining.info/ PageRank β’ Problem with Katz Centrality: β In directed graphs, once a node becomes an authority (high centrality), it passes all its centrality along all of its out-links β’ This is less desirable since not everyone known by a well-known person is well-known β’ Solution? β We can divide the value of passed centrality by the number of outgoing links, i.e., out-degree of that node β Each connected neighbor gets a fraction of the source nodeβs centrality
- 25. 25Social Media Mining Measures and Metrics 25Social Media Mining Network Measureshttp://socialmediamining.info/ PageRank, cont. What if the degree is zero? Similar to Katz Centrality, in practice, πΆ < π/π, where π is the largest eigenvalue of π΄ π π·β1 . In undirected graphs, the largest eigenvalue of π΄ π π·β1 is π = 1; therefore, πΆ < π.
- 26. 26Social Media Mining Measures and Metrics 26Social Media Mining Network Measureshttp://socialmediamining.info/ PageRank Example β’ We assume Ξ±=0.95 < 1 and and π½ = 0.1
- 27. 27Social Media Mining Measures and Metrics 27Social Media Mining Network Measureshttp://socialmediamining.info/ PageRank Example β Alternative Approach [Markov Chains] Step A B C D E F G 0 1/7 1/7 1/7 1/7 1/7 1/7 1/7 1 B/2 C/3 A/3 + G A/3 + C/3 + F/2 A/3 + D C/3 + B/2 F/2 + E 0.071 0.048 0.190 0.167 0.190 0.119 0.214 Using Power Method βYou don't understand anything until you learn it more than one wayβ πΌ=1 and π½ =0? Marvin Minsky (1927-2016)
- 28. 28Social Media Mining Measures and Metrics 28Social Media Mining Network Measureshttp://socialmediamining.info/ PageRank: Example Step A B C D E F G Sum 1 0.143 0.143 0.143 0.143 0.143 0.143 0.143 1.000 2 0.071 0.048 0.190 0.167 0.190 0.119 0.214 1.000 3 0.024 0.063 0.238 0.147 0.190 0.087 0.250 1.000 4 0.032 0.079 0.258 0.131 0.155 0.111 0.234 1.000 5 0.040 0.086 0.245 0.152 0.142 0.126 0.210 1.000 6 0.043 0.082 0.224 0.158 0.165 0.125 0.204 1.000 7 0.041 0.075 0.219 0.151 0.172 0.115 0.228 1.000 8 0.037 0.073 0.241 0.144 0.165 0.110 0.230 1.000 9 0.036 0.080 0.242 0.148 0.157 0.117 0.220 1.000 10 0.040 0.081 0.232 0.151 0.160 0.121 0.215 1.000 11 0.040 0.077 0.228 0.151 0.165 0.118 0.220 1.000 12 0.039 0.076 0.234 0.148 0.165 0.115 0.223 1.000 13 0.038 0.078 0.236 0.148 0.161 0.116 0.222 1.000 14 0.039 0.079 0.235 0.149 0.161 0.118 0.219 1.000 15 0.039 0.078 0.232 0.150 0.162 0.118 0.220 1.000 Rank 7 6 1 4 3 5 2
- 29. 29Social Media Mining Measures and Metrics 29Social Media Mining Network Measureshttp://socialmediamining.info/ Effect of PageRank PageRank Node Rank A 7 B 6 C 1 D 4 E 3 F 5 G 2
- 30. 30Social Media Mining Measures and Metrics 30Social Media Mining Network Measureshttp://socialmediamining.info/ Centrality in terms of how you connect others (information broker)
- 31. 31Social Media Mining Measures and Metrics 31Social Media Mining Network Measureshttp://socialmediamining.info/ Betweenness Centrality Another way of looking at centrality is by considering how important nodes are in connecting other nodes The number of shortest paths from π to π‘ that pass through π£π The number of shortest paths from vertex π to π‘ β a.k.a. information pathways Linton Freeman
- 32. 32Social Media Mining Measures and Metrics 32Social Media Mining Network Measureshttp://socialmediamining.info/ Normalizing Betweenness Centrality β’ In the best case, node π£π is on all shortest paths from π to π‘, hence, Therefore, the maximum value is (π β 1)(π β 2) Betweenness centrality:
- 33. 33Social Media Mining Measures and Metrics 33Social Media Mining Network Measureshttp://socialmediamining.info/ Betweenness Centrality: Example 1
- 34. 34Social Media Mining Measures and Metrics 34Social Media Mining Network Measureshttp://socialmediamining.info/ Betweenness Centrality: Example 2 Node Betweenness Centrality Rank A 16 + 1/2 + 1/2 1 B 7+5/2 3 C 0 7 D 5/2 5 E 1/2 + 1/2 6 F 15 + 2 1 G 0 7 H 0 7 I 7 4
- 35. 35Social Media Mining Measures and Metrics 35Social Media Mining Network Measureshttp://socialmediamining.info/ Computing Betweenness β’ In betweenness centrality, we compute shortest paths between all pairs of nodes to compute the betweenness value. β’ Trivial Solution: β Use Dijkstra and run it π(π) times β We get an π(π3 ) solution β’ Better Solution: β Brandes Algorithm: β’ π(ππ) for unweighted graphs β’ π(ππ + π2 log π) for weighted graphs
- 36. 36Social Media Mining Measures and Metrics 36Social Media Mining Network Measureshttp://socialmediamining.info/ Brandes Algorithm [2001] ππππ(π , π€) is the set of predecessors of π€ in the shortest paths from π to π€. β In the most basic scenario, π€ is the immediate child of π£π There exists a recurrence equation that can help us determine πΏπ (π£π)
- 37. 37Social Media Mining Measures and Metrics 37Social Media Mining Network Measureshttp://socialmediamining.info/ How to compute π ππ Source: Networks, Crowds, and Markets: Reasoning about a Highly Connected World. By David Easley and Jon Kleinberg Original Network Sum of Parents values BFS starting at A (i.e., π )
- 38. 38Social Media Mining Measures and Metrics 38Social Media Mining Network Measureshttp://socialmediamining.info/ How do you compute πΏπ (π£π) No shortest path starting from 1 passes through 9 2/2 (1+0) 1/1(3/2+1)+1/1(3/2+1)
- 39. 39Social Media Mining Measures and Metrics 39Social Media Mining Network Measureshttp://socialmediamining.info/ Centrality in terms of how fast you can reach others
- 40. 40Social Media Mining Measures and Metrics 40Social Media Mining Network Measureshttp://socialmediamining.info/ Closeness Centrality β’ The intuition is that influential/central nodes can quickly reach other nodes β’ These nodes should have a smaller average shortest path length to others Closeness centrality: Linton Freeman
- 41. 41Social Media Mining Measures and Metrics 41Social Media Mining Network Measureshttp://socialmediamining.info/ Closeness Centrality: Example 1
- 42. 42Social Media Mining Measures and Metrics 42Social Media Mining Network Measureshttp://socialmediamining.info/ Closeness Centrality: Example 2 (Undirected) Node A B C D E F G H I D_Avg Closeness Centrality Rank A 0 1 2 1 2 1 2 3 2 1.750 0.571 1 B 1 0 1 2 1 2 3 4 3 2.125 0.471 3 C 2 1 0 3 2 3 4 5 4 3.000 0.333 8 D 1 2 3 0 1 2 3 4 3 2.375 0.421 4 E 2 1 2 1 0 3 4 5 4 2.750 0.364 7 F 1 2 3 2 3 0 1 2 1 1.875 0.533 2 G 2 3 4 3 4 1 0 3 2 2.750 0.364 7 H 3 4 5 4 5 2 3 0 1 3.375 0.296 9 I 2 3 4 3 4 1 2 1 0 2.500 0.400 5
- 43. 43Social Media Mining Measures and Metrics 43Social Media Mining Network Measureshttp://socialmediamining.info/ Closeness Centrality: Example 3 (Directed) Node A B C D E F G H I D_Avg Closeness Centrality Rank A 0 1 2 3 2 2 1 3 3 2.125 0.471 1 B 3 0 1 2 1 4 4 2 3 2.500 0.400 2 C 4 5 0 7 6 3 5 1 2 4.125 0.242 9 D 1 2 3 0 3 3 2 4 5 2.875 0.348 3 E 2 3 4 1 0 4 3 5 5 3.375 0.296 6 F 1 2 3 4 3 0 2 4 4 2.875 0.348 4 G 2 3 4 5 4 1 0 5 2 3.250 0.308 5 H 4 4 5 6 5 2 4 0 1 3.875 0.258 8 I 2 3 4 5 4 1 4 5 0 3.500 0.286 7
- 44. 44Social Media Mining Measures and Metrics 44Social Media Mining Network Measureshttp://socialmediamining.info/ An Interesting Comparison! Comparing three centrality values β’ Generally, the 3 centrality types will be positively correlated β’ When they are not (or low correlation), it usually reveals interesting information Low Degree Low Closeness Low Betweenness High Degree Node is embedded in a community that is far from the rest of the network Ego's connections are redundant - communication bypasses the node High Closeness Key node connected to important/active alters Probably multiple paths in the network, ego is near many people, but so are many others High Betweenness Ego's few ties are crucial for network flow Very rare! Ego monopolizes the ties from a small number of people to many others. This slide is modified from a slide developed by James Moody
- 45. 45Social Media Mining Measures and Metrics 45Social Media Mining Network Measureshttp://socialmediamining.info/ Centrality for a group of nodes
- 46. 46Social Media Mining Measures and Metrics 46Social Media Mining Network Measureshttp://socialmediamining.info/ Group Centrality β’ All centrality measures defined so far measure centrality for a single node. These measures can be generalized for a group of nodes. β’ A simple approach is to replace all nodes in a group with a super node β The group structure is disregarded. β’ Let π denote the set of nodes in the group and π β π the set of outsiders
- 47. 47Social Media Mining Measures and Metrics 47Social Media Mining Network Measureshttp://socialmediamining.info/ I. Group Degree Centrality β Normalization: II. Group Betweenness Centrality β Normalization: Group Centrality divide by |π β π| divide by
- 48. 48Social Media Mining Measures and Metrics 48Social Media Mining Network Measureshttp://socialmediamining.info/ III. Group Closeness Centrality β It is the average distance from non-members to the group β’ One can also utilize the maximum distance or the average distance Group Centrality
- 49. 49Social Media Mining Measures and Metrics 49Social Media Mining Network Measureshttp://socialmediamining.info/ Group Centrality Example β’ Consider π = {π£2, π£3} β’ Group degree centrality = β’ Group betweenness centrality = β’ Group closeness centrality = 3 3 1
- 50. 50Social Media Mining Measures and Metrics 50Social Media Mining Network Measureshttp://socialmediamining.info/ β’ Transitivity/Reciprocity β’ Status/Balance Friendship Patterns
- 51. 51Social Media Mining Measures and Metrics 51Social Media Mining Network Measureshttp://socialmediamining.info/ I. Transitivity and Reciprocity
- 52. 52Social Media Mining Measures and Metrics 52Social Media Mining Network Measureshttp://socialmediamining.info/ Transitivity β’ Mathematic representation: β For a transitive relation π : β’ In a social network: β Transitivity is when a friend of my friend is my friend β Transitivity in a social network leads to a denser graph, which in turn is closer to a complete graph β We can determine how close graphs are to the complete graph by measuring transitivity ππΉπ or ππΉπ ?
- 53. 53Social Media Mining Measures and Metrics 53Social Media Mining Network Measureshttp://socialmediamining.info/ [Global] Clustering Coefficient β’ Clustering coefficient measures transitivity in undirected graphs β Count paths of length two and check whether the third edge exists When counting triangles, since every triangle has 6 closed paths of length 2
- 54. 54Social Media Mining Measures and Metrics 54Social Media Mining Network Measureshttp://socialmediamining.info/ Clustering Coefficient and Triples Or we can rewrite it as β’ Triple: an ordered set of three nodes, β connected by two (open triple) edges or β three edges (closed triple) β’ A triangle can miss any of its three edges β A triangle has 3 Triples π£π π£π π£ π and π£π π£ π π£πare different triples β’ The same members β’ First missing edge π(π£ π, π£π) and second missing π(π£π, π£π) π£π π£π π£ πand π£ π π£π π£πare the same triple
- 55. 55Social Media Mining Measures and Metrics 55Social Media Mining Network Measureshttp://socialmediamining.info/ [Global] Clustering Coefficient: Example
- 56. 56Social Media Mining Measures and Metrics 56Social Media Mining Network Measureshttp://socialmediamining.info/ Local Clustering Coefficient β’ Local clustering coefficient measures transitivity at the node level β Commonly employed for undirected graphs β Computes how strongly neighbors of a node π£ (nodes adjacent to π£) are themselves connected In an undirected graph, the denominator can be rewritten as: Provides a way to determine structural holes Structural Holes
- 57. 57Social Media Mining Measures and Metrics 57Social Media Mining Network Measureshttp://socialmediamining.info/ Local Clustering Coefficient: Example β’ Thin lines depict connections to neighbors β’ Dashed lines are the missing link among neighbors β’ Solid lines indicate connected neighbors β When none of neighbors are connected πΆ = 0 β When all neighbors are connected πΆ = 1
- 58. 58Social Media Mining Measures and Metrics 58Social Media Mining Network Measureshttp://socialmediamining.info/ Reciprocity If you become my friend, Iβll be yours β’ Reciprocity is simplified version of transitivity β It considers closed loops of length 2 β’ If node π£ is connected to node π’, β π’ by connecting to π£, exhibits reciprocity What about π = π ?
- 59. 59Social Media Mining Measures and Metrics 59Social Media Mining Network Measureshttp://socialmediamining.info/ Reciprocity: Example Reciprocal nodes: π£1, π£2
- 60. 60Social Media Mining Measures and Metrics 60Social Media Mining Network Measureshttp://socialmediamining.info/ β’ Measuring consistency in friendships II. Balance and Status
- 61. 61Social Media Mining Measures and Metrics 61Social Media Mining Network Measureshttp://socialmediamining.info/ Social Balance Theory Social balance theory β Consistency in friend/foe relationships among individuals β Informally, friend/foe relationships are consistent when β’ In the network β Positive edges demonstrate friendships (π€ππ = 1) β Negative edges demonstrate being enemies (π€ππ = β1) β’ Triangle of nodes π, π, and π, is balanced, if and only if β π€ππ denotes the value of the edge between nodes π and π
- 62. 62Social Media Mining Measures and Metrics 62Social Media Mining Network Measureshttp://socialmediamining.info/ Social Balance Theory: Possible Combinations For any cycle, if the multiplication of edge values become positive, then the cycle is socially balanced
- 63. 63Social Media Mining Measures and Metrics 63Social Media Mining Network Measureshttp://socialmediamining.info/ Social Status Theory β’ Status: how prestigious an individual is ranked within a society β’ Social status theory: β How consistent individuals are in assigning status to their neighbors β Informally,
- 64. 64Social Media Mining Measures and Metrics 64Social Media Mining Network Measureshttp://socialmediamining.info/ Social Status Theory: Example β’ A directed β+β edge from node π to node π shows that π has a higher status than π and a β-β one shows vice versa Unstable configuration Stable configuration
- 65. 65Social Media Mining Measures and Metrics 65Social Media Mining Network Measureshttp://socialmediamining.info/ β’ Structural Equivalence β’ Regular Equivalence Similarity How similar are two nodes in a network?
- 66. 66Social Media Mining Measures and Metrics 66Social Media Mining Network Measureshttp://socialmediamining.info/ Structural Equivalence β’ Structural Equivalence: β We look at the neighborhood shared by two nodes; β The size of this shared neighborhood defines how similar two nodes are. β’ Example: β Two brothers have in common β’ sisters, mother, father, grandparents, etc. β This shows that they are similar, β Two random male or female individuals do not have much in common and are dissimilar.
- 67. 67Social Media Mining Measures and Metrics 67Social Media Mining Network Measureshttp://socialmediamining.info/ β’ Vertex similarity: β’ The neighborhood π(π£) often excludes the node itself π£. β What can go wrong? β’ Connected nodes not sharing a neighbor will be assigned zero similarity β Solution: β’ We can assume nodes are included in their neighborhoods Structural Equivalence: Definitions Jaccard Similarity: Cosine Similarity: Normalize?
- 68. 68Social Media Mining Measures and Metrics 68Social Media Mining Network Measureshttp://socialmediamining.info/ Similarity: Example
- 69. 69Social Media Mining Measures and Metrics 69Social Media Mining Network Measureshttp://socialmediamining.info/ Similarity Significance Measuring Similarity Significance: compare the calculated similarity value with its expected value where vertices pick their neighbors at random β’ For vertices π£π and π£π with degrees ππ and ππ this expectation is ππ ππ/π β There is a ππ/π chance of becoming π£πβs neighbor β π£π selects ππ neighbors β’ We can rewrite neighborhood overlap as
- 70. 70Social Media Mining Measures and Metrics 70Social Media Mining Network Measureshttp://socialmediamining.info/ Normalized Similarity, cont. What is this?
- 71. 71Social Media Mining Measures and Metrics 71Social Media Mining Network Measureshttp://socialmediamining.info/ Normalized Similarity, cont. π times the Covariance between π¨π and π¨π Normalize covariance by the multiplication of Variances. We get Pearson correlation coefficient (range of ο³ ο [-1,1] )
- 72. 72Social Media Mining Measures and Metrics 72Social Media Mining Network Measureshttp://socialmediamining.info/ Regular Equivalence β’ In regular equivalence, β We do not look at neighborhoods shared between individuals, but β How neighborhoods themselves are similar β’ Example: β Athletes are similar not because they know each other in person, but since they know similar individuals, such as coaches, trainers, other players, etc.
- 73. 73Social Media Mining Measures and Metrics 73Social Media Mining Network Measureshttp://socialmediamining.info/ β’ π£π, π£π are similar when their neighbors π£ π and π£π are similar β’ The equation (left figure) is hard to solve since it is self referential so we relax our definition using the right figure Regular Equivalence
- 74. 74Social Media Mining Measures and Metrics 74Social Media Mining Network Measureshttp://socialmediamining.info/ Regular Equivalence β’ π£π and π£π are similar when π£π is similar to π£πβs neighbors π£ π β’ In vector format A vertex is highly similar to itself, we guarantee this by adding an identity matrix to the equation Wπ‘ππ§ πΌ < π/π πππ the matrix is invertible
- 75. 75Social Media Mining Measures and Metrics 75Social Media Mining Network Measureshttp://socialmediamining.info/ Regular Equivalence: Example β’ Any row/column of this matrix shows the similarity to other vertices β’ Vertex 1 is most similar (other than itself) to vertices 2 and 3 β’ Nodes 2 and 3 have the highest similarity (regular equivalence) The largest eigenvalue of π΄ is 2.43 Set πΌ = 0.3 < 1/2.43

No public clipboards found for this slide

Be the first to comment