Socialnetworkanalysis (Tin180 Com)


Published on - Trang tin tức văn hóa lành mạnh

Published in: Business, Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Socialnetworkanalysis (Tin180 Com)

    1. 1. Social Network Analysis <ul><li>Social Network Introduction </li></ul><ul><li>Statistics and Probability Theory </li></ul><ul><li>Models of Social Network Generation </li></ul><ul><li>Networks in Biological System </li></ul>
    2. 2. Society Nodes : individuals Links : social relationship (family/work/friendship/etc.) S. Milgram (1967) Social networks: Many individuals with diverse social interactions between them. John Guare Six Degrees of Separation
    3. 3. Communication networks The Earth is developing an electronic nervous system, a network with diverse nodes and links are -computers -routers -satellites -phone lines -TV cables -EM waves Communication networks: Many non-identical components with diverse connections between them.
    4. 4. New York Times Complex systems Made of many non-identical elements connected by diverse interactions . NETWORK
    5. 5. “ Natural” Networks and Universality <ul><li>Consider many kinds of networks: </li></ul><ul><ul><li>social, technological, business, economic, content,… </li></ul></ul><ul><li>These networks tend to share certain informal properties: </li></ul><ul><ul><li>large scale; continual growth </li></ul></ul><ul><ul><li>distributed, organic growth: vertices “decide” who to link to </li></ul></ul><ul><ul><li>interaction restricted to links </li></ul></ul><ul><ul><li>mixture of local and long-distance connections </li></ul></ul><ul><ul><li>abstract notions of distance: geographical, content, social,… </li></ul></ul><ul><li>Do natural networks share more quantitative universals? </li></ul><ul><li>What would these “universals” be? </li></ul><ul><li>How can we make them precise and measure them? </li></ul><ul><li>How can we explain their universality? </li></ul><ul><li>This is the domain of social network theory </li></ul><ul><li>Sometimes also referred to as link analysis </li></ul>
    6. 6. Some Interesting Quantities <ul><li>Connected components : </li></ul><ul><ul><li>how many, and how large? </li></ul></ul><ul><li>Network diameter : </li></ul><ul><ul><li>maximum (worst-case) or average? </li></ul></ul><ul><ul><li>exclude infinite distances? (disconnected components) </li></ul></ul><ul><ul><li>the small-world phenomenon </li></ul></ul><ul><li>Clustering : </li></ul><ul><ul><li>to what extent that links tend to cluster “locally”? </li></ul></ul><ul><ul><li>what is the balance between local and long-distance connections? </li></ul></ul><ul><ul><li>what roles do the two types of links play? </li></ul></ul><ul><li>Degree distribution : </li></ul><ul><ul><li>what is the typical degree in the network? </li></ul></ul><ul><ul><li>what is the overall distribution? </li></ul></ul>
    7. 7. A “Canonical” Natural Network has… <ul><li>Few connected components: </li></ul><ul><ul><li>often only 1 or a small number, indep. of network size </li></ul></ul><ul><li>Small diameter: </li></ul><ul><ul><li>often a constant independent of network size (like 6) </li></ul></ul><ul><ul><li>or perhaps growing only logarithmically with network size or even shrink? </li></ul></ul><ul><ul><li>typically exclude infinite distances </li></ul></ul><ul><li>A high degree of clustering: </li></ul><ul><ul><li>considerably more so than for a random network </li></ul></ul><ul><ul><li>in tension with small diameter </li></ul></ul><ul><li>A heavy-tailed degree distribution: </li></ul><ul><ul><li>a small but reliable number of high-degree vertices </li></ul></ul><ul><ul><li>often of power law form </li></ul></ul>
    8. 8. Probabilistic Models of Networks <ul><li>All of the network generation models we will study are probabilistic or statistical in nature </li></ul><ul><li>They can generate networks of any size </li></ul><ul><li>They often have various parameters that can be set: </li></ul><ul><ul><li>size of network generated </li></ul></ul><ul><ul><li>average degree of a vertex </li></ul></ul><ul><ul><li>fraction of long-distance connections </li></ul></ul><ul><li>The models generate a distribution over networks </li></ul><ul><li>Statements are always statistical in nature: </li></ul><ul><ul><li>with high probability , diameter is small </li></ul></ul><ul><ul><li>on average , degree distribution has heavy tail </li></ul></ul><ul><li>Thus, we’re going to need some basic statistics and probability theory </li></ul>
    9. 9. Zipf’s Law <ul><li>Look at the frequency of English words: </li></ul><ul><ul><li>“ the” is the most common, followed by “of”, “to”, etc. </li></ul></ul><ul><ul><li>claim: frequency of the n-th most common ~ 1/n (power law, α = 1) </li></ul></ul><ul><li>General theme: </li></ul><ul><ul><li>rank events by their frequency of occurrence </li></ul></ul><ul><ul><li>resulting distribution often is a power law! </li></ul></ul><ul><li>Other examples: </li></ul><ul><ul><li>North America city sizes </li></ul></ul><ul><ul><li>personal income </li></ul></ul><ul><ul><li>file sizes </li></ul></ul><ul><ul><li>genus sizes (number of species) </li></ul></ul><ul><li>People seem to dither over exact form of these distributions (e.g. value of α ), but not heavy tails </li></ul>
    10. 10. Zipf’s Law Linear scales on both axes Logarithmic scales on both axes The same data plotted on linear and logarithmic scales. Both plots show a Zipf distribution with 300 datapoints
    11. 11. Social Network Analysis <ul><li>Social Network Introduction </li></ul><ul><li>Statistics and Probability Theory </li></ul><ul><li>Models of Social Network Generation </li></ul><ul><li>Networks in Biological System </li></ul><ul><li>Summary </li></ul>
    12. 12. Some Models of Network Generation <ul><li>Random graphs (Erdös-Rényi models): </li></ul><ul><ul><li>gives few components and small diameter </li></ul></ul><ul><ul><li>does not give high clustering and heavy-tailed degree distributions </li></ul></ul><ul><ul><li>is the mathematically most well-studied and understood model </li></ul></ul><ul><li>Watts-Strogatz models: </li></ul><ul><ul><li>give few components, small diameter and high clustering </li></ul></ul><ul><ul><li>does not give heavy-tailed degree distributions </li></ul></ul><ul><li>Scale-free Networks: </li></ul><ul><ul><li>gives few components, small diameter and heavy-tailed distribution </li></ul></ul><ul><ul><li>does not give high clustering </li></ul></ul><ul><li>Hierarchical networks: </li></ul><ul><ul><li>few components, small diameter, high clustering, heavy-tailed </li></ul></ul><ul><li>Affiliation networks: </li></ul><ul><ul><li>models group-actor formation </li></ul></ul>
    13. 13. The Clustering Coefficient of a Network <ul><li>Let nbr(u) denote the set of neighbors of u in a graph </li></ul><ul><ul><li>all vertices v such that the edge (u,v) is in the graph </li></ul></ul><ul><li>The clustering coefficient of u: </li></ul><ul><ul><li>let k = |nbr(u)| (i.e., number of neighbors of u) </li></ul></ul><ul><ul><li>choose(k,2): max possible # of edges between vertices in nbr(u) </li></ul></ul><ul><ul><li>c(u) = ( actual # of edges between vertices in nbr(u))/choose(k,2) </li></ul></ul><ul><ul><li>0 <= c(u) <= 1; measure of cliquishness of u’s neighborhood </li></ul></ul><ul><li>Clustering coefficient of a graph: </li></ul><ul><ul><li>average of c(u) over all vertices u </li></ul></ul>k = 4 choose(k,2) = 6 c(u) = 4/6 = 0.666…
    14. 14. Clustering : My friends will likely know each other! Probability to be connected C » p C = # of links between 1,2,…n neighbors n(n-1)/2 Networks are clustered [large C(p)] but have a small characteristic path length [small L(p)]. The Clustering Coefficient of a Network
    15. 15. Erdos-Renyi: Clustering Coefficient <ul><li>Generate a network G according to G(N,p) </li></ul><ul><li>Examine a “typical” vertex u in G </li></ul><ul><ul><li>choose u at random among all vertices in G </li></ul></ul><ul><ul><li>what do we expect c(u) to be? </li></ul></ul><ul><li>Answer: exactly p! </li></ul><ul><li>In G(N,m), expect c(u) to be 2m/N(N-1) </li></ul><ul><li>Both cases: c(u) entirely determined by overall density </li></ul><ul><li>Baseline for comparison with “more clustered” models </li></ul><ul><ul><li>Erdos-Renyi has no bias towards clustered or local edges </li></ul></ul>
    16. 16. Scale-free Networks <ul><li>The number of nodes (N) is not fixed </li></ul><ul><ul><li>Networks continuously expand by additional new nodes </li></ul></ul><ul><ul><ul><li>WWW: addition of new nodes </li></ul></ul></ul><ul><ul><ul><li>Citation: publication of new papers </li></ul></ul></ul><ul><li>The attachment is not uniform </li></ul><ul><ul><li>A node is linked with higher probability to a node that already has a large number of links </li></ul></ul><ul><ul><ul><li>WWW: new documents link to well known sites (CNN, Yahoo, Google) </li></ul></ul></ul><ul><ul><ul><li>Citation: Well cited papers are more likely to be cited again </li></ul></ul></ul>
    17. 17. Scale-Free Networks <ul><li>Start with (say) two vertices connected by an edge </li></ul><ul><li>For i = 3 to N: </li></ul><ul><ul><li>for each 1 <= j < i, d(j) = degree of vertex j so far </li></ul></ul><ul><ul><li>let Z = S d(j) (sum of all degrees so far) </li></ul></ul><ul><ul><li>add new vertex i with k edges back to {1, …, i-1}: </li></ul></ul><ul><ul><ul><li>i is connected back to j with probability d(j)/Z </li></ul></ul></ul><ul><li>Vertices j with high degree are likely to get more links! </li></ul><ul><li>“ Rich get richer” </li></ul><ul><li>Natural model for many processes: </li></ul><ul><ul><li>hyperlinks on the web </li></ul></ul><ul><ul><li>new business and social contacts </li></ul></ul><ul><ul><li>transportation networks </li></ul></ul><ul><li>Generates a power law distribution of degrees </li></ul><ul><ul><li>exponent depends on value of k </li></ul></ul>
    18. 18. <ul><li>Preferential attachment explains </li></ul><ul><ul><li>heavy-tailed degree distributions </li></ul></ul><ul><ul><li>small diameter (~log(N), via “hubs”) </li></ul></ul><ul><li>Will not generate high clustering coefficient </li></ul><ul><ul><li>no bias towards local connectivity, but towards hubs </li></ul></ul>Scale-Free Networks
    19. 19. Social Network Analysis <ul><li>Social Network Introduction </li></ul><ul><li>Statistics and Probability Theory </li></ul><ul><li>Models of Social Network Generation </li></ul><ul><li>Networks in Biological System </li></ul><ul><li>Mining on Social Network </li></ul><ul><li>Summary </li></ul>
    20. 20. Bio-Map protein-gene interactions protein-protein interactions PROTEOME GENOME Citrate Cycle METABOLISM Bio-chemical reactions
    21. 21. Citrate Cycle METABOLISM Bio-chemical reactions Metabolic Network
    22. 22. Boehring-Mennheim
    23. 23. Metab-movie Nodes : chemicals (substrates) Links : bio-chemical reactions Metabolic Network
    24. 24. Meta-P(k) Organisms from all three domains of life are scale-free networks! H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.L. Barabasi, Nature, 407 651 (2000) Archaea Bacteria Eukaryotes Metabolic Network
    25. 25. Bio-Map protein-gene interactions protein-protein interactions PROTEOME GENOME Citrate Cycle METABOLISM Bio-chemical reactions
    26. 26. Protein Network protein-protein interactions PROTEOME
    27. 27. Prot Interaction map Nodes : proteins Links : physical interactions (binding) P. Uetz, et al. Nature 403 , 623-7 (2000). Yeast Protein Network
    28. 28. Prot P(k) H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, 41-42 (2001) Topology of the Protein Network
    29. 29. P53 … “ One way to understand the p53 network is to compare it to the Internet. The cell, like the Internet, appears to be a ‘ scale-free network ’.” p53 Network Nature 408 307 (2000)
    30. 30. P53 P(k) p53 Network (mammals)