Exploratory social network analysis with pajek

5,812 views

Published on

Exploratory social network analysis with pajek

Published in: Education

Exploratory social network analysis with pajek

  1. 1. Exploratory Social Network Analysis with Pajek Fundamentals in Social Network Analysis by Wouter de Nooy, Andres Mrvar and Vladimir Batagelj Slides created by Thomas Plotkowiak 26.08.2010
  2. 2. Agenda 1. Fundamentals 2. Attributes and Relations 3. Cohesion 4. Sentiments and Friendship 5. Affiliations 6. Core - Periphery
  3. 3. 1 - Fundamentals
  4. 4. Fundamentals Sociometry studies interpersonal relations. Society is not an aggregate of individuals and the characteristics (as statisticians assume) but a structure of interpersonal ties. Therefore, the individual is not the basic social unit. The social atom consists of an individual and his or her social, economic, or cultural ties. Social atoms are linked into groups, and , ultimately, society consists of interrelated groups.
  5. 5. Example of a Sociogram Choices of twenty-six girls living in one dormitory at a New York state training school. The girls were asked to choose the girls they liked best as their dining-table partners.
  6. 6. Exploratory Social Network Analysis The main goal of social network analysis is detecting and interpreting patterns of social ties among actors. It consists of four parts: 1. The definition of a network 2. Network manipulation 3. Determination of structural features 4. Visual inspection
  7. 7. Network Definition A graph is a set of vertices and a set of lines between pairs of vertices. A ver tex is the smallest unit in a network. In SNA it represents an actor (girl, organization, country…) A line is a tie between two vertices in a network. In SNA it can be any social relation. A loop is a special kind of line, namely, a line that connects a vertex to itself.
  8. 8. Network Definition II A directed lins is called an arc . Whereas an undirected line is an arc. edge. A directed graph or digraph contains one or more arcs. An undirected graph contains no arcs (all of its lines are edges edges). A simple directed graph contains no multiple arcs. A simple undirected graph contains neither multiple edges nor loops.
  9. 9. Network Definition III A network consists of a graph and additional information on the vertices or the lines of the graph.
  10. 10. Application 1. We use the computer program Pajek – Slovenian for spider – to analyzed and draw social networks. (get it from http://vlado.fmf.uni-lj.si/pub/networks/) Number of vertices Specific vertex and orientation List of Arcs
  11. 11. Pajek Main Screen
  12. 12. Manipulation Suppose we want to change reciprocated choices in the dining- table partners network into edges.
  13. 13. Calculation Suppose we want to calculate the total number of lines:
  14. 14. Visualization
  15. 15. Automatic Drawing • Layout by Energy: Move vertices to locations that minimize the variation in line length. ( Imagine that the lines are springs pulling vertices together, though never too close) • Energy Layouts: – Kamada-Kawai (computationally expensive) – Fruchtemann Reingold (faster) • Draw by Hand
  16. 16. Exporting VRML, MDL, Kinemages Bitmap SVG, EPS
  17. 17. 2 - Attributes and Relations
  18. 18. Example – The world system In 1974,ImmanuelWallerstein introduced the concept of a capitalist world system which came into existence in the system, sixteenth century. This system is characterized by a world economy that is stratified into a core, a semiperiphery, and a periphery. Countries owe their wealth or poverty to their positionin the world economy. The core,Wallerstein argues, exists because it succeeds in exploiting the periphery and, to a lesser extent, the semiperiphery.The semiperiphery profits from being an intermediary between the coreand the periphery. Which countries belong to the core, semiperiphery or periphery?
  19. 19. The world system network • Network contains 80 countries with attributes: • continent • world system position in 1980 • gross domestic product per capita in U.S. dollars in 1995 • The arcs represent imports (of metal) into one country from another.
  20. 20. Partition A par tition of a network is a classification or clustering of the vertices in the network such that each vertex is assigned to exactly one class or cluster.
  21. 21. Partition Load & Edit • File > Partition > Read (.clu File) • File> Partition> Edit
  22. 22. Info on Partition Distribution • Info > Partition
  23. 23. Application – Draw Partition • Draw > Draw Partition
  24. 24. Reduction of a Network To extract a subnetwork from a network, select a subset of its vertices and all lines that are only incident with the selected vertices. • Operations > Extract from Network (select class 6)
  25. 25. Partition – Local View 1. Partitions > Extract Second from First
  26. 26. Global View To shrink a network, replace a subset of its vertices by one new vertex that is incident to all lines that were incident with the vertices of the subset in the original network. 1. Operations > Shrink Network
  27. 27. Contextual View In a contextual view, all classes are shunk except the one in which you are particularly interested. • Operations > Shrink Network (Don't shrink class 6)
  28. 28. Vectors and Coordinates Load & Edit A vector assigns a numerical value to each vertex in a network. • File > Vector > Read (.vec File) • File> Vector > Edit
  29. 29. Info on a Vector • Info > Vector
  30. 30. Vector Partition • Vector > Make Partition > by Truncating (Abs) • Vector > Make Partition by Intervals > First Threshold and Step • Vector > Make Partition by Intervals >Selected Thresholds
  31. 31. Draw Vector & Partition
  32. 32. Global View & Vectors 1. Vector > Shrink Vector (Sum)
  33. 33. Network Analysis and Statistics • Example: Crosstabulation of two partitions and some measures of association between the classifications represented by two partitions. • Partition > Info > Cramer's , Rajski • Cramer's V measures the statistical dependence between two classifications. • Rajski's indices measure the degree to which the information in one classification is preserved in the other classification.
  34. 34. END OF LESSON 1
  35. 35. 3 - Cohesion
  36. 36. Cohesive Subgroups Cohesive subgroups: We hypothesize that cohesive subgroups are the basis for solidarity, shared norms, identity and collective behavior. Perceived similarity, for instance, membership of a social group, is expected to promote interaction. We expect similar people to interact a lot, at least more often than with dissimilar people. This peonomenon is called homophily: "Birds of a feather flock together." Birds
  37. 37. Example – Families in Haciendas (1948) Each arc represents "frequent visits" from one family to another.
  38. 38. Density & Degree I Density is the number of lines in a simple network, expressed as a proportion of the maximum possible number of lines. A complete network is a network with maximum density. The degree of a vertex is the number of lines incident with it.
  39. 39. Density & Degree II Two vertices are adjacent if they are connected by a line. The indegree of a vertex is the number of arcs it receives. The outdegree is the number of arcs it sends. To symmetrize a directed network is to replace unilateral and bidirectional arcs by edges.
  40. 40. Computing Density • Info > Network > General
  41. 41. Computing Degree • Net > Transform > Arcs Edges > All • Net > Partitions > Degree > {In, Out, All}
  42. 42. Components A semiwalk from vertex u to vertex v is a sequence of lines such that the end vertex of one line is the starting vertex of the next line and the sequence starts at vertex u and end at vertex v. A walk is a semiwalk with the additional condition that none of its lines are an arc of which the end vertex is the arc's tail Note that v5 v3 v4 v5 v3 is also a walk to v3
  43. 43. Paths A semipath is a semiwalk in which no vertex in between the first and last vertex of the semiwalk occurs more than once. A path is a walk in which no vertex in between the first and last vertex of the walk occurs more than once.
  44. 44. Connectedness A network is (weakly) connected if each pair of vertices is connected by a semipath. A network is strongly connected if each pair of vertices is connected by a path. This network is not connected because v2 is isolated.
  45. 45. Connected Components A (weak) component is a maximal (weakly) connected subnetwork. A strong component is a maximal strongly connected subnetwork. v1,v3,v4,v5 are a weak component v3,v4,v5 are a strong component
  46. 46. Example Strong Components 1. Net > Components > {Strong, Weak}
  47. 47. Cliques and Complete Subnetworks A clique is a maximal complete subnetwork containing three vertices or more. (cliques can overlap) v2,v4,v5 is not a clique v1,v6,v5 is a clique v2,v3,v4,v5 is a clique
  48. 48. n-Clique & n-Clan n-Clique: Is a maximal complete subgraph, in the analyzed graph, each vertex has maximally the distance n. A Clique is a n-Clique with n=1. n-Clan: Ist a maximal complete subgraph, where each vertex has maximally the distance n in the resulting graph 2-Clique 2-Clan
  49. 49. n-Clans & n-Cliques 6 5 1 4 2 3 2-Clans: 123,234,345,456,561,612 2-Cliquen: 123,234,345,456,561,612 and 135,246
  50. 50. k-Plexes k-Plex: A k-Plex is a maximal complete subgraph with gs Vertext, in which each vertex has at least connections with gs-k vertices. 6 5 1 4 2 3 2-Plexe:s 1234, 2345, 3456, 4561, 5612, 6123 In general k-Plexes are more robust than Cliques und Clans.
  51. 51. Overview Subgroups 4 3 4 3 4 3 1 2 1 2 1 2 2 Components 1 Component 1 Component 2 2-Clans (341,412) 1 2-Clans (124) 2 2-Cliques (341,412) 1 2-Clique (124) 4 3 4 3 1 Component 1 Component 1 2-Clan (1234) 1 2-Clan (1234) 1 2-Clique (1234) 1 2-Clique (1234) 1 2-Plex (1234) 1 2-Plex (1234) 1 2 1 2 1 Clique
  52. 52. Overview Groupconcepts • 1-Clique, 1-Clan und 1-Plex are identical • A n-Clan is always included in a higher order n-Clique Component 2-Clique 2-Clan 2-Plex Clique
  53. 53. Finding Cliques • Example: We are looking for occurences of triads • Nets > First Network, Second Network • Nets > Fragment (1 in 2 ) > Find The figure shows the hierarchy for the example of overlapping complete triads. There are five complete triads; each of the triads is represented by a gray vertex. Each triad consits of three vertices.
  54. 54. Finding Social Circles • Partitions > First, Second • Partitions > Extract Second from First We have found three social circles.
  55. 55. k-Cores A •k-core is a maximal subnetwork in which each vertex has at Net > Components > {Strong, Weak} least degree k within the subnetwork.
  56. 56. k-Cores k-cores are nested which means that a vertex in a 3-core is also part of a 2-core but not all members of a 2-core belong to a 3- core.
  57. 57. k-Cores Application • K-cores help to detect cohesive subgroups by removing the lowes k-cores from the network until the network breaks up into relatively dense components. • Net > Partitions > Core >{Input, Output, All}
  58. 58. 4 - Sentiments and Friendship
  59. 59. Balance Theory Franz Heider (1940): A person (P) feels uncomfortable whe he ore she disagrees with his ore her friend(O) on a topic (X). P feels an urge to change this imbalance. He can adjust his opinion, change his affection for O, or convince himself that O is not really opposed to X.
  60. 60. Signed Graphs A signed graph is a graph in which each line carries either a positive or a negative sign. {O,P,X} form a cycle. All balanced cycles contain an even number of negative lines or no negative lines at all.
  61. 61. Signed Graphs with Arcs A cycle is a closed path. A semicycle is a closed semipath. A (semi-)cycle is balanced if it does not contain an uneven number of negative arcs.
  62. 62. Balanced Networks A signed graph is balanced if all of its (semi-)cycles are balanced. A signed graph is balanced if it can be partitioned into two clusters such that all positive ties are contained within the clusters and all negative ties are situated between the clusters.
  63. 63. Clusterability = Generalized Balance A cycle or a semicycle is clusterable if it does not contain exactly one negative arc. A signed graph is clusterable if it can be partitioned into clusters such that all positive ties are contained within clusters and all negative ties are situated between clusters.
  64. 64. Example – Community in a New England monastery Options > Values of Lines > Similaritiies Young Turks (1), Loyal Opposition (2), Outcasts (3) Interstitial Group (4)
  65. 65. Issues on Clustering 1. An optimization may find several solutions that fit equally well. It is up to the researcher to select one or present all. 2. There is no guarantee that there is not a better solution than the found one, unless it is optimal. 3. Different starting options yield different results. (It is hard to tell the exact number of clusters that will yield the lowest error score) 4. Negative ars are often tolerated less in a cluster than positive arcs between clusters.
  66. 66. Calculating Clustering 1. Partition > Create Random Partition (ex. 3 Clusters) 2. Operations > Balance (alpha 0.5)
  67. 67. Development in Time List of Vertices with their presence List of Arcs with their presence
  68. 68. Loading and Drawing Networks in Time • Net > Transform > Generate in Time • Draw > {Previous, Next} Network consists of 3 choices, hence the bigger errors. We see a tendency towards clusterability.
  69. 69. BREAK LESSON 2
  70. 70. 5 - Affiliations
  71. 71. Example – Corporate interlocks in Scotland in the beginning of the twentieth century (1904-5) A fragment of the Scottish directorates network. Companies are classified according t oil & mining, railway, engineering... Directors (grey) and Firms (black)
  72. 72. Two-Mode and One-Mode Networks In a one mode network each vertex can be related to each network, other vertex. In a two-mode network, vertices are divided into two sets and two- vertices can only be related to vertices in the other set. • The degree of a firm specifies the number of its multiple directors, also known as size of an event. • The degree of a director equals the number of boards he sits on, also known as the rate of par ticipation of an actor. • Also note that some measures must be computed two- differently for two-node networks.
  73. 73. Transforming two-mode networks into one- mode networks Whenever two firms share a director in the two-mode network, there is a line between them in the one-mode network.
  74. 74. Transforming two-mode networks into one- mode networks II The events of the two-mode networks are represented by lines and loops in the one- mode network of actors. J.S.T ait meets W. Sanderson in board meetings of two companies.
  75. 75. Transforming two-mode networks into one- mode networks III • Net > Transform > 2-Mode to 1-Mode {Rows,Columns} • Net > Transform > 2-Mode to 1-Mode > Include Loops, Multiple Lines • Info > Network > Line Values
  76. 76. m-Slices An m-slice is a maximal subnetwork containing the lines with a multiplicity equal to or greater than m and the vertices incident with these lines.
  77. 77. 2-Slice in the network of Scottish firms
  78. 78. Computing m-Slices • Net > Partitions > Values Core {use max} • Net > Partitions > Values Core > First Threshold and Step • Net > Transform > Remove >lines with value > lower than 2 • Operations > Extract from Network > Partition • Net > Components > Weak 3-slice
  79. 79. m-slices 3D • Layers > Type of Layout > 3D • Layers > In z direction • Options> Scroll Bar > On
  80. 80. Drawing m-Slices 1. Layers > Type of Layout > 3D 2. Layers > In z direction 3. Options> Scroll Bar >
  81. 81. 6 – Center and Periphery //Slides need to be translated //Input from book 2 needed.
  82. 82. Example – Communication ties within a sawmill H – Hispanic E – English M- Mill P – Planer section Y - Yard Vertex labels indicate the ethnicity and the type of work of each employee, for example HP-10 is an Hispanic (H) working in the planer section (P)
  83. 83. Distance • The larger the number of sources accessible to a person, the easier it is to obtain information. Social ties constitute a social capital that may be used to mobilize social resources. • The simples indicator of centrality is the number of its neighbors (degree in a simple undirected network)
  84. 84. Degree centrality I The degree centrality of a vertex is its degree. Degree centralization of a network is the variation in the degrees of vertices divided by the maximum degree variation which is possible in a networks of the same size.
  85. 85. Degree Centrality II 1. //TODO missing…
  86. 86. Closeness Centrality 1. Closeness centrality : Eine Person ist dann zentral, wenn sie bezüglich der Netzwerkrelation sehr nah bei allen anderen Liegt. Eine solche zentrale Lage steigert die Effizienz, mit der ein Akteur im Netzwerk agieren kann. Ein solcher Akteur kann Informationen schnell empfangen und verbreiten. g −1 C c ( ni ) = g ∑ d (n , n j =1 i j )
  87. 87. Closeness Centrality 1 4 6 10 3 8 9 2 5 7 11 ni nj d n Cc 3 1 1 1 0,27 3 2 1 11 − 1 2 0,29 C c ( n3 ) = = 0, 4 3 3 4 1 3 0,40 3 5 1 3 6 2 23 4 0,45 3 7 2 5 0,45 3 8 3 6 0,45 3 9 4 Achtung: Hier wurde können nur 7 0,45 3 10 5 symmetrische Verbindungen 8 0,45 3 11 5 betrachtet werden und nur 9 0,37 23 Netze. verbundene Netze 10 0,27
  88. 88. Zentralisierung 1. Zentralisierung =! Zentralität 2. Zentralisierung ist eine strukturelle Eigenschaft der Gruppe und nicht der relationalen Eigenschaft einzelner Akteure. 3. Index für Zentralisierung: Man berechnet die Differenzen zwischen der Zentralität des zentralsten Akteurs und der Zentralität aller Anderen. Man summiert dann diese diff. über alle anderen Akteure. g ∑ C (n*) − C (n ) i C= i =1 g −1
  89. 89. Zentralisierung 1. Dieser weißt nur dann einen hohen Wert auf wenn genau ein Akteur zentral ist, und nicht mehrere Akteure ein Zentrum bilden. 2. Nur der Vergleich von Daten einer Gruppe zwischen mehreren Zeitpunkten erlaubt sinnvoll interpretierbare Aussagen.
  90. 90. Betweenness centrality 1. Betweenness Centrality: Personen (Cutpoints), die zwei die ansonsten unverbundene Teilpopulationen miteinander verbinden, sind Akteure mit einer hohen betweenness centrality. (Annahme: man nutzt nur die kürzesten Verbindungen zur Kommunikation) 2. Indem man für jedes Paar von Akteuren j, k != i unter allen kürzesten Pfaden, die j un k verbinden , den Anteil von Pfaden bestimmt die über Akteur i laufen. Anschließens müssen diese Anteile über alle Paare j, k != i gemittelt werden. ∑ j≠k g jk ( n i )g jk i ≠ j ,k C b ( ni ) = ( g − 1)( g − 2 )
  91. 91. Betweenness centrality 1. Achtung: Es ist möglich das einige Akteure zwar nicht erreichbar sind, selbst aber die anderen von sich aus erreichen können. 1 4 6 10 3 8 9 2 5 7 11 1 2 3 4 5 6 7 8 9 10 11 0 0 0,37 0,22 0,22 0,22 0,22 0,48 0,37 0 0
  92. 92. Degree Prestige 1. Prestige lässt sich sinnvoll messen als relativer Innengrad dieses Akteurs (degree prestige) [Wasserman und K. Faust 1994: 202] Pd (n j ) = x+ j / ( g − 1) n j Akteur j xij Matrix-Eintrag Zeile i, Spalte j xij Anzahl Knoten im Netzwerk x+ j = ∑ xij i
  93. 93. Prestige Beispiel 1 4 6 10 3 8 9 2 5 7 11 Prestige von Knoten 3: Pd (n3 ) = 2+3 / (11 − 1) =0, 2 Generell: Prestige ist unabhängig von der Gruppengröße und sein Wert liegt zwischen 0 und 1 (Stern).

×