Social network analysis intro part I


Published on

Slides giving a short introduction into social network analysis. A short course I gave here.

Published in: Technology
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Social network analysis intro part I

  1. 1. Social Network Analysis 2012Introduction to Social Network AnalysisPart IKatarina Stanoevska-Slabeva, Miriam Meckel,Thomas Plotkowiak
  2. 2. Agenda1. Introduction to networks ~ 1h – Types – Research Areas2. Introduction network measures ~ 1h – For whole networks – For actors • Centrality measures3. Workshop ~ 2h – Import your Facebook Data – Analyze your Data – Export your Data © Thomas Plotkowiak 2010
  3. 3. 1. Introduction to Networks
  4. 4. 1.1 Network TypesDomain Aspects: General Aspects:• Non-Social Networks – Computer Networks • Direct vs. Indirect Connection − One Mode – Power Grid Networks − Two Mode – Road Networks • Temporal Aspects – Neural Networks …. − Changing in Time• Social Networks − Static – Real Life • Topological Aspects • Friendship − (Non)Directed • Marriage − (Non)Valued • Sexual Contact − Shapes (Ring, Star,…) – Online • Mobile Networks • Friendship in OSN © Thomas Plotkowiak 2010
  5. 5. Non-Social Networks• Power Grid USA (NPR, 2009) © Thomas Plotkowiak 2010
  6. 6. Airline Networks Source: Northwest Airlines WorldTraveler Magazine © Thomas Plotkowiak 2010
  7. 7. Railway Networks Source: TRTA, March 2003 - Tokyo rail map © Thomas Plotkowiak 2010
  8. 8. Biochemical PathwaysBiochemical pathways (Roche) © Thomas Plotkowiak 2010
  9. 9. Flavor NetworksA flavor network that captures the flavor compounds shared by culinary ingredients. Each node denotes an ingredient, the node colorindicates food category, and node size reflects the ingredient prevalence in recipes. Two ingredients are connected if they share asignificant number of flavor compounds, link thickness representing the number of shared compounds between the two ingredients.(Barabasi et al 2012) © Thomas Plotkowiak 2010
  10. 10. Migration Networks © Thomas Plotkowiak 2010
  11. 11. Twitter News-Sharing NetworksNews sharing network of NYT. Nodes are individuals who predominantly share news stories on topics given by the legend.Links are “follow” relationships between individuals. Cosmopolitan, local scene, national liberal, national conservative, andnational diverse are tightly connected groups. (Herdagdelen 2012) © Thomas Plotkowiak 2010
  12. 12. Political Blog Networks Color corresponds to political orientation, size reflects the number of citations received from the top 40 blogs, and line thickness reflects the number of citations between two blogs. (Adamic 2004) © Thomas Plotkowiak 2010
  13. 13. (Artificial) Neural Networks © Thomas Plotkowiak 2010
  14. 14. Networks are everywhere © Thomas Plotkowiak 2010
  15. 15. Sexual Networks © Thomas Plotkowiak 2010
  16. 16. Romantic Relationships on Facebook © Thomas Plotkowiak 2010
  17. 17. Organisational Networks © Thomas Plotkowiak 2010
  18. 18. Two Mode Networks A fragment of the Scottish directorates (1904-5) network. Directors (grey) and Firms (black). Data taken from The anatomy of Scottish Capital (John Scott and Michael Hughes). 64 nonfinancial firms, 8 banks, 14 insurance comp. and 22 investment companies. © Thomas Plotkowiak 2010
  19. 19. Scientific Knowledge Networks Circles represent individual journals. The lines that connect journals are clicks from users. Colors correspond to the AAT classification of the journal. Labels have been assigned to local clusters of journals that correspond to particular scientific disciplines. (Bollen et al 2012) © Thomas Plotkowiak 2010
  20. 20. More networks on © Thomas Plotkowiak 2010
  21. 21. 1.2 – Fundamentals of SNA
  22. 22. Sociometry and Social Network AnalysisSociometry studies interpersonal relations. Society is not an aggregate of individuals and the characteristics (as statisticians assume) but a structure of interpersonal ties. Therefore, the individual is not the basic social unit. The social atom consists of an individual and his or her social, economic, or cultural ties. Social atoms are linked into groups, and , ultimately, society consists of interrelated groups. © Thomas Plotkowiak 2010
  23. 23. Social Network AnalysisWhere to put it? © Thomas Plotkowiak 2010
  24. 24. Practical applications• Businesses use SNA to analyze and improve communication flow in their organization, or with their networks of partners and customers• Law enforcement agencies (and the army) use SNA to identify criminal and terrorist networks from traces of communication that they collect; and then identify key players in these networksSocial• Network Sites like Facebookuse basic elements of SNA to identify and recommend potential friends based on friends-of-friends• Civil society organizations use SNA to uncover conflicts of interest in hidden connections between government bodies, lobbies and businesses• Network operators (telephony, cable, mobile) use SNA-like methods to optimize the structure and capacity of their networks © Thomas Plotkowiak 2010
  25. 25. Example of a Sociogram Choices of twenty-six girls living in one dormitory at a New York state training school. The girls were asked to choose the girls they liked best as their dining-table partners. © Thomas Plotkowiak 2010
  26. 26. Different Levels of AnalysisGlobal-Network Primary Group Ego-Net Best Friend Dyad 2-step Partial network © Thomas Plotkowiak 2010
  27. 27. Why should we make a distinction?1. Ego-network – Have data on a respondent (ego) and the people they are connected to (alters). – May include estimates of connections among alters2. Partial network – Ego networks plus some amount of tracing to reach contacts of contacts – Something less than full account of connections among all pairs of actors in the relevant population3. Complete or “Global” data – Data on all actors within a particular (relevant) boundary – Never exactly complete (due to missing data), but boundaries are set  Different forms of analysis methods and perspectives have emerged based on the scope of the analyzed network. © Thomas Plotkowiak 2010
  28. 28. 1.3 Research Areas
  29. 29. 1.3 Research Areas• Research on networks • What are their properties? What is their structure? • Does structure matter? For ex. How stable are the networks? • Are all networks similar to each other (no matter what domain)?• Research on actors • What positions exist? What position do certain actors have? • Does position matter? Does a role matter?• Research on dynamics • How do actors act in networks? What typical behaviors can we find? • How do networks form? How do they evolve?• Research on diffusion • What flows on the on the edges in the network? • For ex. How fast does information flow? Where does it flow to? • How can we influence it? © Thomas Plotkowiak 2010
  30. 30. Research on Network Structure• Example: How does the Internet look like? (Britt) © Thomas Plotkowiak 2010
  31. 31. Research on Actors• Example: Two Step Flow Model (Lazarsfeld) © Thomas Plotkowiak 2010
  32. 32. Research on Ties © Thomas Plotkowiak 2010
  33. 33. Research on Network Dynamics• Example Friendship Network Formation (Snijders) t=0 t=1 t=2 t=3 © Thomas Plotkowiak 2010
  34. 34. © Thomas Plotkowiak 2010
  35. 35. Research on DiffusionAdopted 1Q Post LaunchAdopted 2Q Post LaunchAdopted 3Q Post LaunchAdopted 4Q Post LaunchAdopted 5Q Post LaunchAdopted 6Q Post LaunchAdopted 7Q Post LaunchAdopted 8Q Post Launch
  36. 36. 2. Network Measures2.1. Metrics for Networks
  37. 37. Metrics for whole networks• Density• Average Degree• Average Distance• Diameter• Number of Components• … Next Session: More advanced metrics for whole networks (degree distributions, clustering, hierarchy etc..) © Thomas Plotkowiak 2010
  38. 38. Density• Density: Number of ties, expressed as percentage of the number of ordered/unordered pairs low density: 25% high density: 39% © Thomas Plotkowiak 2010
  39. 39. Average Degree• Average number of links per PersonDensity: 0,47 Density: 0,14Average Degree: 4 Average Degree: 4 © Thomas Plotkowiak 2010
  40. 40. Average Distance• Average geodesic distance between all pairs of nodes avg. distance 1.9 avg. distance 2.4 © Thomas Plotkowiak 2010
  41. 41. Diameter• Maximum Distance (= The length of the longest shortest path.) diameter 3 diameter 3 © Thomas Plotkowiak 2010
  42. 42. Number of Components • Component Ratio: Number of Components minus 1 divided by number of nodes minus 1CR is 1 when all nodes are isolates.CR is 0 when all nodes are in one component. CR: (3-1)/(14-1) = 0.154 © Thomas Plotkowiak 2010
  43. 43. 2 Network Measures2.2 Metrics for Actors (and whole networks too)Centrality Measures
  44. 44. Centrality Measures• Distance• Degree Centrality• Degree Prestige• Closeness Centrality• Betweenness Centrality• Eigenvector Centrality & Pagerank © Thomas Plotkowiak 2010
  45. 45. Example – Communication ties within a sawmill H – Hispanic E – English M- Mill P – Planer section Y - YardVertex labels indicate the ethnicity and the type of work of each employee, for example HP-10 is an Hispanic (H) working in the planer section (P) © Thomas Plotkowiak 2010
  46. 46. Distance• The larger the number of sources accessible to a person, the easier it is to obtain information. Social ties constitute a social capital that may be used to mobilize social resources.A geodesic is the shortest path between two vertices.The distance from vertex u to vertex v is the length of thegeodesic from u to v. © Thomas Plotkowiak 2010
  47. 47. Degree Centrality • The simplest indicator of centrality is the number of its neighbors (degree in a simple undirected network) The degree centrality of a node is its degree.4 3 © Thomas Plotkowiak 2010
  48. 48. Degree Centrality for whole networksDegree centralization of a network is the variation in the degreesof vertices divided by the maximum degree variation which ispossible in a networks of the same size. Degree Centralization = 1 Degree Centralization Thomas Plotkowiak 2010 © = 0.17
  49. 49. Prestige Centrality = Indegree •Prestige can be expressed as the relative indegree of an actor (degree prestige) 1 4 6 10 3 8 9 2 5 7 11Prestige of node 3: Pd = 2+3 / (11= 0, 2 (n3 ) − 1) = x+ j / ( g − 1) Pd (n j )Notice: Prestige does not depend on the size of the group and ist value lies between0 and 1 (Star). © Thomas Plotkowiak 2010
  50. 50. Closeness Centrality• Closeness centrality : A person is always then central, if that person regarding to the network relation is very close to all other persons. Such a central position allows to improve the efficiency of the communication of an actor. Such an actor is able to desseminate and receive information fast. g −1 Cc ( ni ) = g ∑ d (n , n j =1 i j ) © Thomas Plotkowiak 2010
  51. 51. Closeness Centrality 1 4 6 10 3 8 9 2 5 7 11ni nj d n Cc3 1 1 1 0,273 2 1 11 − 1 2 0,29 Cc = = 0, 433 4 13 5 1 ( n3 ) 3 0,433 6 2 23 4 0,453 7 2 5 0,453 8 3 6 0,453 9 4 Notice: We are only analyzing 7 0,453 10 5 symetrical relations and fully connected 8 0,453 11 5 networks. 9 0,37 23 10 0,27 © Thomas Plotkowiak 2010
  52. 52. Closeness Centrality for whole networks• Centralisation is a structural property of a group and not a relational attribute of individual actors.• Index for Centralisation is computed by summing the differences of the the centrality of the most central actor and the centrality of all other actors and dividing by the Maximum possible value for such a network. ∑ [C (n ) − CD (i)] g * D CD = i=1 [(N −1)(N − 2)] © Thomas Plotkowiak 2010
  53. 53. Centralisation II• Centralisation is always high when only one node has a high centrality degree and the remaining nodes are not central.• Notice: Only the difference of data of a fixed group at different timeslots allows for interpretable results (analogue to network density) Closenes Centralization = 1 Closeness Centralization = Thomas Plotkowiak 2010 © 0.43
  54. 54. Betweenness Centrality • Betweenness Centrality: Persons (Cutpoints), that connect two in other respects unconnected subpopulations, are actors with a high betweenness centrality score. • Notice : We are assuming that information always travels on the shortest paths! g (n i ) ∑ j ≠k jk g jk i ≠ j ,k Cb ( ni ) = ( g − 1)( g − 2)* (g-1)(g-2)/2 for undirected graphs © Thomas Plotkowiak 2010
  55. 55. Betweenness centrality • Notice: In directed networks it is possible that some actors are not reachable by others, but are themselves able to reach other nodes by themselves. 1 4 6 10 3 8 9 2 5 7 111 2 3 4 5 6 7 8 9 10 110 0 0,37 0,22 0,22 0,22 0,22 0,48 0,37 0 0 © Thomas Plotkowiak 2010
  56. 56. Hue (from red=0 to blue=max) shows the node betweenness. © Thomas Plotkowiak 2010
  57. 57. Eigenvector Centrality Don Corleone did not have many strong ties. He was a man of few words, yet he could make an offer you can’t refuse. Don Corleone surrounded himself with his sons and his trusted capos, who in turn, handled the day to day management issues of the family. © Thomas Plotkowiak 2010
  58. 58. Eigenvector CentralityMake xi proportional to the average of the centralities of its i’s network neighbors n 1 xi = λ ∑A x j =1 ij jwhere λ is a constant. In matrix-vector notation we can write 1 x= Ax λThe value λ is an eigenvalue of matrix A if there exists a non-zerovector x, such that Ax=λx. Vector x is an eigenvector of matrix AThe largest eigenvalue is called the principal eigenvalueThe corresponding eigenvector is the principal eigenvector © Thomas Plotkowiak 2010
  59. 59. Centralities in comparison• Degree: How many people can this person reach directly?• Betweenness: How likely is this person to be the most direct route between two people in the network?• Closeness: How fast can this person reach everyone in the network?• Eigenvector: How well is this person connected to other well-connected people? © Thomas Plotkowiak 2010
  60. 60. 3 WorkshopExploratory Facebook Social NetworkAnalysis
  61. 61. Process1. Import Data with Netviz2. Process with Gephi 1. Open 7. Labels 2. Layout 8. Community detection 3. Ranking (Degree) 9. Filter 4. Statistics 10. Label Adjust 5. Ranking (Betweenness) 11. Preview 6. Layout (Size Adjust) 3. Export © Thomas Plotkowiak 2010
  62. 62. Netvizz1. Sign in to your Facebook account2. Search for netvizz application3. Choose parameters you would like to include in the data (e.g. gender, wall posts count, interface language)4. Analyze either – Your personal friend network  today – [OR] one of your groups listed at the bottom5. Wait for the application to create the .gdf file and download it (right click, save as) © Thomas Plotkowiak 2010
  63. 63. Gephi• Gephi is an open-source network analysis and visualization software package.• Envisioned as providing "easy and broad access to network data", its advertised as being "Like Photoshop for graphs."• Gephi has been used in a number of research projects in the university, journalism and elsewhere.• The Gephi Team: Mathieu Bastian, Sebastien Heymann, Julian Bilcke, Mathieu Jacomy, Franck Ghitalla © Thomas Plotkowiak 2010
  64. 64. Gephi: 1. Open• From File menu select Open and then select the .gdf file you saved from Netvizz• At first it looks like a big hairball, so well change the layout to make some sense of the connections © Thomas Plotkowiak 2010
  65. 65. Gephi: 2. Layout • From the Layout module on the left side chose Force Atlas* from the Dropdown Menu, then click run – Force atlas makes connected nodes attract each other, while unconnected nodes are pushed towards the periphery • Click stop when it seems that the layout has converged towards a stable state*For graphs with a large number of nodes or edges rather chose Yifan Hu Layout © Thomas Plotkowiak 2010
  66. 66. Gephi: 3. Ranking (Degree)1. Chose the Ranking-Nodes Tab in the top left module and chose Degree from the dropdown menu – Degree = number of connections2. Hover your mouse over the gradient bar, then double click on each triangle to choose a color for each side of the range – Try to use bright colors for the highest degree and dark for lowest3. Click apply © Thomas Plotkowiak 2010
  67. 67. Gephi: 4. Statistics• Click the Statistics tab in the top right module• Click Run next to Average path length – Chose directed from Popup Menu• Click close when the graph reports shows up © Thomas Plotkowiak 2010
  68. 68. Gephi: 5. Rank (Betweeness)• Return to Ranking in the top left module and click Chose a rank parameter from the dropdown – Chose Betweeness Centrality from the dropdown menu• Click on the icon for size, instead of color – Set min size to 10 and max size to 50 (experiment a little)• Click Apply © Thomas Plotkowiak 2010
  69. 69. Gephi: 6. Layout• To keep the larger nodes from overlapping smaller ones, go to the Layout tab and check the Adjust by sizes box• Click Run and then Stop © Thomas Plotkowiak 2010
  70. 70. Gephi: 7. Labels• Click the bold black T in the toolabar at the bottom of the window to turn labels on• Click the black letter A in the same toolbar to select the Size Mode for the labels, and choose the node size option• Use the slider on the right to adjust the size• You can also change the font style by clicking next to the slider © Thomas Plotkowiak 2010
  71. 71. Gephi: 8. Community Detection• Go back to the statistics tab on the right and click Run next to Modularity – Check randomize and click OK• Go to the partition tab in the top left module and click the refresh arrow• Choose modularity class from the dropdown menu – Right click to randomize colors• Click Apply © Thomas Plotkowiak 2010
  72. 72. Gephi: 9. Filter• Go to Filters in the top right module and open the Topology Folder – Drag the degree range to the box below ("Drag filter here")• Click on Degree Range to open the Parameters – Click on the "0" and change it to a slightly higher value – This removes the nodes that are not connected to many other nodes• Click Filter © Thomas Plotkowiak 2010
  73. 73. Gephi: 10. Label Adjust1. Go to the Layout module on the left2. Chose label Adjust layout to make the labels not overlapping3. Click Run and then Stop © Thomas Plotkowiak 2010
  74. 74. Gephi: 11. Preview1. At the very top click on the Preview tab2. Under Node, check the box "Show Labels"3. Click Refresh at the bottom, and choose your label font4. Play around with the options until you like your graph (Dont forget to click refresh every time) © Thomas Plotkowiak 2010
  75. 75. Gephi: 12. Export• To Export your graph for publication in SVG or PDF click the Export button• Save © Thomas Plotkowiak 2010
  76. 76. Gephi: 13. Make sense out of it Friends from swimming club Roommate & swimming clubFriends fromstaying in Japan Friends from studies at the University of Mannheim Friends from studies at the University of Waterloo Joined me on Friends from school the exchange to Canada © Thomas Plotkowiak 2010
  77. 77. Hungry? Need More Data?• Use NodeXL• Write own crawlers (ask me)• Use existing archives – – http://vlado.fmf.uni- – http://vlado.fmf.uni- ucinet/ucidata.htm• Collect by Surveys © Thomas Plotkowiak 2010
  78. 78. Time to read a book on SNA. But which? © Thomas Plotkowiak 2010
  79. 79. Interactive Summary The biggest advantage I can gain by using SNA is… The most important fact about SNA for me is… The concept that made the most sense for me today was… The biggest danger in using SNA is … If I will use SNA in the future, I will try to make sure that… If I use SNA in my next project I will use it for … I should change my perspective on networks in considering … I have changed my opinion about SNA , finding out that… I missed today that … Before attending that seminar I didnt know that … I wish we could have covered… If I forget mostly everything that learned today, I will still remember … The most important thing today for me was … © Thomas Plotkowiak 2010
  80. 80. Thanks for your attention!Questions & DiscussionNext Date is: XY