Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Social network analysis

12,389 views

Published on

A high-level overview of social network analysis using gephi with your exported Facebook friends network. See more network analysis at http://allthingsgraphed.com.

Published in: Technology

Social network analysis

  1. 1. SOCIAL NETWORK ANALYSIS Caleb Jones { “email” : “calebjones@gmail.com”, “website” : “http://calebjones.info”, “twitter” : “@JonesWCaleb” }
  2. 2. Overview •  Network Analysis – Crash Course •  Degree •  Components •  Modularity •  Ranking •  Resiliency •  Gephi – Intro •  Loading data (Facebook) •  Navigation •  Statistics •  Exporting •  Filtering •  Resiliency
  3. 3. Resources SNA Coursera Course (next being taught October 2013) Linked by Albert-László Barabási
  4. 4. Network Analysis – Crash Course •  Degree (n): The number of connections a node has. •  Node A has in-degree 3 and out-degree 1 •  Node B has degree 4 A B
  5. 5. Network Analysis – Crash Course •  Component (n): A a maximally connected subgraph (undirected). •  Giant component is largest component component (giant) component Graph with nodes { A, B, C, X, Y, Z }
  6. 6. Network Analysis – Crash Course •  Modularity (n) ~ Division of a graph into communities (modules/classes/cliques) with dense interconnection with the network having relatively sparse interconnection between communities. Community 1 Community 2 Graph with nodes { A, B, C, X, Y, Z }
  7. 7. Network Analysis – Crash Course • Ranking: A measure of a node’s “importance” • Many different methods for determining “importance” • Degree, Centrality, Closeness, Betweenness, Eigenvector, HITS, PageRank, Erdös Number • Which one to consider depends on the question being asked • Precursor to identifying network resilience, diffusion, and vulnerability
  8. 8. Network Analysis – Crash Course • Degree ranking: Quantity over quality Node Score A 3 B 3 C 1 D 1 X 1 Y 1 Z 3 Q 1
  9. 9. Network Analysis – Crash Course • Betweeness Ranking: How frequently a node appears on shortest paths. Node Score A 15 B 11 C 0 D 0 X 0 Y 0 Z 11 Q 0
  10. 10. Network Analysis – Crash Course • Closeness Ranking: Average number of hops from a node to rest of network. Node Score A 1.571 B 1.857 C 2.714 D 2.714 X 2.714 Y 2.714 Z 1.857 Q 2.429 Note: Smaller is (usually) better
  11. 11. Network Analysis – Crash Course • Eigenvector Ranking: A node’s “influence” on the network (accounts for who you know) Node Score A 1 B 0.836 C 0.392 D 0.392 X 0.392 Y 0.392 Z 0.836 Q 0.465 Google’s PageRank is a variant of this Based on eigenvector of adjacency matrix
  12. 12. Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 0 B 1 C 2 D 2 X 2 Y 2 Z 1 Q 1 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
  13. 13. Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 2 B 1 C 2 D 0 X 4 Y 4 Z 3 Q 3 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
  14. 14. Network Analysis – Crash Course • Limitations: • Only considered undirected networks (directed is more complicated) • Treated all edges as equal. Many networks have a weight or cost associated to edges (e.g. distance) • Treated all nodes as equal. A node’s importance may be inherent based on attributes separate from its position in network (e.g. dating sites)
  15. 15. Network Analysis – Crash Course • Resiliency (removing nodes/links): • Target nodes based on their “importance” • High degree nodes more likely to affect local communities • High betweeness/Eigenvector nodes more likely to fragment communities
  16. 16. Gephi Introduction •  Platform for visualizing and analyzing networks •  https://gephi.org/ •  Cross-platform •  Plugin model
  17. 17. Facebook Dataset •  Download your data (gml) •  http://snacourse.com/getnet/ •  Import into Gephi •  File -> Open -> Select downloaded .gml file •  Choose “undirected” for “Graph Type”
  18. 18. Layout Layout -> Fruchterman Reingold
  19. 19. Partitioning Communities 1.  Statistic -> Modularity -> Run (use defaults) 2.  Partition -> Nodes (refresh) -> Modularity class -> Apply
  20. 20. Degree Distribution 1.  Statistic -> Average Degree -> Run 2.  Partition -> Nodes (refresh) -> Modularity class -> Apply Lots of nodes with few connections Only a few with a large number of connections Power law distribution?
  21. 21. Node Ranking by Degree 1.  Ranking -> Nodes (refresh) -> Degree -> Apply (try tweaking min/max size and Spline for desired emphasis)
  22. 22. Filtering Isolated Nodes (“noise”) 1.  Statistics -> Connected Components -> Run 2.  Filters -> Attributes -> Partition Count -> Component ID 3.  Drag “Component ID” down into “Queries” section 4.  Click on “Partition Count”, slide the settings bar, and click “Filter” – adjust to remove isolated nodes Can be important step when dealing with very large data sets. Depending on degree distribution, filter can be set quite high.
  23. 23. Re-adjust after Filtering • Need to re-run previous steps to refresh calculated values now that filtering has been done. • Statistics -> Average degree, modularity, connected components •  How did these numbers change? • Re-partition node color by modularity class now that modularity has been recalculated • Run Fruchterman Reingold layout again to fill space left over from filtered nodes
  24. 24. Have you saved yet!?
  25. 25. Node Ranking by Centrality 1.  Statistics -> Network Diameter -> Run 2.  Ranking -> Betweeness Centrality -> Apply
  26. 26. Erdös Number •  You may have noticed a key node which both has the highest degree and betweeness ranking. •  Click on the “Edit” button and select that node (note the name) •  Statistics -> Erdös Number -> Select that name -> OK •  What will happen if you select a less conspicuous node?
  27. 27. Data Lab •  Go to “Data Laboratory” •  All node information as well as calculated statistics appear here in a spreadsheet. •  Sort by “Erdös Number” (descending) •  What is the largest Erdös Number? N degrees of ________ . •  Try sorting by other values (degree, closeness, betweeness) Max is 7 degrees of separation
  28. 28. Node Ranking by Eigenvector Centrality 1.  Statistics -> Eigenvector Centrality -> Run 2.  Ranking -> Eigenvector Centrality -> Apply
  29. 29. Node Ranking by PageRank 1.  Statistics -> PageRank -> Run 2.  Ranking -> PageRank -> Apply
  30. 30. Export to Image •  Go to “Preview” mode •  Click “Refresh” to see what you have now •  Add node labels •  “Node Labels” -> “Show Labels” •  Adjust font size to avoid label overlapping •  If Node Labels are overlapping, try expanding layout •  Back to “Overview” -> Layout -> Fruchterman Reingold •  Increase the “Area” parameter and re-run the layout •  Then go back to “Preview” mode and click “Refresh” •  May need to re-adjust Node Label text size •  Experiment with “Curved” edges
  31. 31. labels omitted in slidedeck for privacy
  32. 32. Before we attack the network, save!
  33. 33. Network Resiliency •  How can we fragment the network or increase the separation between nodes? •  Which nodes, if removed/influenced, would most greatly impact the network? •  What information have we learned already that could be used?
  34. 34. Network Resiliency •  Go to “Data Laboratory” -> sort by “PageRank descending •  Select top 5 rows and delete them (did you save first!!!) •  Note their names – Are these people influential in your life? sort Top 5
  35. 35. Network Resiliency •  Go back to statistics and note the following: •  Average Degree, Network Diameter, Modularity, Connected Components, Average Path Length •  Also note how the network visually has changed •  Re-run the statistics above and note how the numbers changed •  Did you successfully fragment the network (did # of connected components increase)? (disrupting communications) •  How many nodes do you think you’d have to remove if you removed by lowest PageRank scores first? (robustness of network) •  What if links represented load distributed across network? How would the network load change after removing these key nodes? (cascading failure)
  36. 36. Review •  Network Analysis – Crash Course •  Degree •  Components •  Modularity •  Ranking •  Resiliency •  Gephi – Intro •  Loading data (Facebook) •  Navigation •  Statistics •  Exporting •  Filtering •  Resiliency
  37. 37. Questions?

×