Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Social Network Analysis by Giorgos Cheliotis 258209 views
- The Basics of Social Network Analysis by Rory Sie 10080 views
- Social Network Analysis & an Introd... by Patti Anklam 26861 views
- Social Networks and Social Capital by Giorgos Cheliotis 51777 views
- How to conduct a social network ana... by Jeromy Anglim 36949 views
- Social network analysis course 2010... by guillaume ereteo 12775 views

14,247 views

Published on

Published in:
Technology

No Downloads

Total views

14,247

On SlideShare

0

From Embeds

0

Number of Embeds

46

Shares

0

Downloads

0

Comments

7

Likes

52

No notes for slide

- 1. SOCIAL NETWORK ANALYSIS Caleb Jones { “email” : “calebjones@gmail.com”, “website” : “http://calebjones.info”, “twitter” : “@JonesWCaleb” }
- 2. Overview • Network Analysis – Crash Course • Degree • Components • Modularity • Ranking • Resiliency • Gephi – Intro • Loading data (Facebook) • Navigation • Statistics • Exporting • Filtering • Resiliency
- 3. Resources SNA Coursera Course (next being taught October 2013) Linked by Albert-László Barabási
- 4. Network Analysis – Crash Course • Degree (n): The number of connections a node has. • Node A has in-degree 3 and out-degree 1 • Node B has degree 4 A B
- 5. Network Analysis – Crash Course • Component (n): A a maximally connected subgraph (undirected). • Giant component is largest component component (giant) component Graph with nodes { A, B, C, X, Y, Z }
- 6. Network Analysis – Crash Course • Modularity (n) ~ Division of a graph into communities (modules/classes/cliques) with dense interconnection with the network having relatively sparse interconnection between communities. Community 1 Community 2 Graph with nodes { A, B, C, X, Y, Z }
- 7. Network Analysis – Crash Course • Ranking: A measure of a node’s “importance” • Many different methods for determining “importance” • Degree, Centrality, Closeness, Betweenness, Eigenvector, HITS, PageRank, Erdös Number • Which one to consider depends on the question being asked • Precursor to identifying network resilience, diffusion, and vulnerability
- 8. Network Analysis – Crash Course • Degree ranking: Quantity over quality Node Score A 3 B 3 C 1 D 1 X 1 Y 1 Z 3 Q 1
- 9. Network Analysis – Crash Course • Betweeness Ranking: How frequently a node appears on shortest paths. Node Score A 15 B 11 C 0 D 0 X 0 Y 0 Z 11 Q 0
- 10. Network Analysis – Crash Course • Closeness Ranking: Average number of hops from a node to rest of network. Node Score A 1.571 B 1.857 C 2.714 D 2.714 X 2.714 Y 2.714 Z 1.857 Q 2.429 Note: Smaller is (usually) better
- 11. Network Analysis – Crash Course • Eigenvector Ranking: A node’s “influence” on the network (accounts for who you know) Node Score A 1 B 0.836 C 0.392 D 0.392 X 0.392 Y 0.392 Z 0.836 Q 0.465 Google’s PageRank is a variant of this Based on eigenvector of adjacency matrix
- 12. Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 0 B 1 C 2 D 2 X 2 Y 2 Z 1 Q 1 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
- 13. Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 2 B 1 C 2 D 0 X 4 Y 4 Z 3 Q 3 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
- 14. Network Analysis – Crash Course • Limitations: • Only considered undirected networks (directed is more complicated) • Treated all edges as equal. Many networks have a weight or cost associated to edges (e.g. distance) • Treated all nodes as equal. A node’s importance may be inherent based on attributes separate from its position in network (e.g. dating sites)
- 15. Network Analysis – Crash Course • Resiliency (removing nodes/links): • Target nodes based on their “importance” • High degree nodes more likely to affect local communities • High betweeness/Eigenvector nodes more likely to fragment communities
- 16. Gephi Introduction • Platform for visualizing and analyzing networks • https://gephi.org/ • Cross-platform • Plugin model
- 17. Facebook Dataset • Download your data (gml) • http://snacourse.com/getnet/ • Import into Gephi • File -> Open -> Select downloaded .gml file • Choose “undirected” for “Graph Type”
- 18. Layout Layout -> Fruchterman Reingold
- 19. Partitioning Communities 1. Statistic -> Modularity -> Run (use defaults) 2. Partition -> Nodes (refresh) -> Modularity class -> Apply
- 20. Degree Distribution 1. Statistic -> Average Degree -> Run 2. Partition -> Nodes (refresh) -> Modularity class -> Apply Lots of nodes with few connections Only a few with a large number of connections Power law distribution?
- 21. Node Ranking by Degree 1. Ranking -> Nodes (refresh) -> Degree -> Apply (try tweaking min/max size and Spline for desired emphasis)
- 22. Filtering Isolated Nodes (“noise”) 1. Statistics -> Connected Components -> Run 2. Filters -> Attributes -> Partition Count -> Component ID 3. Drag “Component ID” down into “Queries” section 4. Click on “Partition Count”, slide the settings bar, and click “Filter” – adjust to remove isolated nodes Can be important step when dealing with very large data sets. Depending on degree distribution, filter can be set quite high.
- 23. Re-adjust after Filtering • Need to re-run previous steps to refresh calculated values now that filtering has been done. • Statistics -> Average degree, modularity, connected components • How did these numbers change? • Re-partition node color by modularity class now that modularity has been recalculated • Run Fruchterman Reingold layout again to fill space left over from filtered nodes
- 24. Have you saved yet!?
- 25. Node Ranking by Centrality 1. Statistics -> Network Diameter -> Run 2. Ranking -> Betweeness Centrality -> Apply
- 26. Erdös Number • You may have noticed a key node which both has the highest degree and betweeness ranking. • Click on the “Edit” button and select that node (note the name) • Statistics -> Erdös Number -> Select that name -> OK • What will happen if you select a less conspicuous node?
- 27. Data Lab • Go to “Data Laboratory” • All node information as well as calculated statistics appear here in a spreadsheet. • Sort by “Erdös Number” (descending) • What is the largest Erdös Number? N degrees of ________ . • Try sorting by other values (degree, closeness, betweeness) Max is 7 degrees of separation
- 28. Node Ranking by Eigenvector Centrality 1. Statistics -> Eigenvector Centrality -> Run 2. Ranking -> Eigenvector Centrality -> Apply
- 29. Node Ranking by PageRank 1. Statistics -> PageRank -> Run 2. Ranking -> PageRank -> Apply
- 30. Export to Image • Go to “Preview” mode • Click “Refresh” to see what you have now • Add node labels • “Node Labels” -> “Show Labels” • Adjust font size to avoid label overlapping • If Node Labels are overlapping, try expanding layout • Back to “Overview” -> Layout -> Fruchterman Reingold • Increase the “Area” parameter and re-run the layout • Then go back to “Preview” mode and click “Refresh” • May need to re-adjust Node Label text size • Experiment with “Curved” edges
- 31. labels omitted in slidedeck for privacy
- 32. Before we attack the network, save!
- 33. Network Resiliency • How can we fragment the network or increase the separation between nodes? • Which nodes, if removed/influenced, would most greatly impact the network? • What information have we learned already that could be used?
- 34. Network Resiliency • Go to “Data Laboratory” -> sort by “PageRank descending • Select top 5 rows and delete them (did you save first!!!) • Note their names – Are these people influential in your life? sort Top 5
- 35. Network Resiliency • Go back to statistics and note the following: • Average Degree, Network Diameter, Modularity, Connected Components, Average Path Length • Also note how the network visually has changed • Re-run the statistics above and note how the numbers changed • Did you successfully fragment the network (did # of connected components increase)? (disrupting communications) • How many nodes do you think you’d have to remove if you removed by lowest PageRank scores first? (robustness of network) • What if links represented load distributed across network? How would the network load change after removing these key nodes? (cascading failure)
- 36. Review • Network Analysis – Crash Course • Degree • Components • Modularity • Ranking • Resiliency • Gephi – Intro • Loading data (Facebook) • Navigation • Statistics • Exporting • Filtering • Resiliency
- 37. Questions?

No public clipboards found for this slide

Login to see the comments