Like this presentation? Why not share!

- Social Network Analysis by Giorgos Cheliotis 131180 views
- Social Network Analysis & an Introd... by Patti Anklam 15826 views
- The Basics of Social Network Analysis by Rory Sie 4533 views
- Social network analysis course 2010... by guillaume ereteo 9111 views
- A Guide to Social Network Analysis by Olivier Serrat 1288 views
- A comparative study of social netwo... by David Combe 22294 views

6,702

-1

-1

Published on

Published in:
Technology

No Downloads

Total Views

6,702

On Slideshare

0

From Embeds

0

Number of Embeds

2

Shares

0

Downloads

0

Comments

0

Likes

32

No embeds

No notes for slide

- 1. SOCIAL NETWORK ANALYSIS Caleb Jones { “email” : “calebjones@gmail.com”, “website” : “http://calebjones.info”, “twitter” : “@JonesWCaleb” }
- 2. Overview • Network Analysis – Crash Course • Degree • Components • Modularity • Ranking • Resiliency • Gephi – Intro • Loading data (Facebook) • Navigation • Statistics • Exporting • Filtering • Resiliency
- 3. Resources SNA Coursera Course (next being taught October 2013) Linked by Albert-László Barabási
- 4. Network Analysis – Crash Course • Degree (n): The number of connections a node has. • Node A has in-degree 3 and out-degree 1 • Node B has degree 4 A B
- 5. Network Analysis – Crash Course • Component (n): A a maximally connected subgraph (undirected). • Giant component is largest component component (giant) component Graph with nodes { A, B, C, X, Y, Z }
- 6. Network Analysis – Crash Course • Modularity (n) ~ Division of a graph into communities (modules/classes/cliques) with dense interconnection with the network having relatively sparse interconnection between communities. Community 1 Community 2 Graph with nodes { A, B, C, X, Y, Z }
- 7. Network Analysis – Crash Course • Ranking: A measure of a node’s “importance” • Many different methods for determining “importance” • Degree, Centrality, Closeness, Betweenness, Eigenvector, HITS, PageRank, Erdös Number • Which one to consider depends on the question being asked • Precursor to identifying network resilience, diffusion, and vulnerability
- 8. Network Analysis – Crash Course • Degree ranking: Quantity over quality Node Score A 3 B 3 C 1 D 1 X 1 Y 1 Z 3 Q 1
- 9. Network Analysis – Crash Course • Betweeness Ranking: How frequently a node appears on shortest paths. Node Score A 15 B 11 C 0 D 0 X 0 Y 0 Z 11 Q 0
- 10. Network Analysis – Crash Course • Closeness Ranking: Average number of hops from a node to rest of network. Node Score A 1.571 B 1.857 C 2.714 D 2.714 X 2.714 Y 2.714 Z 1.857 Q 2.429 Note: Smaller is (usually) better
- 11. Network Analysis – Crash Course • Eigenvector Ranking: A node’s “influence” on the network (accounts for who you know) Node Score A 1 B 0.836 C 0.392 D 0.392 X 0.392 Y 0.392 Z 0.836 Q 0.465 Google’s PageRank is a variant of this Based on eigenvector of adjacency matrix
- 12. Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 0 B 1 C 2 D 2 X 2 Y 2 Z 1 Q 1 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
- 13. Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 2 B 1 C 2 D 0 X 4 Y 4 Z 3 Q 3 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
- 14. Network Analysis – Crash Course • Limitations: • Only considered undirected networks (directed is more complicated) • Treated all edges as equal. Many networks have a weight or cost associated to edges (e.g. distance) • Treated all nodes as equal. A node’s importance may be inherent based on attributes separate from its position in network (e.g. dating sites)
- 15. Network Analysis – Crash Course • Resiliency (removing nodes/links): • Target nodes based on their “importance” • High degree nodes more likely to affect local communities • High betweeness/Eigenvector nodes more likely to fragment communities
- 16. Gephi Introduction • Platform for visualizing and analyzing networks • https://gephi.org/ • Cross-platform • Plugin model
- 17. Facebook Dataset • Download your data (gml) • http://snacourse.com/getnet/ • Import into Gephi • File -> Open -> Select downloaded .gml file • Choose “undirected” for “Graph Type”
- 18. Layout Layout -> Fruchterman Reingold
- 19. Partitioning Communities 1. Statistic -> Modularity -> Run (use defaults) 2. Partition -> Nodes (refresh) -> Modularity class -> Apply
- 20. Degree Distribution 1. Statistic -> Average Degree -> Run 2. Partition -> Nodes (refresh) -> Modularity class -> Apply Lots of nodes with few connections Only a few with a large number of connections Power law distribution?
- 21. Node Ranking by Degree 1. Ranking -> Nodes (refresh) -> Degree -> Apply (try tweaking min/max size and Spline for desired emphasis)
- 22. Filtering Isolated Nodes (“noise”) 1. Statistics -> Connected Components -> Run 2. Filters -> Attributes -> Partition Count -> Component ID 3. Drag “Component ID” down into “Queries” section 4. Click on “Partition Count”, slide the settings bar, and click “Filter” – adjust to remove isolated nodes Can be important step when dealing with very large data sets. Depending on degree distribution, filter can be set quite high.
- 23. Re-adjust after Filtering • Need to re-run previous steps to refresh calculated values now that filtering has been done. • Statistics -> Average degree, modularity, connected components • How did these numbers change? • Re-partition node color by modularity class now that modularity has been recalculated • Run Fruchterman Reingold layout again to fill space left over from filtered nodes
- 24. Have you saved yet!?
- 25. Node Ranking by Centrality 1. Statistics -> Network Diameter -> Run 2. Ranking -> Betweeness Centrality -> Apply
- 26. Erdös Number • You may have noticed a key node which both has the highest degree and betweeness ranking. • Click on the “Edit” button and select that node (note the name) • Statistics -> Erdös Number -> Select that name -> OK • What will happen if you select a less conspicuous node?
- 27. Data Lab • Go to “Data Laboratory” • All node information as well as calculated statistics appear here in a spreadsheet. • Sort by “Erdös Number” (descending) • What is the largest Erdös Number? N degrees of ________ . • Try sorting by other values (degree, closeness, betweeness) Max is 7 degrees of separation
- 28. Node Ranking by Eigenvector Centrality 1. Statistics -> Eigenvector Centrality -> Run 2. Ranking -> Eigenvector Centrality -> Apply
- 29. Node Ranking by PageRank 1. Statistics -> PageRank -> Run 2. Ranking -> PageRank -> Apply
- 30. Export to Image • Go to “Preview” mode • Click “Refresh” to see what you have now • Add node labels • “Node Labels” -> “Show Labels” • Adjust font size to avoid label overlapping • If Node Labels are overlapping, try expanding layout • Back to “Overview” -> Layout -> Fruchterman Reingold • Increase the “Area” parameter and re-run the layout • Then go back to “Preview” mode and click “Refresh” • May need to re-adjust Node Label text size • Experiment with “Curved” edges
- 31. labels omitted in slidedeck for privacy
- 32. Before we attack the network, save!
- 33. Network Resiliency • How can we fragment the network or increase the separation between nodes? • Which nodes, if removed/influenced, would most greatly impact the network? • What information have we learned already that could be used?
- 34. Network Resiliency • Go to “Data Laboratory” -> sort by “PageRank descending • Select top 5 rows and delete them (did you save first!!!) • Note their names – Are these people influential in your life? sort Top 5
- 35. Network Resiliency • Go back to statistics and note the following: • Average Degree, Network Diameter, Modularity, Connected Components, Average Path Length • Also note how the network visually has changed • Re-run the statistics above and note how the numbers changed • Did you successfully fragment the network (did # of connected components increase)? (disrupting communications) • How many nodes do you think you’d have to remove if you removed by lowest PageRank scores first? (robustness of network) • What if links represented load distributed across network? How would the network load change after removing these key nodes? (cascading failure)
- 36. Review • Network Analysis – Crash Course • Degree • Components • Modularity • Ranking • Resiliency • Gephi – Intro • Loading data (Facebook) • Navigation • Statistics • Exporting • Filtering • Resiliency
- 37. Questions?

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment