Report

Caleb JonesFollow

Jul. 26, 2013•0 likes## 54 likes

•14,919 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Jul. 26, 2013•0 likes## 54 likes

•14,919 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Report

Technology

A high-level overview of social network analysis using gephi with your exported Facebook friends network. See more network analysis at http://allthingsgraphed.com.

Caleb JonesFollow

- SOCIAL NETWORK ANALYSIS Caleb Jones { “email” : “calebjones@gmail.com”, “website” : “http://calebjones.info”, “twitter” : “@JonesWCaleb” }
- Overview • Network Analysis – Crash Course • Degree • Components • Modularity • Ranking • Resiliency • Gephi – Intro • Loading data (Facebook) • Navigation • Statistics • Exporting • Filtering • Resiliency
- Resources SNA Coursera Course (next being taught October 2013) Linked by Albert-László Barabási
- Network Analysis – Crash Course • Degree (n): The number of connections a node has. • Node A has in-degree 3 and out-degree 1 • Node B has degree 4 A B
- Network Analysis – Crash Course • Component (n): A a maximally connected subgraph (undirected). • Giant component is largest component component (giant) component Graph with nodes { A, B, C, X, Y, Z }
- Network Analysis – Crash Course • Modularity (n) ~ Division of a graph into communities (modules/classes/cliques) with dense interconnection with the network having relatively sparse interconnection between communities. Community 1 Community 2 Graph with nodes { A, B, C, X, Y, Z }
- Network Analysis – Crash Course • Ranking: A measure of a node’s “importance” • Many different methods for determining “importance” • Degree, Centrality, Closeness, Betweenness, Eigenvector, HITS, PageRank, Erdös Number • Which one to consider depends on the question being asked • Precursor to identifying network resilience, diffusion, and vulnerability
- Network Analysis – Crash Course • Degree ranking: Quantity over quality Node Score A 3 B 3 C 1 D 1 X 1 Y 1 Z 3 Q 1
- Network Analysis – Crash Course • Betweeness Ranking: How frequently a node appears on shortest paths. Node Score A 15 B 11 C 0 D 0 X 0 Y 0 Z 11 Q 0
- Network Analysis – Crash Course • Closeness Ranking: Average number of hops from a node to rest of network. Node Score A 1.571 B 1.857 C 2.714 D 2.714 X 2.714 Y 2.714 Z 1.857 Q 2.429 Note: Smaller is (usually) better
- Network Analysis – Crash Course • Eigenvector Ranking: A node’s “influence” on the network (accounts for who you know) Node Score A 1 B 0.836 C 0.392 D 0.392 X 0.392 Y 0.392 Z 0.836 Q 0.465 Google’s PageRank is a variant of this Based on eigenvector of adjacency matrix
- Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 0 B 1 C 2 D 2 X 2 Y 2 Z 1 Q 1 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
- Network Analysis – Crash Course • Erdös Ranking: Number of hops to specific node (degrees of separation). Node Score A 2 B 1 C 2 D 0 X 4 Y 4 Z 3 Q 3 Note: Smaller is (usually) better What if “Erdös” is an influential CEO? What if “Erdös” has bird flu? Erdös
- Network Analysis – Crash Course • Limitations: • Only considered undirected networks (directed is more complicated) • Treated all edges as equal. Many networks have a weight or cost associated to edges (e.g. distance) • Treated all nodes as equal. A node’s importance may be inherent based on attributes separate from its position in network (e.g. dating sites)
- Network Analysis – Crash Course • Resiliency (removing nodes/links): • Target nodes based on their “importance” • High degree nodes more likely to affect local communities • High betweeness/Eigenvector nodes more likely to fragment communities
- Gephi Introduction • Platform for visualizing and analyzing networks • https://gephi.org/ • Cross-platform • Plugin model
- Facebook Dataset • Download your data (gml) • http://snacourse.com/getnet/ • Import into Gephi • File -> Open -> Select downloaded .gml file • Choose “undirected” for “Graph Type”
- Layout Layout -> Fruchterman Reingold
- Partitioning Communities 1. Statistic -> Modularity -> Run (use defaults) 2. Partition -> Nodes (refresh) -> Modularity class -> Apply
- Degree Distribution 1. Statistic -> Average Degree -> Run 2. Partition -> Nodes (refresh) -> Modularity class -> Apply Lots of nodes with few connections Only a few with a large number of connections Power law distribution?
- Node Ranking by Degree 1. Ranking -> Nodes (refresh) -> Degree -> Apply (try tweaking min/max size and Spline for desired emphasis)
- Filtering Isolated Nodes (“noise”) 1. Statistics -> Connected Components -> Run 2. Filters -> Attributes -> Partition Count -> Component ID 3. Drag “Component ID” down into “Queries” section 4. Click on “Partition Count”, slide the settings bar, and click “Filter” – adjust to remove isolated nodes Can be important step when dealing with very large data sets. Depending on degree distribution, filter can be set quite high.
- Re-adjust after Filtering • Need to re-run previous steps to refresh calculated values now that filtering has been done. • Statistics -> Average degree, modularity, connected components • How did these numbers change? • Re-partition node color by modularity class now that modularity has been recalculated • Run Fruchterman Reingold layout again to fill space left over from filtered nodes
- Have you saved yet!?
- Node Ranking by Centrality 1. Statistics -> Network Diameter -> Run 2. Ranking -> Betweeness Centrality -> Apply
- Erdös Number • You may have noticed a key node which both has the highest degree and betweeness ranking. • Click on the “Edit” button and select that node (note the name) • Statistics -> Erdös Number -> Select that name -> OK • What will happen if you select a less conspicuous node?
- Data Lab • Go to “Data Laboratory” • All node information as well as calculated statistics appear here in a spreadsheet. • Sort by “Erdös Number” (descending) • What is the largest Erdös Number? N degrees of ________ . • Try sorting by other values (degree, closeness, betweeness) Max is 7 degrees of separation
- Node Ranking by Eigenvector Centrality 1. Statistics -> Eigenvector Centrality -> Run 2. Ranking -> Eigenvector Centrality -> Apply
- Node Ranking by PageRank 1. Statistics -> PageRank -> Run 2. Ranking -> PageRank -> Apply
- Export to Image • Go to “Preview” mode • Click “Refresh” to see what you have now • Add node labels • “Node Labels” -> “Show Labels” • Adjust font size to avoid label overlapping • If Node Labels are overlapping, try expanding layout • Back to “Overview” -> Layout -> Fruchterman Reingold • Increase the “Area” parameter and re-run the layout • Then go back to “Preview” mode and click “Refresh” • May need to re-adjust Node Label text size • Experiment with “Curved” edges
- labels omitted in slidedeck for privacy
- Before we attack the network, save!
- Network Resiliency • How can we fragment the network or increase the separation between nodes? • Which nodes, if removed/influenced, would most greatly impact the network? • What information have we learned already that could be used?
- Network Resiliency • Go to “Data Laboratory” -> sort by “PageRank descending • Select top 5 rows and delete them (did you save first!!!) • Note their names – Are these people influential in your life? sort Top 5
- Network Resiliency • Go back to statistics and note the following: • Average Degree, Network Diameter, Modularity, Connected Components, Average Path Length • Also note how the network visually has changed • Re-run the statistics above and note how the numbers changed • Did you successfully fragment the network (did # of connected components increase)? (disrupting communications) • How many nodes do you think you’d have to remove if you removed by lowest PageRank scores first? (robustness of network) • What if links represented load distributed across network? How would the network load change after removing these key nodes? (cascading failure)
- Review • Network Analysis – Crash Course • Degree • Components • Modularity • Ranking • Resiliency • Gephi – Intro • Loading data (Facebook) • Navigation • Statistics • Exporting • Filtering • Resiliency
- Questions?