â˘

54 likesâ˘14,971 views

A high-level overview of social network analysis using gephi with your exported Facebook friends network. See more network analysis at http://allthingsgraphed.com.

- 1. SOCIAL NETWORK ANALYSIS Caleb Jones { âemailâ : âcalebjones@gmail.comâ, âwebsiteâ : âhttp://calebjones.infoâ, âtwitterâ : â@JonesWCalebâ }
- 2. Overview â˘âŻ Network Analysis â Crash Course â˘âŻ Degree â˘âŻ Components â˘âŻ Modularity â˘âŻ Ranking â˘âŻ Resiliency â˘âŻ Gephi â Intro â˘âŻ Loading data (Facebook) â˘âŻ Navigation â˘âŻ Statistics â˘âŻ Exporting â˘âŻ Filtering â˘âŻ Resiliency
- 3. Resources SNA Coursera Course (next being taught October 2013) Linked by Albert-LĂĄszlĂł BarabĂĄsi
- 4. Network Analysis â Crash Course â˘âŻ Degree (n): The number of connections a node has. â˘âŻ Node A has in-degree 3 and out-degree 1 â˘âŻ Node B has degree 4 A B
- 5. Network Analysis â Crash Course â˘âŻ Component (n): A a maximally connected subgraph (undirected). â˘âŻ Giant component is largest component component (giant) component Graph with nodes { A, B, C, X, Y, Z }
- 6. Network Analysis â Crash Course â˘âŻ Modularity (n) ~ Division of a graph into communities (modules/classes/cliques) with dense interconnection with the network having relatively sparse interconnection between communities. Community 1 Community 2 Graph with nodes { A, B, C, X, Y, Z }
- 7. Network Analysis â Crash Course â˘âŻRanking: A measure of a nodeâs âimportanceâ â˘âŻMany different methods for determining âimportanceâ â˘âŻDegree, Centrality, Closeness, Betweenness, Eigenvector, HITS, PageRank, ErdĂśs Number â˘âŻWhich one to consider depends on the question being asked â˘âŻPrecursor to identifying network resilience, diffusion, and vulnerability
- 8. Network Analysis â Crash Course â˘âŻDegree ranking: Quantity over quality Node Score A 3 B 3 C 1 D 1 X 1 Y 1 Z 3 Q 1
- 9. Network Analysis â Crash Course â˘âŻBetweeness Ranking: How frequently a node appears on shortest paths. Node Score A 15 B 11 C 0 D 0 X 0 Y 0 Z 11 Q 0
- 10. Network Analysis â Crash Course â˘âŻCloseness Ranking: Average number of hops from a node to rest of network. Node Score A 1.571 B 1.857 C 2.714 D 2.714 X 2.714 Y 2.714 Z 1.857 Q 2.429 Note: Smaller is (usually) better
- 11. Network Analysis â Crash Course â˘âŻEigenvector Ranking: A nodeâs âinfluenceâ on the network (accounts for who you know) Node Score A 1 B 0.836 C 0.392 D 0.392 X 0.392 Y 0.392 Z 0.836 Q 0.465 Googleâs PageRank is a variant of this Based on eigenvector of adjacency matrix
- 12. Network Analysis â Crash Course â˘âŻErdĂśs Ranking: Number of hops to specific node (degrees of separation). Node Score A 0 B 1 C 2 D 2 X 2 Y 2 Z 1 Q 1 Note: Smaller is (usually) better What if âErdĂśsâ is an influential CEO? What if âErdĂśsâ has bird flu? ErdĂśs
- 13. Network Analysis â Crash Course â˘âŻErdĂśs Ranking: Number of hops to specific node (degrees of separation). Node Score A 2 B 1 C 2 D 0 X 4 Y 4 Z 3 Q 3 Note: Smaller is (usually) better What if âErdĂśsâ is an influential CEO? What if âErdĂśsâ has bird flu? ErdĂśs
- 14. Network Analysis â Crash Course â˘âŻLimitations: â˘âŻOnly considered undirected networks (directed is more complicated) â˘âŻTreated all edges as equal. Many networks have a weight or cost associated to edges (e.g. distance) â˘âŻTreated all nodes as equal. A nodeâs importance may be inherent based on attributes separate from its position in network (e.g. dating sites)
- 15. Network Analysis â Crash Course â˘âŻResiliency (removing nodes/links): â˘âŻTarget nodes based on their âimportanceâ â˘âŻHigh degree nodes more likely to affect local communities â˘âŻHigh betweeness/Eigenvector nodes more likely to fragment communities
- 16. Gephi Introduction â˘âŻ Platform for visualizing and analyzing networks â˘âŻ https://gephi.org/ â˘âŻ Cross-platform â˘âŻ Plugin model
- 17. Facebook Dataset â˘âŻ Download your data (gml) â˘âŻ http://snacourse.com/getnet/ â˘âŻ Import into Gephi â˘âŻ File -> Open -> Select downloaded .gml file â˘âŻ Choose âundirectedâ for âGraph Typeâ
- 18. Layout Layout -> Fruchterman Reingold
- 19. Partitioning Communities 1.âŻ Statistic -> Modularity -> Run (use defaults) 2.âŻ Partition -> Nodes (refresh) -> Modularity class -> Apply
- 20. Degree Distribution 1.âŻ Statistic -> Average Degree -> Run 2.âŻ Partition -> Nodes (refresh) -> Modularity class -> Apply Lots of nodes with few connections Only a few with a large number of connections Power law distribution?
- 21. Node Ranking by Degree 1.âŻ Ranking -> Nodes (refresh) -> Degree -> Apply (try tweaking min/max size and Spline for desired emphasis)
- 22. Filtering Isolated Nodes (ânoiseâ) 1.âŻ Statistics -> Connected Components -> Run 2.âŻ Filters -> Attributes -> Partition Count -> Component ID 3.âŻ Drag âComponent IDâ down into âQueriesâ section 4.âŻ Click on âPartition Countâ, slide the settings bar, and click âFilterâ â adjust to remove isolated nodes Can be important step when dealing with very large data sets. Depending on degree distribution, filter can be set quite high.
- 23. Re-adjust after Filtering â˘âŻNeed to re-run previous steps to refresh calculated values now that filtering has been done. â˘âŻStatistics -> Average degree, modularity, connected components â˘âŻ How did these numbers change? â˘âŻRe-partition node color by modularity class now that modularity has been recalculated â˘âŻRun Fruchterman Reingold layout again to fill space left over from filtered nodes
- 24. Have you saved yet!?
- 25. Node Ranking by Centrality 1.âŻ Statistics -> Network Diameter -> Run 2.âŻ Ranking -> Betweeness Centrality -> Apply
- 26. ErdĂśs Number â˘âŻ You may have noticed a key node which both has the highest degree and betweeness ranking. â˘âŻ Click on the âEditâ button and select that node (note the name) â˘âŻ Statistics -> ErdĂśs Number -> Select that name -> OK â˘âŻ What will happen if you select a less conspicuous node?
- 27. Data Lab â˘âŻ Go to âData Laboratoryâ â˘âŻ All node information as well as calculated statistics appear here in a spreadsheet. â˘âŻ Sort by âErdĂśs Numberâ (descending) â˘âŻ What is the largest ErdĂśs Number? N degrees of ________ . â˘âŻ Try sorting by other values (degree, closeness, betweeness) Max is 7 degrees of separation
- 28. Node Ranking by Eigenvector Centrality 1.âŻ Statistics -> Eigenvector Centrality -> Run 2.âŻ Ranking -> Eigenvector Centrality -> Apply
- 29. Node Ranking by PageRank 1.âŻ Statistics -> PageRank -> Run 2.âŻ Ranking -> PageRank -> Apply
- 30. Export to Image â˘âŻ Go to âPreviewâ mode â˘âŻ Click âRefreshâ to see what you have now â˘âŻ Add node labels â˘âŻ âNode Labelsâ -> âShow Labelsâ â˘âŻ Adjust font size to avoid label overlapping â˘âŻ If Node Labels are overlapping, try expanding layout â˘âŻ Back to âOverviewâ -> Layout -> Fruchterman Reingold â˘âŻ Increase the âAreaâ parameter and re-run the layout â˘âŻ Then go back to âPreviewâ mode and click âRefreshâ â˘âŻ May need to re-adjust Node Label text size â˘âŻ Experiment with âCurvedâ edges
- 31. labels omitted in slidedeck for privacy
- 32. Before we attack the network, save!
- 33. Network Resiliency â˘âŻ How can we fragment the network or increase the separation between nodes? â˘âŻ Which nodes, if removed/influenced, would most greatly impact the network? â˘âŻ What information have we learned already that could be used?
- 34. Network Resiliency â˘âŻ Go to âData Laboratoryâ -> sort by âPageRank descending â˘âŻ Select top 5 rows and delete them (did you save first!!!) â˘âŻ Note their names â Are these people influential in your life? sort Top 5
- 35. Network Resiliency â˘âŻ Go back to statistics and note the following: â˘âŻ Average Degree, Network Diameter, Modularity, Connected Components, Average Path Length â˘âŻ Also note how the network visually has changed â˘âŻ Re-run the statistics above and note how the numbers changed â˘âŻ Did you successfully fragment the network (did # of connected components increase)? (disrupting communications) â˘âŻ How many nodes do you think youâd have to remove if you removed by lowest PageRank scores first? (robustness of network) â˘âŻ What if links represented load distributed across network? How would the network load change after removing these key nodes? (cascading failure)
- 36. Review â˘âŻ Network Analysis â Crash Course â˘âŻ Degree â˘âŻ Components â˘âŻ Modularity â˘âŻ Ranking â˘âŻ Resiliency â˘âŻ Gephi â Intro â˘âŻ Loading data (Facebook) â˘âŻ Navigation â˘âŻ Statistics â˘âŻ Exporting â˘âŻ Filtering â˘âŻ Resiliency
- 37. Questions?