A high-level overview of social network analysis using gephi with your exported Facebook friends network. See more network analysis at http://allthingsgraphed.com.
Network Analysis – Crash Course
• Degree (n): The number of connections a node has.
• Node A has in-degree 3 and out-degree 1
• Node B has degree 4
A
B
Network Analysis – Crash Course
• Component (n): A a maximally connected subgraph
(undirected).
• Giant component is largest component
component (giant) component
Graph with nodes { A, B, C, X, Y, Z }
Network Analysis – Crash Course
• Modularity (n) ~ Division of a graph into communities
(modules/classes/cliques) with dense interconnection with
the network having relatively sparse interconnection
between communities.
Community 1 Community 2
Graph with nodes { A, B, C, X, Y, Z }
Network Analysis – Crash Course
• Ranking: A measure of a node’s
“importance”
• Many different methods for determining
“importance”
• Degree, Centrality, Closeness, Betweenness,
Eigenvector, HITS, PageRank, Erdös Number
• Which one to consider depends on the
question being asked
• Precursor to identifying network resilience,
diffusion, and vulnerability
Network Analysis – Crash Course
• Degree ranking: Quantity over quality
Node Score
A 3
B 3
C 1
D 1
X 1
Y 1
Z 3
Q 1
Network Analysis – Crash Course
• Betweeness Ranking: How frequently a
node appears on shortest paths.
Node Score
A 15
B 11
C 0
D 0
X 0
Y 0
Z 11
Q 0
Network Analysis – Crash Course
• Closeness Ranking: Average number of
hops from a node to rest of network.
Node Score
A 1.571
B 1.857
C 2.714
D 2.714
X 2.714
Y 2.714
Z 1.857
Q 2.429
Note: Smaller is (usually) better
Network Analysis – Crash Course
• Eigenvector Ranking: A node’s “influence”
on the network (accounts for who you know)
Node Score
A 1
B 0.836
C 0.392
D 0.392
X 0.392
Y 0.392
Z 0.836
Q 0.465
Google’s PageRank is a variant of this
Based on eigenvector of adjacency matrix
Network Analysis – Crash Course
• Erdös Ranking: Number of hops to
specific node (degrees of separation).
Node Score
A 0
B 1
C 2
D 2
X 2
Y 2
Z 1
Q 1
Note: Smaller is (usually) better
What if “Erdös” is an influential CEO?
What if “Erdös” has bird flu?
Erdös
Network Analysis – Crash Course
• Erdös Ranking: Number of hops to
specific node (degrees of separation).
Node Score
A 2
B 1
C 2
D 0
X 4
Y 4
Z 3
Q 3
Note: Smaller is (usually) better
What if “Erdös” is an influential CEO?
What if “Erdös” has bird flu?
Erdös
Network Analysis – Crash Course
• Limitations:
• Only considered undirected networks (directed
is more complicated)
• Treated all edges as equal. Many networks
have a weight or cost associated to edges (e.g.
distance)
• Treated all nodes as equal. A node’s importance
may be inherent based on attributes separate
from its position in network (e.g. dating sites)
Network Analysis – Crash Course
• Resiliency (removing nodes/links):
• Target nodes based on their “importance”
• High degree nodes more likely to affect
local communities
• High betweeness/Eigenvector nodes
more likely to fragment communities
Gephi Introduction
• Platform for visualizing and analyzing networks
• https://gephi.org/
• Cross-platform
• Plugin model
Facebook Dataset
• Download your data (gml)
• http://snacourse.com/getnet/
• Import into Gephi
• File -> Open -> Select downloaded
.gml file
• Choose “undirected”
for “Graph Type”
Degree Distribution
1. Statistic -> Average Degree -> Run
2. Partition -> Nodes (refresh) -> Modularity class -> Apply
Lots of nodes with
few connections
Only a few with a large
number of connections
Power law distribution?
Node Ranking by Degree
1. Ranking -> Nodes (refresh) -> Degree -> Apply
(try tweaking min/max size and Spline for desired emphasis)
Filtering Isolated Nodes (“noise”)
1. Statistics -> Connected
Components -> Run
2. Filters -> Attributes -> Partition
Count -> Component ID
3. Drag “Component ID” down into
“Queries” section
4. Click on “Partition Count”, slide the
settings bar, and click “Filter” –
adjust to remove isolated nodes
Can be important step when dealing with very
large data sets. Depending on degree
distribution, filter can be set quite high.
Re-adjust after Filtering
• Need to re-run previous steps to refresh
calculated values now that filtering has been
done.
• Statistics -> Average degree, modularity,
connected components
• How did these numbers change?
• Re-partition node color by modularity class now
that modularity has been recalculated
• Run Fruchterman Reingold layout again to fill
space left over from filtered nodes
Node Ranking by Centrality
1. Statistics -> Network Diameter -> Run
2. Ranking -> Betweeness Centrality -> Apply
Erdös Number
• You may have noticed a key node which both has the
highest degree and betweeness ranking.
• Click on the “Edit” button and select that node
(note the name)
• Statistics -> Erdös Number -> Select that name -> OK
• What will happen if you select a less conspicuous node?
Data Lab
• Go to “Data Laboratory”
• All node information as well as calculated statistics appear
here in a spreadsheet.
• Sort by “Erdös Number” (descending)
• What is the largest Erdös Number? N degrees of ________ .
• Try sorting by other values (degree, closeness, betweeness)
Max is 7 degrees
of separation
Node Ranking by Eigenvector Centrality
1. Statistics -> Eigenvector Centrality -> Run
2. Ranking -> Eigenvector Centrality -> Apply
Node Ranking by PageRank
1. Statistics -> PageRank -> Run
2. Ranking -> PageRank -> Apply
Export to Image
• Go to “Preview” mode
• Click “Refresh” to see what you have now
• Add node labels
• “Node Labels” -> “Show Labels”
• Adjust font size to avoid label overlapping
• If Node Labels are overlapping, try expanding layout
• Back to “Overview” -> Layout -> Fruchterman Reingold
• Increase the “Area” parameter and re-run the layout
• Then go back to “Preview” mode and click “Refresh”
• May need to re-adjust Node Label text size
• Experiment with “Curved” edges
Network Resiliency
• How can we fragment the network or increase the
separation between nodes?
• Which nodes, if removed/influenced, would most greatly
impact the network?
• What information have we learned already that could be
used?
Network Resiliency
• Go to “Data Laboratory” -> sort by “PageRank descending
• Select top 5 rows and delete them (did you save first!!!)
• Note their names – Are these people influential in your life? sort
Top 5
Network Resiliency
• Go back to statistics and note the following:
• Average Degree, Network Diameter, Modularity, Connected
Components, Average Path Length
• Also note how the network visually has changed
• Re-run the statistics above and note how the numbers
changed
• Did you successfully fragment the network (did # of connected
components increase)? (disrupting communications)
• How many nodes do you think you’d have to remove if you
removed by lowest PageRank scores first? (robustness of network)
• What if links represented load distributed across network? How
would the network load change after removing these key nodes?
(cascading failure)