SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 30 day free trial to unlock unlimited reading.
A high-level overview of social network analysis using gephi with your exported Facebook friends network. See more network analysis at http://allthingsgraphed.com.
1.
SOCIAL NETWORK
ANALYSIS
Caleb Jones
{
“email” : “calebjones@gmail.com”,
“website” : “http://calebjones.info”,
“twitter” : “@JonesWCaleb”
}
3.
Resources
SNA Coursera Course
(next being taught October 2013)
Linked by
Albert-László Barabási
4.
Network Analysis – Crash Course
• Degree (n): The number of connections a node has.
• Node A has in-degree 3 and out-degree 1
• Node B has degree 4
A
B
5.
Network Analysis – Crash Course
• Component (n): A a maximally connected subgraph
(undirected).
• Giant component is largest component
component (giant) component
Graph with nodes { A, B, C, X, Y, Z }
6.
Network Analysis – Crash Course
• Modularity (n) ~ Division of a graph into communities
(modules/classes/cliques) with dense interconnection with
the network having relatively sparse interconnection
between communities.
Community 1 Community 2
Graph with nodes { A, B, C, X, Y, Z }
7.
Network Analysis – Crash Course
• Ranking: A measure of a node’s
“importance”
• Many different methods for determining
“importance”
• Degree, Centrality, Closeness, Betweenness,
Eigenvector, HITS, PageRank, Erdös Number
• Which one to consider depends on the
question being asked
• Precursor to identifying network resilience,
diffusion, and vulnerability
8.
Network Analysis – Crash Course
• Degree ranking: Quantity over quality
Node Score
A 3
B 3
C 1
D 1
X 1
Y 1
Z 3
Q 1
9.
Network Analysis – Crash Course
• Betweeness Ranking: How frequently a
node appears on shortest paths.
Node Score
A 15
B 11
C 0
D 0
X 0
Y 0
Z 11
Q 0
10.
Network Analysis – Crash Course
• Closeness Ranking: Average number of
hops from a node to rest of network.
Node Score
A 1.571
B 1.857
C 2.714
D 2.714
X 2.714
Y 2.714
Z 1.857
Q 2.429
Note: Smaller is (usually) better
11.
Network Analysis – Crash Course
• Eigenvector Ranking: A node’s “influence”
on the network (accounts for who you know)
Node Score
A 1
B 0.836
C 0.392
D 0.392
X 0.392
Y 0.392
Z 0.836
Q 0.465
Google’s PageRank is a variant of this
Based on eigenvector of adjacency matrix
12.
Network Analysis – Crash Course
• Erdös Ranking: Number of hops to
specific node (degrees of separation).
Node Score
A 0
B 1
C 2
D 2
X 2
Y 2
Z 1
Q 1
Note: Smaller is (usually) better
What if “Erdös” is an influential CEO?
What if “Erdös” has bird flu?
Erdös
13.
Network Analysis – Crash Course
• Erdös Ranking: Number of hops to
specific node (degrees of separation).
Node Score
A 2
B 1
C 2
D 0
X 4
Y 4
Z 3
Q 3
Note: Smaller is (usually) better
What if “Erdös” is an influential CEO?
What if “Erdös” has bird flu?
Erdös
14.
Network Analysis – Crash Course
• Limitations:
• Only considered undirected networks (directed
is more complicated)
• Treated all edges as equal. Many networks
have a weight or cost associated to edges (e.g.
distance)
• Treated all nodes as equal. A node’s importance
may be inherent based on attributes separate
from its position in network (e.g. dating sites)
15.
Network Analysis – Crash Course
• Resiliency (removing nodes/links):
• Target nodes based on their “importance”
• High degree nodes more likely to affect
local communities
• High betweeness/Eigenvector nodes
more likely to fragment communities
16.
Gephi Introduction
• Platform for visualizing and analyzing networks
• https://gephi.org/
• Cross-platform
• Plugin model
17.
Facebook Dataset
• Download your data (gml)
• http://snacourse.com/getnet/
• Import into Gephi
• File -> Open -> Select downloaded
.gml file
• Choose “undirected”
for “Graph Type”
20.
Degree Distribution
1. Statistic -> Average Degree -> Run
2. Partition -> Nodes (refresh) -> Modularity class -> Apply
Lots of nodes with
few connections
Only a few with a large
number of connections
Power law distribution?
21.
Node Ranking by Degree
1. Ranking -> Nodes (refresh) -> Degree -> Apply
(try tweaking min/max size and Spline for desired emphasis)
22.
Filtering Isolated Nodes (“noise”)
1. Statistics -> Connected
Components -> Run
2. Filters -> Attributes -> Partition
Count -> Component ID
3. Drag “Component ID” down into
“Queries” section
4. Click on “Partition Count”, slide the
settings bar, and click “Filter” –
adjust to remove isolated nodes
Can be important step when dealing with very
large data sets. Depending on degree
distribution, filter can be set quite high.
23.
Re-adjust after Filtering
• Need to re-run previous steps to refresh
calculated values now that filtering has been
done.
• Statistics -> Average degree, modularity,
connected components
• How did these numbers change?
• Re-partition node color by modularity class now
that modularity has been recalculated
• Run Fruchterman Reingold layout again to fill
space left over from filtered nodes
25.
Node Ranking by Centrality
1. Statistics -> Network Diameter -> Run
2. Ranking -> Betweeness Centrality -> Apply
26.
Erdös Number
• You may have noticed a key node which both has the
highest degree and betweeness ranking.
• Click on the “Edit” button and select that node
(note the name)
• Statistics -> Erdös Number -> Select that name -> OK
• What will happen if you select a less conspicuous node?
27.
Data Lab
• Go to “Data Laboratory”
• All node information as well as calculated statistics appear
here in a spreadsheet.
• Sort by “Erdös Number” (descending)
• What is the largest Erdös Number? N degrees of ________ .
• Try sorting by other values (degree, closeness, betweeness)
Max is 7 degrees
of separation
28.
Node Ranking by Eigenvector Centrality
1. Statistics -> Eigenvector Centrality -> Run
2. Ranking -> Eigenvector Centrality -> Apply
29.
Node Ranking by PageRank
1. Statistics -> PageRank -> Run
2. Ranking -> PageRank -> Apply
30.
Export to Image
• Go to “Preview” mode
• Click “Refresh” to see what you have now
• Add node labels
• “Node Labels” -> “Show Labels”
• Adjust font size to avoid label overlapping
• If Node Labels are overlapping, try expanding layout
• Back to “Overview” -> Layout -> Fruchterman Reingold
• Increase the “Area” parameter and re-run the layout
• Then go back to “Preview” mode and click “Refresh”
• May need to re-adjust Node Label text size
• Experiment with “Curved” edges
33.
Network Resiliency
• How can we fragment the network or increase the
separation between nodes?
• Which nodes, if removed/influenced, would most greatly
impact the network?
• What information have we learned already that could be
used?
34.
Network Resiliency
• Go to “Data Laboratory” -> sort by “PageRank descending
• Select top 5 rows and delete them (did you save first!!!)
• Note their names – Are these people influential in your life? sort
Top 5
35.
Network Resiliency
• Go back to statistics and note the following:
• Average Degree, Network Diameter, Modularity, Connected
Components, Average Path Length
• Also note how the network visually has changed
• Re-run the statistics above and note how the numbers
changed
• Did you successfully fragment the network (did # of connected
components increase)? (disrupting communications)
• How many nodes do you think you’d have to remove if you
removed by lowest PageRank scores first? (robustness of network)
• What if links represented load distributed across network? How
would the network load change after removing these key nodes?
(cascading failure)
A high-level overview of social network analysis using gephi with your exported Facebook friends network. See more network analysis at http://allthingsgraphed.com.
1.
SOCIAL NETWORK
ANALYSIS
Caleb Jones
{
“email” : “calebjones@gmail.com”,
“website” : “http://calebjones.info”,
“twitter” : “@JonesWCaleb”
}
3.
Resources
SNA Coursera Course
(next being taught October 2013)
Linked by
Albert-László Barabási
4.
Network Analysis – Crash Course
• Degree (n): The number of connections a node has.
• Node A has in-degree 3 and out-degree 1
• Node B has degree 4
A
B
5.
Network Analysis – Crash Course
• Component (n): A a maximally connected subgraph
(undirected).
• Giant component is largest component
component (giant) component
Graph with nodes { A, B, C, X, Y, Z }
6.
Network Analysis – Crash Course
• Modularity (n) ~ Division of a graph into communities
(modules/classes/cliques) with dense interconnection with
the network having relatively sparse interconnection
between communities.
Community 1 Community 2
Graph with nodes { A, B, C, X, Y, Z }
7.
Network Analysis – Crash Course
• Ranking: A measure of a node’s
“importance”
• Many different methods for determining
“importance”
• Degree, Centrality, Closeness, Betweenness,
Eigenvector, HITS, PageRank, Erdös Number
• Which one to consider depends on the
question being asked
• Precursor to identifying network resilience,
diffusion, and vulnerability
8.
Network Analysis – Crash Course
• Degree ranking: Quantity over quality
Node Score
A 3
B 3
C 1
D 1
X 1
Y 1
Z 3
Q 1
9.
Network Analysis – Crash Course
• Betweeness Ranking: How frequently a
node appears on shortest paths.
Node Score
A 15
B 11
C 0
D 0
X 0
Y 0
Z 11
Q 0
10.
Network Analysis – Crash Course
• Closeness Ranking: Average number of
hops from a node to rest of network.
Node Score
A 1.571
B 1.857
C 2.714
D 2.714
X 2.714
Y 2.714
Z 1.857
Q 2.429
Note: Smaller is (usually) better
11.
Network Analysis – Crash Course
• Eigenvector Ranking: A node’s “influence”
on the network (accounts for who you know)
Node Score
A 1
B 0.836
C 0.392
D 0.392
X 0.392
Y 0.392
Z 0.836
Q 0.465
Google’s PageRank is a variant of this
Based on eigenvector of adjacency matrix
12.
Network Analysis – Crash Course
• Erdös Ranking: Number of hops to
specific node (degrees of separation).
Node Score
A 0
B 1
C 2
D 2
X 2
Y 2
Z 1
Q 1
Note: Smaller is (usually) better
What if “Erdös” is an influential CEO?
What if “Erdös” has bird flu?
Erdös
13.
Network Analysis – Crash Course
• Erdös Ranking: Number of hops to
specific node (degrees of separation).
Node Score
A 2
B 1
C 2
D 0
X 4
Y 4
Z 3
Q 3
Note: Smaller is (usually) better
What if “Erdös” is an influential CEO?
What if “Erdös” has bird flu?
Erdös
14.
Network Analysis – Crash Course
• Limitations:
• Only considered undirected networks (directed
is more complicated)
• Treated all edges as equal. Many networks
have a weight or cost associated to edges (e.g.
distance)
• Treated all nodes as equal. A node’s importance
may be inherent based on attributes separate
from its position in network (e.g. dating sites)
15.
Network Analysis – Crash Course
• Resiliency (removing nodes/links):
• Target nodes based on their “importance”
• High degree nodes more likely to affect
local communities
• High betweeness/Eigenvector nodes
more likely to fragment communities
16.
Gephi Introduction
• Platform for visualizing and analyzing networks
• https://gephi.org/
• Cross-platform
• Plugin model
17.
Facebook Dataset
• Download your data (gml)
• http://snacourse.com/getnet/
• Import into Gephi
• File -> Open -> Select downloaded
.gml file
• Choose “undirected”
for “Graph Type”
20.
Degree Distribution
1. Statistic -> Average Degree -> Run
2. Partition -> Nodes (refresh) -> Modularity class -> Apply
Lots of nodes with
few connections
Only a few with a large
number of connections
Power law distribution?
21.
Node Ranking by Degree
1. Ranking -> Nodes (refresh) -> Degree -> Apply
(try tweaking min/max size and Spline for desired emphasis)
22.
Filtering Isolated Nodes (“noise”)
1. Statistics -> Connected
Components -> Run
2. Filters -> Attributes -> Partition
Count -> Component ID
3. Drag “Component ID” down into
“Queries” section
4. Click on “Partition Count”, slide the
settings bar, and click “Filter” –
adjust to remove isolated nodes
Can be important step when dealing with very
large data sets. Depending on degree
distribution, filter can be set quite high.
23.
Re-adjust after Filtering
• Need to re-run previous steps to refresh
calculated values now that filtering has been
done.
• Statistics -> Average degree, modularity,
connected components
• How did these numbers change?
• Re-partition node color by modularity class now
that modularity has been recalculated
• Run Fruchterman Reingold layout again to fill
space left over from filtered nodes
25.
Node Ranking by Centrality
1. Statistics -> Network Diameter -> Run
2. Ranking -> Betweeness Centrality -> Apply
26.
Erdös Number
• You may have noticed a key node which both has the
highest degree and betweeness ranking.
• Click on the “Edit” button and select that node
(note the name)
• Statistics -> Erdös Number -> Select that name -> OK
• What will happen if you select a less conspicuous node?
27.
Data Lab
• Go to “Data Laboratory”
• All node information as well as calculated statistics appear
here in a spreadsheet.
• Sort by “Erdös Number” (descending)
• What is the largest Erdös Number? N degrees of ________ .
• Try sorting by other values (degree, closeness, betweeness)
Max is 7 degrees
of separation
28.
Node Ranking by Eigenvector Centrality
1. Statistics -> Eigenvector Centrality -> Run
2. Ranking -> Eigenvector Centrality -> Apply
29.
Node Ranking by PageRank
1. Statistics -> PageRank -> Run
2. Ranking -> PageRank -> Apply
30.
Export to Image
• Go to “Preview” mode
• Click “Refresh” to see what you have now
• Add node labels
• “Node Labels” -> “Show Labels”
• Adjust font size to avoid label overlapping
• If Node Labels are overlapping, try expanding layout
• Back to “Overview” -> Layout -> Fruchterman Reingold
• Increase the “Area” parameter and re-run the layout
• Then go back to “Preview” mode and click “Refresh”
• May need to re-adjust Node Label text size
• Experiment with “Curved” edges
33.
Network Resiliency
• How can we fragment the network or increase the
separation between nodes?
• Which nodes, if removed/influenced, would most greatly
impact the network?
• What information have we learned already that could be
used?
34.
Network Resiliency
• Go to “Data Laboratory” -> sort by “PageRank descending
• Select top 5 rows and delete them (did you save first!!!)
• Note their names – Are these people influential in your life? sort
Top 5
35.
Network Resiliency
• Go back to statistics and note the following:
• Average Degree, Network Diameter, Modularity, Connected
Components, Average Path Length
• Also note how the network visually has changed
• Re-run the statistics above and note how the numbers
changed
• Did you successfully fragment the network (did # of connected
components increase)? (disrupting communications)
• How many nodes do you think you’d have to remove if you
removed by lowest PageRank scores first? (robustness of network)
• What if links represented load distributed across network? How
would the network load change after removing these key nodes?
(cascading failure)