Network Analysis
Francisco Restivo
fjr@fe.up.pt
http://orcid.org/0000-0002-6173-082X
DIME 2014, FEUP, 5-12-2014
Agenda
• Networks are everywhere
• Social, biological, financial, etc
• Complex networks
• Influential nodes & communities
• Who’s who
• Software tools
• Some project ideas
2DIME 2014, FEUP, 5-12-2014
Networks that changed the world
DIME 2014, FEUP, 5-12-2014 3
More…
DIME 2014, FEUP, 5-12-2014 4
Data, information and knowledge
DIME 2014, FEUP, 5-12-2014 5
http://www.knowledge-management-tools.net/knowledge-information-data.html
DIME 2014, FEUP, 5-12-2014 6
Retweet Network – #DE2012
raminetinati.wordpress.com
DIME 2014, FEUP, 5-12-2014 7
ProteinNetwork
http://intbio.ncl.ac.uk/?people=dr-katherine-james
DIME 2014, FEUP, 5-12-2014 8
Networkviewofcross-borderbankingin2007
www.fna.fi
Networks
• Vertexes (nodes, actors)
– properties (attributes)
• Edges (relations)
– directed/undirected
– properties (attributes)
• Density
• Excentricity
• Diâmeter
DIME 2014, FEUP, 5-12-2014 9
9/21=0.43
3
2 3
2
3 3
2
3
Adjacency matrix
DIME 2014, FEUP, 5-12-2014 10
01001
10101
01011
00101
11110
A
00000
10000
01000
00100
11110
A 21 
V={Po, Lx, Ma, Pa, Be}
Influential nodes
• Degree (in, out)
• Clustering coefficient
• Centrality (degree, closeness, betweenness)
• PageRank
• etc
11DIME 2014, FEUP, 5-12-2014
http://drunksandlampposts.files.wordpress.com/2012/06/philprettyv4.png?
Understanding node metrics
Label Degree
Betweenness
Centrality
Closeness
Centrality PageRank
Clustering
Coefficient
Por 3 2.000 0.100 1.129 0.333
Lis 2 0.000 0.083 0.791 1.000
Mad 3 3.500 0.111 1.132 0.333
Par 4 7.000 0.125 1.537 0.167
Ber 2 0.000 0.091 0.798 1.000
Rom 1 0.000 0.077 0.477 0.000
Lon 3 2.500 0.111 1.136 0.333
DIME 2014, FEUP, 5-12-2014 12
1 – Lis – Par
1 – Lis – Rom
0.5 – Lis – Ber
0.5 – Por – Par
0.5 – Por – Rom
Understanding networks
• Formation
• Modularity
• Communities
• Network dynamics
• etc
DIME 2014, FEUP, 5-12-2014 13
http://www.freerepublic.com/focus/news/1327834/posts
Degree distribution
DIME 2014, FEUP, 5-12-2014 14
0.00
0.05
0.10
0.15
0.20
0 1 2 3 4 5 6 7 8 9 10
Community Detection
• Communities and clusters are different
• Network data is related to graph properties
• Real world means big data
DIME 2014, FEUP, 5-12-2014 15
Modularity
• Compares number of edges with number of
edges of a random network
• Maximize Q is NP-hard
DIME 2014, FEUP, 5-12-2014 16




 



 

j
g,
i
g
ij
ij
P
ij
A
m2
1
Q
m2
j
k
i
k
ij
P
Clauset-Newman-Moore
A hierarchical agglomeration algorithm for detecting community
structure which is faster than many competing algorithms.
Its running time on a network with n vertices and m edges is
O(md log n) where d is the depth of the dendrogram describing the
community structure.
DIME 2014, FEUP, 5-12-2014 17
NodeXL
Wakita-Tsurumi
CNM algorithm does not scale well and its use is practically limited to
networks whose sizes are up to 500,000 nodes.
A simple heuristics that attempts to merge community structures in a
balanced manner can dramatically improve community structure
analysis.
DIME 2014, FEUP, 5-12-2014 18
NodeXL
Girvan-Newman
A property that is found in many networks, the property of community
structure, in which network nodes are joined together in tightly knit
groups, between which there are only looser connections.
We propose a method for detecting such communities, built around
the idea of using centrality indices to find community boundaries.
DIME 2014, FEUP, 5-12-2014 19
NodeXL
Chinese Whispers [Biemann]
• a
Randomized graph-clustering algorithm, which is time-linear in the
number of edges.
It can be viewed as a simulation of an agent-based social network.
DIME 2014, FEUP, 5-12-2014 20
Gephi plugin
DIME 2014, FEUP, 5-12-2014 21
DIME 2014, FEUP, 5-12-2014 22
DIME 2014, FEUP, 5-12-2014 23
DIME 2014, FEUP, 5-12-2014 24
DIME 2014, FEUP, 5-12-2014 25
Figure 5. Map of science derived from clickstream data.
Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, et al. (2009) Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3):
e4803. doi:10.1371/journal.pone.0004803
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0004803
DIME 2014, FEUP, 5-12-2014 27
DIME 2014, FEUP, 5-12-2014 28
DIME 2014, FEUP, 5-12-2014 29
How to generate journal insights
using visualization techniques
Software Tools
• NodeXL
• Gephi
• D3.js (JavaScript)
• NetworkX (Python)
• Netlogo
• etc
DIME 2014, FEUP, 5-12-2014 30
http://gephi.github.io/features/
DIME 2014, FEUP, 5-12-2014 31
DIME 2014, FEUP, 5-12-2014 32
DIME 2014, FEUP, 5-12-2014 33
DIME 2014, FEUP, 5-12-2014 34
DIME 2014, FEUP, 5-12-2014 35
DIME 2014, FEUP, 5-12-2014 36
DIME 2014, FEUP, 5-12-2014 37
Datasets
• netvizz
• I keep my collection here
https://sites.google.com/site/frestivo/networked-life/databases
• There is another in Quora
Where can I find large datasets open to the public?
DIME 2014, FEUP, 5-12-2014 38
netvizz
DIME 2014, FEUP, 5-12-2014 39
DIME 2014, FEUP, 5-12-2014 40
netvizz > gephi > NodeXL
0
50
100
150
200
250
Frequency
Degree
Minimum Degree 0
Maximum Degree 237
Average Degree 20.012
Median Degree 14.000
DIME 2014, FEUP, 5-12-2014 41
DIME 2014, FEUP, 5-12-2014 42
DIME 2014, FEUP, 5-12-2014 43
Digital footprint…
DIME 2014, FEUP, 5-12-2014 44
DIME 2014, FEUP, 5-12-2014 45
DIME 2014, FEUP, 5-12-2014 46
Project approach
• Big data set
• Think if communities make sense
• Compare different approaches
• Explain your findings
DIME 2014, FEUP, 5-12-2014 47
Thank you!
DIME 2014, FEUP, 5-12-2014 48

Network analysis

  • 1.
  • 2.
    Agenda • Networks areeverywhere • Social, biological, financial, etc • Complex networks • Influential nodes & communities • Who’s who • Software tools • Some project ideas 2DIME 2014, FEUP, 5-12-2014
  • 3.
    Networks that changedthe world DIME 2014, FEUP, 5-12-2014 3
  • 4.
  • 5.
    Data, information andknowledge DIME 2014, FEUP, 5-12-2014 5 http://www.knowledge-management-tools.net/knowledge-information-data.html
  • 6.
    DIME 2014, FEUP,5-12-2014 6 Retweet Network – #DE2012 raminetinati.wordpress.com
  • 7.
    DIME 2014, FEUP,5-12-2014 7 ProteinNetwork http://intbio.ncl.ac.uk/?people=dr-katherine-james
  • 8.
    DIME 2014, FEUP,5-12-2014 8 Networkviewofcross-borderbankingin2007 www.fna.fi
  • 9.
    Networks • Vertexes (nodes,actors) – properties (attributes) • Edges (relations) – directed/undirected – properties (attributes) • Density • Excentricity • Diâmeter DIME 2014, FEUP, 5-12-2014 9 9/21=0.43 3 2 3 2 3 3 2 3
  • 10.
    Adjacency matrix DIME 2014,FEUP, 5-12-2014 10 01001 10101 01011 00101 11110 A 00000 10000 01000 00100 11110 A 21  V={Po, Lx, Ma, Pa, Be}
  • 11.
    Influential nodes • Degree(in, out) • Clustering coefficient • Centrality (degree, closeness, betweenness) • PageRank • etc 11DIME 2014, FEUP, 5-12-2014 http://drunksandlampposts.files.wordpress.com/2012/06/philprettyv4.png?
  • 12.
    Understanding node metrics LabelDegree Betweenness Centrality Closeness Centrality PageRank Clustering Coefficient Por 3 2.000 0.100 1.129 0.333 Lis 2 0.000 0.083 0.791 1.000 Mad 3 3.500 0.111 1.132 0.333 Par 4 7.000 0.125 1.537 0.167 Ber 2 0.000 0.091 0.798 1.000 Rom 1 0.000 0.077 0.477 0.000 Lon 3 2.500 0.111 1.136 0.333 DIME 2014, FEUP, 5-12-2014 12 1 – Lis – Par 1 – Lis – Rom 0.5 – Lis – Ber 0.5 – Por – Par 0.5 – Por – Rom
  • 13.
    Understanding networks • Formation •Modularity • Communities • Network dynamics • etc DIME 2014, FEUP, 5-12-2014 13 http://www.freerepublic.com/focus/news/1327834/posts
  • 14.
    Degree distribution DIME 2014,FEUP, 5-12-2014 14 0.00 0.05 0.10 0.15 0.20 0 1 2 3 4 5 6 7 8 9 10
  • 15.
    Community Detection • Communitiesand clusters are different • Network data is related to graph properties • Real world means big data DIME 2014, FEUP, 5-12-2014 15
  • 16.
    Modularity • Compares numberof edges with number of edges of a random network • Maximize Q is NP-hard DIME 2014, FEUP, 5-12-2014 16             j g, i g ij ij P ij A m2 1 Q m2 j k i k ij P
  • 17.
    Clauset-Newman-Moore A hierarchical agglomerationalgorithm for detecting community structure which is faster than many competing algorithms. Its running time on a network with n vertices and m edges is O(md log n) where d is the depth of the dendrogram describing the community structure. DIME 2014, FEUP, 5-12-2014 17 NodeXL
  • 18.
    Wakita-Tsurumi CNM algorithm doesnot scale well and its use is practically limited to networks whose sizes are up to 500,000 nodes. A simple heuristics that attempts to merge community structures in a balanced manner can dramatically improve community structure analysis. DIME 2014, FEUP, 5-12-2014 18 NodeXL
  • 19.
    Girvan-Newman A property thatis found in many networks, the property of community structure, in which network nodes are joined together in tightly knit groups, between which there are only looser connections. We propose a method for detecting such communities, built around the idea of using centrality indices to find community boundaries. DIME 2014, FEUP, 5-12-2014 19 NodeXL
  • 20.
    Chinese Whispers [Biemann] •a Randomized graph-clustering algorithm, which is time-linear in the number of edges. It can be viewed as a simulation of an agent-based social network. DIME 2014, FEUP, 5-12-2014 20 Gephi plugin
  • 21.
    DIME 2014, FEUP,5-12-2014 21
  • 22.
    DIME 2014, FEUP,5-12-2014 22
  • 23.
    DIME 2014, FEUP,5-12-2014 23
  • 24.
    DIME 2014, FEUP,5-12-2014 24
  • 25.
    DIME 2014, FEUP,5-12-2014 25
  • 26.
    Figure 5. Mapof science derived from clickstream data. Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, et al. (2009) Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. doi:10.1371/journal.pone.0004803 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0004803
  • 27.
    DIME 2014, FEUP,5-12-2014 27
  • 28.
    DIME 2014, FEUP,5-12-2014 28
  • 29.
    DIME 2014, FEUP,5-12-2014 29 How to generate journal insights using visualization techniques
  • 30.
    Software Tools • NodeXL •Gephi • D3.js (JavaScript) • NetworkX (Python) • Netlogo • etc DIME 2014, FEUP, 5-12-2014 30 http://gephi.github.io/features/
  • 31.
    DIME 2014, FEUP,5-12-2014 31
  • 32.
    DIME 2014, FEUP,5-12-2014 32
  • 33.
    DIME 2014, FEUP,5-12-2014 33
  • 34.
    DIME 2014, FEUP,5-12-2014 34
  • 35.
    DIME 2014, FEUP,5-12-2014 35
  • 36.
    DIME 2014, FEUP,5-12-2014 36
  • 37.
    DIME 2014, FEUP,5-12-2014 37
  • 38.
    Datasets • netvizz • Ikeep my collection here https://sites.google.com/site/frestivo/networked-life/databases • There is another in Quora Where can I find large datasets open to the public? DIME 2014, FEUP, 5-12-2014 38
  • 39.
  • 40.
    DIME 2014, FEUP,5-12-2014 40 netvizz > gephi > NodeXL 0 50 100 150 200 250 Frequency Degree Minimum Degree 0 Maximum Degree 237 Average Degree 20.012 Median Degree 14.000
  • 41.
    DIME 2014, FEUP,5-12-2014 41
  • 42.
    DIME 2014, FEUP,5-12-2014 42
  • 43.
    DIME 2014, FEUP,5-12-2014 43 Digital footprint…
  • 44.
    DIME 2014, FEUP,5-12-2014 44
  • 45.
    DIME 2014, FEUP,5-12-2014 45
  • 46.
    DIME 2014, FEUP,5-12-2014 46
  • 47.
    Project approach • Bigdata set • Think if communities make sense • Compare different approaches • Explain your findings DIME 2014, FEUP, 5-12-2014 47
  • 48.
    Thank you! DIME 2014,FEUP, 5-12-2014 48