This document analyzes and summarizes a network graph of 105 vertices and 441 edges representing the purchase of political books in the USA. Various centrality metrics are calculated and visualized including degree, betweenness, closeness, and eccentricity centrality. Community detection algorithms are applied including finding graph partitions, communities, and k-cliques. The document concludes by comparing Mathematica and Gephi for network analysis and visualization and discussing lessons learned.
2. Objective: To analyse a graphical
representation of a network
Coral reef
food web,
Cuba
233 vertices
3,753 edges
Data: http://datadryad.org/resource/doi:10.5061/dryad.c213h
3. Objective: To analyse a graphical
representation of a network
CMT111
Students,
Cardiff
26 vertices
30 edges
4. Objective: To analyse a graphical
representation of a network
Purchase of political books,
USA
105 vertices
441 edges
Data courtesy of Valdis Kreb available at: http://www-personal.umich.edu/~mejn/netdata/
5. Limitations
• Data limited: Number of purchases missing from
database, date of purchase, buyer data e.g.way voted
• No weights : Number of buyers who purchased both
books
• Mathematica’s definitions (hubs, k-cliques)
6. Mathematica
• Importing data
• Will not read as wide a variety of CSV formats as Gephi
• Can read .gml, .gv, .dot, .graphml, .gxl, .col, .g6, .s6, .gw, .net, .tgf
• Use Map or create a rule to show links (->) from one column of a CSV onto
another
• Other attributes
• More difficult to show vertex/edge attributes than in Gephi but still ppssiblt to
highlight using HighlightGraph[g, x]
• Built-in functions
• Very intuitive and well documented:
https://reference.wolfram.com/language/guide/GraphsAndNetworks.html
7. Book frequent purchase of both
books (endpoints) -
from Amazon ‘people who
bought this also bought’
NODE LINK
8. This graph is
unweighted: edges do not have associated weights
undirected: all edges travel in both directions
contains loops: no vertex is linked directly to itself
simple: undirected, unweighted, loop-free and lacks multiple edges
incomplete: each vertex is not connected to every other vertex
cyclic: contains at least one cycle
not bipartite: vertices cannot be divided into two disjoint sets
UndirectedGraphQ[books]
WeightedGraphQ[books]
CompleteGraphQ[books]
SimpleGraphQ[books]
BipartiteGraphQ[books]
LoopFreeGraphQ[books]
AcyclicGraphQ[books]
22. Graph communities:
maximises edges joining nodes within communities
with relatively fewer edges joining to nodes in other
communities
HighlightGraph[books, FindGraphCommunities[books]]
23. Centrality - showing bifurcation
partitioning based on edge centrality
CommunityGraphPlot[books, FindGraphCommunities[books, Method -> "Centrality"]]
24. ation - “Uncovering the overlapping community structure of comple
Finds largest k-
cliques first
then reduces k
CommunityGraphPlot[books, FindGraphCommunities[books, Method -> "CliquePercolation"]]
[1] Palla et al., Nature 435, 814-818 (2005)
25. Cliques
Largest set of connected vertices
HighlightGraph[books, Subgraph[books, FindClique[books]]]
26. Cliques
Largest set of connected vertices within 2 edges of each other
HighlightGraph[books, Subgraph[books, FindKClique[books, 2]]]
27. Cliques
Largest set of connected vertices within 3 edges of each other
HighlightGraph[books, Subgraph[books, FindKClique[books, 3]]]
28. Cliques
Largest set of connected vertices within 4 edges of each other
HighlightGraph[books, Subgraph[books, FindKClique[books, 4]]]
29. Cliques
Largest set of connected vertices within 5 edges of each other
HighlightGraph[books, Subgraph[books, FindKClique[books, 5]]]
30. Cliques
Largest set of connected vertices within 6 edges of each other
HighlightGraph[books, Subgraph[books, FindKClique[books, 6]]]
31. Cliques
Largest set of connected vertices within 7 (=diameter)
edges of each other
HighlightGraph[books, Subgraph[books, FindKClique[books, 7]]]
32. Lessons & Conclusions
• Mathematica Vs Gephi on Data Visualization
• Gephi struggles with larger datasets, crashes on OS X, cannot ‘undo’
• Gephi good a pulling apart larger datasets for easier visualisation, takes a wider range of
input formats, can visualise ‘multiple graphs’ more easily
• All the other functions within Mathematica at your disposal to aid network analysis e.g. Plot
• Source of Data Sets
• Working with a dataset of sufficient size but not so big that it cannot be comprehended.
• Analysis of sub-networks