Maps of sparse memory networks reveal overlapping communities in network flows
1. Maps of sparse memory networks reveal
overlapping communities in network flows
Martin Rosvall
Christian Persson, Ludvig Bohlin, and Daniel Edler
Integrated Science Lab, Umeå University, Sweden
Hello. I will talk about how we make maps of flows through complex systems,
and three challenges that have kept us excited working on this.
2. I arrived here thanks to good maps, and I share with several of you the vision to create similarly powerful
maps of complex systems. Imagine Google Maps for networks. Let’s start with the first challenge.
Barabasi, A.L.
Maps of complex systems
3. This is network science until a few years ago: Take a complex system, abstract away all but the undirected,
unweighted links, and identify modules with your favorite community-detection algorithm.
Conventional
network
representation
Conventional
community
detection Two-level and non-overlapping
modular representation
Undirected,
unweighted
network
Raw
data
Complex
system
Conventional
approach
1. Detectability limit
4. For some time, we forgot the underlying complex system, and tried to crank out as much as possible from the network.
Sure enough, there is a limit to how much structure we can detect.
Conventional
network
representation
Conventional
community
detection Two-level and non-overlapping
modular representation
Limited detectability
Undirected,
unweighted
network
Raw
data
Complex
system
Conventional
approach
1. Detectability limit
5. But this is not the limit of how much we can say about the complex system. If we are interested in flows,
we can use higher-order flow modeling, and represent the flows with memory or multilayer networks.
Higher-order
flow modeling
Conventional
network
representation
Conventional
community
detection Two-level and non-overlapping
modular representation
Limited detectability
Undirected,
unweighted
network
Multilayer
memory
network
Raw
data
Complex
system
Conventional
approach
Higher-order
framework
1. Detectability limit
6. And with these representations, there is sufficient information about the flows through the complex system to reveal
multilevel, overlapping modules with a generalized community detection algorithm.
Higher-order
flow modeling
Conventional
network
representation
Conventional
community
detection
Higher-order
flow mapping
Two-level and non-overlapping
modular representation
Multilevel and overlapping
modular representation
Limited detectability
Undirected,
unweighted
network
Multilayer
memory
network
Raw
data
Complex
system
Conventional
approach
Higher-order
framework
1. Detectability limit
7. Higher-order representations break the limit of what we can detect. This takes us to the second challenge:
It looks like we need one algorithm for each representation. And there are many representations.
Unlimited detectability
Higher-order
flow modeling
Conventional
network
representation
Conventional
community
detection
Higher-order
flow mapping
Two-level and non-overlapping
modular representation
Multilevel and overlapping
modular representation
Limited detectability
Undirected,
unweighted
network
Multilayer
memory
network
Raw
data
Complex
system
Conventional
approach
Higher-order
framework
Solution
Higher-order network representations1. Detectability limit
8. Imagine that you are node i in this example, that the solid pathway to the left comes from a Facebook conversation
with your friends, and the dashed pathway to the right from an email conversation with your colleagues.
Higher-order network flows
2. Many representations
9. Memory and multilayer networks model these pathways in different ways. Look at the white links that correspond
to a message that you got from your friend j and forwarded to your friend k.
Memory network
Multilayer network
Higher-order network flows
2. Many representations
10. To capture the higher-order flows, memory networks use state nodes to capture where flows come from,
and multilayer networks use layers to capture the different data sources.
Memory network
Multilayer network
Higher-order network flows
2. Many representations
11. We can lump redundant state nodes in the memory network such that they correspond to nodes in layers. In this way,
both networks can be represented with a sparse memory network in which state nodes are free to represent anything.
Memory network
Multilayer network
Sparse memory network
Higher-order network flows
2. Many representations
12. Sparse memory networks solve the many representation problem. As a result, we can cluster different networks
with a single algorithm. This takes us to the third challenge.
Memory network
Multilayer network
Sparse memory network
Higher-order network flows
2. Many representations2. Many representations
Solution
Sparse memory networks
13. For real systems when state nodes are not completely redundant, which sparse memory network should we use?
Here an example with citation flows from journals in two fields through multidisciplinary PNAS.
3. Scale and model selection
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Microbiology
Plant science
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Information loss (bits)
Two-module
compression (bits)
0.0
0.43
14. We identify two fields overlapping in PNAS, because the flows stay within those fields. Lumping together the state nodes
for the microbiology journals destroys very little information. Still two fields.
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Microbiology
Plant science
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Information loss (bits)
Two-module
compression (bits)
0.0
0.43
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
0.003
0.43
3. Scale and model selection
15. In the same way, lumping together the state node for the plant science journals barely affects the flows. Still two fields.
Not so we if continue and lump together the remaining state nodes.
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Microbiology
Plant science
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Information loss (bits)
Two-module
compression (bits)
0.0
0.43
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
0.003
0.43
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
0.006
0.43
3. Scale and model selection
16. Only one field – we are underfitting. To balance under- and overfitting, we lump states nodes based on
minimal information loss and perform cross validation on the clustering.
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Microbiology
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Microbiology
Plant science
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Information loss (bits)
Two-module
compression (bits)
0.890
-0.64
0.006
0.43
0.0
0.43
0.003
0.43
3. Scale and model selection
Solution
State lumping and cross-validation
17. Let me finish with a larger example, a map of multistep citation pathways modeled with the best sparse memory network
and clustered with Infomap. Green circles for different fields and red circles for PNAS clustered in multiple fields.
Life sciences
Molecular biology
Medicine
Neuroscience
Cardiology
Immunology
Clinical microbiology
Neurology
Plant
science
Nutrition
Psychiatry
Physical sciences
Chemistry
Material physics
Condensed
matter
physics
Analytic chemistry
Chemical physics
Macro molecules
Environmental
chemistry
Biomaterials
Astrophysics &
particle physics
Astrophysics
Particle physics
Nuclear physics
Solar system
Space weather
Instrumentation
Geometric physics
Earth sciences & ecology
Ecology & evolution
Geophysics
Geology
Marine ecology
Geoscience
Systematics
Global change
Hydrology
Mathematics &
computer science Mathematical physics
Nonlinear science
Information theory
Applied analysis
Computer science
Operations research
Numerical methods
Pattern recogition
Social sciences
Economics
Management
Finance
Geography
Sociology
Political science
Marketing
Information science
Microbiology
Flow volume in research field
Flow volume in PNAS
18. That is, several fields overlap in PNAS, because the flows through PNAS in different fields look very different.
On the other hand, fields do not overlap in specialized journals, because a single field contains all flows.
+++
Life sciences
Molecular biology
Medicine
Neuroscience
Cardiology
Immunology
Clinical microbiology
Neurology
Plant
science
Nutrition
Psychiatry
S
Earth sciences & ecolo
Ecology & evolution
Geophysics
Geology
Marine ecology
Geoscience
Systematics
Global change
Hydrology
Microbiology
Life sciences
Molecular biology
Medicine
Neuroscience
Cardiology
Immunology
Clinical microbiology
Neurology
Plant
science
Nutrition
Psychiatry
Physical sciences
Chemistry
Material physics
Condensed
matter
physics
Analytic chemistry
Chemical physics
Macro molecules
Environmental
chemistry
Biomaterials
Astrophysics &
particle physics
Astrophysics
Particle physics
Nuclear physics
Solar system
Space weather
Instrumentation
Geometric physics
Earth sciences & ecology
Ecology & evolution
Geophysics
Geology
Marine ecology
Geoscience
Systematics
Global change
Hydrology
Mathematics &
computer science Mathematical physics
Nonlinear science
Information theory
Applied analysis
Computer science
Operations research
Numerical methods
Pattern recogition
Social sciences
Economics
Management
Finance
Geography
Sociology
Political science
Marketing
Information science
Microbiology
Flow volume in research field
Flow volume in PNAS
19. To conclude, I have showed how we model higher-order network flows with sparse memory networks,
and reveal multilevel, overlapping modules with Infomap.
www.mapequation.org
Model higher-order network flows
with sparse memory networks,
and reveal multilevel, overlapping
modules with Infomap
Challenges
1. Detectability limit
2. Many representations
3. Scale and model selection
Solutions
Higher-order network representations
Sparse memory networks
State lumping and cross-validation
20. Learn more in Maps of sparse Markov chains efficiently reveal community structure in network flows with memory
and Mapping higher-order network flows in memory and multilayer networks with Infomap
www.mapequation.org
Thank you!
Model higher-order network flows
with sparse memory networks,
and reveal multilevel, overlapping
modules with Infomap