Maps of sparse memory networks reveal overlapping communities in network flows

Umeå University
Umeå UniversityUmeå University
Maps of sparse memory networks reveal
overlapping communities in network flows
Martin Rosvall
Christian Persson, Ludvig Bohlin, and Daniel Edler
Integrated Science Lab, Umeå University, Sweden
Hello. I will talk about how we make maps of flows through complex systems,
and three challenges that have kept us excited working on this.
I arrived here thanks to good maps, and I share with several of you the vision to create similarly powerful
maps of complex systems. Imagine Google Maps for networks. Let’s start with the first challenge.
Barabasi, A.L.
Maps of complex systems
This is network science until a few years ago: Take a complex system, abstract away all but the undirected,
unweighted links, and identify modules with your favorite community-detection algorithm.
Conventional
network
representation
Conventional
community
detection Two-level and non-overlapping
modular representation
Undirected,
unweighted
network
Raw
data
Complex
system
Conventional
approach
1. Detectability limit
For some time, we forgot the underlying complex system, and tried to crank out as much as possible from the network.
Sure enough, there is a limit to how much structure we can detect.
Conventional
network
representation
Conventional
community
detection Two-level and non-overlapping
modular representation
Limited detectability
Undirected,
unweighted
network
Raw
data
Complex
system
Conventional
approach
1. Detectability limit
But this is not the limit of how much we can say about the complex system. If we are interested in flows,
we can use higher-order flow modeling, and represent the flows with memory or multilayer networks.
Higher-order
flow modeling
Conventional
network
representation
Conventional
community
detection Two-level and non-overlapping
modular representation
Limited detectability
Undirected,
unweighted
network
Multilayer
memory
network
Raw
data
Complex
system
Conventional
approach
Higher-order
framework
1. Detectability limit
And with these representations, there is sufficient information about the flows through the complex system to reveal
multilevel, overlapping modules with a generalized community detection algorithm.
Higher-order
flow modeling
Conventional
network
representation
Conventional
community
detection
Higher-order
flow mapping
Two-level and non-overlapping
modular representation
Multilevel and overlapping
modular representation
Limited detectability
Undirected,
unweighted
network
Multilayer
memory
network
Raw
data
Complex
system
Conventional
approach
Higher-order
framework
1. Detectability limit
Higher-order representations break the limit of what we can detect. This takes us to the second challenge:
It looks like we need one algorithm for each representation. And there are many representations.
Unlimited detectability
Higher-order
flow modeling
Conventional
network
representation
Conventional
community
detection
Higher-order
flow mapping
Two-level and non-overlapping
modular representation
Multilevel and overlapping
modular representation
Limited detectability
Undirected,
unweighted
network
Multilayer
memory
network
Raw
data
Complex
system
Conventional
approach
Higher-order
framework
Solution
Higher-order network representations1. Detectability limit
Imagine that you are node i in this example, that the solid pathway to the left comes from a Facebook conversation
with your friends, and the dashed pathway to the right from an email conversation with your colleagues.
Higher-order network flows
2. Many representations
Memory and multilayer networks model these pathways in different ways. Look at the white links that correspond
to a message that you got from your friend j and forwarded to your friend k.
Memory network
Multilayer network
Higher-order network flows
2. Many representations
To capture the higher-order flows, memory networks use state nodes to capture where flows come from,
and multilayer networks use layers to capture the different data sources.
Memory network
Multilayer network
Higher-order network flows
2. Many representations
We can lump redundant state nodes in the memory network such that they correspond to nodes in layers. In this way,
both networks can be represented with a sparse memory network in which state nodes are free to represent anything.
Memory network
Multilayer network
Sparse memory network
Higher-order network flows
2. Many representations
Sparse memory networks solve the many representation problem. As a result, we can cluster different networks
with a single algorithm. This takes us to the third challenge.
Memory network
Multilayer network
Sparse memory network
Higher-order network flows
2. Many representations2. Many representations
Solution
Sparse memory networks
For real systems when state nodes are not completely redundant, which sparse memory network should we use?
Here an example with citation flows from journals in two fields through multidisciplinary PNAS.
3. Scale and model selection
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Microbiology
Plant science
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Information loss (bits)
Two-module
compression (bits)
0.0
0.43
We identify two fields overlapping in PNAS, because the flows stay within those fields. Lumping together the state nodes
for the microbiology journals destroys very little information. Still two fields.
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Microbiology
Plant science
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Information loss (bits)
Two-module
compression (bits)
0.0
0.43
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
0.003
0.43
3. Scale and model selection
In the same way, lumping together the state node for the plant science journals barely affects the flows. Still two fields.
Not so we if continue and lump together the remaining state nodes.
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Microbiology
Plant science
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Information loss (bits)
Two-module
compression (bits)
0.0
0.43
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
0.003
0.43
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
0.006
0.43
3. Scale and model selection
Only one field – we are underfitting. To balance under- and overfitting, we lump states nodes based on
minimal information loss and perform cross validation on the clustering.
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Microbiology
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Microbiology
Plant science
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
PNAS
PNAS
Mol Microbiol
J Bacteriol
Plant Cell
Plant Physiol
Information loss (bits)
Two-module
compression (bits)
0.890
-0.64
0.006
0.43
0.0
0.43
0.003
0.43
3. Scale and model selection
Solution
State lumping and cross-validation
Let me finish with a larger example, a map of multistep citation pathways modeled with the best sparse memory network
and clustered with Infomap. Green circles for different fields and red circles for PNAS clustered in multiple fields.
Life sciences
Molecular biology
Medicine
Neuroscience
Cardiology
Immunology
Clinical microbiology
Neurology
Plant
science
Nutrition
Psychiatry
Physical sciences
Chemistry
Material physics
Condensed
matter
physics
Analytic chemistry
Chemical physics
Macro molecules
Environmental
chemistry
Biomaterials
Astrophysics &
particle physics
Astrophysics
Particle physics
Nuclear physics
Solar system
Space weather
Instrumentation
Geometric physics
Earth sciences & ecology
Ecology & evolution
Geophysics
Geology
Marine ecology
Geoscience
Systematics
Global change
Hydrology
Mathematics &
computer science Mathematical physics
Nonlinear science
Information theory
Applied analysis
Computer science
Operations research
Numerical methods
Pattern recogition
Social sciences
Economics
Management
Finance
Geography
Sociology
Political science
Marketing
Information science
Microbiology
Flow volume in research field
Flow volume in PNAS
That is, several fields overlap in PNAS, because the flows through PNAS in different fields look very different.
On the other hand, fields do not overlap in specialized journals, because a single field contains all flows.
+++
Life sciences
Molecular biology
Medicine
Neuroscience
Cardiology
Immunology
Clinical microbiology
Neurology
Plant
science
Nutrition
Psychiatry
S
Earth sciences & ecolo
Ecology & evolution
Geophysics
Geology
Marine ecology
Geoscience
Systematics
Global change
Hydrology
Microbiology
Life sciences
Molecular biology
Medicine
Neuroscience
Cardiology
Immunology
Clinical microbiology
Neurology
Plant
science
Nutrition
Psychiatry
Physical sciences
Chemistry
Material physics
Condensed
matter
physics
Analytic chemistry
Chemical physics
Macro molecules
Environmental
chemistry
Biomaterials
Astrophysics &
particle physics
Astrophysics
Particle physics
Nuclear physics
Solar system
Space weather
Instrumentation
Geometric physics
Earth sciences & ecology
Ecology & evolution
Geophysics
Geology
Marine ecology
Geoscience
Systematics
Global change
Hydrology
Mathematics &
computer science Mathematical physics
Nonlinear science
Information theory
Applied analysis
Computer science
Operations research
Numerical methods
Pattern recogition
Social sciences
Economics
Management
Finance
Geography
Sociology
Political science
Marketing
Information science
Microbiology
Flow volume in research field
Flow volume in PNAS
To conclude, I have showed how we model higher-order network flows with sparse memory networks,
and reveal multilevel, overlapping modules with Infomap.
www.mapequation.org
Model higher-order network flows
with sparse memory networks,
and reveal multilevel, overlapping
modules with Infomap
Challenges
1. Detectability limit
2. Many representations
3. Scale and model selection
Solutions
Higher-order network representations
Sparse memory networks
State lumping and cross-validation
Learn more in Maps of sparse Markov chains efficiently reveal community structure in network flows with memory
and Mapping higher-order network flows in memory and multilayer networks with Infomap
www.mapequation.org
Thank you!
Model higher-order network flows
with sparse memory networks,
and reveal multilevel, overlapping
modules with Infomap
1 of 20

More Related Content

Similar to Maps of sparse memory networks reveal overlapping communities in network flows

E04423133E04423133
E04423133IOSR-JEN
305 views3 slides

Similar to Maps of sparse memory networks reveal overlapping communities in network flows(20)

Brain SpecializationBrain Specialization
Brain Specialization
Kristi Lucas2 views
Jürgens diata12-communitiesJürgens diata12-communities
Jürgens diata12-communities
Pascal Juergens1.1K views
Analytic tools for higher-order dataAnalytic tools for higher-order data
Analytic tools for higher-order data
Austin Benson263 views
E04423133E04423133
E04423133
IOSR-JEN305 views
A tutorial in Connectome Analysis (3) - Marcus KaiserA tutorial in Connectome Analysis (3) - Marcus Kaiser
A tutorial in Connectome Analysis (3) - Marcus Kaiser
Lake Como School of Advanced Studies1.4K views
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
Ian Foster2.3K views
NEURAL NETWORKSNEURAL NETWORKS
NEURAL NETWORKS
ESCOM1.5K views
Kain042710 mit sloan-schoolKain042710 mit sloan-school
Kain042710 mit sloan-school
Erik Chan472 views
Topology pptTopology ppt
Topology ppt
boocse11872 views
Topology pptTopology ppt
Topology ppt
karan saini224 views
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial
Alexander Pico1.7K views
Braintalk cuso nmBraintalk cuso nm
Braintalk cuso nm
eXascale Infolab1.6K views
Brain NetworksBrain Networks
Brain Networks
Jimmy Lu1.2K views
Raymond Kurzweil presentationRaymond Kurzweil presentation
Raymond Kurzweil presentation
Antonio Eram1.8K views

Maps of sparse memory networks reveal overlapping communities in network flows

  • 1. Maps of sparse memory networks reveal overlapping communities in network flows Martin Rosvall Christian Persson, Ludvig Bohlin, and Daniel Edler Integrated Science Lab, Umeå University, Sweden Hello. I will talk about how we make maps of flows through complex systems, and three challenges that have kept us excited working on this.
  • 2. I arrived here thanks to good maps, and I share with several of you the vision to create similarly powerful maps of complex systems. Imagine Google Maps for networks. Let’s start with the first challenge. Barabasi, A.L. Maps of complex systems
  • 3. This is network science until a few years ago: Take a complex system, abstract away all but the undirected, unweighted links, and identify modules with your favorite community-detection algorithm. Conventional network representation Conventional community detection Two-level and non-overlapping modular representation Undirected, unweighted network Raw data Complex system Conventional approach 1. Detectability limit
  • 4. For some time, we forgot the underlying complex system, and tried to crank out as much as possible from the network. Sure enough, there is a limit to how much structure we can detect. Conventional network representation Conventional community detection Two-level and non-overlapping modular representation Limited detectability Undirected, unweighted network Raw data Complex system Conventional approach 1. Detectability limit
  • 5. But this is not the limit of how much we can say about the complex system. If we are interested in flows, we can use higher-order flow modeling, and represent the flows with memory or multilayer networks. Higher-order flow modeling Conventional network representation Conventional community detection Two-level and non-overlapping modular representation Limited detectability Undirected, unweighted network Multilayer memory network Raw data Complex system Conventional approach Higher-order framework 1. Detectability limit
  • 6. And with these representations, there is sufficient information about the flows through the complex system to reveal multilevel, overlapping modules with a generalized community detection algorithm. Higher-order flow modeling Conventional network representation Conventional community detection Higher-order flow mapping Two-level and non-overlapping modular representation Multilevel and overlapping modular representation Limited detectability Undirected, unweighted network Multilayer memory network Raw data Complex system Conventional approach Higher-order framework 1. Detectability limit
  • 7. Higher-order representations break the limit of what we can detect. This takes us to the second challenge: It looks like we need one algorithm for each representation. And there are many representations. Unlimited detectability Higher-order flow modeling Conventional network representation Conventional community detection Higher-order flow mapping Two-level and non-overlapping modular representation Multilevel and overlapping modular representation Limited detectability Undirected, unweighted network Multilayer memory network Raw data Complex system Conventional approach Higher-order framework Solution Higher-order network representations1. Detectability limit
  • 8. Imagine that you are node i in this example, that the solid pathway to the left comes from a Facebook conversation with your friends, and the dashed pathway to the right from an email conversation with your colleagues. Higher-order network flows 2. Many representations
  • 9. Memory and multilayer networks model these pathways in different ways. Look at the white links that correspond to a message that you got from your friend j and forwarded to your friend k. Memory network Multilayer network Higher-order network flows 2. Many representations
  • 10. To capture the higher-order flows, memory networks use state nodes to capture where flows come from, and multilayer networks use layers to capture the different data sources. Memory network Multilayer network Higher-order network flows 2. Many representations
  • 11. We can lump redundant state nodes in the memory network such that they correspond to nodes in layers. In this way, both networks can be represented with a sparse memory network in which state nodes are free to represent anything. Memory network Multilayer network Sparse memory network Higher-order network flows 2. Many representations
  • 12. Sparse memory networks solve the many representation problem. As a result, we can cluster different networks with a single algorithm. This takes us to the third challenge. Memory network Multilayer network Sparse memory network Higher-order network flows 2. Many representations2. Many representations Solution Sparse memory networks
  • 13. For real systems when state nodes are not completely redundant, which sparse memory network should we use? Here an example with citation flows from journals in two fields through multidisciplinary PNAS. 3. Scale and model selection Mol Microbiol J Bacteriol Plant Cell Plant Physiol Microbiology Plant science PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol Information loss (bits) Two-module compression (bits) 0.0 0.43
  • 14. We identify two fields overlapping in PNAS, because the flows stay within those fields. Lumping together the state nodes for the microbiology journals destroys very little information. Still two fields. Mol Microbiol J Bacteriol Plant Cell Plant Physiol Microbiology Plant science PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol Information loss (bits) Two-module compression (bits) 0.0 0.43 PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol 0.003 0.43 3. Scale and model selection
  • 15. In the same way, lumping together the state node for the plant science journals barely affects the flows. Still two fields. Not so we if continue and lump together the remaining state nodes. Mol Microbiol J Bacteriol Plant Cell Plant Physiol Microbiology Plant science PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol Information loss (bits) Two-module compression (bits) 0.0 0.43 PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol 0.003 0.43 PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol 0.006 0.43 3. Scale and model selection
  • 16. Only one field – we are underfitting. To balance under- and overfitting, we lump states nodes based on minimal information loss and perform cross validation on the clustering. Mol Microbiol J Bacteriol Plant Cell Plant Physiol Mol Microbiol J Bacteriol Plant Cell Plant Physiol Microbiology PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol Microbiology Plant science PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol Information loss (bits) Two-module compression (bits) 0.890 -0.64 0.006 0.43 0.0 0.43 0.003 0.43 3. Scale and model selection Solution State lumping and cross-validation
  • 17. Let me finish with a larger example, a map of multistep citation pathways modeled with the best sparse memory network and clustered with Infomap. Green circles for different fields and red circles for PNAS clustered in multiple fields. Life sciences Molecular biology Medicine Neuroscience Cardiology Immunology Clinical microbiology Neurology Plant science Nutrition Psychiatry Physical sciences Chemistry Material physics Condensed matter physics Analytic chemistry Chemical physics Macro molecules Environmental chemistry Biomaterials Astrophysics & particle physics Astrophysics Particle physics Nuclear physics Solar system Space weather Instrumentation Geometric physics Earth sciences & ecology Ecology & evolution Geophysics Geology Marine ecology Geoscience Systematics Global change Hydrology Mathematics & computer science Mathematical physics Nonlinear science Information theory Applied analysis Computer science Operations research Numerical methods Pattern recogition Social sciences Economics Management Finance Geography Sociology Political science Marketing Information science Microbiology Flow volume in research field Flow volume in PNAS
  • 18. That is, several fields overlap in PNAS, because the flows through PNAS in different fields look very different. On the other hand, fields do not overlap in specialized journals, because a single field contains all flows. +++ Life sciences Molecular biology Medicine Neuroscience Cardiology Immunology Clinical microbiology Neurology Plant science Nutrition Psychiatry S Earth sciences & ecolo Ecology & evolution Geophysics Geology Marine ecology Geoscience Systematics Global change Hydrology Microbiology Life sciences Molecular biology Medicine Neuroscience Cardiology Immunology Clinical microbiology Neurology Plant science Nutrition Psychiatry Physical sciences Chemistry Material physics Condensed matter physics Analytic chemistry Chemical physics Macro molecules Environmental chemistry Biomaterials Astrophysics & particle physics Astrophysics Particle physics Nuclear physics Solar system Space weather Instrumentation Geometric physics Earth sciences & ecology Ecology & evolution Geophysics Geology Marine ecology Geoscience Systematics Global change Hydrology Mathematics & computer science Mathematical physics Nonlinear science Information theory Applied analysis Computer science Operations research Numerical methods Pattern recogition Social sciences Economics Management Finance Geography Sociology Political science Marketing Information science Microbiology Flow volume in research field Flow volume in PNAS
  • 19. To conclude, I have showed how we model higher-order network flows with sparse memory networks, and reveal multilevel, overlapping modules with Infomap. www.mapequation.org Model higher-order network flows with sparse memory networks, and reveal multilevel, overlapping modules with Infomap Challenges 1. Detectability limit 2. Many representations 3. Scale and model selection Solutions Higher-order network representations Sparse memory networks State lumping and cross-validation
  • 20. Learn more in Maps of sparse Markov chains efficiently reveal community structure in network flows with memory and Mapping higher-order network flows in memory and multilayer networks with Infomap www.mapequation.org Thank you! Model higher-order network flows with sparse memory networks, and reveal multilevel, overlapping modules with Infomap