Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Maps of sparse memory networks reveal overlapping communities in network flows

1,250 views

Published on

Lightning talk at NetSci 2017. Includes script.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Maps of sparse memory networks reveal overlapping communities in network flows

  1. 1. Maps of sparse memory networks reveal overlapping communities in network flows Martin Rosvall Christian Persson, Ludvig Bohlin, and Daniel Edler Integrated Science Lab, Umeå University, Sweden Hello. I will talk about how we make maps of flows through complex systems, and three challenges that have kept us excited working on this.
  2. 2. I arrived here thanks to good maps, and I share with several of you the vision to create similarly powerful maps of complex systems. Imagine Google Maps for networks. Let’s start with the first challenge. Barabasi, A.L. Maps of complex systems
  3. 3. This is network science until a few years ago: Take a complex system, abstract away all but the undirected, unweighted links, and identify modules with your favorite community-detection algorithm. Conventional network representation Conventional community detection Two-level and non-overlapping modular representation Undirected, unweighted network Raw data Complex system Conventional approach 1. Detectability limit
  4. 4. For some time, we forgot the underlying complex system, and tried to crank out as much as possible from the network. Sure enough, there is a limit to how much structure we can detect. Conventional network representation Conventional community detection Two-level and non-overlapping modular representation Limited detectability Undirected, unweighted network Raw data Complex system Conventional approach 1. Detectability limit
  5. 5. But this is not the limit of how much we can say about the complex system. If we are interested in flows, we can use higher-order flow modeling, and represent the flows with memory or multilayer networks. Higher-order flow modeling Conventional network representation Conventional community detection Two-level and non-overlapping modular representation Limited detectability Undirected, unweighted network Multilayer memory network Raw data Complex system Conventional approach Higher-order framework 1. Detectability limit
  6. 6. And with these representations, there is sufficient information about the flows through the complex system to reveal multilevel, overlapping modules with a generalized community detection algorithm. Higher-order flow modeling Conventional network representation Conventional community detection Higher-order flow mapping Two-level and non-overlapping modular representation Multilevel and overlapping modular representation Limited detectability Undirected, unweighted network Multilayer memory network Raw data Complex system Conventional approach Higher-order framework 1. Detectability limit
  7. 7. Higher-order representations break the limit of what we can detect. This takes us to the second challenge: It looks like we need one algorithm for each representation. And there are many representations. Unlimited detectability Higher-order flow modeling Conventional network representation Conventional community detection Higher-order flow mapping Two-level and non-overlapping modular representation Multilevel and overlapping modular representation Limited detectability Undirected, unweighted network Multilayer memory network Raw data Complex system Conventional approach Higher-order framework Solution Higher-order network representations1. Detectability limit
  8. 8. Imagine that you are node i in this example, that the solid pathway to the left comes from a Facebook conversation with your friends, and the dashed pathway to the right from an email conversation with your colleagues. Higher-order network flows 2. Many representations
  9. 9. Memory and multilayer networks model these pathways in different ways. Look at the white links that correspond to a message that you got from your friend j and forwarded to your friend k. Memory network Multilayer network Higher-order network flows 2. Many representations
  10. 10. To capture the higher-order flows, memory networks use state nodes to capture where flows come from, and multilayer networks use layers to capture the different data sources. Memory network Multilayer network Higher-order network flows 2. Many representations
  11. 11. We can lump redundant state nodes in the memory network such that they correspond to nodes in layers. In this way, both networks can be represented with a sparse memory network in which state nodes are free to represent anything. Memory network Multilayer network Sparse memory network Higher-order network flows 2. Many representations
  12. 12. Sparse memory networks solve the many representation problem. As a result, we can cluster different networks with a single algorithm. This takes us to the third challenge. Memory network Multilayer network Sparse memory network Higher-order network flows 2. Many representations2. Many representations Solution Sparse memory networks
  13. 13. For real systems when state nodes are not completely redundant, which sparse memory network should we use? Here an example with citation flows from journals in two fields through multidisciplinary PNAS. 3. Scale and model selection Mol Microbiol J Bacteriol Plant Cell Plant Physiol Microbiology Plant science PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol Information loss (bits) Two-module compression (bits) 0.0 0.43
  14. 14. We identify two fields overlapping in PNAS, because the flows stay within those fields. Lumping together the state nodes for the microbiology journals destroys very little information. Still two fields. Mol Microbiol J Bacteriol Plant Cell Plant Physiol Microbiology Plant science PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol Information loss (bits) Two-module compression (bits) 0.0 0.43 PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol 0.003 0.43 3. Scale and model selection
  15. 15. In the same way, lumping together the state node for the plant science journals barely affects the flows. Still two fields. Not so we if continue and lump together the remaining state nodes. Mol Microbiol J Bacteriol Plant Cell Plant Physiol Microbiology Plant science PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol Information loss (bits) Two-module compression (bits) 0.0 0.43 PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol 0.003 0.43 PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol 0.006 0.43 3. Scale and model selection
  16. 16. Only one field – we are underfitting. To balance under- and overfitting, we lump states nodes based on minimal information loss and perform cross validation on the clustering. Mol Microbiol J Bacteriol Plant Cell Plant Physiol Mol Microbiol J Bacteriol Plant Cell Plant Physiol Microbiology PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol Microbiology Plant science PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol PNAS PNAS Mol Microbiol J Bacteriol Plant Cell Plant Physiol Information loss (bits) Two-module compression (bits) 0.890 -0.64 0.006 0.43 0.0 0.43 0.003 0.43 3. Scale and model selection Solution State lumping and cross-validation
  17. 17. Let me finish with a larger example, a map of multistep citation pathways modeled with the best sparse memory network and clustered with Infomap. Green circles for different fields and red circles for PNAS clustered in multiple fields. Life sciences Molecular biology Medicine Neuroscience Cardiology Immunology Clinical microbiology Neurology Plant science Nutrition Psychiatry Physical sciences Chemistry Material physics Condensed matter physics Analytic chemistry Chemical physics Macro molecules Environmental chemistry Biomaterials Astrophysics & particle physics Astrophysics Particle physics Nuclear physics Solar system Space weather Instrumentation Geometric physics Earth sciences & ecology Ecology & evolution Geophysics Geology Marine ecology Geoscience Systematics Global change Hydrology Mathematics & computer science Mathematical physics Nonlinear science Information theory Applied analysis Computer science Operations research Numerical methods Pattern recogition Social sciences Economics Management Finance Geography Sociology Political science Marketing Information science Microbiology Flow volume in research field Flow volume in PNAS
  18. 18. That is, several fields overlap in PNAS, because the flows through PNAS in different fields look very different. On the other hand, fields do not overlap in specialized journals, because a single field contains all flows. +++ Life sciences Molecular biology Medicine Neuroscience Cardiology Immunology Clinical microbiology Neurology Plant science Nutrition Psychiatry S Earth sciences & ecolo Ecology & evolution Geophysics Geology Marine ecology Geoscience Systematics Global change Hydrology Microbiology Life sciences Molecular biology Medicine Neuroscience Cardiology Immunology Clinical microbiology Neurology Plant science Nutrition Psychiatry Physical sciences Chemistry Material physics Condensed matter physics Analytic chemistry Chemical physics Macro molecules Environmental chemistry Biomaterials Astrophysics & particle physics Astrophysics Particle physics Nuclear physics Solar system Space weather Instrumentation Geometric physics Earth sciences & ecology Ecology & evolution Geophysics Geology Marine ecology Geoscience Systematics Global change Hydrology Mathematics & computer science Mathematical physics Nonlinear science Information theory Applied analysis Computer science Operations research Numerical methods Pattern recogition Social sciences Economics Management Finance Geography Sociology Political science Marketing Information science Microbiology Flow volume in research field Flow volume in PNAS
  19. 19. To conclude, I have showed how we model higher-order network flows with sparse memory networks, and reveal multilevel, overlapping modules with Infomap. www.mapequation.org Model higher-order network flows with sparse memory networks, and reveal multilevel, overlapping modules with Infomap Challenges 1. Detectability limit 2. Many representations 3. Scale and model selection Solutions Higher-order network representations Sparse memory networks State lumping and cross-validation
  20. 20. Learn more in Maps of sparse Markov chains efficiently reveal community structure in network flows with memory and Mapping higher-order network flows in memory and multilayer networks with Infomap www.mapequation.org Thank you! Model higher-order network flows with sparse memory networks, and reveal multilevel, overlapping modules with Infomap

×