Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Information Content of Complex Networks


Published on

This short talk given in Stockholm, Sweden, explains how algorithmic complexity measures, notably Kolmogorov complexity approximated both by lossless compression algorithms and the Block Decomposition Method (BDM) are capable of characterizing graphs and networks by some of their group-theoretic and topological properties, notably graph automorphism group size and clustering coefficients of complex networks. The method distinguished between models of networks such as regular, random, small-world and scale-free.

Published in: Education

Information Content of Complex Networks

  1. 1. Information Content of Complex NetworksHector ZenilBased on the results reported in:H. Zenil, F. Soler-Toscano, K. Dingle and A. Louis, GraphAutomorphisms and Topological Characterization of ComplexNetworks by Algorithmic Information ContentApril 28, 2013
  2. 2. A biological motivation: data and network correspondence.Figure : Biological networks.Figure : Which interaction networkcorresponds to the biological data?If data and associated network canbe derived from each other thenthey should have about the sameinformation content.
  3. 3. Graph automorphismDefinition An automorphism of a graph g is a permutation λ ofthe vertex set V , such that the pair of vertices (i, j) forms an edgeif and only if the pair (λ(i), λ(j)) also forms an edge.Figure : Example of a non-trivial graph automorphism (graphs andnetworks are synonyms in math.)
  4. 4. Automorphisms groupThe set of all automorphisms of an object forms a group, called theautomorphism group. The size of the automorphism group A(g)provides an indication of a formal type of symmetry of a graph.Figure : Elements of a graph automorphism group.
  5. 5. Graph embeddingsFigure : (Petersen graph) Automorphisms are not ways to embed (plot)graphs. We are interested in topological properties (how nodes areconnected), not geometrical ones (how nodes and links are distributed ina plane or space.)
  6. 6. Clustering coefficientA clustering coefficient is a measure of the degree to which nodesin a graph tend to cluster together (for example, friends in socialnetworks [2]).DefinitionC(vi ) =2 |E(Ni )|ni (ni − 1)where E(Ni ) denotes the set of edges with both nodes in Ni .Figure : Some topological properties of graphs.
  7. 7. Adjacency matrixA graph g = (V , E) consists of a set of vertices V (also callednodes) and a set of edges E. Two vertices, i and j, form an edgeof the graph if (i, j) ∈ E.A graph can be represented by its adjacency matrix. Assumingthat the vertices are indices from 1 to n, that is, thatV = {1, 2, . . . , n}, then the adjacency matrix of g is an n × nmatrix, with entries ai,j = 1 if (i, j) ∈ E and 0 otherwise.Figure : A graph and its adjacency matrix.
  8. 8. Founders of Algorithmic Information Theory (60s)Figure : A. Kolmogorov, R. Solomonoff (here with C.S. Calude) and G.Chaitin (2007).
  9. 9. Kolmogorov complexityK(s) is the length of the shortest program p that outputs the strings, when run on a universal Turing machine U. Formally [3, 1],Definition 4.KU(s) = min{|p|, U(p) = s} (1)Figure : Giving directions in themost compact/easiest way (smallK).Figure : Compression is anotherway to understand Kolmogorovcomplexity. If compressible thensmall K, otherwise random data.
  10. 10. Turing machineBy the invariance theorem [4], KU only depends on U up to aconstant, so as is conventional, we drop the subscript and writeonly K.Figure : The simple concept of a “silly” machine started an entire field:Computer Science. This “silly” machine helped understand the seminalconcept of Computation Universality.K as a measure is not computable! there is no algorithm (Turingmachine) that given a string retrieves the shortest program thatproduces the string. Proven: only a non-computable function canbe universal complexity measure, so live/deal with it.
  11. 11. Complexity and edge densityThese observations show that our measure is behaving as expectedfrom theory.Figure : Estimated (normalised) Kolmogorov complexity for increasingnumber of edges for random graphs of 50 nodes each. The minimumcomplexity (Left) standard deviation (Right) is shown for 100 randompermutations of 20 graphs in each group.
  12. 12. Graph dualityFigure : The dual graph of a plane graph G is a graph that has a vertexcorresponding to each face of G.Even if very different, dual graphs should have the sameinformation content (because there is a program of constant lengththat transforms any graph into its dual and viceversa).
  13. 13. Complexity of graph dualityFigure : Graphs ranked by Kolmogorov complexity approximated by twodifferent methods: (Top) lossless compression and (Bottom) BDM.These results are important because a network and its dual areshown to have about the same information content even when theymay superficially look very different.
  14. 14. Graph automorphisms and Kolmogorov complexityFigure : Graph automorphismgroup size A(g) (y-axis) ofconnected regular graphs of size 20versus K complexity (x-axis). A(g)decays with increasing K.Figure : For V (g) = 20, thecomplete, the (4,5)-lattice and the(20,46)-noncayley transitive graphsfound in the boundaries of regulargraphs.
  15. 15. Figure : Plots of number of graph automorphisms normalised bymaximum number of edges of g, A(g)/V (g)! (y-axis) versus(normalised) Kolmogorov complexity (x-axis) estimated by NBDM forconnected regular graphs found in Mathematica (GraphData[]) with sizeV (g) = 20 to 36 nodes (only vertex sizes for which at least 20 graphswere found in the dataset were plotted). The decay can be witnessedeven if noisy.
  16. 16. Applying BDM to real-world networksFigure : Real-world networks also display a connection betweenKolmogorov complexity and automorphism group size, A(g). Networkswith less symmetries have greater estimated Kolmogorov complexity.Automorphisms count is normalised by network size.
  17. 17. Information-content characterisation of topologicalproperties of complex networksFigure : Example of a Watts–Strogatz rewiring algorithm forn = 30-vertex graphs and rewiring probability p = 0, 0.1 and 1 startingfrom a 2n-regular graph.
  18. 18. A network is said to have the “small-world” property of complexnetworks if the average graph distance D grows no faster than thelog of the number of nodes: D ∼ log(V (g)).Figure : The Barab´asi-Albert model is an algorithm for generatingrandom scale-free networks using a preferential attachment mechanism(here for n = 30). A new vertex with s vertex is added at each step.
  19. 19. Figure : Kolmogorov complexity of the Watts-Strogatz model as afunction of the rewiring probability on a 1000-node network starting froma regular graph. Both the number of nodes and the number of links arekept constant, while p varies; Kolmogorov complexity increases with p.This demonstrates that information-content is sensitive to topologicalproperties of complex networks because the size of the network is thesame both for vertex and nodes.
  20. 20. Topological characterization of complex networksFigure : Network topology characterisation by Kolmogorov complexity.Km as approximated by NBDM applied to 792 networks with V (g) = 20nodes each: 198 connected regular graphs (e.g. Haars, circulants,noncayley transitives, snarks, cubics, lattices, books, Andr´asfai,resistances, etc.), 198 random graphs with edge density 0.5, 198Barab´asi-Albert networks and 198 Watts-Strogatz networks (withrewiring probability 0.5).
  21. 21. NormalisedNetwork description (g) V (g) Km(g) A(g)Metabolic Network ActinobacillusActinomycetemcomitans 993 0.00336 4.39648 × 1077Metabolic Network Neisseria Meningitidis 981 0.00344 2.81375 × 1079Perl Module Authors Network 840 0.00350 3.89458 × 10473Metabolic Network Campylobacter Jejuni 946 0.00370 6.59472 × 1077Metabolic Network Emericella Nidulans 916 0.00378 3.14461 × 1071Whole Network Pyrococcus Horikoshii 953 0.00382 4.0251 × 1073Whole Network Pyrococcus Furiosus 931 0.00384 3.14461 × 1071Metabolic Network Thermotoga Maritima 830 0.00477 1.70606 × 1067Whole Network Mycoplasma Genitalium 878 0.00480 5.9279 × 1095Whole Network Treponema Pallidum 899 0.00499 2.44515 × 1087Whole Network Chlamydia Trachomatis 822 0.00511 1.42326 × 1078Metabolic Network Pyrococcus Furiosus 751 0.00511 2.15507 × 1053Whole Network Rickettsia Prowazekii 817 0.00523 1.13861 × 1079Whole Network Arabidopsis Thaliana 768 0.00535 1.48263 × 1063Whole Network Oryza Sativa 744 0.00569 2.57400 × 1060Whole Network Chlamydia Pneumoniae 744 0.00635 1.49306 × 1073Metabolic Network Oryza Sativa 665 0.00640 6.31369 × 1050Metabolic Network Rickettsia Prowazekii 456 0.01080 5.04691 × 1036Metabolic Network Mycoplasma Pneumoniae 411 0.01280 7.6059 × 1030Metabolic Network Borrelia Burgdorferi 409 0.01460 8.6134 × 1038Table : Random sample of 20 real-world networks from the 88 includedin the study (and plotted in Fig. 18), sorted from smallest to largestestimated Kolmogorov complexity (NBDM). The (negative) correlationbetween Km and A(g) is stronger than between Km and V (g).
  22. 22. Conclusions and remarksK is applicable even if only semi-computable.Information-theoretic measures are sensitive to graph andnetwork group-theoretic and topological properties.The adjacency matrix of a graph/network is a reasonablerepresentation of information content of a graph.Approaches to K for graph complexity yield results inagreement with theory and intuition.The Block Decomposition Method (BDM) is complementaryto lossless compression algorithms, is more accurate fornon-random cases and less accurate for random ones.BDM runs in linear time characterising network propertiesthat require exponential time otherwise. (BDM requires CTMwhich runs in exponential time, but this latter calculationneeds to be run only once, not every time that it is required).The results can easily be applied to directed networks.
  23. 23. References IH. Zenil, F. Soler-Toscano, K. Dingle and A. Louis GraphAutomorphisms and Topological Characterization of ComplexNetworks by Algorithmic Information Content, 2013.H. Zenil, F. Soler-Toscano, J.-P. Delahaye and N. Gauvrit,Two-Dimensional Kolmogorov Complexity and Validation ofthe Coding Theorem Method by Compressibility, 2013.F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit,Correspondence and Independence of Numerical Evaluations ofAlgorithmic Information MeasuresF. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit,Calculating Kolmogorov Complexity from the FrequencyOutput Distributions of Small Turing Machines.
  24. 24. G. J. Chaitin.On the length of programs for computing finite binarysequences: Statistical considerations.Journal of the ACM, 16(1):145–159, 1969.M. Girvan and M. E. J. Newman.Community structure in social and biological networks.Proceedings of the National Academy of Sciences,99(12):7821–7826, 2002.A. N. Kolmogorov.Three approaches to the quantitative definition of information.Problems of Information and Transmission, 1(1):1–7, 1965.M. Li and P. Vit´anyi.An Introduction to Kolmogorov Complexity and ItsApplications.Springer, Heidelberg, 2008.