Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- 2016 07-19 Gender in Computational ... by Kevin Bonham 197 views
- Mod 13 by Sean Wells 996 views
- Lecture5 xing by Tianlu Wang 391 views
- PhD Defense, Oldenburg, Germany, Ju... by Jurgen Riedel 708 views
- Can you trust the internet? An intr... by Denise Gosnell, P... 1254 views
- Big Data, Computational Biology & t... by NBBJDesign 1302 views

I present cutting-edge concepts and tools drawn from algorithmic information theory (AIT) for new generation genetic sequencing, network biology and bioinformatics in general. AIT is the most advanced mathematical theory of information theory formally characterising the concepts and differences between simplicity, randomness and structure. Measures of AIT will empower computational medicine and systems biology to deal with big data, sophisticated analytics and a powerful new understanding framework.

No Downloads

Total views

3,328

On SlideShare

0

From Embeds

0

Number of Embeds

231

Shares

0

Downloads

76

Comments

9

Likes

4

No notes for slide

- 1. Algorithmic Information Theory and Computational Biology Hector Zenil Unit of Computational Medicine Karolinska Institutet Sweden Hector Zenil AIT Tools for Biology and Medicine
- 2. Complex Adaptive Systems (CAS) Hector Zenil AIT Tools for Biology and Medicine
- 3. Complexity is hard to quantify in biology Mapping quantitative stimuli to qualitative behaviour Hector Zenil AIT Tools for Biology and Medicine
- 4. Information Theory in Biology Sequence alignment Pattern recognition Sequence logos Binding site detection Motif detection Consensus sequences Biological signiﬁcance [based on Claude Shannon’s Information Theory, 1940] Hector Zenil AIT Tools for Biology and Medicine
- 5. Algorithmic Information Theory Which sequence looks more random? (a) AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA (b) AGGTCGTGAAGTGCGATGGCCTTACGTAGC (c) GCGCGCGCGCGCGCGCGCGCGCGCGCGC Classical probability theory vs. Kolmogorov Complexity Deﬁnition KU (s) = min{|p|, U(p) = s} (1) Compressibility A sequence with low Kolmogorov complexity is c-compressible if |p| + c = |s|. A sequence is random if K (s) ≈ |s|. [Kolmogorov (1965); Chaitin (1966)] Hector Zenil AIT Tools for Biology and Medicine
- 6. Examples Example 1 Sequences like (a) have low algorithmic complexity because they allow a short description. For example, “20 times A”. No matter how long (a) grows in length, the description increases only by about log2 (k) (k times A). Example 2 The sequence (b) is algorithmic random because it doesn’t seem to allow a (much) shorter description other than the length of (b) itself. For example, for sequence (a), a proof of non-randomness implies the exhibition of a short program. Compressibility is therefore a suﬃcient test of non-randomness. Hector Zenil AIT Tools for Biology and Medicine
- 7. Example of an evaluation of K The sequence (b) GCGCGC...GC is not algorithmic random (or has low K complexity) because it can be produced by the following program (take G=0 and C=1): Program A(i): 1: n:= 0 2: Print n mod 2 3: n:= n+1 4: If n=i Goto 6 5: Goto 2 6: End The length of A (in bits) is an upper bound of K (GCGCGC ...GC ). Hector Zenil AIT Tools for Biology and Medicine
- 8. The ultimate measure of pattern detection and optimalprediction Kolmogorov and Chaitin, Schnorr, and Martin-L¨fo independently provided 3 diﬀerent approaches to randomness (compression, predictability and typicality). They proved (for inﬁnite sequences): incompressibility ⇐⇒ unpredictability ⇐⇒ typicality When this happens in mathematics a concept has objectively been captured (randomness). This is why prediction in biology is hard. AIT tells that no eﬀective statistical test will succeed to recognise all patterns and no computable technique can fully predict all outcomes. The problem is deeply connected to computability and algorithmic information theory. [Solomonoﬀ (1964); Kolmogorov (1965); Chaitin (1969)] Hector Zenil AIT Tools for Biology and Medicine
- 9. Information distances and similarity metrics Measures waiting to be introduced in bioinformatics Information Distance ID(x, y ) = max K (x|y ), K (y |x) Universal Similarity Metric USM(x, y ) = max K (x|y ), K (y |x)/ max K (x), K (y ) Normalised Information Distance: NCD(x, y ) = K (xy ) − min K (x), K (y )/ max K (x), K (y ) and NCD. Normalized Compression Measure (NCM): NC (s) = K (s)/|s| (asymptotic behaviour) Bennett’s Logical Depth: LDd (s) = min{t(p) : (|p| − |p ∗ | < d) and (U(p) = s)} (e.g. of an app. see Zenil, Complexity 2011) Hector Zenil AIT Tools for Biology and Medicine
- 10. Non-systematic but succesful attempts in biology GenCompress is a compression algorithm to compress DNA sequences: d(x, y ) = 1 − (K (x) − K (x|y ))/K (xy ) NCD applied to genetic similarity: AIT looks at the genome as information, not as data (letters). Counting: traditional Shannon-entropy style sequencing. Interpreting: AIT. The full power of the theory hasn’t yet been unleashed. Hector Zenil AIT Tools for Biology and Medicine
- 11. To be or not to be... Borel’s “Inﬁnite Monkey” theorem Input 1 0 1024 π Syntax error √2 ∞ CH3 ∞ “To be or not to be, that is the question.” Hector Zenil AIT Tools for Biology and Medicine
- 12. Algorithmic probability Hector Zenil AIT Tools for Biology and Medicine
- 13. Producing π This C-language code produces the ﬁrst 1000 digits of π (Gjerrit Meinsma): long k = 4e3, p, a[337], q, t = 1e3; main(j){for (; a[j = q = 0]+ = 2, k; ) for (p = 1 + 2 ∗ k; j < 337; q = a[j] ∗ k + q%p ∗ t, a[j + +] = q/p) k! = j > 2? : printf (“%.3d”, a[j2]%t + q/p/t); } Producing non-random sequences: If an object has low Kolmogorov complexity then it has a short description and a greater probability to be produced by a random program. The less random a string the more likely to be produced by a short program. Hector Zenil AIT Tools for Biology and Medicine
- 14. Biological Big Data Analysis The information bottleneck: Small Data matters: Local measurements of information content are a good indication of the global information content of an object. Evidence: BDM Image classiﬁcation. Compression works at large scales looking for long regularities, while BDM is very local. Yet both yield astonishing similar results for this object sizes. Hector Zenil AIT Tools for Biology and Medicine
- 15. Complementary methods for diﬀerent sequence lengths The methods to approximate K coexist and complement each other for diﬀerent sequence lengths. short strings long strings scalability < 100 bits > 100 bits Lossless compression √ √ method × Coding Theorem √ method × × Block Decomposition √ √ √ method [Zenil, Soler, Delahaye, Gauvrit, Two-Dimensional Kolmogorov Complexity and Validation of the Coding Theorem Method by Compressibility (2012)] Hector Zenil AIT Tools for Biology and Medicine
- 16. Coding Theorem method and lossless compression The transition between one method and the other. What is complex for the Coding Theorem method is less compressible. [Soler, Zenil, Delahaye, Gauvrit, Correspondence and Independence of Numerical Evaluations of Algorithmic Information Measures (2012)] Hector Zenil AIT Tools for Biology and Medicine
- 17. Online Algorithmic Complexity Calculator Provides: Shannon’s entropy, lossless compression (Deﬂate) values, Kolmogorov complexity approximations and relative frequency order (algorithmic probability). A Mathematica API and an R module. Datasets available online at the Dataverse Network. Basic data analysis tool for shorts sequence comparison. [http://www.complexitycalculator.com] Hector Zenil AIT Tools for Biology and Medicine
- 18. Online Algorithmic Complexity Calculator 2 [http://www.complexitycalculator.com] Hector Zenil AIT Tools for Biology and Medicine
- 19. Simulation of natural systems w/complex symbolic systems An elementary cellular automaton (ECA) is deﬁned by a local function f : {0, 1}3 → {0, 1}, f maps the state of a cell and its two immediate neighbours (range = 1) to a new cell state: ft : r−1 , r0 , r+1 → r0 . Cells are updated synchronously according to f over all cells in a row. [Wolfram, (1994)] Hector Zenil AIT Tools for Biology and Medicine
- 20. Behavioural classes of CA Wolfram’s classes of behaviour: Class I: Systems evolve into a stable state. Class II: Systems evolve in a periodic (e.g. fractal) state. Class III: Systems evolve into random-looking states. Class IV: Systems evolve into localised complex structures. e.g. Rule 110 or the Game of Life. [Wolfram, (1994)] Hector Zenil AIT Tools for Biology and Medicine
- 21. Block Decomposition method (BDM) The Block Decomposition method uses the Coding Theorem method. Formally, we will say that an object c has complexity: K logm,2Dd×d (c) = (nu − 1) log2 (Km,2D (ru )) + Km,2D (ru ) (ru ,nu )∈cd×d (2) where cd×d represents the set with elements (ru , nu ), obtained from decomposing the object into blocks of d × d with boundary conditions. In each (ru , nu ) pair, ru is one of such squares and nu its multiplicity. [H. Zenil, F. Soler-Toscano, J.-P. Delahaye and N. Gauvrit, (2012)] Hector Zenil AIT Tools for Biology and Medicine
- 22. Classiﬁcation of ECA by BDM versus lossless compression Compressors have limitations (small sequences, time complexity) Applications to machine learning Problems of classiﬁcation and clustering BDM is computationally eﬃcient (runs in O(nd ) time, hence linear (d = 1) time for sequences) [H. Zenil, F. Soler-Toscano, J.-P. Delahaye and N. Gauvrit, (2012)] Hector Zenil AIT Tools for Biology and Medicine
- 23. Asymptotic behaviour of complex systems [Zenil, Complex Systems (2010)] Hector Zenil AIT Tools for Biology and Medicine
- 24. Rule space of 3-symbol 1D CA [Zenil, Complex Systems (2011)] Hector Zenil AIT Tools for Biology and Medicine
- 25. Phase transition detection Deﬁnition |C (Mt (i1 ))−C (Mt (i2 ))|+...+|C (Mt (in−1 ))−C (Mt (in ))| ctn = t(n−1) [Zenil, Complex Systems (2011)] Hector Zenil AIT Tools for Biology and Medicine
- 26. A measure of programmability ∂f (ctn ) Ctn (M) = (3) ∂t [Zenil, Complex Systems (2011)] Hector Zenil AIT Tools for Biology and Medicine
- 27. Examples Figure : ECA Rule 4 has a low Ctn for random chosen n and t (it doesn’t react much to external stimuli). limn,t→∞ Ctn (R4) = 0 [H. Zenil, Philosophy & Technology, (2013)] Hector Zenil AIT Tools for Biology and Medicine
- 28. Examples (cont.) Figure : ECA R110 has large coeﬃcient Ctn value for sensible choices of t and n, which is compatible with the fact that it has been proven to be capable of universal computation (for particular semi-periodic initial conﬁgurations). limn,t→∞ Ctn (R110) = 1 Hector Zenil AIT Tools for Biology and Medicine
- 29. Classiﬁcation of graphs [Zenil, Soler, Dingle, Graph Automorphism Estimation and Complex Network Topological Characterization by Algorithmic Randomness] Hector Zenil AIT Tools for Biology and Medicine
- 30. Characterisation of complex networks Complex Networks w/preferential attachment algorithms preserve properties invariant under network size (connectedness, robustness) at a low cost (unlike costly random nets in the number of links). [Zenil, Soler, Dingle, Graph Automorphism Estimation and Complex Network Topological Characterization by Algorithmic Randomness] Hector Zenil AIT Tools for Biology and Medicine
- 31. Biological case study: Programmable Porphyrin molecules Much about the dynamics of these molecules is known, one can perform Monte-Carlo simulations based in these mathematical models and establish a correspondence between Wang tiles and simple molecules. [joint work with ICOS, U. of Nottingham] [G. Terrazas, H. Zenil and N. Krasnogor, Exploring Programmable Self-Assembly in Non DNA-based Molecular Computing] Hector Zenil AIT Tools for Biology and Medicine
- 32. Quantitative dynamics of living systems Aggregations with similar Kolmogorov complexity cluster in similar conﬁgurations. [G. Terrazas, H. Zenil and N. Krasnogor, Exploring Programmable Self-Assembly in Non DNA-based Molecular Computing] Hector Zenil AIT Tools for Biology and Medicine
- 33. Mapping output behaviour to external stimuli: Parameterdiscovery Parameter Space P → Target Space T Target space T : Set a conﬁguration from P that triggers the desired behaviour in T . To investigate: Reduction of the parameter space Characterisation of the target space [G. Terrazas, H. Zenil and N. Krasnogor, Exploring Programmable Self-Assembly in Non DNA-based Molecular Computing] Hector Zenil AIT Tools for Biology and Medicine
- 34. Robustness and pervasiveness Concentration changes preserving behaviour: Output parameters that have the highest impact can be tested in silico before experiments in materio. [G. Terrazas, H. Zenil and N. Krasnogor, Exploring Programmable Self-Assembly in Non DNA-based Molecular Computing] Hector Zenil AIT Tools for Biology and Medicine
- 35. Orthogonality Speciﬁc concentrations producing certain behaviour using the mathematical model to be tested against empirical data. Hector Zenil AIT Tools for Biology and Medicine
- 36. Highlights and goals Ultimate goal (a few years time): An information-theoretical toolbox for systems and synthetic biology [Complex3D Proteins Database (graph representation) & Z Chen et al. Lung cancer pathways in response to treatments.] Pushing boundaries. A cutting-edge mathematical approach Tools from Complexity theory. Hector Zenil AIT Tools for Biology and Medicine
- 37. New Generation Sequence data analysis Heavily driven by: Explosion of experimental data Diﬃculties in data interpretation New paradigms for knowledge extraction Data mining the behaviour of natural systems Towards an AIT tool-kit for systems biology, a functional library of programmable biological modules with a SBML interface. Hector Zenil AIT Tools for Biology and Medicine
- 38. J.P. Delahaye and H. Zenil, On the Kolmogorov-Chaitin complexityfor short sequences, in Cristian Calude (eds), Complexity andRandomness: From Leibniz to Chaitin, World Scientiﬁc, 2007.J.-P. Delahaye and H. Zenil, Numerical Evaluation of the Complexityof Short Strings, Applied Mathematics and Computation, 2011.H. Zenil, F. Soler-Toscano, J.-P. Delahaye and N. Gauvrit,Two-Dimensional Kolmogorov Complexity and Validation of theCoding Theorem Method by Compressibility, arXiv:1212.6745 [cs.CC]F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit,Correspondence and Independence of Numerical Evaluations ofAlgorithmic Information Measures, Numerical Algorithms (in 2ndrevision)F. Soler-Toscano, H. Zenil, J.-P. Delahaye and N. Gauvrit,Calculating Kolmogorov Complexity from the Frequency OutputDistributions of Small Turing Machines, arXiv:1211.1302 [cs.IT]H. Zenil, Compression-based Investigation of the DynamicalProperties of Cellular Automata and Other Systems, ComplexSystems, Vol. 19, No. 1, pages 1-28, 2010. Hector Zenil AIT Tools for Biology and Medicine
- 39. H. Zenil and J.A.R. Marshall, Some Aspects of ComputationEssential to Evolution and Life, Ubiquity, 2012.H. Zenil, What is Nature-like Computation? A Behavioural Approachand a Notion of Programmability, Philosophy & Technology (specialissue on History and Philosophy of Computing), 2013.H. Zenil, On the Dynamic Qualitative Behavior of UniversalComputation Complex Systems, vol. 20, No. 3, pp. 265-278, 2012.H. Zenil, A Turing Test-Inspired Approach to Natural ComputationIn G. Primiero and L. De Mol (eds.), Turing in Context II (Brussels,10-12 October 2012), Historical and Contemporary Research inLogic, Computing Machinery and Artiﬁcial Intelligence, Proceedingspublished by the Royal Flemish Academy of Belgium for Science andArts, 2013.G.J. Chaitin A Theory of Program Size Formally Identical toInformation Theory, J. Assoc. Comput. Mach. 22, 329-340, 1975.A. N. Kolmogorov, Three approaches to the quantitative deﬁnitionof information Problems of Information and Transmission, 1(1):1–7,1965. Hector Zenil AIT Tools for Biology and Medicine
- 40. L. Levin, Laws of information conservation (non-growth) and aspectsof the foundation of probability theory, Problems of InformationTransmission, 10(3):206–210, 1974.M. Li, P. Vit´nyi, An Introduction to Kolmogorov Complexity and Its aApplications, Springer, 3rd. ed., 2008.R.J. Solomonoﬀ. A formal theory of inductive inference: Parts 1 and2, Information and Control, 7:1–22 and 224–254, 1964.S. Wolfram, A New Kind of Science, Wolfram Media, 2002. Hector Zenil AIT Tools for Biology and Medicine

No public clipboards found for this slide

Login to see the comments