Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards a General Theory of Intelligence - April 2018

1,540 views

Published on

Towards AGI - overview, updates and developments

Published in: Technology
  • @Fabian J. G. Westerheide - Hi Fabian, would love to. Drop me a line at pmorgan@deeplp.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dear Peter, thank you for sharing those slides. I like the technical deep-dive, since many AGI talks are high-level. Would you enjoy talking about the roadmap towards AGI at our next Rise of AI conference May 2019 in Berlin? It is an annual gathering for 800 AI researchers, investors, entrepreneurs and companies in Berlin. www.riseof.ai
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Errata: Slide 87, Deep RL Directions. Replace "Geometric Deep Learning http://geometricdeeplearning.com • Gary Marcus" with, "Deep Reinforcement Learning Symposium, NIPS 2017, https://sites.google.com/view/deeprl-symposium-nips2017/home".
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Towards a General Theory of Intelligence - April 2018

  1. 1. London Deep Learning Lab Meetup – April 19, 2018 © Peter Morgan, April 2018 https://www.meetup.com/Deep-Learning-Lab/
  2. 2. Towards a General Theory of Intelligence Peter Morgan www.deeplp.com
  3. 3. Thanks to our Sponsors Wizebit © Peter Morgan, April 2018
  4. 4. Upcoming Conferences © Peter Morgan, April 2018
  5. 5. © Peter Morgan, April 2018
  6. 6. London 9-11 October © Peter Morgan, April 2018
  7. 7. © Peter Morgan, April 2018
  8. 8. Announcements • TensorFlow Dev Summit March 30, 2018 • Summary of TF developments over the last year • Held in Mountain View CA • https://www.youtube.com/watch?v=bUjMAzCg k2A&list=PLQY2H8rRoyvxjVx3zfw4vA4cvlKogyL NN • Coincided with Release 1.7 • 11 million downloads so far • Many highlights – check it out. © Peter Morgan, April 2018
  9. 9. Announcements • HOUSE OF LORDS Select Committee on Artificial Intelligence releases AI Report on 16 April: “AI in the UK: ready, willing and able?” • https://www.parliament.uk/business/committees/committees-a-z/lords- select/ai-committee/news-parliament-2017/ai-report-published/ • The Select Committee on Artificial Intelligence was appointed by the House of Lords on 29 June 2017 “to consider the economic, ethical and social implications of advances in artificial intelligence” • “Our inquiry has concluded that the UK is in a strong position to be among the world leaders in the development of artificial intelligence during the twenty- first century”. © Peter Morgan, April 2018
  10. 10. Announcements (due to be published by end of April) © Peter Morgan, April 2018
  11. 11. Outline of Talk • Physical Systems • Biological • Non-biological • Deep Learning • Description • CNN, RNN, LSTM, GAN • Reinforcement Learning • Latest Research in DL • Other (Better) Theories? • Overview • Comparisons • AGI • Conclusions © Peter Morgan, April 2018
  12. 12. Motivation • Solve (general) intelligence • Use it to solve everything else • Medicine • Cancer • Brain disease (Alzheimer's, etc.) • Longevity • Physics • Maths • Materials science • Social © Peter Morgan, April 2018
  13. 13. The Big Picture - a ToE? Physics Computer Science Neuroscience © Peter Morgan, April 2018
  14. 14. Physical Systems • Biological • Plants, bacteria, insects, reptiles, mammalian, biological brains • Non-biological • CPU - Intel Xeon SP, AMD RyZen, Qualcomm, IBM PowerPC, ARM • GPU - Nvidia (Volta), AMD (Vega) • FPGA - Intel (Altera, Xylinx etc.) • ASIC - Google TPU, Graphcore IPU, Intel Nervana, Wave, … • Neuromorphic (Human Brain Project - SpiNNaker, BrainScaleS; IBM TrueNorth; Intel Liohi, … • Quantum • IBM, Microsoft, Intel, Google, DWave, Rigetti, … • Quantum biology? (photosynthesis, navigation, …) • QuantumML, Quantum Intelligence © Peter Morgan, April 2018
  15. 15. Types of Physical Computation Systems* *Can we find a theory that unifies them all (classical, quantum, biological, non-biological) Digital Neuromorphic Quantum Biological © Peter Morgan, April 2018
  16. 16. Biology © Peter Morgan, April 2018
  17. 17. Biological Systems are Hierarchical © Peter Morgan, April 2018
  18. 18. Biological Neuron Microstructure © Peter Morgan, April 2018
  19. 19. Biological Neuron © Peter Morgan, April 2018
  20. 20. Hand drawn neuron types From "Structure of the Mammalian Retina" c.1900, by Santiago Ramon y Cajal. © Peter Morgan, April 2018
  21. 21. Neuron - scanning electron microscope © Peter Morgan, April 2018
  22. 22. © Peter Morgan, April 2018
  23. 23. Cortical columns in the cortex © Peter Morgan, April 2018
  24. 24. © Peter Morgan, April 2018
  25. 25. Human Connectome © Peter Morgan, April 2018
  26. 26. Central Nervous System (CNS) © Peter Morgan, April 2018
  27. 27. Social Systems © Peter Morgan, April 2018
  28. 28. A Comparison of Neuron Models © Peter Morgan, April 2018
  29. 29. Non-biological Hardware • Digital • CPU • GPU • FPGA • ASIC • Neuromorphic • Various architectures • SpiNNaker, BrainScaleS, … • Quantum • Different qubits • Anyons, superconducting, photonic, … © Peter Morgan, April 2018
  30. 30. Digital Computing • Abacus • Charles Babbage • Ada Lovelace • Vacuum tubes (valves) • Turing • Von Neumann • ENIAC • Transistor (Bardeen, Brattain, Shockey, 1947) • Intel • ARM • Nvidia © Peter Morgan, April 2018
  31. 31. © Peter Morgan, April 2018
  32. 32. Cray-1 1976 160 MFlops© Peter Morgan, April 2018
  33. 33. CPU – Intel Xeon Up to 18 cores, ~1 TFlops© Peter Morgan, April 2018
  34. 34. GPU – Nvidia Volta V100 21 billion transistors, 120 TFlops© Peter Morgan, April 2018
  35. 35. DGX-2 - released 27 Mar 2018 16 V100’s, 2 PFlops, 30TB storage ($400k) 2 PFlops! © Peter Morgan, April 2018
  36. 36. ASIC - Google TPU v2 180 TFlops© Peter Morgan, April 2018
  37. 37. © Peter Morgan, April 2018
  38. 38. ASIC - Graphcore IPU © Peter Morgan, April 2018 >200 TFlops
  39. 39. Graph computations – Graphcore (ResNet-50) © Peter Morgan, April 2018
  40. 40. TPU Pod 64 2nd-genTPUs 11.5 PetaFlops 4 Terabytes ofmemory Cloud TPU’s © Peter Morgan, April 2018
  41. 41. HPC – what’s next? Currently 100PFlops By 2020 - Exascale© Peter Morgan, April 2018
  42. 42. Processor Performance (MFlops) More specific à © Peter Morgan, April 2018
  43. 43. End to End Hardware Example © Peter Morgan, April 2018
  44. 44. Neuromorphic Computing • Biologically inspired • First proposed Carver Mead, Caltech, 1980’s • Uses analogue signals – spiking neural networks (SNN) • SpiNNaker (Manchester, HBP, Furber) • BrainScaleS (Heidelberg, HBP, Schemmel) • TrueNorth (IBM, Modha) • Intel Liohi • Startups (Knowm, Spaun, etc.) • Up to 1 million cores, 1 billion “neurons” (mouse) • Need to scale 100X à human brain • Relatively low power • Available on the (HBP) cloud today © Peter Morgan, April 2018
  45. 45. SpiNNaker Neuromorphic Computer © Peter Morgan, April 2018
  46. 46. Neuromorphic vs von Neumann © Peter Morgan, April 2018
  47. 47. TrueNorth Performance © Peter Morgan, April 2018
  48. 48. © Peter Morgan, April 2018
  49. 49. Neuromorphic v ASIC Analogue v Digital © Peter Morgan, April 2018
  50. 50. Quantum Computing • First proposed by Richard Feynman, Caltech, 1980’s • Qubits – spin 1, 0 and superposition states (QM) • (Nature is) fundamentally probabilistic at atomic scale • Have to be kept cold (mKelvin) to avoid noise/decoherence • Building is an engineering problem (theory is known) • Several approaches - superconductors, trapped ions, semiconductors, topological structures • Several initiatives (with access available) • Microsoft, IBM, Google, Intel, Dwave, Rigetti, etc. • Can login today • Many applications – optimization, cryptography, drug discovery, etc. © Peter Morgan, April 2018
  51. 51. IBM 50 Qubit Quantum Computer © Peter Morgan, April 2018
  52. 52. © Peter Morgan, April 2018
  53. 53. Quantum Logic Gates © Peter Morgan, April 2018
  54. 54. Summary – Now have three non-biological stacks Algorithms Distributed Layer OS Hardware Digital Neuromorphic Quantum © Peter Morgan, April 2018
  55. 55. Outline • Physical Systems • Biological • Non-biological • Deep Learning • Description • CNN, RNN, LSTM, GAN • Reinforcement Learning • Latest Research in DL • Other (Better) Theories? • Overview • Comparisons • AGI • Conclusions © Peter Morgan, April 2018
  56. 56. Deep Learning • Artificial Neural Networks (ANNs) • Universal Approximation Theorem • Computation graph • Hyperparameters • AutoML • Optimization • CNN • RNN (LSTM) © Peter Morgan, April 2018
  57. 57. Deep Learning (cont.) • GAN • Different Models • AlexNet, VGG, ResNet, Inception • Squeeznet, MobileNet • DL Frameworks • TensorFlow • MXnet, CNTK, PyTorch • Training data sets • Text, speech, images, video, time series © Peter Morgan, April 2018
  58. 58. Early papers © Peter Morgan, April 2018
  59. 59. Nodes and Layers © Peter Morgan, April 2018
  60. 60. © Peter Morgan, April 2018
  61. 61. More Neural Networks (“Neural Network Zoo”) © Peter Morgan, April 2018
  62. 62. Computation in each node © Peter Morgan, April 2018
  63. 63. Universal Approximation Theorem • A feed-forward network with a single hidden layer containing a finite number of neurons, can approximate continuous functions in Rn, under mild assumptions on the activation function • We can define as an approximate realization of f(x): • One of the first versions of the theorem was proved by George Cybenko in 1989 for sigmoid activation functions • Kurt Hornik showed in 1991 that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture which gives neural networks the potential of being universal approximators • Cybenko, G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314 • Kurt Hornik (1991) "Approximation Capabilities of Multilayer Feedforward Networks", Neural Networks, 4(2), 251–257 © Peter Morgan, April 2018
  64. 64. Computation Graph https://www.tensorflow.org/programmers_guide/graph_viz© Peter Morgan, April 2018
  65. 65. Hyperparameters • Activation function • Loss (cost) function • Learning rate • Initialization • Batch normalization • Automation • Hyperparameter tuning • AutoML • https://research.googleblog.com/2018/03/using-machine-learning-to-discover.html © Peter Morgan, April 2018
  66. 66. Optimizations • Initializers Constant, Uniform, Gaussian, Glorot Uniform, Xavier, Kaiming, IdentityInit, Orthonormal • Optimizers Gradient Descent with Momentum, RMSProp, Adadelta, Adam, Adagrad, MultiOptimizer • Activations Rectified Linear, Softmax, Tanh, Logistic, Identity, ExpLin • Layers Linear, Convolution, Pooling, Deconvolution, Dropout, Recurrent, Long Short- Term Memory, Gated Recurrent Unit, BatchNorm, LookupTable, Local Response Normaliz ation, Bidirectional-RNN, Bidirectional-LSTM • Cost functions Binary Cross Entropy, Multiclass Cross Entropy, Sum of Squares Error © Peter Morgan, April 2018
  67. 67. Deep Learning Performance Image classification © Peter Morgan, April 2018
  68. 68. Deep Learning Performance ImageNet Error rate is now around 2.2%, less than half that of average humans © Peter Morgan, April 2018
  69. 69. Convolutional Neural Networks • First developed in 1970’s. • Widely used for image recognition and classification. • Inspired by biological processes, CNN’s are a type of feed-forward ANN. • The individual neurons are tiled in such a way that they respond to overlapping regions in the visual field • Yann LeCun – Bell Labs, 90’s © Peter Morgan, April 2018
  70. 70. Recurrent Neural Networks • First developed in 1970’s. • RNN’s are neural networks that are used to predict the next element in a sequence or time series. • This could be, for example, words in a sentence or letters in a word. • Applications include predicting or generating music, stories, news, code, financial instrument pricing, text, speech, in fact the next element in any event stream. © Peter Morgan, April 2018
  71. 71. GANs Generative Adversarial Networks - introduced by Ian Goodfellow et al in 2014 (see references) A class of artificial intelligence algorithms used in unsupervised deep learning A theory of adversarial examples, resembling what we have for normal supervised learning Implemented by a system of two neural networks, a discriminator, D and a generator, G D & G contest with each other in a zero-sum game framework Generator generates candidate networks and the discriminator evaluates them © Peter Morgan, April 2018
  72. 72. Stacked Generative Adversarial Networks https://arxiv.org/abs/1612.04357v1© Peter Morgan, April 2018
  73. 73. Collection Style Transfer © Peter Morgan, April 2018
  74. 74. Season Transfer © Peter Morgan, April 2018
  75. 75. Models AlexNet (Toronto) VGG (Oxford) ResNet (Microsoft) Inception (Google) DenseNet (Cornell) SqueezNet (Berkeley) MobileNet (Google) NASNet (Google) © Peter Morgan, April 2018
  76. 76. Deep Learning Frameworks © Peter Morgan, April 2018
  77. 77. Top 20 ML/DL Frameworks KD Nuggets Feb 2018 https://www.kdnuggets.com/2018/02/top-20-python-ai-machine-learning-open-source-projects.html * Deep Learning o Machine Learning * MXNet *CNTK © Peter Morgan, April 2018
  78. 78. TensorFlow • TensorFlow is the open sourced deep learning library from Google (Nov 2015) • It is their second generation system for the implementation and deployment of large-scale machine learning models • Written in C++ with a python interface, originated from research and deploying machine learning projects throughout a wide range of Google products and services • Initially TF ran only on a single node (your laptop, say), but now runs on distributed clusters • Available across all the major cloud providers (TFaaS) • Second most popular framework on GitHub • Close to 100,000 stars as of March 2018 • https://www.tensorflow.org/ © Peter Morgan, April 2018
  79. 79. TensorFlow supports many platforms RaspberryPi AndroidiOS 1st-genTPU GPUCPU CloudTPU © Peter Morgan, April 2018
  80. 80. Growth of Deep Learning atGoogle and many more . . .. Directories containing model descriptionfiles © Peter Morgan, April 2018
  81. 81. TensorFlow Popularity © Peter Morgan, April 2018
  82. 82. Other Frameworks • CNTK (Microsoft) • MXnet (Amazon) • Keras (Open source community) • PyTorch (Facebook) • Caffe (Berkeley) • Neon (Intel) • Chainer (Preferred Networks) © Peter Morgan, April 2018
  83. 83. Data Sets • Text, speech, images, video, time series • Examples of recorded data sets include the MNIST and Labeled Faces in the Wild (LFW). MNIST LFW © Peter Morgan, April 2018
  84. 84. Other Data Sets • Images: CIFAR-10, ImageNet, PASCAL VOC, Mini-Places2, Food 101 • Text: IMDB, Penn Treebank, Shakespeare Text, bAbI, Hutter-prize • Video: UCF101, Kinetics, YouTube-8M, CMU mocap • Others: flickr8k, flickr30k, COCO • List of data sets for machine learning https://en.wikipedia.org/wiki/List_of_datasets_for_machine_learning_research © Peter Morgan, April 2018
  85. 85. Open Source • ML Frameworks – open source (e.g., TensorFlow) • Operating systems – open source (Linux) • Hardware – open source (OCP = Open Compute Project) • Data sets – open source (see previous slide) • Research – open source (see arXiv) • The fourth industrial revolution will be open source © Peter Morgan, April 2018
  86. 86. Reinforcement Learning • TD Learning • DQN • Latest research • NIPS Workshop Dec 2017 • http://metalearning-symposium.ml © Peter Morgan, April 2018
  87. 87. RL Research Directions • Graphcore https://www.graphcore.ai/posts/directions-of-ai-research • Bristol ASIC • Geometric Deep Learning http://geometricdeeplearning.com • Gary Marcus • Berkeley (BAIR) http://bair.berkeley.edu • Peter Abdeel • Serge Levine • Deepmind https://deepmind.com • IMPALA (DMLab) https://deepmind.com/blog/impala-scalable-distributed-deeprl- dmlab-30/ • OpenAI https://openai.com • Research white papers © Peter Morgan, April 2018
  88. 88. Outline • Physical Systems • Biological • Non-biological • Deep Learning • Description • CNN, RNN, LSTM, GAN • Reinforcement Learning • Latest Research in DL • Other (Better) Theories? • Overview • Comparisons • AGI • Conclusions © Peter Morgan, April 2018
  89. 89. Other Theories of Intelligence • What do we need? • Active Inference • Gauge theories • Other approaches • Applications • Building AGI © Peter Morgan, April 2018
  90. 90. What do we need to build AGI? A Principle of Principles? • Free Energy Principle • Systems act to minimize their expected free energy • Reduce uncertainty (or surprisal) • F = Complexity – Accuracy • Prediction error = expected outcome – actual outcome = surprise • Theory of Everything (ToE) • In physics - try to unify gravity and quantum mechanics è call this a ToE • But actually Active Inference is more encompassing than even this • It encompasses all interactions and dynamics (physical phenomena) • Over all time scales • Over all distance scales • Also see Constructor Theory • David Deutsch (Oxford) © Peter Morgan, April 2018
  91. 91. So what are the principles? Hint: we already have them Newtonian mechanics – three laws Special relativity – invariance of laws under a Lorentz transformation GR – Principle of Equivalence Electromagnetism – Maxwell’s equations Thermodynamics – three laws Quantum mechanics – uncertainty principle Relativistic QM – Dirac equation Dark energy/dark matter – we don’t know yet All of the above = Principle of Least Action © Peter Morgan, April 2018
  92. 92. Analogy – Einstein’s General Theory of Relativity • Made some very general (and insightful) assumptions about the laws of physics in a gravitational field (non-inertial frames) • Equivalence principle • Covariance of laws of physics • Generalised coordinate system – Riemannian geometry • Spacetime is curved • Standing on the shoulders of giants • After ten years of hard work he finally wrote down his now famous field equations © Peter Morgan, April 2018
  93. 93. All known physics – Field theoretic © Peter Morgan, April 2018
  94. 94. Active Inference - Information theoretic (uses generalised free energy) ( ) argmin E [ ( , )] [ ( ) || ( )] ( ) argmin ( , ) ( , ) E [ln ( | ) ln ( , | )] ln ( ) ( , ) ( , ) E [ln ( | ) ln ( , | )] Q Q entropy energy Q entropy energy Q F D Q P Q s F F Q s P o s P G G Q s P o s t t t t t t t t t t p p t p p p p t p t p p p p t p t p p = + = = - = - = - å å å !"#"$ !""#""$ !"#"$ !""#""$ Perceptual inference Policy selection ( , | ) ( , | ) ( , ) E [ ( , )] E [ln ( | ) ( | ) ln ( , )] E [ ( | ) || ( | )] [ ( , ) || ( | ) ( | )] Q Q o s entropy energy Q o s expected cost epistemic value(mutual informat G F Q o Q s P o s D Q s P s D Q o s Q s Q o t t t t p t t t t p t t t t t t p t p t p p p p p p = = - = - !"""#"""$ !"#"$ !""""#""""$ ion) !"""""#"""""$ Generalised free energy – with some care ( | ) : ( | ) ( ) : ( ) : ( | ) ( | ) : P o s t Q o s o t P s t P s P s t t t t t t t t t t d t t p p t >ì = í £î >ì = í £î © Peter Morgan, April 2018
  95. 95. Active Inference Karl Friston - UCL © Peter Morgan, April 2018
  96. 96. ln ( ) ( , ) arg min ( , ) ( , ) E [ln ( | ) ln ( , )] [ ( | ) || ( )] [ [ ( | )]] Q expected entropy expected energy Q expected cost expected ambiguity P G G G Q s P o s D Q s P s E H P o s t p t t t t t t t t p g p t p t p t p p = - × Þ = = - = + å åπ !"#"$ !"#"$ !"""#"""$ !""#""$ [ ] 0 0 ( [ ], [ ]| [ ]) ( [ ]) 0 [ ] arg min ( [ ]) ( [ ]) E [ln ( [ ]| [ ]) ln ( [ ], [ ])] [ ( [ ]| [ ]) || ( [ ])] T a T a p s a expected entropy expected energy expected complexity d a d a p b p b D p b p t h t t t d t t t t t t h t t t h t h t t h t = Þ = = - = ò ò I a a I I !""#""$ !""#""$ !""""#""" $ ( [ ]| [ ])E [ [ ( [ ]| [ ])]]p a expected ambiguity H p bh t t t h t+ " !"""""#"""""$ Active states ( , )ss f bh w= + ( )a af b Fµ -Ñ External states Sensory states ( , )f bhh h w= +! prefrontal cortex β tu VTA/SN motor cortex occipital cortex striatum toπ p ts G hippocampus Discrete formulation Dynamic formulation Expected surprise and free energy © Peter Morgan, April 2018
  97. 97. What is free-energy? Free-energy is basically prediction error where small errors mean low surprise General Principle – Systems act to minimize uncertainty (their expected free energy) sensations – predictions = prediction error © Peter Morgan, April 2018
  98. 98. The Markov blanket of cells to brains Active states ( , , )aa f s a µ»! External states Internal states Sensory states ( , , )f s aµµ µ»! ( , , )s ss f s ay w= +! ( , , )f s ay yy y w= +! Cell Brain © Peter Morgan, April 2018
  99. 99. But what about the Markov blanket? ( , , )s s a µ=! "#$ ( ) ln ( | ) ( ) ln ( | )a Q p s m a Q p s m µµ = G - Ñ = G - Ñ ! " ! " Perception Action Reinforcement learning, optimal control and expected utility theory Infomax, minimum redundancy and the free-energy principle Self-organisation, synergetics and homoeostasis Bayesian brain, evidence accumulation and predictive coding Value Surprise Entropy Model evidence Pavlov Haken Helmholtz ln ( | ) ln ( | ) [ ln ( | )] ( | ) t p s m F p s m E p s m p s m = = - = - = = ! ! ! ! Barlow ( ) ( ) ln ( | )f x Q p x m= G - Ñ © Peter Morgan, April 2018
  100. 100. Application © Peter Morgan, April 2018
  101. 101. Summary • Biological agents resist the second law of thermodynamics • They must minimize their average surprise (entropy) • They minimize surprise by suppressing prediction error (free-energy) • Prediction error can be reduced by changing predictions (perception) • Prediction error can be reduced by changing sensations (action) • Perception entails recurrent message passing in the brain to optimise predictions • Action makes predictions come true (and minimises surprise) Perception Birdsong and categorization Simulated lesions Action Active inference Goal directed reaching Policies Control and attractors The mountain-car problem © Peter Morgan, April 2018
  102. 102. Techniques from Maths and Physics • We’ve already been here before • We use various mathematical techniques to describe physical phenomena • Maths: higher dimensions, group theory, transformations, symmetries, path integrals, variational calculus, gauge theories, topology, vector spaces, category theory, algebraic geometry, … • Physics: special relativity, general relativity, QM, QFT, QED, standard model, particle physics, statistical physics, information theory, classical physics, EM, gravitation, string theory, unification theory, … • Apply above tools to the brain – after all the brain is a (hierarchical) physical system • For example – mirror symmetry • Transform to another mathematical space where the calculation is more easily performed, then transform back (“duality”) © Peter Morgan, April 2018
  103. 103. Gauge Theories • Invariance of laws under transformations – Gauge theories • Give rise to conservation laws • Noether’s theorem • Examples: • Neuronal gauge theory - many aspects of neurobiology can be seen as consequences of fundamental invariance properties • See references section Invariance under transformation Conserved quantity Space Momentum Time Energy Rotation Angular momentum © Peter Morgan, April 2018
  104. 104. Types of Intelligence © Peter Morgan, April 2018
  105. 105. Comparisons - ANN vs BNN • Neural circuits in the brain develop via synaptic pruning; a process by which connections are overproduced and then eliminated over time • In contrast, computer scientists typically design networks by starting with an initially sparse topology and gradually adding connections • AI (specific) vs AGI (general) • Yann LeCun – CNN’s Bell Labs in ’80/90’s – “mathematical, not biological” • Gone as far as we can with ”just” mathematics • Now almost every researcher looking to biology for inspiration • Costa et al, 2018, etc. (see “Bio-plausible Deep Learning” in reference section) © Peter Morgan, April 2018
  106. 106. © Peter Morgan, April 2018
  107. 107. © Peter Morgan, April 2018
  108. 108. © Peter Morgan, April 2018
  109. 109. © Peter Morgan, April 2018
  110. 110. Approaches • Helmholtz (Late 1800’s) • Friston – Active Inference • Tishby – Information bottleneck • Bialek – Biophysics • Hutter - AIXI • Schmidhuber – Godel Machine • Etc. © Peter Morgan, April 2018
  111. 111. Key Concepts • Bayesian inference • Predictive coding • Generative models • Cortical organization • Perception • Action • Learning • Decision making • Affect • Computational psychiatry © Peter Morgan, April 2018
  112. 112. Probabilistic Programming • A probabilistic programming language (PPL) is a programming language designed to describe probabilistic models and then perform inference in those models • Define a probability model on a programme • Closely related to graphical models and Bayesian networks, but are more expressive and flexible. Probabilistic programming represents an attempt to unify general purpose programming with probabilistic modeling • Languages include Edward, Church, Anglican, Pyro, PyMC, MetaProb, Gen, Stan, Turing.jl, Infer.NET • Introducing TensorFlow Probability https://medium.com/tensorflow/introducing- tensorflow-probability-dca4c304e245 • Announced at TF Dev Summit, March 30, 2018 (see next slide) © Peter Morgan, April 2018
  113. 113. Tensorflow Probability © Peter Morgan, April 2018
  114. 114. Implementations & Applications • BNN Frameworks – SPM, PyNN, NEST, NEURON, Brian • Various open source frameworks on github • Hearing aids - GN Group (DK) • Order of Magnitude - Christian Kaiser (SV) © Peter Morgan, April 2018
  115. 115. Building AGI © Peter Morgan, April 2018
  116. 116. Building AGI © Peter Morgan, April 2018
  117. 117. Can we build general intelligence? • We have the theory – active inference • We have the algorithms/software • We have the hardware (ASIC, neuromorphic) • We have the data sets (Internet plus open data sets) • Need to build out libraries • A TensorFlow for general intelligence • Open source? (Open/closed) • Apollo Project of our time – “Fourth Revolution” • Human Brain Project • Deepmind • BRAIN project • Should we build AGI/ASI? – safety, ethics, singularity?© Peter Morgan, April 2018
  118. 118. Other AGI Projects • OpenCog – Ben Goertzel (US) • Numenta – Jeff Hawkins (US) • Curious AI – (Finland) • AGI Innovations – Peter Voss (US) • Eurisko – Doug Lenat (US) • GoodAI – Marek Rosa (Czech) • OpenAI – Sam Altman (US) • NNAIsense – Jurgen Schmidhuber (Swiss) • Deepmind – Demis Hassibis (UK) • Vicarious – Dileep George (US) • SOAR – CMU • ACT-R – CMU • Sigma – Paul Rosenbloom – USC • Plus many more
  119. 119. Conclusions • Deep Learning (ANN) is lacking many of the characteristics and attributes needed for a general theory of intelligence • Active inference is such a theory (A ToE* which includes AGI) • ANN research groups are now (finally) turning to biology for inspiration • Bioplausible models are starting to appear • Some groups are starting to look at active inference • AGI in five years? Ten years? • Still have to wait for hardware to mature • Neuromorphic might be the platform that gets us *there* * ToE = Theory of Everything © Peter Morgan, April 2018
  120. 120. References © Peter Morgan, April 2018
  121. 121. Neuroscience - Books • Saxe, G. et al, Brain entropy and human intelligence: A resting-state fMRI study, PLOS One, Feb 12, 2018 • Sterling, P. and Laughlin, S., Principles of Neural Design, MIT Press, 2017 • Slotnick, S., Cognitive Neuroscience of Memory, Cambridge Univ Press, 2017 • Engel, Friston, Kragic, Eds, The Pragmatic Turn - Toward Action-Oriented Views in Cognitive Science, MIT Press, 2016 • Gerstner, W. et al, Neuronal Dynamics, Cambridge Univ Press, 2014 • Kandel, E., Principles of Neural Science, 5th ed, McGraw-Hill, 2012 • Rabinovich, Friston and Varona, Eds, Principles of Brain Dynamics, MIT Press, 2012 • Jones, E. G. Thalamus, Cambridge Univ. Press, 2007 • Dayan, P. and L. Abbott, Theoretical Neuroscience, MIT Press, 2005 © Peter Morgan, April 2018
  122. 122. Neuroscience - Papers • Crick, F., The recent excitement about neural networks, Nature337, 129–132, 1989 • Rao RP and DH Ballard, Predictive coding in the visual cortex, Nature Neuroscience 2:79–87, 1999 • Izhikevich, E. M., Solving the distal reward problem through linkage of STDP and dopamine signalling, Cereb. Cortex 17, 2443–2452, 2007 • How the brain constructs the world, 2018 https://medicalxpress.com/news/2018-02-brain-world.html • Lamme, V. A. F. & Roelfsema, P. R., The distinct modes of vision offered by feedforward and recurrent processing, Trends Neurosci. 23, 571–579, 2000 • Sherman, S. M., Thalamus plays a central role in ongoing cortical functioning, Nat. Neurosci. 16, 533– 541, 2016 • Harris, K. D. & Shepherd, G. M. G., The neocortical circuit: themes and variations, Nat. Neurosci. 18, 170–181, 2015 • van Kerkoerle, T. et al, Effects of attention and working memory in the different layers of monkey primary visual cortex, Nat. Commun. 8, 13804, 2017 • Roelfsema, P.R. and A. Holtmaat, Control of synaptic plasticity in deep cortical networks, Nature Reviews Neuroscience, 19, pages 166–180, 2018 © Peter Morgan, April 2018
  123. 123. Hardware • Lacey, G. et al, Deep Learning on FPGAs: Past, Present, and Future, Feb 2016 https://arxiv.org/abs/1602.04283 • AI ASICs https://www.nanalyze.com/2017/05/12-ai-hardware-startups-new-ai-chips/ • Suri, M. Advances in Neuromorphic Hardware, Springer, 2017 • Human Brain Project, Silicon Brains https://www.humanbrainproject.eu/en/silicon- brains/ • Artificial Brains http://www.artificialbrains.com • The Future is Quantum https://www.microsoft.com/en-us/research/blog/future-is- quantum-with-dr-krysta-svore/?OCID=MSR_podcast_ksvore_fb • Wang, Z. et al, Fully memristive neural networks for pattern classification with unsupervised learning, Nature Electronics, 8 Feb, 2018 © Peter Morgan, April 2018
  124. 124. Classical Deep Learning • Schmidhuber, Jurgen, Deep learning in neural networks: An overview, Neural Networks, 61:85–117, 2015 • Bengio, Yoshua et al, Deep Learning, MIT Press, 2016 • LeCun, Y., Bengio, Y., and Hinton, G., Deep Learning, Nature, v.521, p.436–444, May 2016 http://www.nature.com/nature/journal/v521/n7553/abs/nature14539.html • Brtiz, D. et al, Massive Exploration of Neural Machine Translation Architectures, Mar 2017 https://arxiv.org/abs/1703.03906 • Liu H. et al, Hierarchical representations for efficient architecture search, 2017 https://arxiv.org/abs/1711.00436 • NIPS 2017 Proceedings https://papers.nips.cc/book/advances-in-neural-information-processing-systems-30- 2017 • Deepmind papers https://deepmind.com/blog/deepmind-papers-nips-2017/ • Jeff Dean, Building Intelligent Systems with Large Scale Deep Learning, TensorFlow slides, Google Brain, 2017 • Rawat, W. and Z. Wang, Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review, Neural Computation, 29(9), Sept 2017 © Peter Morgan, April 2018
  125. 125. New Ideas in Deep Learning • Sabour, S. et al, Dynamic Routing Between Capsules, Nov 2017, https://arxiv.org/abs/1710.09829 • Chaudhari, P. and S. Soatto, Stochastic gradient descent performs variational inference, Jan 2018, https://arxiv.org/abs/1710.11029 • Vidal, R. et al, The Mathematics of Deep Learning, Dec 2017, https://arxiv.org/abs/1712.04741 • Chaudhari, P. and S. Soatto, On the energy landscape of deep networks, Apr 2017, https://arxiv.org/abs/1511.06485 • Pearl, Judea, Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution, Jan 2018, https://arxiv.org/abs/1801.04016 • Marcus, Gary, Deep Learning: A Critical Appraisal, Jan 2018, https://arxiv.org/abs/1801.00631 • Scellier, B. and Y. Bengio, Equilibrium propagation: bridging the gap between energy-based models and backpropagation, Front. Comput. Neurosci. 11, 24, 2017 • Pham H. et al, Efficient Neural Architecture Search via Parameter Sharing, Feb 2018, https://arxiv.org/abs/1802.03268 • Jaderberg, M. et al, Population Based Training of Neural Networks, 28 Nov, 2017, https://arxiv.org/abs/1711.09846 © Peter Morgan, April 2018
  126. 126. Bio-plausible Deep Learning • Hassabis, D. et al, Neuroscience-Inspired Artificial Intelligence, Neuron, 95(2), July 2017 • Marblestone, A.H. et al, Toward an Integration of Deep Learning and Neuroscience, Front Comput Neurosci., 14 Sept, 2016 • Costa, R.P. et al, Cortical microcircuits as gated-recurrent neural networks, Jan 2018 https://arxiv.org/abs/1711.02448 • Lillicrap T.P. et al, Random synaptic feedback weights support error backpropagation for deep learning, Nature Communications 7:13276, 2016 • Sacramento, J. et al, Dendritic error backpropagation in deep cortical microcircuits, Dec 2017, https://arxiv.org/abs/1801.00062 • Guerguiev, J. et al, Towards deep learning with segregated dendrites, eLife Neuroscience, 5 Dec, 2017 • Webb, S., Deep learning for biology, Nature, 20 Feb,2018, https://www.nature.com/articles/d41586-018-02174-z © Peter Morgan, April 2018
  127. 127. Cognitive Science • Barbey, A., Network Neuroscience Theory of Human Intelligence, Trends in Cognitive Sciences, 22(1), Jan 2018 • Navlakha, B. et al, Network Design and the Brain, Trends in Cognitive Sciences, 22 (1), Jan 2018 • Lake, B. et al, Building Machines That Learn and Think Like People, 2016 https://arxiv.org/abs/1604.00289 • Lake, B., et al, Human-level concept learning through probabilistic program induction, Science, 350(6266) Dec 2015 • Tenenbaum, J.B. et al, How to Grow a Mind: Statistics, Structure, and Abstraction, Science, 331(1279) March 2011 • Trends in Cognitive Sciences, Special Issue: The Genetics of Cognition 15 (9), Sept 2011 • William Bialek, Princeton https://www.princeton.edu/~wbialek/categories.html • Dissecting artificial intelligence to better understand the human brain https://medicalxpress.com/news/2018-03-artificial-intelligence-human-brain.html © Peter Morgan, April 2018
  128. 128. Active Inference • Friston K., The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11(2), 2010 • Friston, K., Life as we know it, Journal of the Royal Society Interface, 3 July, 2013 • Friston, K. et al, Active Inference: A Process Theory, Neural Computation, 29(1), Jan 2017 • Friston, K., Consciousness is not a thing, but a process of inference, Aeon, 18 May, 2017 • Kirchoff, M. et al, The Markov blankets of life, Journal of the Royal Society Interface, 17 Jan, 2018 © Peter Morgan, April 2018
  129. 129. Gauge Theories and Beyond • Sengupta et al, Towards a Neuronal Gauge Theory, PLOS Biology, Mar 8, 2016 • Information geometry https://en.wikipedia.org/wiki/Information_geometry • Algebraic geometry – HBP https://www.wired.com/story/the-mind-boggling-math-that- maybe-mapped-the-brain-in-11-dimensions/ • Guss, W.H., Deep function machines: Generalized neural networks for topological layer expression, 2016 https://arxiv.org/abs/1612.04799 • Guss, W.H. and R. Salakhutdinov, On Characterizing the Capacity of Neural Networks using Algebraic Topology, 2018 https://arxiv.org/abs/1802.04443 • Fok, R. et al, Spontaneous Symmetry Breaking in Deep Neural Networks ICLR Conference Submission, Feb 2018 • Bronstein, M.M. et al, Geometric deep learning: going beyond Euclidean data, May 2017, https://arxiv.org/abs/1611.08097 © Peter Morgan, April 2018
  130. 130. AGI • Veness, J. et al, A Monte Carlo AIXI Approximation, Dec 2010, https://arxiv.org/abs/0909.0801 • Schmidhuber, J., Goedel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements, Dec 2006, https://arxiv.org/abs/cs/0309048 • Hutter, M., One Decade of Universal Artificial Intelligence, Feb 2012, https://arxiv.org/abs/1202.6153 • Sunehag, P. and M. Hutter, Principles of Solomonoff Induction and AIXI, Nov 2011, https://arxiv.org/abs/1111.6117 • Silver, D. et al, Mastering the game of Go without human knowledge, Nature, Vol 550, 19 Oct, 2017 • Wolpert, D., Physical limits of inference, Oct 2008, https://arxiv.org/abs/0708.1362 • Goertzel, B., Toward a Formal Model of Cognitive Synergy, Mar 2017, https://arxiv.org/abs/1703.04361 • Hauser, Hermann, Are Machines Better than Humans? Evening lecture on machine intelligence at SCI, London, 25 October 2017 https://www.youtube.com/watch?v=SVOMyEeXUow © Peter Morgan, April 2018
  131. 131. Information Theory • Chaitin, G.J., From Philosophy to Program Size, Mar 2013, https://arxiv.org/abs/math/0303352 • Solomonoff, R.J., Machine Learning — Past and Future, Revision of lecture given at AI@50, The Dartmouth Artificial Intelligence Conference, July 13-15, 2006 • Publications of A. N. Kolmogorov, Annals of Probability, 17(3), July 1989 • Levin, L. A., Universal Sequential Search Problems, Problems of Information Transmission, 9(3), 1973 • Shannon, C.E., A Mathematical Theory of Communication, Bell System Technical Journal, 27 (3):379–423, July 1948 • Tishby, N. & R. Schwartz-Ziv, Opening the Black Box of Deep Neural Networks via Information, Apr 29 2017, https://arxiv.org/abs/1703.00810 • AIT https://en.m.wikipedia.org/wiki/Algorithmic_information_theory © Peter Morgan, April 2018
  132. 132. Classic Papers • Turing, A.M., Computing Machinery and Intelligence, Mind 49:433-460, 1950 • Schrodinger, E., What is Life? Based on lectures delivered at Trinity College, Dublin, Feb 1943 http://www.whatislife.ie/downloads/What-is-Life.pdf • Deutsch, David, The Constructor Theory of Life, Journal of the Royal Society Interface, 12(104), 2016 • Rumelhart DE, Hinton GE, Williams RJ, Learning representations by back-propagating errors, Nature 323:533–536, 1986 • Crick F., The recent excitement about neural networks, Nature 337:129–132, 1989 • Kolmogorov, A., On Analytical Methods in the Theory of Probability, Mathematische Annalen, 104(1) 1931 • Solomonoff, R.J., A Formal Theory of Inductive Inference, Part 1, Information and Control, 7(1), Mar, 1964, http://world.std.com/~rjs/1964pt1.ps • McCulloch, W.S. and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics, 5(4):115–133, 1943 © Peter Morgan, April 2018
  133. 133. Mathematical • Cybenko, George, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems, 2(4):303–314, 1989 • Kurt Hornik (1991) "Approximation Capabilities of Multilayer Feedforward Networks", Neural Networks, 4(2), 251–257 • Stinchcombe, M.B., Neural network approximation of continuous functionals and continuous functions on compactifications, Neural Networks, 12(3):467–477, 1999 © Peter Morgan, April 2018
  134. 134. Books • Sutton, R. S. & A.G. Barto, Reinforcement Learning, 2nd ed., MIT Press, 2018 • Goodfellow, I. et al, Deep Learning, Cambridge University Press, 2016 • Li, Ming and Paul Vitanyi, An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, N.Y., 2008 • Hutter M., Universal Artificial Intelligence, Springer–Verlag, 2004 • Wolfram, S., A New Kind of Science, Wolfram Media, 2002 • MacKay, David, Information theory, inference and learning algorithms, Cambridge University Press, 2003 • Hebb, D. O. The Organization of Behavior. A Neuropsychological Theory, John Wiley & Sons, 1949 © Peter Morgan, April 2018
  135. 135. Final Word … https://www.youtube.com/watch?v=7ottuFZYflg© Peter Morgan, April 2018
  136. 136. Questions © Peter Morgan, April 2018

×