Deep learning has recently reached the height the pioneers wished for, serving as the driving force behind recent breakthroughs in AI, which have arguably surpassed the Turing test. In this tutorial, we will provide an overview of the fundamental principles of deep learning and explore the latest advances in the field, including Foundation Models. We will also examine the powers and limitations of deep learning, exploring how reasoning may emerge from carefully crafted neural networks and massively pre-trained models.
AI as a general-purpose technology akin to steam engines and electricity, holds the potential for profound global socio-economic change. In this talk, we delve into a new form of disruptive AI known as Generative AI (GenAI) and its revolutionary impact on how we live, work, and interact with our environment. This discussion will cover GenAI’s arrival, capability and its impact. We will also discuss the challenges and opportunities that GenAI presents to industry leaders and practitioners including the defence sector. We'll explore its potential to reshape industries, push creative boundaries, and expand consolidated knowledge -- GenAI has become the cornerstone upon which new platforms, companies, and industries are built.
Deep Learning has taken the digital world by storm. As a general purpose technology, it is now present in all walks of life. Although the fundamental developments in methodology have been slowing down in the past few years, applications are flourishing with major breakthroughs in Computer Vision, NLP and Biomedical Sciences. The primary successes can be attributed to the availability of large labelled data, powerful GPU servers and programming frameworks, and advances in neural architecture engineering. This combination enables rapid construction of large, efficient neural networks that scale to the real world. But the fundamental questions of unsupervised learning, deep reasoning, and rapid contextual adaptation remain unsolved. We shall call what we currently have Deep Learning 1.0, and the next possible breakthroughs as Deep Learning 2.0.
This is part 1 of the Tutorial delivered at IEEE SSCI 2020, Canberra, December 1st (Virtual).
Artificial intelligence in the post-deep learning eraDeakin University
Deep learning has recently reached the heights that pioneers in the field had aspired to, serving as the driving force behind recent breakthroughs in AI, which have arguably surpassed the Turing test. At present, the spotlight is on scaling Transformers and diffusion models on Internet-scale data. In this talk, I will provide an overview of the fundamental principles of deep learning, its powers, and limitations, and explore the new era of post-deep learning. This new era encompasses novel objectives, dynamic architectures, abstract reasoning, neurosymbolic hybrid systems, and LLM-based agent systems.
Deep learning, enabled by powerful compute, and fuelled by massive data, has delivered unprecedented data analytics capabilities. However, major limitations remain. Chiefly among those is that deep neural networks tend to exploit the surface statistics in the data, creating short-cuts from the input to the output, without really deeply understanding of the data. As a result, these networks fail miserably to generalize to novel combinations. This is because the networks perform shallow pattern matching but not deliberate reasoning – the capacity to deliberately deduce new knowledge out of the contextualized data. Second, machine learning is often trained to do just one task at a time, making it impossible to re-define tasks on the fly as needed in a complex operating environment. This talk presents our recent developments to extend the capacity of neural networks to remove these limitations. Our main focus is on learning to reason from data, that is, learning to determine if the data entails a conclusion. This capacity opens up new ways to generate insights from data through arbitrary querying using natural languages without the need of predefining a narrow set of tasks.
In this talk we will summarise some of the detectable trends on AI beyond deep learning. We will focus on the current transition from deep learning to deep semantics, describing the enabling infrastructures, challenges and opportunities in the construction of the next generation AI systems. The talk will focus on Natural Language Processing (NLP) as an AI sub-domain and will link to the research at the AI Systems Lab at the University of Manchester.
TL;DR: This tutorial was delivered at KDD 2021. Here we review recent developments to extend the capacity of neural networks to “learning to reason” from data, where the task is to determine if the data entails a conclusion.
The rise of big data and big compute has brought modern neural networks to many walks of digital life, thanks to the relative ease of construction of large models that scale to the real world. Current successes of Transformers and self-supervised pretraining on massive data have led some to believe that deep neural networks will be able to do almost everything whenever we have data and computational resources. However, this might not be the case. While neural networks are fast to exploit surface statistics, they fail miserably to generalize to novel combinations. Current neural networks do not perform deliberate reasoning – the capacity to deliberately deduce new knowledge out of the contextualized data. This tutorial reviews recent developments to extend the capacity of neural networks to “learning to reason” from data, where the task is to determine if the data entails a conclusion. This capacity opens up new ways to generate insights from data through arbitrary querying using natural languages without the need of predefining a narrow set of tasks.
Keynote given at the workshop for Artificial Intelligence meets the Web of Data on Pragmatic Semantics.
In this keynote I argue that the Web of Data is a Complex System or Marketplace of Ideas rather than a classical Database, and that the model theory on which classical semantics are based is not appropriate in all situations, and propose an alternative "Pragmatic Semantics" based on optimisation of possible interpretations. .
AI as a general-purpose technology akin to steam engines and electricity, holds the potential for profound global socio-economic change. In this talk, we delve into a new form of disruptive AI known as Generative AI (GenAI) and its revolutionary impact on how we live, work, and interact with our environment. This discussion will cover GenAI’s arrival, capability and its impact. We will also discuss the challenges and opportunities that GenAI presents to industry leaders and practitioners including the defence sector. We'll explore its potential to reshape industries, push creative boundaries, and expand consolidated knowledge -- GenAI has become the cornerstone upon which new platforms, companies, and industries are built.
Deep Learning has taken the digital world by storm. As a general purpose technology, it is now present in all walks of life. Although the fundamental developments in methodology have been slowing down in the past few years, applications are flourishing with major breakthroughs in Computer Vision, NLP and Biomedical Sciences. The primary successes can be attributed to the availability of large labelled data, powerful GPU servers and programming frameworks, and advances in neural architecture engineering. This combination enables rapid construction of large, efficient neural networks that scale to the real world. But the fundamental questions of unsupervised learning, deep reasoning, and rapid contextual adaptation remain unsolved. We shall call what we currently have Deep Learning 1.0, and the next possible breakthroughs as Deep Learning 2.0.
This is part 1 of the Tutorial delivered at IEEE SSCI 2020, Canberra, December 1st (Virtual).
Artificial intelligence in the post-deep learning eraDeakin University
Deep learning has recently reached the heights that pioneers in the field had aspired to, serving as the driving force behind recent breakthroughs in AI, which have arguably surpassed the Turing test. At present, the spotlight is on scaling Transformers and diffusion models on Internet-scale data. In this talk, I will provide an overview of the fundamental principles of deep learning, its powers, and limitations, and explore the new era of post-deep learning. This new era encompasses novel objectives, dynamic architectures, abstract reasoning, neurosymbolic hybrid systems, and LLM-based agent systems.
Deep learning, enabled by powerful compute, and fuelled by massive data, has delivered unprecedented data analytics capabilities. However, major limitations remain. Chiefly among those is that deep neural networks tend to exploit the surface statistics in the data, creating short-cuts from the input to the output, without really deeply understanding of the data. As a result, these networks fail miserably to generalize to novel combinations. This is because the networks perform shallow pattern matching but not deliberate reasoning – the capacity to deliberately deduce new knowledge out of the contextualized data. Second, machine learning is often trained to do just one task at a time, making it impossible to re-define tasks on the fly as needed in a complex operating environment. This talk presents our recent developments to extend the capacity of neural networks to remove these limitations. Our main focus is on learning to reason from data, that is, learning to determine if the data entails a conclusion. This capacity opens up new ways to generate insights from data through arbitrary querying using natural languages without the need of predefining a narrow set of tasks.
In this talk we will summarise some of the detectable trends on AI beyond deep learning. We will focus on the current transition from deep learning to deep semantics, describing the enabling infrastructures, challenges and opportunities in the construction of the next generation AI systems. The talk will focus on Natural Language Processing (NLP) as an AI sub-domain and will link to the research at the AI Systems Lab at the University of Manchester.
TL;DR: This tutorial was delivered at KDD 2021. Here we review recent developments to extend the capacity of neural networks to “learning to reason” from data, where the task is to determine if the data entails a conclusion.
The rise of big data and big compute has brought modern neural networks to many walks of digital life, thanks to the relative ease of construction of large models that scale to the real world. Current successes of Transformers and self-supervised pretraining on massive data have led some to believe that deep neural networks will be able to do almost everything whenever we have data and computational resources. However, this might not be the case. While neural networks are fast to exploit surface statistics, they fail miserably to generalize to novel combinations. Current neural networks do not perform deliberate reasoning – the capacity to deliberately deduce new knowledge out of the contextualized data. This tutorial reviews recent developments to extend the capacity of neural networks to “learning to reason” from data, where the task is to determine if the data entails a conclusion. This capacity opens up new ways to generate insights from data through arbitrary querying using natural languages without the need of predefining a narrow set of tasks.
Keynote given at the workshop for Artificial Intelligence meets the Web of Data on Pragmatic Semantics.
In this keynote I argue that the Web of Data is a Complex System or Marketplace of Ideas rather than a classical Database, and that the model theory on which classical semantics are based is not appropriate in all situations, and propose an alternative "Pragmatic Semantics" based on optimisation of possible interpretations. .
Generative AI represents a pivotal moment in computing history, opening up new opportunities for scientific discoveries. By harnessing extensive and diverse datasets, we can construct new general-purpose Foundation Models that can be fine-tuned for specific prediction and exploration tasks. This talk introduces our research program, which focuses on leveraging the power of Generative AI for materials discovery. Generative AI facilitates rapid exploration of vast materials design spaces, enabling the identification of new compounds and combinations. However, this field also presents significant challenges, such as effectively representing crystals in a compact manner and striking the right balance between utilizing known structural regions and venturing into unexplored territories. Our research delves into the development of a new kind of generative models specifically designed to search for diverse molecular/crystal regions that yield high returns, as defined by domain experts. In addition, our toolset includes Large Language Models that have been fine-tuned using materials literature and scientific knowledge. These models possess the ability to comprehend extensive volumes of materials literature, encompassing molecular string representations, mathematical equations in LaTeX, and codebases. We explore the open challenges, including effectively representing deep domain knowledge and implementing efficient querying techniques to address materials discovery problems.
AI for automated materials discovery via learning to represent, predict, gene...Deakin University
A brief overview of how our AI can help automate the materials discovery process, covering a wide range of problems, from drug design to crystal plasticity.
Robust Feature Learning with Deep Neural Networks
http://snu-primo.hosted.exlibrisgroup.com/primo_library/libweb/action/display.do?tabs=viewOnlineTab&doc=82SNU_INST21557911060002591
Introduction to Topological Data AnalysisMason Porter
Here are slides for my 3/14/21 talk on an introduction to topological data analysis.
This is the first talk in our Short Course on topological data analysis at the 2021 American Physical Society (APS) March Meeting: https://march.aps.org/program/dsoft/gsnp-short-course-introduction-to-topological-data-analysis/
Where are all the Semantic Web agents? There are billions of "machine readable" open facts on the Semantic Web, i.e. Linked Open Data (LOD), isn't that enough? It looks like it's not. We're still far from seeing Lucy's and Pete's agents brilliantly solving their tasks with the help of other Semantic Web agents they can trust (Tim Berners Lee et al., The Semantic Web, Scientific American (2001) ). Despite its technological impact on many applications and areas, the Semantic Web promised to cause a breakthrough that we didn't yet experience. One issue is that LOD ontologies are not as linked as they should be. Another issue is that formalising only semi-structured Web pages or databases is not enough for making them able to operate. They also need to reason with commonsense knowledge, the encoding of which is a long-standing challenge in Artificial Intelligence. A third consideration is that most existing commonsense knowledge bases lack formal semantics and situational constraints. In this talk I will advocate the role of the Semantic Web as a provider of a knowledge graph of commonsense to Artificial Intelligence, and discuss ways and obstacles towards the achievement of this goal.
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
Abstract: Topic modeling techniques have been widely used to uncover dominant themes hidden inside an unstructured document collection. Though these techniques first originated in the probabilistic analysis of word distributions, many deep learning approaches have been adopted recently. In this paper, we propose a novel neural network based architecture that produces distributed representation of topics to capture topical themes in a dataset. Unlike many state-of-the-art techniques for generating distributed representation of words and documents that directly use neighboring words for training, we leverage the outcome of a sophisticated deep neural network to estimate the topic labels of each document. The networks, for topic modeling and generation of distributed representations, are trained concurrently in a cascaded style with better runtime without sacrificing the quality of the topics. Empirical studies reported in the paper show that the distributed representations of topics represent intuitive themes using smaller dimensions than conventional topic modeling approaches.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
The current deep learning revolution has brought unprecedented changes to how we live, learn, interact with the digital and physical worlds, run business and conduct sciences. These are made possible thanks to the relative ease of construction of massive neural networks that are flexible to train and scale up to the real world. But the flexibility is hitting the limits due to excessive demand of labelled data, the narrowness of the tasks, the failure to generalize beyond surface statistics to novel combinations, and the lack of the key mental faculty of deliberate reasoning. In this talk, I will present a multi-year research program to push deep learning to overcome these limitations. We aim to build dynamic neural networks that can train themselves with little labelled data, compress on-the-fly in response to resource constraints, and respond to arbitrary query about a context. The networks are equipped with capability to make use of external knowledge, and operate that the high-level of objects and relations. The long-term goal is to build persistent digital companions that co-live with us and other AI entities, understand our need and intention, and share our human values and norms. They will be capable of having natural conversations, remembering lifelong events, and learning in an open-ended fashion.
Tutorial delivered at ECML-PKDD 2021.
TL;DR: This tutorial reviews recent developments on drug discovery using machine learning methods.
Powered by neural networks, modern machine learning has enjoyed great successes in data-intensive domains such as computer vision and languages where human can naturally perform well. Machine learning equipped with reasoning is now accelerating fields that traditionally require deep expertise such as physics, chemistry and biomedicine. This tutorial provides an overview of how machine learning and reasoning are speeding up and lowering the cost of drug discovery. This includes how machine learning can help in wide range of areas such as novel molecule identification, protein representation, drug-target binding, drug re-purposing, generative drug design, chemical reaction, retrosynthesis planning, drug-drug interaction, and safety assessment. We will also discuss relevant machine learning models for graph classification, molecular graph transformation, drug generation using deep generative models and reinforcement learning, and chemical reasoning.
Deep Learning has taken the digital world by storm. As a general purpose technology, it is now present in all walks of life. Although the fundamental developments in methodology have been slowing down in the past few years, applications are flourishing with major breakthroughs in Computer Vision, NLP and Biomedical Sciences. The primary successes can be attributed to the availability of large labelled data, powerful GPU servers and programming frameworks, and advances in neural architecture engineering. This combination enables rapid construction of large, efficient neural networks that scale to the real world. But the fundamental questions of unsupervised learning, deep reasoning, and rapid contextual adaptation remain unsolved. We shall call what we currently have Deep Learning 1.0, and the next possible breakthroughs as Deep Learning 2.0.
This is part 2 of the Tutorial delivered at IEEE SSCI 2020, Canberra, December 1st (Virtual).
More Related Content
Similar to Deep learning and reasoning: Recent advances
Generative AI represents a pivotal moment in computing history, opening up new opportunities for scientific discoveries. By harnessing extensive and diverse datasets, we can construct new general-purpose Foundation Models that can be fine-tuned for specific prediction and exploration tasks. This talk introduces our research program, which focuses on leveraging the power of Generative AI for materials discovery. Generative AI facilitates rapid exploration of vast materials design spaces, enabling the identification of new compounds and combinations. However, this field also presents significant challenges, such as effectively representing crystals in a compact manner and striking the right balance between utilizing known structural regions and venturing into unexplored territories. Our research delves into the development of a new kind of generative models specifically designed to search for diverse molecular/crystal regions that yield high returns, as defined by domain experts. In addition, our toolset includes Large Language Models that have been fine-tuned using materials literature and scientific knowledge. These models possess the ability to comprehend extensive volumes of materials literature, encompassing molecular string representations, mathematical equations in LaTeX, and codebases. We explore the open challenges, including effectively representing deep domain knowledge and implementing efficient querying techniques to address materials discovery problems.
AI for automated materials discovery via learning to represent, predict, gene...Deakin University
A brief overview of how our AI can help automate the materials discovery process, covering a wide range of problems, from drug design to crystal plasticity.
Robust Feature Learning with Deep Neural Networks
http://snu-primo.hosted.exlibrisgroup.com/primo_library/libweb/action/display.do?tabs=viewOnlineTab&doc=82SNU_INST21557911060002591
Introduction to Topological Data AnalysisMason Porter
Here are slides for my 3/14/21 talk on an introduction to topological data analysis.
This is the first talk in our Short Course on topological data analysis at the 2021 American Physical Society (APS) March Meeting: https://march.aps.org/program/dsoft/gsnp-short-course-introduction-to-topological-data-analysis/
Where are all the Semantic Web agents? There are billions of "machine readable" open facts on the Semantic Web, i.e. Linked Open Data (LOD), isn't that enough? It looks like it's not. We're still far from seeing Lucy's and Pete's agents brilliantly solving their tasks with the help of other Semantic Web agents they can trust (Tim Berners Lee et al., The Semantic Web, Scientific American (2001) ). Despite its technological impact on many applications and areas, the Semantic Web promised to cause a breakthrough that we didn't yet experience. One issue is that LOD ontologies are not as linked as they should be. Another issue is that formalising only semi-structured Web pages or databases is not enough for making them able to operate. They also need to reason with commonsense knowledge, the encoding of which is a long-standing challenge in Artificial Intelligence. A third consideration is that most existing commonsense knowledge bases lack formal semantics and situational constraints. In this talk I will advocate the role of the Semantic Web as a provider of a knowledge graph of commonsense to Artificial Intelligence, and discuss ways and obstacles towards the achievement of this goal.
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
Abstract: Topic modeling techniques have been widely used to uncover dominant themes hidden inside an unstructured document collection. Though these techniques first originated in the probabilistic analysis of word distributions, many deep learning approaches have been adopted recently. In this paper, we propose a novel neural network based architecture that produces distributed representation of topics to capture topical themes in a dataset. Unlike many state-of-the-art techniques for generating distributed representation of words and documents that directly use neighboring words for training, we leverage the outcome of a sophisticated deep neural network to estimate the topic labels of each document. The networks, for topic modeling and generation of distributed representations, are trained concurrently in a cascaded style with better runtime without sacrificing the quality of the topics. Empirical studies reported in the paper show that the distributed representations of topics represent intuitive themes using smaller dimensions than conventional topic modeling approaches.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
The current deep learning revolution has brought unprecedented changes to how we live, learn, interact with the digital and physical worlds, run business and conduct sciences. These are made possible thanks to the relative ease of construction of massive neural networks that are flexible to train and scale up to the real world. But the flexibility is hitting the limits due to excessive demand of labelled data, the narrowness of the tasks, the failure to generalize beyond surface statistics to novel combinations, and the lack of the key mental faculty of deliberate reasoning. In this talk, I will present a multi-year research program to push deep learning to overcome these limitations. We aim to build dynamic neural networks that can train themselves with little labelled data, compress on-the-fly in response to resource constraints, and respond to arbitrary query about a context. The networks are equipped with capability to make use of external knowledge, and operate that the high-level of objects and relations. The long-term goal is to build persistent digital companions that co-live with us and other AI entities, understand our need and intention, and share our human values and norms. They will be capable of having natural conversations, remembering lifelong events, and learning in an open-ended fashion.
Similar to Deep learning and reasoning: Recent advances (20)
Tutorial delivered at ECML-PKDD 2021.
TL;DR: This tutorial reviews recent developments on drug discovery using machine learning methods.
Powered by neural networks, modern machine learning has enjoyed great successes in data-intensive domains such as computer vision and languages where human can naturally perform well. Machine learning equipped with reasoning is now accelerating fields that traditionally require deep expertise such as physics, chemistry and biomedicine. This tutorial provides an overview of how machine learning and reasoning are speeding up and lowering the cost of drug discovery. This includes how machine learning can help in wide range of areas such as novel molecule identification, protein representation, drug-target binding, drug re-purposing, generative drug design, chemical reaction, retrosynthesis planning, drug-drug interaction, and safety assessment. We will also discuss relevant machine learning models for graph classification, molecular graph transformation, drug generation using deep generative models and reinforcement learning, and chemical reasoning.
Deep Learning has taken the digital world by storm. As a general purpose technology, it is now present in all walks of life. Although the fundamental developments in methodology have been slowing down in the past few years, applications are flourishing with major breakthroughs in Computer Vision, NLP and Biomedical Sciences. The primary successes can be attributed to the availability of large labelled data, powerful GPU servers and programming frameworks, and advances in neural architecture engineering. This combination enables rapid construction of large, efficient neural networks that scale to the real world. But the fundamental questions of unsupervised learning, deep reasoning, and rapid contextual adaptation remain unsolved. We shall call what we currently have Deep Learning 1.0, and the next possible breakthroughs as Deep Learning 2.0.
This is part 2 of the Tutorial delivered at IEEE SSCI 2020, Canberra, December 1st (Virtual).
This is the talk given at the Faculty of Information Technology, Monash University on 19/08/2020. It covers our recent research on the topics of learning to reason, including dual-process theory, visual reasoning and neural memories.
A discussion of the nature of AI/ML as an empirical science. Covering concepts in the field, how to position ourselves, how to plan for research, what are empirical methods in AI/ML, and how to build up a theory of AI.
Introducing research works in the area of machine reasoning at our Applied AI Institute, Deakin University, Australia. Covering visual & social reasoning, neural Turing machine and System 2.
Describing latest research in visual reasoning, in particular visual question answering. Covering both images and videos. Dual-process theories approach. Relational memory.
Full day lectures @International University, HCM City, Vietnam, May 2019. Review of AI in 2019; outlook into the future; empirical research in AI; introduction to AI research at Deakin University
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
5. 3/07/2023 5
“[By 2023] …
Emergence of the
generally agreed upon
"next big thing" in AI
beyond deep learning.”
Rodney Brooks
rodneybrooks.com
“[…] general-purpose computer
programs, built on top of far richer
primitives than our current
differentiable layers—[…] we will
get to reasoning and abstraction,
the fundamental weakness of
current models.”
Francois Chollet
blog.keras.io
“Software 2.0 is written in
neural network weights”
Andrej Karpathy
medium.com/@karpathy
6. Why (still) DL in 2023?
Practical
• Generality: Applicable to many
domains.
• Competitive: DL is hard to beat as
long as there are data to train.
• Scalability: DL is better with more
data, and it is very scalable.
Theoretical
Expressiveness: Neural nets
can approximate any function.
Learnability: Neural nets are
trained easily.
Generalisability: Neural nets
generalize surprisingly well to
unseen data.
9. y = f(x; W)
3/07/2023 9
Machine learning in a nutshell
• Most machine learning tasks reduce to
estimating a mapping f from x to y
• The estimation is more accurate with more
experiences, e.g., seeing more pair (x,y) in
training data.
• The mapping f is often parameterized by W.
• When y is a token/scalar/vector/tensor ->
prediction task.
• When y is a program ->
translation/synthesis task.
• When y is an intermediate form ->
representation learning.
❖ Much of ML is in specifying x,
a.k.a feature engineering.
❖ Much of DL is to specify
skeleton of W, a.k.a
architecture engineering.
❖ Much of LLMs is to specify x
again, but with fixed W, a.k.a
prompt engineering.
10. 1980s: Parallel Distributed Processing
• Information is stored in many places
(distributed)
• Activations are sparse (enabling
selectivity and invariance)
• Factors of variation can be coded
efficiently
• Popular these days: Word & doc
embedding (word2vec, glove,
anything2vec)
Credit: Geoff Hinton
11. Symbolic vs.Distributed Representations
• Symbolic Representation
• Distributed Representation
6
Megan_Rapinoe
Ian_McKellen
Play
Game
Game Play
M egan_Rapinoe
Ian_McKellen
Slide credit: Pacheco & Goldwasser, 2021
12. Deep models via layer stacking
Theoretically powerful, but limited in practice
Integrate-and-fire neuron
andreykurenkov.com
Feature detector
Block representation
3/07/2023 12
14. Sequence model with recurrence
Assume the stationary world
Classification
Image captioning
Sentence classification
Neural machine translation
Sequence labelling
Source: http://karpathy.github.io/assets/rnn/diags.jpeg
3/07/2023 14
15. Spatial model with convolutions
Assume filters/motifs are translation
invariant
http://colah.github.io/posts/2015-09-NN-Types-FP/
Learnable kernels
andreykurenkov.com
Feature detector,
often many
17. Operator on sets/bags: Attentions
Not everything is created equal for a goal
• Need attention model to select or
ignore certain computations or inputs
• Can be “soft” (differentiable) or “hard”
(requires RL)
• Attention provides a short-cut → long-
term dependencies
• Also encourages sparsity if done right!
http://distill.pub/2016/augmented-rnns/
18. Why attention?
• Visual attention in human: Focus on specific
parts of visual inputs to compute the
adequate responses.
• Examples:
• We focus on objects rather than the background
of an image.
• We skim text by looking at important words.
• In neural computation, we need to select
the most relevance piece of information and
ignore all other parts
Slide credit: Trang Pham
Photo: programmersought
20. Transformer: Key ideas
• Use self-similarity to refine token’s representation (embedding).
• “June is happy” -> June is represented as a person’s name.
• Hidden contexts are borrowed from other sentences that share
tokens/motifs/patterns, e.g., “She is happy”, “Her name is June”, etc.
• Akin to retrieval: matching query to key.
• Context is simply other tokens co-occurring in the same text segment.
• Related to “co-location”.
• How big is context? → Small window, a sentence, a paragraph, the whole doc.
• What is about relative position? → Position coding.
3/07/2023 20
21. Positional Encoding
• The Transformer relaxes the sequentiality of data
• Positional encoding to embed sequential order in model
Slide credit: Adham Beykikhoshk
22. Theory: Transformers are (new) Hopfield
net
3/07/2023 22
Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint
arXiv:2008.02217 (2020).
23. Speed up: Vanilla Transformers are not efficient
Slide credit: Hung Le
24. Speed up: Efficient Transformers
3/07/2023 24
Tay, Yi, et al. "Efficient transformers: A survey." arXiv
preprint arXiv:2009.06732 (2020).
25. Speed up: Kernerlization and associative tricks
Same index,
reusable sum
Reduce
complexity
The idea is linked back to
Efficient Attention: Attention with Linear Complexities by Shen et.al, 2018.
Slide credit: Hung Le
27. Fast weights | HyperNet
The model world is recursive
• Early ideas in early 1990s by Juergen Schmidhuber and collaborators.
• Data-dependent weights | Using a controller to generate weights of the
main net.
3/07/2023 27
Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." arXiv preprint arXiv:1609.09106 (2016).
29. Module composition
The system is modular, composable
3/07/2023 29
Source: https://www.ruder.io/modular-deep-learning/
30. Neural architecture search
When design is cheap and non-creative
• The space is huge and discrete
• Can be done through meta-heuristics (e.g., genetic algorithms) or
Reinforcement learning (e.g., one discrete change in model structure
is an action).
3/07/2023 30
Bello, Irwan, et al. "Neural optimizer search with reinforcement learning." arXiv preprint arXiv:1709.07417 (2017).
31. Neural networks design goals
•Capture long-term
dependencies in time and
space
•Capture invariances
natively
•Capture equivariance
3/07/2023 31
• Expressivity
• Scalability
• Reusability/modularity
• Compositionality
• Universality
32. Neural networks design goals (2)
3/07/2023 32
• Easy to train / learnability
• Use (almost) no labels => Unsupervised learning
• Resource adaptive
• Ability to extrapolate => Must go beyond surface statistics
• Support fast and slow learning (Complementary learning)
• Support fast and slow inference (Dual system theory)
34. Graph Structures in real world – Network Science
Internet
Social networks
World wide web
Communication Citations Biological networks
credit: Jure Leskovec
Slide credit: Yao Ma, Wei Jin, Yiqi Wang, Jiliang Tang, Tyler Derr, AAAI21
35. #REF: Penmatsa, Aravind, Kevin H. Wang,
and Eric Gouaux. "X-ray structure of
dopamine transporter elucidates
antidepressant
mechanism." Nature 503.7474 (2013): 85-
90.
Biology, pharmacy &
chemistry, materials
• Molecule/crystal as graph:
atoms as nodes, chemical
bonds as edges
• Computing molecular
properties
• Chemical-chemical
interaction
• Chemical reaction
3/07/2023 35
Gilmer, Justin, et al. "Neural message passing for quantum
chemistry." arXiv preprint arXiv:1704.01212 (2017).
36. Scene graphs as intermediate representation for image
captioning
Yao et al. Exploring Visual Relationship for Image Captioning, ECCV 2018
Fei-Fei Li, Ranjay Krishna, Danfei Xu
37. GNN in videos: Space-time region graphs
(Abhinav Gupta et al, ECCV’18)
38. Transformer is a special type of GNN
3/07/2023 38
Image credit: Chaitanya Joshi
40. Natural evolution of representing the world
• Vector → Embedding, MLP
• Sequence → RNN (LSTM, GRU)
• Grid → CNN (AlexNet, VGG, ResNet, EfficientNet, etc)
• Set → Word2vec, Attention, Transformer
• Graph → GNN (node2vec, DeepWalk, GCN, Graph Attention Net,
Column Net, MPNN etc)
• ResNet is a special case of GNN on grid!
• Transformer is a special case of GNN on fully connected graph.
3/07/2023 40
41. • Graphs are pervasive
in many scientific
disciplines.
• The sub-area of graph
representation has
reached a certain
maturity, with
multiple reviews,
workshops and papers
at top AI/ML venues.
3/07/2023 41
GNN in research
Source: https://github.com/EdisonLeeeee/ICLR2023-OpenReviewData
42. Deep Graph Learning: Foundations, Advances and
Applications
Graph Neural Network as a solution
Graph Neural Network
Graph/Node
Representation
Applications
Node
Classification
Link Prediction
Community
Detection
Graph
Generation
………
Neural network model that can deal with graph data.
Yu Rong, Wenbing Huang, Tingyang Xu, Hong Cheng, Junzhou
Huang 2020
43. Two Main Operations in GNN
43
Graph Filtering
Graph Filtering
Graph filtering refines the node features
Slide credit: Yao Ma and Yiqi Wang, Tyler Derr, Lingfei Wu and Tengfei Ma
44. Two Main Operations in GNN
44
Graph Pooling
Graph Pooling
Graph pooling generates a smaller graph
Slide credit: Yao Ma and Yiqi Wang, Tyler Derr, Lingfei Wu and Tengfei Ma
45. General GNN Framework
45
… …
…
𝐵1 𝐵𝑛
Filtering Layer Activation Pooling Layer (Optional)
Slide credit: Yao Ma and Yiqi Wang, Tyler Derr, Lingfei Wu and Tengfei Ma
46. Generalizing 2D convolutions to Graph Convolutions
- Graph convolutions involve similar local
operations on nodes.
- Nodes are now object representations
and not activations
- The ordering of neighbors should not
matter.
- The number of neighbors should not
matter.
- N(i) are the neighbors of node I
- Attention can be employed for edge
selection
Kipf & Welling (ICLR 2017)
Fei-Fei Li, Ranjay Krishna, Danfei Xu
47. Generalizing GNNs through message passing
3/07/2023 47
#REF: Pham, Trang, et al. "Column Networks for Collective Classification." AAAI. 2017.
Relation graph
Generalized message passing
48. Message Passing Neural Net
48
ℎ2, 𝑙2
ℎ1, 𝑙1
ℎ3, 𝑙3
ℎ4, 𝑙4
ℎ5, 𝑙5
ℎ6, 𝑙6
ℎ7, 𝑙7
𝑣2 𝑣8
𝑣1
𝑣3 𝑣4
𝑣5
𝑣6
𝑣7
ℎ8, 𝑙8
Message Passing
Feature Updating
𝑀𝑘() and 𝑈𝑘() are functions to be designed
Neural Message Passing for Quantum Chemistry. ICML 2017.
Slide credit: Yao Ma, Wei Jin, Yiqi Wang, Jiliang Tang, Tyler Derr, AAAI21
49. Neural graph morphism
• Input: Graph
• Output: A new graph.
Same nodes, different
edges.
• Model: Graph
morphism
• Method: Graph
transformation policy
network (GTPN)
3/07/2023 49
Kien Do, Truyen Tran, and Svetha Venkatesh. "Graph Transformation Policy Network for
Chemical Reaction Prediction." KDD’19.
50. Neural graph recurrence
• Graphs that represent interaction between entities through
time
• Spatial edges are node interaction at a time step
• Temporal edges are consistency relationship through time
51. Challenges
• The addition of temporal edges make the graphs
bigger, more complex
• Relying on context specific constraints to reduce the
complexity by approximations
• Through time, structures of the graph may change
• Hard to solve, most methods model short sequences to
avoid this
53. GraphRNN to generate graphs
• A case of graph
dynamics: nodes
and edges are
added
sequentially.
• Solve tractability
using BFS
3/07/2023 53
You, Jiaxuan, et al.
"GraphRNN: Generating
realistic graphs with deep
auto-regressive
models." ICML (2018).
55. Representation learning, a bit of history
•“Representation is the use of signs that stand in
for and take the place of something else”
It has been a goal of neural networks since the 1980s and the current
wave of deep learning (2005-present) → Replacing feature engineering
Between 2006-2012, many unsupervised learning models with varying
degree of success: RBM, DBN, DBM, DAE, DDAE, PSD
Between 2013-2018, most models were supervised, following AlexNet
Since 2018, unsupervised learning has become competitive (with
contrastive learning, self-supervised learning, BERT)!
3/07/2023 55
56. Criteria for a good representation
• Separates factors of variation (aka disentanglement), which are
linearly correlated with desired outputs of downstream tasks.
• Provides abstraction that is invariant against deformations and
small variations.
• Is distributed (one concept is represented by multiple units), which
is compact and good for interpolation.
• Optionally, offers dimensionality reduction.
• Optionally, is sparse, giving room for emerging symbols.
3/07/2023 56
Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new
perspectives." IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 1798-1828.
57. Why neural unsupervised learning?
• Neural nets have representational richness:
• FFN are functional approximator
• RNN are program approximator, can estimate a program behaviour and generate a string
• CNN are for translation invariance
• Transformers are powerful contextual encoder
• Compactness: Representations are (sparse and) distributed.
• Essential to perception, compact storage and reasoning
• Accounting for uncertainty: Neural nets can be stochastic to model
distributions
• Symbolic representation: realisation through sparse activations and gating
mechanisms
3/07/2023 57
58. Generative models:
Discover the underlying process that generates
data
3/07/2023 58
Many applications:
• Text to speech
• Simulate data that are hard to obtain/share in
real life (e.g., healthcare)
• Generate meaningful sentences conditioned on
some input (foreign language, image, video)
• Semi-supervised learning
• Planning
59. Deep (Denoising) AutoEncoder:
Self-reconstruction of data
3/07/2023 59
Auto-encoder
Feature detector
Representation
Raw data
(optionally
with added
noise)
Reconstruction
Deep Auto-encoder
Encoder
Decoder
60. FSDL 2022
• "Latent Diffusion" model: diffuse in
lower-dimensional latent space, then
decode back into pixel space
• Frozen CLIP ViT-L/14, trained 860M
UNet, 123M text encoder
• Trained on LAOIN-5B on 256 A100s for
24 days ($600K)
• FULLY OPEN-SOURCE
StableDiffusion
60
Slide credit: Karayev, 2022
62. GAN: Generative Adversarial nets
Matching data statistics
• Instead of modeling the entire distribution of data, learns to
map ANY random distribution into the region of data, so that
there is no discriminator that can distinguish sampled data
from real data.
Any random distribution
in any space
Binary discriminator,
usually a neural
classifier
Neural net that maps
z → x
64. BERT
Transformer that predicts its own masked
parts
• BERT is like parallel
approximate pseudo-
likelihood
• ~ Maximizing the conditional
likelihood of some variables
given the rest.
• When the number of
variables is large, this
converses to MLE (maximum
likelihood estimate).
3/07/2023 64
https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
65. Neural
autoregressive
models:
Predict the next
step given the
history
• The keys: (a) long-term dependencies, (b) ordering, & (c)
parameter sharing.
• Can be realized using:
• RNN
• CNN: One-sided CNN, dilated CNN (e.g., WaveNet), PixelCNN
• Transformers → GPT-X family
• Masked autoencoder → MADE
• Pros: General, good quality thus far
• Cons: Slow – needs better inductive biases for scalability
3/07/2023 65
lyusungwon.github.io/studies/2018/07/25/nade/
66. FSDL 2022
• Generative Pre-trained Transformer
• Decoder-only (uses masked self-attention)
• Trained on 8M web pages, largest model is 1.5B
GPT / GPT-2 (2019)
https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
66
Slide credit: Karayev, 2022
68. • 400M image-text pairs
crawled from the Internet
• Transformer to encode
text, ResNet or Visual
Transformer to encode
image
• Contrastive training:
maximize cosine similarity
of correct image-text pairs
(32K pairs per batch)
79
CLIP: Image-pair vs the rest
https://arxiv.org/pdf/2103.00020.pdf
Slide credit: Karayev, 2022
69. Unsupervised
learning: A few
more points
• No external labels, but rich training signals (thousand bits per sample, as opposed to a
few bits in supervised learning). A few techniques:
• Compressing data as much as possible with little loss
• Energy-based, i.e., pull down energy of observed data, pull up every else
• Filling the missing slots (aka predictive learning, self-supervised learning)
• We have not covered unsupervised learning on graphs (e.g., DeepWalk, GPT-GNN), but
the general principles should hold.
• Question: Multiple objectives, or no objective at all?
• Question: Emergence from many simple interacting elements?
3/07/2023 69
Liu, Xiao, et al. "Self-supervised learning: Generative or contrastive." arXiv preprint arXiv:2006.08218 (2020).
Assran, Mahmoud, et al. "Self-Supervised Learning from Images with a Joint-Embedding Predictive
Architecture." arXiv preprint arXiv:2301.08243 (2023).
70. Picture taken from (Bommasani et al, 2021)
A Tipping Point: Foundation Models
70
• A foundation model is a
model trained at broad
scale that can adapted
to a wide range of
downstream tasks
• Scale and the ability to
perform tasks beyond
training
Slide credit: Samuel Albanie, 2022
72. Twokeyideasunderpinfoundation model?
Emergence
•system behaviour is implicitly induced rather than explicitly constructed
•cause of scientific excitement and anxiety of unanticipated consequences
Homogenisation
•consolidation of methodology for building machine learning system across many applications
•provides strong leverage for many tasks, but also creates single points of failure
Slide credit: Samuel Albanie, 2022
73. Homogenisation
Learning instead of algorithm: Many applications can be powered by the
same learning algorithm.
• => Feature engineering
Deep architecture engineering: Instead of hand-crafting features, the same
architecture could be used widely.
• => Architecture engineering
Modern Transformer is universal: Same architecture, just different data!
• => Data & Prompt engineering
Slide credit: Samuel Albanie, 2022
74. 3/07/2023 74
convolution --
motif detection
3
sequencing
time gaps/transfer
phrase/admission
1
embedding
2
word
vector
medical record
visits/admissions
time gap
?
prediction point output
max-pooling
prediction
4
5
record
vector
Homogenisation-Deepr
Nguyen, Phuoc, Truyen Tran,
Nilmini Wickramasinghe, and
Svetha Venkatesh. Deepr: a
convolutional net for medical
records." IEEE journal of
biomedical and health
informatics 21, no. 1 (2016): 22-30.
Concept: Stringify() – everything as a string
77. 1960s-1990s
▪ Hand-crafting rules, domain-
specific, logic-based
▪ High in reasoning
▪ Can’t scale.
▪ Fail on unseen cases.
3/07/2023
77
2020s-2030s
Learning + reasoning, general
purpose, human-like
Has contextual and common-
sense reasoning
Requires less data
Adapt to change
Explainable
1990s-2020s
Machine learning, general
purpose, statistics-based
Low in reasoning
Needs lots of data
Less adaptive
Little explanation
Photo credit: DARPA
78. From ML to Machine Reasoning
3/07/2023 78
cylinder
cube sphere cylinder sphere
cyan
brown
orange
red
object detection
Reasoning
Slide credit: Tin Pham
79. What is missing in deep
learning?
• Modern neural networks are good at
interpolating
→ Data hungry to cover all variations and smooth
local manifolds
→Little systematic generalization (novel
combinations)
• Lack of human-perceived reasoning capability
• Lack of logical inference
• Lack of natural mechanism to incorporate prior
knowledge, e.g., common sense
• No built-in causal mechanisms
3/07/2023 79
80. Machine reasoning
Reasoning is concerned with arriving at a deduction
about a new combination of circumstances.
Reasoning is to deduce new knowledge from
previously acquired knowledge in response to a
query.
3/07/2023 80
Leslie Valiant
Leon Bottou
81. Machine reasoning
• Two-part process
• manipulate previously acquired knowledge
• to draw novel inferences or answer new questions
• Example:
• Premise:
• A is to the left of B
• B is to the left of C
• D is in front of A
• E is in front of C
• Conclusion: what is the relation between D and E?
3/07/2023 81
Slide credit: Tin Pham
82. Geometry example
3/07/2023 82
Premise
• AM = MN (1)
• BM = MC (2)
•
𝐴𝑀𝐵 =
𝑁𝑀𝐶 (3)
Solution:
From (1), (2), (3)
➔△AMB = △NMC (4)
➔AB = CN
From (1), (2) ➔ ABNC is
a parallelogram (5)
→ AB // CN
Existing
knowledge
Conclusion
• AB = CN?
• AB // CN?
Slide credit: Tin Pham
83. Is reasoning always formal/logical?
3/07/2023 83
Bottou, Léon. "From machine learning to machine
reasoning." Machine learning 94.2 (2014): 133-149.
Leon Bottou
• “When we observe a visual scene, when we hear a complex
sentence, we are able to explain in formal terms the
relation of the objects in the scene, or the precise meaning
of the sentence components.
• However, there is no evidence that such a formal analysis
necessarily takes place: we see a scene, we hear a
sentence, and we just know what they mean.
• This suggests the existence of a middle layer, already a
form of reasoning, but not yet formal or logical.”
84. Why not just neural reasoning?
Central to reasoning is composition rules to guide the combinations of modules to
address new tasks
Bottou:
• Reasoning is not necessarily achieved by making logical inferences
• There is a continuity between [algebraically rich inference] and [connecting
together trainable learning systems]
→Neural networks are a plausible candidate!
→But still not natural to represent abstract discrete concepts and relations.
Hinton/Bengio/LeCun: Neural networks can do everything!
The rest: Not so fast! => Neurosymbolic systems!
3/07/2023 84
Bottou, Léon. "From machine learning to machine
reasoning." Machine learning 94.2 (2014): 133-149.
85. Learning to reason
• Learning is to improve itself by experiencing ~ acquiring
knowledge & skills
• Reasoning is to deduce knowledge from previously acquired
knowledge in response to a query (or a cues)
• Learning to reason is to improve the ability to decide if a
knowledge base entails a predicate.
• E.g., given a video f, determines if the person with the hat turns
before singing.
• Hypotheses:
• Reasoning as just-in-time program synthesis.
• It employs conditional computation.
• It minimises an energy function, or maximise the compatibility
between input (prompt) and output.
3/07/2023 85
Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM
(JACM) 44.5 (1997): 697-725.
(Dan Roth; ACM
Fellow; IJCAI John
McCarthy Award)
86. Reasoning as a skill
• Reasoning as a prediction skill that can be learnt from data.
• Question answering as zero-shot learning.
• Neural network operations for learning to reason:
• Attention & transformers.
• Dynamic neural networks, conditional computation & differentiable programming.
• Reasoning as iterative representation refinement & query-driven program
synthesis and execution.
• Compositional attention networks.
• Reasoning as Neural module networks.
3/07/2023 86
88. The two approaches to neural reasoning
• Implicit chaining of predicates through recurrence:
• Step-wise query-specific attention to relevant concepts & relations.
• Iterative concept refinement & combination, e.g., through a working memory.
• Answer is computed from the last memory state & question embedding.
• Explicit program synthesis:
• There is a set of modules, each performs an pre-defined operation.
• Question is parse into a symbolic program.
• The program is implemented as a computational graph constructed by chaining
separate modules.
• The program is executed to compute an answer
3/07/2023 88
89. MACNet: Composition-
Attention-Control
(reasoning by progressive
refinement of selected data)
3/07/2023 89
Hudson, Drew A., and Christopher D. Manning.
"Compositional attention networks for machine
reasoning." arXiv preprint arXiv:1803.03067 (2018).
90. LOGNet: Relational object reasoning with language
binding
90
• Key insight: Reasoning is chaining of relational predicates to arrive
at a final conclusion
→ Needs to uncover spatial relations, conditioned on query
→ Chaining is query-driven
→ Objects/language needs binding
→ Object semantics is query-dependent
→ Very thing is end-to-end differentiable
Thao Minh Le, Vuong Le, Svetha Venkatesh, and
Truyen Tran, “Dynamic Language Binding in
Relational Visual Reasoning”, IJCAI’20.
91. 91
LOGNet for VQA
Thao Minh Le, Vuong Le,
Svetha Venkatesh, and
Truyen Tran, “Dynamic
Language Binding in
Relational Visual
Reasoning”, IJCAI’20.
93. What is about Transformer?
• Reasoning as (free-) energy minimisation
• The classic Belief Propagation algorithm is minimization algorithm
of the Bethe free-energy!
• Transformer has relational, iterative state refinement makes
it a great candidate for implicit relational reasoning.
3/07/2023 93
Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free
energy." Advances in neural information processing systems. 2003.
Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint
arXiv:2008.02217 (2020).
95. Module networks
(reasoning by constructing and executing neural programs)
• Reasoning as laying out
modules to reach an
answer
• Composable neural
architecture → question
parsed as program (layout
of modules)
• A module is a function (x
→ y), could be a sub-
reasoning process ((x, q)
→ y).
3/07/2023 95
https://bair.berkeley.edu/blog/2017/06/20/learning-to-reason-with-neural-module-networks/
96. Program execution
• Work on object-based visual
representation
• An intermediate set of objects is
represented by a vector, as attention
mask over all object in the scene. For
example, Filter(Green_cube) outputs a
mask (0,1,0,0).
• The output mask is fed into the next
module (e.g Relate)
96
97. Source: @rao2z
What is about reasoning in LLMs?
• LLMs have HUGE associative memory.
• With “Let’s think step-by-step”?
• With “Chain of Thought”?
• Or just a pattern recognition of chain of
reasoning?
• Finding short-cuts to approximate provably
correct reasoning procedure.
• => Very poor OOD generalisation.
3/07/2023 97
98. A general framework
3/07/2023 98
Explicit Knowledge Graphs
+
Large Language Models
(implicit common sense knowledge,
associative database)
100. 3/07/2023 100
Learning a Turing
machine
→ Can we learn a (neural)
program that learns to
program from data?
101. Memory networks • Input is a set → Load into memory,
which is NOT updated.
• State is a RNN with attention reading
from inputs
• Concepts: Query, key and content +
Content addressing.
• Deep models, but constant path length
from input to output.
• Equivalent to a RNN with shared input
set.
• => Seq2seq with attention is a Memory
Network (Memory = input seq).
• => Transformer is a kind of Memory
Network with Parallel Memory Update!
3/07/2023 101
Sukhbaatar, Sainbayar, Jason Weston, and Rob
Fergus. "End-to-end memory networks." Advances in
neural information processing systems. 2015.
102. MANN: Memory-Augmented Neural Networks
(a constant path length)
• Long-term dependency
• E.g., outcome depends on the far past
• Memory is needed (e.g., as in LSTM)
• => This is what make Transformers powerful!
• Complex program requires multiple computational steps
• Each step can be selective (attentive) to certain memory cell
• Operations: Encoding | Decoding | Retrieval
103. MANN: Neural Turing machine (NTM)
(simulating a differentiable Turing machine)
• A controller that takes
input/output and talks to an
external memory module.
• Memory has read/write
operations.
• The main issue is where to write,
and how to update the memory
state.
• All operations are differentiable.
Source: rylanschaeffer.github.io
104. 3/07/2023 104
NTM unrolled in time with LSTM as controller
#Ref: https://medium.com/snips-ai/ntm-lasagne-a-library-for-neural-turing-machines-in-lasagne-2cdce6837315
105. MANN for reasoning
• Three steps:
• Store data into memory
• Read query, process sequentially, consult memory
• Output answer
• Behind the scene:
• Memory contains data & results of intermediate steps
• Drawbacks of current MANNs:
• No memory of controllers → Less modularity and
compositionality when query is complex
• No memory of relations → Much harder to chain predicates.
3/07/2023 105
Source: rylanschaeffer.github.io
106. Failures of item-only MANNs for
reasoning
• Relational representation is NOT stored → Can’t reuse later in the
chain
• A single memory of items and relations → Can’t understand how
relational reasoning occurs
• The memory-memory relationship is coarse since it is represented as
either dot product, or weighted sum.
3/07/2023 106
107. Self-attentive associative memories (SAM)
Learning relations automatically over time
3/07/2023 107
Hung Le, Truyen Tran, Svetha Venkatesh, “Self-
attentive associative memory”, ICML'20.
109. Neural nets are
powerful but we still
want:
• Learning with less and zero-shot
learning;
• Generalization of the solutions to
unseen tasks and unforeseen data
distributions;
• Explainability by construction;
3/07/2023 109
https://ibm.github.io/neuro-symbolic-ai/events/ns-
workshop2023
Self-Aware Learning
• Deeper learning for challenging tasks
• Integrating continuous and symbolic
representations
• Diversified learning modalities
Credit: Yolanda Gil, Bart Selman
AI to Understand Human
Intelligence
• 5 years: AI systems could be designed to
study psychological models of complex
intelligent phenomena that are based on
combinations of symbolic processing and
artificial neural networks.
110. Symbolic forms
• Words in Wordnet
• Syntax in NLP & Code
• Logic, prepositional and first-order
• Variables, equations
• Knowledge structure: Semantic nets, knowledge graphs
• Graphical models: Bayesian networks, Markov random fields, Markov
logic networks.
• Function (names), indirection, pointer in C/C++.
3/07/2023 110
111. Henry Kautz's taxonomy (1)
• Symbolic Neural symbolic—is the current approach of many neural models in
natural language processing, where words or subword tokens are both the
ultimate input and output of large language models. Examples include BERT,
RoBERTa, and GPT-3.
3/07/2023 111
Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore memorial lecture. AI
Magazine, 43(1), pp.105-125. https://en.wikipedia.org/wiki/Neuro-symbolic_AI
112. Representing Context and Structure
Known as contextualized language models
10
Devlin et-al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” 2019
Slide credit: Pacheco & Goldwasser, 2021
113. What does BERT learn?
Emergent linguistic structure in artificial neural networks trained by self-supervision. PNAS Manning et-al, 2020
Linguistic structure emerges without direct supervision
Slide credit: Pacheco & Goldwasser, 2021
114. Using BERT for ReasoningTasks
• BERT-based near-human performance on Winograd Schema
WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale. Sakaguchi et-al,
AAAI’20
Can “thinking-slow” tasks be accomplished with “thinking-fast” systems?
Not a panacea (McCoy et al ACL’19, others), often relies on simple heuristics when
learning complex decisions
12
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference. McCoy et-al, ACL’19
World Knowledge and
Commonsense inferences
reflected in coref
decisions
Slide credit: Pacheco & Goldwasser, 2021
115. Henry Kautz's taxonomy (2)
• Symbolic[Neural]—is exemplified by
AlphaGo, where symbolic techniques are
used to call neural techniques. In this case,
the symbolic approach is Monte Carlo tree
search and the neural techniques learn
how to evaluate game positions.
3/07/2023 115
Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore
memorial lecture. AI Magazine, 43(1), pp.105-125.
https://en.wikipedia.org/wiki/Neuro-symbolic_AI
116. Henry Kautz's taxonomy (3)
• Neural | Symbolic—uses a neural architecture to interpret perceptual data as
symbols and relationships that are reasoned about symbolically. The Neural-
Concept Learner is an example.
3/07/2023 116
Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore
memorial lecture. AI Magazine, 43(1), pp.105-125.
https://en.wikipedia.org/wiki/Neuro-symbolic_AI
117. End-to-End Module Networks
• Construct the program internally
• The two parts are jointly learnable
3/07/2023 End-to-End Module Networks , Hu et.al., ICCV 17 117
Slide credit: Vuong Le
118. Henry Kautz's taxonomy (4)
• Neural: Symbolic → Neural—relies on symbolic reasoning to generate or label
training data that is subsequently learned by a deep learning model, e.g., to train
a neural model for symbolic computation by using a Macsyma-like symbolic
mathematics system to create or label examples.
3/07/2023 118
Kautz, H., 2022. The third AI summer: AAAI
Robert S. Engelmore memorial lecture. AI
Magazine, 43(1), pp.105-125.
https://en.wikipedia.org/wiki/Neuro-symbolic_AI
Lample, Guillaume, and François Charton. 2020.
“Deep Learning For Symbolic Mathematics.”
In Proceedings of the International Conference on
Learning Representations.
119. Henry Kautz's taxonomy (5)
• Neural_{Symbolic}—uses a
neural net that is generated
from symbolic rules. An
example is the Neural
Theorem Prover, which
constructs a neural network
from an AND-OR proof tree
generated from knowledge
base rules and terms. Logic
Tensor Networks also fall
into this category.
3/07/2023 119
Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore
memorial lecture. AI Magazine, 43(1), pp.105-125.
https://en.wikipedia.org/wiki/Neuro-symbolic_AI
120. Henry Kautz's taxonomy (6)
• Neural[Symbolic]—allows a
neural model to directly call a
symbolic reasoning engine, e.g.,
to perform an action or evaluate
a state. An example would be
ChatGPT using a plugin to query
Wolfram Alpha.
3/07/2023 120
Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore
memorial lecture. AI Magazine, 43(1), pp.105-125.
https://en.wikipedia.org/wiki/Neuro-symbolic_AI
121. LLMs for
calling tools
• Information retriever
• Symbolic/math module & code interpreters
• Virtual agents
• Robotic arms. See https://palm-e.github.io/
3/07/2023 121
Credit: Khattab et al
122. Symbols via Indirection
3/07/2023 122
Z = X + Y
3 1 2
Bind symbols with values
Pointer in Computer Science
Information binding in the brain
https://www.linkedin.com/pulse/
unsolved-problems-ai-part-2-binding-problem-eberhard-schoeneburg/
Indirection binds two objects together and uses one to refer to the other.
Slide credit: Kha Pham
123. Indirection is a key design principle in
software engineering
3/07/2023 123
Client
Indirectional
Layer
Target
https://medium.com/@nmckinnonblog/indirection-fba1857630e2
Indirection removes direct coupling
between units and promotes:
• Extensibility
• Control
• Evolvability
• Encapsulation of code and design
complexity
Every computer science
problem can be solved with a
higher level of indirection.
Andrew Koenig, Butler Lampson, David J. Wheeler
Slide credit: Kha Pham
124. Leveraging indirection to improve OOD
generalization
3/07/2023 124
Why
indirection?
Indirection binds concrete data to abstract symbols, and
reasoning on symbols is likely to improve generalization.
What
to bind?
Concrete information of data, e.g., representations,
functional relations between data, etc.
Functional
indirection
Structural
indirection
How
to bind?
During indirection, some concrete information of
data will be ignored, and thus we have to decide
what to maintain, i.e., invariances across data.
→ Indirection connects invariance and symbolic
approaches.
Slide credit: Kha Pham
125. Structural Indirection: InLay
3/07/2023 125
• InLay simultaneously leverages indirection and data internal relationships to
construct indirection representations, which respect the similarities between
internal relationships.
• InLay connects invariance and symbolic approaches:
• InLay constructs indirection representations from a fixed set of symbolic
vectors.
• InLay assumes two invariances:
• The data internal relationships are invariant through indirection.
• The set of symbolic vectors to compute indirection representations is
invariant across train and test samples.
Slide credit: Kha Pham Pham, K., Le, H., Ngo, M. and Tran, T., Improving Out-of-distribution
Generalization with Indirection Representations. In The Eleventh
International Conference on Learning Representations.
126. Structure-Mapping Theory (SMT)
3/07/2023 126
• Improve previous theories of analogy, i.e. the
Tversky’s contrast theory, which assumed that an
analogy is stronger if the more attributes the base
and target share in common.
• SMT [1] argued that it is not object attributes
which are mapped in an analogy, but relationships
between objects. X12 star system Solar system
similarity
Hydrogen atom
analogy
Rutherford’s analogy
No.
attributes
mapped
No.
relations
mapped
Literal
similarity
Many Many
Analogy Few Many
[1] Gentner, Dedre. "Structure-mapping: A theoretical framework for analogy." Cognitive science 7.2 (1983): 155-170.
Slide credit: Kha Pham
127. Structure-Mapping Theory (SMT) (cont.)
3/07/2023 127
Which will be chosen to be mapped in an analogy?
Systematicity Principle: A predicate that belongs to a mappable system of mutually
interconnecting relationships is more likely to be imported into the target than is an isolated
predicate.
Solar system
Distance
Attractive
force
Revolves
around
Color Temperature
Hydrogen atom
Distance
Attractive
force
Revolves
around
Color Temperature
Slide credit: Kha Pham
128. Model architecture
3/07/2023 128
• Concrete data representation is viewed as a complete graph
with weighted edges.
• The indirection operator maps this graph to a symbolic graph
with the same weight edges, however the vertices are fixed and
trainable.
• This symbolic graph is propagated and the updated node
features are indirection representations
Slide credit: Kha Pham
129. Experiments on IQ datasets – RAVEN dataset
3/07/2023 129
An IQ problem in RAVEN [1] dataset
Model Accuracy
LSTM 30.1/39.2
Transformers 15.1/42.5
RelationNet 12.5/46.4
PrediNet 13.8/15.6
Average test accuracies (%) without/with InLay in
different OOD testing scenarios on RAVEN
[1] Zhang, Chi, et al. "Raven: A dataset for relational and analogical visual reasoning."
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
• The original paper of RAVEN dataset proposes
different OOD testing scenarios, in which models
are trained on one configuration and tested on
another (but related) configuration.
Slide credit: Kha Pham
130. Experiments on OOD image classification tasks
3/07/2023 130
Dog Dog?
OOD image classification,
in which test images are distorted.
• When test images are injected with different kinds
of distortions other than ones in training, deep
neural networks may fail drastically in image
classification tasks. [1]
[1] Robert Geirhos, Carlos RM Temme, Jonas Rauber, Heiko H Schütt, Matthias Bethge, and
Felix A Wichmann. Generalisation in humans and deep neural networks. Advances in neural
information processing systems, 31, 2018.
Dataset ViT accuracy
SVHN 65.9/68.8
CIFAR10 38.2/43.1
CIFAR100 17.1/20.4
Average test accuracies (%) without/with InLay of Vision
Transformers (ViT) on different types of distortions
Slide credit: Kha Pham
131. Here “physics”
refers to
empirical or
theoretical laws
that exist in
nature.
R² = 0.989
1
10
100
1,000
10,000
100,000
2015 2016 2017 2018 2019 2020 2021 2022
#Papers on PIML
Physics-informed NN
133. Physics invariance
• Newton laws
• Symmetry
• Conversation laws
• Noether’s Theorem linking symmetry and
conservation.
First page of Emmy Noether's
article "Invariante
Variationsprobleme" (1918).
Source: Wikipedia
134. ML, data & physics
• Data collection/annotation for ML is expensive
• ML solutions don’t respect symmetries and conservation laws
• Physics laws are universal (upto scale) | ML only generalizes in-
distribution.
Karniadakis, George Em, et al. "Physics-informed machine learning." Nature Reviews Physics 3.6 (2021): 422-440.
135. Embedding physics into ML
https://medium.com/@zhaoshuai1989/why-do-we-need-physics-informed-machine-learning-piml-d11fe0c4436c
136. Physics guides neural architecture
• Physics-informed neural networks (PINN)
Figure from talk by Perdikaris & Wang, 2020.
137. Physics guides learning dynamics
• Physics-informed neural networks (PINN)
Figure from talk by Perdikaris & Wang, 2020.
138. Case study: Damped harmonic oscillation
Source: https://benmoseley.blog/my-research/so-what-is-a-physics-informed-neural-network/
139. Case study: COVID-19 in VN 2021
• Failed to contain the new exponential growth
due to Delta variant.
• The cost: 20 thousand lives within 3 months!!
• At the peak, the daily mortality ~ Vietnam War’s
rate.
• What worked in 2020 didn’t in 2021.
3/07/2023 139
140. SIR family for pandemics
• N = Population
• S = Susceptible
• I = Infectious
• R = Recovered
Source: Wikipedia
Basic reproduction number
141. Covid-19 infections
• SIR: Close-form solutions hard to calculate
• Parameters change over time due to intervention → Need more flexible
framework.
• Solution: Richards equation → Richards curve | Gompertz curve
• Task: 10-20 data points → Extrapolate 150 more.
142. Model design
• Remember often we have only 20-30 highly correlated data points to
learn from!
• Model is sum of 2-3 “waves” – each is a 3-param Gompertz curve
• Height of the peak
• Location of the peak
• Scale of the wave (the effective width)
• The number of waves indicates of the observed waves, and some
hypothetical waves.
• Model can be thought as a special neural network, each hidden unit is a
wave, but with Gompertz-based kernel.
3/07/2023 142
143. Estimating the model priors
• Impossible to know without assumptions!
• Need priors on wave size & possibly, the scale (e.g., min-max)
• One solution:
• Look for other countries, with adjustment in population size.
• Hopefully the culture, economic structure & actions are similar.
• It depends on:
• The virus variant (original != Delta != Omicron)
• Health/border capacity (closed boder + lockdown in the beginning)
• Vaccination coverage (80% tended to be the threshold for openning)
• Total cases/population.
3/07/2023 143
144. Case of HCM City
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
0
50
100
150
200
250
300
350
400
3/07/21
10/07/21
17/07/21
24/07/21
31/07/21
7/08/21
14/08/21
21/08/21
28/08/21
4/09/21
11/09/21
18/09/21
25/09/21
2/10/21
9/10/21
16/10/21
23/10/21
30/10/21
Ước lượng số ca tử vong do Covid-19, TP HCM
Tử vong ghi nhận Tử vong ước lượng Tử vong tích lũy (thực tế)
20-21/8: Peak
Total cases
16/10
11/8: Predicting date
147. In 2022, DL has reached as new height: GPT-4,
PaLM-E, GATO, etc.
3/07/2023 147
148. Major remaining problems of DL
• Massive associative machine
→ Lack of causality prior, prone to learning wrong things, or work for the wrong
reasons.
→Overconfident for the wrong reasons (e.g., prone to adversarial attacks).
→Exploits short-cuts => poor on OOD generalisation
→Sample inefficient
→Approximate reasoning patterns, not from the first principles.
• Inference separated from learning
→No built-in adaptation other than retraining
→Catastrophic forgetting
• Limited theoretical understanding
3/07/2023 148
149. Are limitations inherent?
• YES, statistical systems tend to memorize data and find short-cuts.
• We need lots of data to cover all possible variations, hence lots of compute.
• But aren’t we great copiers?
• NO, neural nets were founded on the basis of distributed
representation and parallel processing. These are robust, fast and
energy efficient.
• We still need to find “binding” tricks that do all sorts of things without relying
on statistical training signals + backprop.
3/07/2023 149
150. Dimensions of progress
• Continuation of current works/paths
• Expansion/optimisation
• Industrialisation: Scale up & scale out
• Challenge fundamental assumptions
• DL as part of more holistic solution to Human-Level AI (HLAI)
• Dealing with the unexpected: Uncertainty, safety, security
3/07/2023 150
151. Continuation
• Enabling techs: Data, compute, network
• Work with noisy quantum computing (which will take time to mature)
• DL fundamentals: Representation, learning & inference
• Rep = data rep + computational graph + symmetry
• Learning as pre-training to extract as much knowledge from data as possible
• Learning as on-the-fly inference (Bayesian, hypernetwork/fast weight)
• Extreme inference = dynamic computational graph on-the-fly.
3/07/2023 151
153. Expansion/optimisation
• New inductive biases (for vision, NLP, living things, science, social AI,
ethical AI)
• Cutting the statistical/associative short-cuts
• Shifting from feature space to function space.
• Pushing for high-level analogy (rather than just feature-based
kernel/template matching)
• Binding, indirection, symbols
• Injection of knowledge into models.
3/07/2023 153
154. Expansion (2)
• Expanding to classical AI areas (planning, reasoning, knowledge
representation, symbol manipulation).
• Needs to solve symbol grounding for that to happen.
• Physics-informed neural networks (e.g., my work in Covid-19
forecasting)
• Social dimensions, human-in-the-loop
3/07/2023 154
155. Industrialisation: Scaling - success
formula thus far
Data + knowledge + compute + generic scalable algorithms
3/07/2023 155
156. Scaling - Rich Sutton’s Bitter Lesson (2019)
3/07/2023 156
“The biggest lesson that can be read from 70 years of AI research is
that general methods that leverage computation are ultimately the
most effective, and by a large margin. ”
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
“The two methods that seem to scale arbitrarily in this way
are search and learning.”
158. But …
• Scaling is like building a taller ladder to get to the Moon.
• We need rocket and science of escape velocity.
• Human brain is big (1e+14 synapses) but does exactly opposite –
maximize entropy reduction using minimum energy (thinking of the
most efficient heat engine).
• Just 20W is enough for human-level intelligence!
• => Must use different principles rather than just (sample inefficient) statistics!
• No need to go around like computer: Analog -> Digital/sequential -> Parallel
analog simulation.
3/07/2023 158
159. DL is part of Broad AI
3/07/2023 159
Hochreiter, S., 2022.
Toward a broad AI.
Communications of
the ACM, 65(4), pp.56-
57.
160. DL is part of Integrated Intelligence
LeCun’s plan
3/07/2023 160
https://ai.facebook.com/blog/yann-lecun-advances-in-ai-research/
Knowledge?
162. DL “accidental” history
3/07/2023 162
Source: rikochet_band
1950s: Rosenblatt wired the first trainable perceptron, hyping AI up.
1970-1980s: Minsky and Papert almost killed it until Rumelhart et al. worked out high-school
math to train multi-layer perceptron.
1980-1990s: LeCun managed to get CNN work for something real.
1990s: RNN was proved to be Turing-equivalent. Schmidhuber got excited and bombarded the
field with lots of cool ideas.
1990s-2000s: But the models were shallow and hard to train. Almost no one worked on it for 2
decades until the Canadian mafia fought back with new tricks to train deeper models.
2010s: !Accidently DL took off like a rocket, thanks to gamers.
2020s: Now DL works on everything, except for:
small data, shifted data, noisy data, artificially twisted data, deep stuffs,
exact stuffs, abstract stuffs, causal stuffs, symbolic stuffs, thinking stuffs, and
stuffs that no one knows how they work like consciousness.
2020s: DL believers got rich, and a new bunch of students got over trained.
164. Final words
• Deep neural networks are here to stay, may be as a part of the holistic solution to
human-level AI.
• Gradient-based learning is still without parallel.
• DL will be much more general/universal/versatile (e.g., dynamic architecture,
with Transformer is a relaxed approximation)
• Higher cognitive capabilities will be there, may be with symbol manipulation
capacity.
• Better generalization capability (e.g., extreme)
• We have to deal with consequences of its own success.
• Negative effect; Jevon’s paradox
• The DL is now an industry, and is still going strong. But students may be over-fitted to
particular DL ways of thinking.
• The industry will need to keep the highly trained (overfitted) DL workforce busy!
3/07/2023 164
165. Second
bitter lesson
Little priors (innateness?) + lots of
experiments > strong priors (theory of
intelligence) + trying to prove it.
=> Chomsky would disagree here.
3/07/2023 165
Source: QuestionPro