Artificial intelligence in the post-deep learning era

AI in the post-deep
learning era
Prof Truyen Tran
Head of AI, Health and Science
Applied AI Institute (A2I2), Deakin University
truyen.tran@deakin.edu.au
12/04/2024 1

“[By 2023] … Emergence
of the generally agreed
upon "next big thing" in
AI beyond deep learning.”
Rodney Brooks, Jan 2018
4/12/2024 2

“Deep learning is going to be able to do everything”
(Geoff Hinton, Nov 2020)
12/04/2024 3

12/04/2024 4
The quest of modern AI:
Learning a Turing machine
A mechanical Turing machine
Can we design a (neural)
program that learns to
program from data?

Three kinds of AI
• Cognitive automation: encoding human
abstractions → automate tasks normally performed
by humans.
• Majority of current machine learning & symbolic AI fall
into this category.
• Cognitive assistance: AI helps us make sense of the
world (perceive, think, understand).
• This is where the true potential of AI lies.
• Some applications of ML fall into this category at present.
• Cognitive autonomy: Artificial minds thrive
independently of us, exist for their own sake.
• Science fiction!
François Chollet
12/04/2024 5

2012
2016
Turing Awards 2018
12 years snapshot
Picture taken from Bommasani et al, 2021
2024
12/04/2024 6

Agenda
12/04/2024
7
Part II:
AI post-
DL
System 1
Towards System 2
AI as a science
Part I:
Deep
learning
Fundamentals
Powers
Limitations

Depth refers to number of steps between input-output
Integrate-and-fire neuron
andreykurenkov.com
Feature detector
Block representation
12/04/2024 8

Integrate-and-fire neuron
andreykurenkov.com
Inductive biases
• Neuron as trainable feature
detector
• Depth + Skip-connection
• Invariance/equivariance:
• Convolution (Translation)
• Recurrence (Time travel)
• Attention (Permutation)
• Analogy
• Kernel, case-based reasoning,
• Attention, memory
Feature detector
Source: http://karpathy.github.io/assets/rnn/diags.jpeg
12/04/2024 9

Neural networks as new Electronic circuits
• Computational graph → Circuit
• Compositionality → Modular design
• Neuron as feature detector → SENSOR, FILTER
• Multiplicative gates → AND gate, Transistor,
Resistor
• Attention mechanism → SWITCH gate
• Memory + forgetting → Capacitor + leakage
• Skip-connection → Short circuit
12/04/2024 10

What DL really means
• Functional view: Nested function composition. Base functions are
feature transformation.
• Depth is number of transformation steps between raw data and output.
• State view: Layered data abstraction, distributed representation.
• Kernel view: Nested kernels, aka “glorified template matching”.
• Programming view: Differentiable programming, dynamic modular
composition, trainable computational graphs.
• Memory view: An associative way to compress data/world model into
weights, and decompress data when prompted.
12/04/2024 11

Advances in the past 10 years
• Architectures – CNN/RNN family, attention/Transformers, memory/differentiable programming,
native data structures (sequence, tree, grid, graph, set), skip-connection, hypernetwork/fast
weight.
• Training techniques (Param initialization, Adam, RMSProp, BERT, self-supervised learning,
contrastive learning).
• Robustness (Dropout, normalization).
• Large models/compute (GPT-X, etc).
• Deep generative models (VAE, GAN, Normalizing flows, Diffusion).
• New theoretical understanding (overparameterization, role of depth, nature of gradient learning).
• Hardware to support parallelization (GPU, TPU).
12/04/2024 12

Picture taken from (Bommasani et al, 2021)
A tipping point: Foundation models
• A foundation model is a
model trained at broad
scale that can adapted
to a wide range of
downstream tasks
• Scale and the ability to
perform tasks beyond
training
Slide credit: Samuel Albanie, 2022
13
12/04/2024

Key concepts that make DL work
• Distributed representation
• Associative learning
• Layers + backprop
→ DL picks up contextual information easily, as long as there are signals (numerical or textual).
→ DL mimics training signals. At extreme, it will be indistinguisable from human’s expression.
→Cross-modal association isn’t hard. Symbol grounding “appears” to be solved (it isn’t).
→DL scales arbitrarily with data and compute (really key for modern AI)
12/04/2024 14

DL works on almost all modalities
SIGNALS STRINGS TABLES
12/04/2024 15

“Software 2.0 is written in
neural network weights”
Andrej Karpathy, Nov
2017
4/12/2024 16

Why DL is so powerful?
Practical
• Generality: Applicable to many
domains.
• Competitive: DL is hard to beat as
long as there are data to train.
• Scalability: DL is better with more
data, and it is very scalable.
Theoretical
Expressiveness: Neural nets
can approximate any function.
Learnability: Neural nets are
trained easily.
Generalisability: Neural nets
generalize surprisingly well to
unseen data.
12/04/2024 17

Why is deep
generative
models
(DGMs) so
powerful?
DGMs are
compression
engine
Prompting is conditioning
for the (preference-
guided) decompression.
DGMs are
approximate
program database
Prompting is retrieving an
approximate program that
takes input and delivers
output.
DGMs are
World Model
We can live entirely in
simulation!
12/04/2024 18

The power comes from arbitrary scaling - Rich Sutton’s
Bitter Lesson (2019)
12/04/2024 19
“The biggest lesson that can be read from 70 years of AI research is that
general methods that leverage computation are ultimately the most
effective, and by a large margin. ”
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
“The two methods that seem to scale arbitrarily in this way
are search and learning.”

DL can learn everything
…
as long as we have the right architecture and clean data
…
as long as we have the right architecture and clean data
12/04/2024 20

What are limitations of deep learning?
• Modern neural networks are massive curve fitting
• Good at interpolating
→Data hungry to cover all variations and smooth
local manifolds
→Very sample/energy inefficient, low rate of data-
knowledge conversion.
→ Little systematic generalization (novel
combinations)
• Inference separated from learning
→No built-in adaptation other than retraining
→Catastrophic forgetting
12/04/2024
• Lack of human-perceived reasoning
capability
• Lack of logical inference
• Lack of natural mechanism to
incorporate prior knowledge, e.g.,
common sense
• No built-in causal mechanisms
• Limited theoretical understanding.
12/04/2024 21

Are these limitations inherent?
• YES, statistical systems tend to memorize data and find short-cuts.
• We need lots of data to cover all possible variations, hence lots of compute.
• But aren’t we great copiers?
• NO, neural nets were founded on the basis of distributed
representation and parallel processing. These are robust, fast and
energy efficient.
• We still need to find “binding” tricks that do all sorts of things without relying
on statistical training signals + backprop.
12/04/2024 22

Agenda
12/04/2024
23
Part II:
AI post-
DL
System 1
Towards System 2
AI as a science
Part I:
Deep
learning
Fundamentals
Powers
Limitations

Dual system: A possible architecture
System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
Single
Image credit: VectorStock | Wikimedia
Perception
Theory of mind
Recursive reasoning
Facts
Semantics
Events and relations
Working space
Memory
12/04/2024 24

Continuation of System 1
• DL has been heavily invested by industry
• → They need to reap the benefits for the years to come, both hardware and
software sides.
• Enabling techs: Data, compute, networking
• → Scaling up (bigger) & scaling out (mixture)
• → One model for all
• DL fundamentals: Representation, learning & inference
• Rep = data rep + computational graph + symmetry
• Learning as pre-training to extract as much knowledge from data as possible
• Learning as on-the-fly inference (Bayesian, hypernetwork/fast weight)
• Extreme inference = dynamic computational graph on-the-fly.
12/04/2024 25

DeepMind: Scale (up) is enough!
12/04/2024 26

But …
• Scaling is like building a taller ladder to get to the Moon.
• We need rocket and science of escape velocity.
• Human brain is big (1e+14 synapses) but does exactly opposite –
maximize entropy reduction using minimum energy (thinking of the
most efficient heat engine).
12/04/2024 27

One model for all – our early attempt
• «(a) multi-label, (b) multi-view, (c) multi-
view/multi-label and (d) multi-instance »
• Columns are generic message passing scheme
between entities
12/04/2024 28
Pham, Trang, Truyen Tran, and Svetha Venkatesh. "One size fits many: Column
bundle for multi-x learning." arXiv preprint arXiv:1702.07021 (2017).

12/04/2024 29
convolution --
motif detection
3
sequencing
time gaps/transfer
phrase/admission
1
embedding
2
word
vector
medical record
visits/admissions
time gap
?
prediction point output
max-pooling
prediction
4
5
record
vector
Our early attempt (2): Deepr
Nguyen, Phuoc, Truyen Tran,
Nilmini Wickramasinghe, and
Svetha Venkatesh. Deepr: a
convolutional net for medical
records." IEEE journal of
biomedical and health
informatics 21, no. 1 (2016): 22-30.
Concept: Stringify() – everything as a string

One model for all – the case of Gato
12/04/2024 30

Why one-model-for-all possible?
• The world is regular: Rules, patterns, motifs, grammars, recurrence
• World models are learnable from data!
• Advances in ML:
• Model flexibility
• Powerful training and inference machines
• Smart tricks
• Human brain gives an examole
• One brain, but capable of processing all modalities, doing plenty of tasks, and
learning from different kind of training signals.
• Thinking at high level is independent of input modalities and task-specific
skills.
12/04/2024 31

RL Team: Reward is enough
12/04/2024 32
Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton.
"Reward is enough." Artificial Intelligence 299 (2021): 103535.

Knowledge → Reward
→ Generation
12/04/2024 33
12/04/2024 33

Agenda
12/04/2024
34
Part II:
AI post-
DL
System 1
Towards System 2
AI as a science
Part I:
Deep
learning
Fundamentals
Powers
Limitations

Machine reasoning
Reasoning is concerned with arriving at a deduction
about a new combination of circumstances.
Reasoning is to deduce new knowledge from
previously acquired knowledge in response to a
query.
12/04/2024 35
Leslie Valiant
Leon Bottou

Hypotheses
Reasoning as just-
in-time program
synthesis.
It employs
conditional
computation.
Reasoning is
recursive, e.g.,
mental travel.
12/04/2024 36

Neural reasoning: Two methods
• Implicit chaining of predicates through recurrence:
• Step-wise query-specific attention to relevant concepts & relations.
• Iterative concept refinement & combination, e.g., through a working memory.
• Answer is computed from the last memory state & question embedding.
• Explicit program synthesis:
• There is a set of modules, each performs an pre-defined operation.
• Question is parse into a symbolic program.
• The program is implemented as a computational graph constructed by chaining
separate modules.
• The program is executed to compute an answer.
12/04/2024 37

Learning to reason: Reasoning as a skill
• Reasoning as a prediction skill that can be learnt
from data.
• Question answering as zero-shot learning.
• Neural network operations for learning to reason:
• Attention & transformers.
• Dynamic neural networks, conditional computation &
differentiable programming.
• Module networks
• LLMs to generate program on the fly + feedbacks
12/04/2024 38
(Dan Roth; ACM Fellow; IJCAI
John McCarthy Award)

Example: LOGNet
Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic Language
Binding in Relational Visual Reasoning”, IJCAI’20.
12/04/2024 39

Deliberative reasoning
implies memory
• Three steps:
• Store data/representations into memory
• Read query, process sequentially, consult/update memory
• Output answer
• But data memory isn’t enough:
• No memory of controllers → Less modularity and compositionality when
query is complex
• No memory of relations → Much harder to chain predicates.
• Still iterative refinement → Prone to curve fitting
12/04/2024 40
Source: rylanschaeffer.github.io

Relational memories → Relations discovery
12/04/2024 41
12/04/2024 41

Program memory → Program synthesis
Le, Hung, Truyen Tran, and Svetha Venkatesh. "Neural Stored-program Memory."
In International Conference on Learning Representations. 2019.
Slide credit: Hung Le
Neural stored-program memory
(NSM) stores key (the address)
and values (the weight)
The weight is selected and
loaded to the controller of NTM
The stored NTM weights and the
weight of the NUTM is learnt
end-to-end by backpropagation
12/04/2024 42

Application of memory: Reasoning over IoT
Unified representation
• Higher-order,
dynamic
relationships
• Multiple sampling
strategies
Reasoning mechanism
• Memory
• Multi-step
reasoning
12/04/2024 43
Unified multimodal representation
[Events/features streams] [Sensory streams]
Cross-channel
deliberative reasoning
Anomaly detection
Events detection
Prediction/forecast
Generation
[Downstream tasks]
Dist. events assoc.
Decoder
[Memory]
LLMs
Large vision-language models
Insights, dialog

Symbolic processing is
desirable in System 2
• Learning with less and zero-shot
learning;
• Generalization of the solutions to
unseen tasks and unforeseen data
distributions;
• Explainability by construction;
12/04/2024 44
https://ibm.github.io/neuro-symbolic-ai/events/ns-
workshop2023
Self-Aware Learning
• Deeper learning for challenging tasks
• Integrating continuous and symbolic
representations
• Diversified learning modalities
Credit: Yolanda Gil, Bart Selman
AI to Understand Human
Intelligence
• 5 years: AI systems could be designed to
study psychological models of complex
intelligent phenomena that are based on
combinations of symbolic processing and
artificial neural networks.

Henry Kautz's taxonomy (2)
• Symbolic[Neural]—is exemplified by
AlphaGo, where symbolic techniques are
used to call neural techniques. In this case,
the symbolic approach is Monte Carlo tree
search and the neural techniques learn
how to evaluate game positions.
12/04/2024 45
Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore
memorial lecture. AI Magazine, 43(1), pp.105-125.
https://en.wikipedia.org/wiki/Neuro-symbolic_AI

• Neural | Symbolic—uses a neural architecture to interpret perceptual data as
symbols and relationships that are reasoned about symbolically. The Neural-
Concept Learner is an example.
12/04/2024 46

• Neural[Symbolic]—allows a
neural model to directly call a
symbolic reasoning engine, e.g.,
to perform an action or evaluate
a state. An example would be
ChatGPT using a plugin to query
Wolfram Alpha.
12/04/2024 47

Symbols via Indirection
12/04/2024 48
Z = X + Y
3 1 2
Bind symbols with values
Pointer in Computer Science
https://www.linkedin.com/pulse/
unsolved-problems-ai-part-2-binding-problem-eberhard-schoeneburg/
Indirection binds two objects together and uses one to refer to the other.
Slide credit: Kha Pham
Every computer science
problem can be solved with a
higher level of indirection.
Andrew Koenig, Butler Lampson, David J. Wheeler

InLay: Indirection layer
12/04/2024 49
• Concrete data representation is viewed as a complete graph
with weighted edges.
• The indirection operator maps this graph to a symbolic graph
with the same weight edges, however the vertices are fixed and
trainable.
• This symbolic graph is propagated and the updated node
features are indirection representations

Experiments on IQ datasets – RAVEN dataset
12/04/2024 50
An IQ problem in RAVEN [1] dataset
Model Accuracy
LSTM 30.1/39.2
Transformers 15.1/42.5
RelationNet 12.5/46.4
PrediNet 13.8/15.6
Average test accuracies (%) without/with InLay in
different OOD testing scenarios on RAVEN
[1] Zhang, Chi, et al. "Raven: A dataset for relational and analogical visual reasoning."
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
• The original paper of RAVEN dataset proposes
different OOD testing scenarios, in which models
are trained on one configuration and tested on
another (but related) configuration.

Experiments on OOD image classification tasks
12/04/2024 51
Dog Dog?
OOD image classification,
in which test images are distorted.
• When test images are injected with different kinds
of distortions other than ones in training, deep
neural networks may fail drastically in image
classification tasks. [1]
[1] Robert Geirhos, Carlos RM Temme, Jonas Rauber, Heiko H Schütt, Matthias Bethge, and
Felix A Wichmann. Generalisation in humans and deep neural networks. Advances in neural
information processing systems, 31, 2018.
Dataset ViT accuracy
SVHN 65.9/68.8
CIFAR10 38.2/43.1
CIFAR100 17.1/20.4
Average test accuracies (%) without/with InLay of Vision
Transformers (ViT) on different types of distortions

Embedding symbolic physics into ML
Source: @zhaoshuai1989
12/04/2024 52

Physics-informed neural networks
Figure from talk by Perdikaris & Wang, 2020.
12/04/2024 53

Case study: Covid-19 infections in VN 2021
• Classic model SIR: Close-form solutions hard to calculate
• Parameters change over time due to intervention → Need
more flexible framework.
• Solution: Richards equation → Mixture of Gompertz
curves
• Task: 10-20 data points → Extrapolate 150 more.
12/04/2024 54

Case of HCM City
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
0
50
100
150
200
250
300
350
400
3/07/21
10/07/21
17/07/21
24/07/21
31/07/21
7/08/21
14/08/21
21/08/21
28/08/21
4/09/21
11/09/21
18/09/21
25/09/21
2/10/21
9/10/21
16/10/21
23/10/21
30/10/21
Ước lượng số ca tử vong do Covid-19, TP HCM
Tử vong ghi nhận Tử vong ước lượng Tử vong tích lũy (thực tế)
20-21/8: Peak
Total cases
16/10
11/8: Predicting date
12/04/2024 55

Agenda
12/04/2024
56
Part II:
AI post-
DL
System 1
Towards System 2
AI as a science
Part I:
Deep
learning
Fundamentals
Powers
Limitations

DL pushes changes in practice of AI
12/04/2024 57
2000s
Focus: Model
Flow: Data → Feature → Model → Deploy
Reception: Skeptical
2010s
Focus: Data
Flow: Data → Model → Deploy
Reception: Accelerating
2020s
Focus: Prompt
Flow: Prompt → Deploy
Reception: Responsible

Newbehaviours
Emergence
•system behaviour is implicitly induced rather than explicitly constructed
•cause of scientific excitement and anxiety of unanticipated consequences
Homogenisation
•consolidation of methodology for building machine learning system across many applications
•provides strong leverage for many tasks, but also creates single points of failure
Slide credit: Samuel Albanie, 2022
12/04/2024 58

The shifting towards science
Engineering
Design man-made systems
AI
Discover emergent behaviours
Science
Discover laws in nature.
12/04/2024 59

Example: Data → Prompt → Deploy
Long Dang, Thao Le, Vuong Le, Tu Minh Phuong, Truyen Tran, SADL: An Effective In-Context
Learning Method for Compositional Visual QA, 2023
12/04/2024 60

Example: LLM agent for scientific discovery
Request: Design a
material that:
- <Requirement 1>
- <Requirement 2>
- …
User
Crystal LLM Agent
Designed
Prompt
- Task description
- Tools description
- Few-shot examples
- …
High-level tasks
Search for template
Generate from template
Evaluate requirement 1
Evaluate requirement 2
Tools set
Tool 1 Tool 2 Tool 3
Selected tools
Tool 1
Tool2
Tool 3
Tool 3
Execution
Reflect
Correction
Final answer
12/04/2024 61

LLM social agents
• Extended actions space:
• APIs
• RAG
• Architectures with LLM
and external memory
• Long-term
• Short-term/sensory
• Working
Memory LLM World
Other
Agent
Other
Agent
Other
Agent
12/04/2024 62
• Social Interactions
• Working as a team in cooperative tasks
• Effective Communications:
• When/Who/What to communicate?
• Via other’s actions and messages
• What is others’ knowledge or belief?
• Should others’ knowledge be corrected by
communication?

Agenda
12/04/2024
63
Part II:
AI post-
DL
System 1
Towards System 2
AI as a science
Part I:
Deep
learning
Fundamentals
Powers
Limitations

Conclusion
• DL reached its peak in 2022 with ChatGPT. This changed the AI practice
dramatically.
• Deep neural networks are here to stay, may be as a part of the holistic solution to
human-level AI.
• Gradient-based learning is still without parallel.
• DL will be much more general/universal/versatile
• Higher cognitive capabilities will be there, may be with symbol manipulation
capacity.
• Better generalization capability (e.g., extreme)
• We have to deal with consequences of its own success.
• The industry will need to keep the highly trained (overfitted) DL workforce busy!
12/04/2024 64

12/04/2024 65
Credit: AvePoint

Artificial intelligence in the post-deep learning era

Recommended

Recommended

More Related Content

Similar to Artificial intelligence in the post-deep learning era

Similar to Artificial intelligence in the post-deep learning era (20)

More from Deakin University

More from Deakin University (20)

Recently uploaded

Recently uploaded (20)

Artificial intelligence in the post-deep learning era