SlideShare a Scribd company logo
1 of 65
Download to read offline
AI in the post-deep
learning era
Prof Truyen Tran
Head of AI, Health and Science
Applied AI Institute (A2I2), Deakin University
truyen.tran@deakin.edu.au
12/04/2024 1
“[By 2023] … Emergence
of the generally agreed
upon "next big thing" in
AI beyond deep learning.”
Rodney Brooks, Jan 2018
4/12/2024 2
“Deep learning is going to be able to do everything”
(Geoff Hinton, Nov 2020)
12/04/2024 3
12/04/2024 4
The quest of modern AI:
Learning a Turing machine
A mechanical Turing machine
Can we design a (neural)
program that learns to
program from data?
Three kinds of AI
• Cognitive automation: encoding human
abstractions → automate tasks normally performed
by humans.
• Majority of current machine learning & symbolic AI fall
into this category.
• Cognitive assistance: AI helps us make sense of the
world (perceive, think, understand).
• This is where the true potential of AI lies.
• Some applications of ML fall into this category at present.
• Cognitive autonomy: Artificial minds thrive
independently of us, exist for their own sake.
• Science fiction!
François Chollet
12/04/2024 5
2012
2016
Turing Awards 2018
12 years snapshot
Picture taken from Bommasani et al, 2021
2024
12/04/2024 6
Agenda
12/04/2024
7
Part II:
AI post-
DL
System 1
Towards System 2
AI as a science
Part I:
Deep
learning
Fundamentals
Powers
Limitations
Depth refers to number of steps between input-output
Integrate-and-fire neuron
andreykurenkov.com
Feature detector
Block representation
12/04/2024 8
Integrate-and-fire neuron
andreykurenkov.com
Inductive biases
• Neuron as trainable feature
detector
• Depth + Skip-connection
• Invariance/equivariance:
• Convolution (Translation)
• Recurrence (Time travel)
• Attention (Permutation)
• Analogy
• Kernel, case-based reasoning,
• Attention, memory
Feature detector
Source: http://karpathy.github.io/assets/rnn/diags.jpeg
12/04/2024 9
Neural networks as new Electronic circuits
• Computational graph → Circuit
• Compositionality → Modular design
• Neuron as feature detector → SENSOR, FILTER
• Multiplicative gates → AND gate, Transistor,
Resistor
• Attention mechanism → SWITCH gate
• Memory + forgetting → Capacitor + leakage
• Skip-connection → Short circuit
12/04/2024 10
What DL really means
• Functional view: Nested function composition. Base functions are
feature transformation.
• Depth is number of transformation steps between raw data and output.
• State view: Layered data abstraction, distributed representation.
• Kernel view: Nested kernels, aka “glorified template matching”.
• Programming view: Differentiable programming, dynamic modular
composition, trainable computational graphs.
• Memory view: An associative way to compress data/world model into
weights, and decompress data when prompted.
12/04/2024 11
Advances in the past 10 years
• Architectures – CNN/RNN family, attention/Transformers, memory/differentiable programming,
native data structures (sequence, tree, grid, graph, set), skip-connection, hypernetwork/fast
weight.
• Training techniques (Param initialization, Adam, RMSProp, BERT, self-supervised learning,
contrastive learning).
• Robustness (Dropout, normalization).
• Large models/compute (GPT-X, etc).
• Deep generative models (VAE, GAN, Normalizing flows, Diffusion).
• New theoretical understanding (overparameterization, role of depth, nature of gradient learning).
• Hardware to support parallelization (GPU, TPU).
12/04/2024 12
Picture taken from (Bommasani et al, 2021)
A tipping point: Foundation models
• A foundation model is a
model trained at broad
scale that can adapted
to a wide range of
downstream tasks
• Scale and the ability to
perform tasks beyond
training
Slide credit: Samuel Albanie, 2022
13
12/04/2024
Key concepts that make DL work
• Distributed representation
• Associative learning
• Layers + backprop
→ DL picks up contextual information easily, as long as there are signals (numerical or textual).
→ DL mimics training signals. At extreme, it will be indistinguisable from human’s expression.
→Cross-modal association isn’t hard. Symbol grounding “appears” to be solved (it isn’t).
→DL scales arbitrarily with data and compute (really key for modern AI)
12/04/2024 14
DL works on almost all modalities
SIGNALS STRINGS TABLES
12/04/2024 15
“Software 2.0 is written in
neural network weights”
Andrej Karpathy, Nov
2017
4/12/2024 16
Why DL is so powerful?
Practical
• Generality: Applicable to many
domains.
• Competitive: DL is hard to beat as
long as there are data to train.
• Scalability: DL is better with more
data, and it is very scalable.
Theoretical
Expressiveness: Neural nets
can approximate any function.
Learnability: Neural nets are
trained easily.
Generalisability: Neural nets
generalize surprisingly well to
unseen data.
12/04/2024 17
Why is deep
generative
models
(DGMs) so
powerful?
DGMs are
compression
engine
Prompting is conditioning
for the (preference-
guided) decompression.
DGMs are
approximate
program database
Prompting is retrieving an
approximate program that
takes input and delivers
output.
DGMs are
World Model
We can live entirely in
simulation!
12/04/2024 18
The power comes from arbitrary scaling - Rich Sutton’s
Bitter Lesson (2019)
12/04/2024 19
“The biggest lesson that can be read from 70 years of AI research is that
general methods that leverage computation are ultimately the most
effective, and by a large margin. ”
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
“The two methods that seem to scale arbitrarily in this way
are search and learning.”
DL can learn everything
…
as long as we have the right architecture and clean data
…
as long as we have the right architecture and clean data
12/04/2024 20
What are limitations of deep learning?
• Modern neural networks are massive curve fitting
• Good at interpolating
→Data hungry to cover all variations and smooth
local manifolds
→Very sample/energy inefficient, low rate of data-
knowledge conversion.
→ Little systematic generalization (novel
combinations)
• Inference separated from learning
→No built-in adaptation other than retraining
→Catastrophic forgetting
12/04/2024
• Lack of human-perceived reasoning
capability
• Lack of logical inference
• Lack of natural mechanism to
incorporate prior knowledge, e.g.,
common sense
• No built-in causal mechanisms
• Limited theoretical understanding.
12/04/2024 21
Are these limitations inherent?
• YES, statistical systems tend to memorize data and find short-cuts.
• We need lots of data to cover all possible variations, hence lots of compute.
• But aren’t we great copiers?
• NO, neural nets were founded on the basis of distributed
representation and parallel processing. These are robust, fast and
energy efficient.
• We still need to find “binding” tricks that do all sorts of things without relying
on statistical training signals + backprop.
12/04/2024 22
Agenda
12/04/2024
23
Part II:
AI post-
DL
System 1
Towards System 2
AI as a science
Part I:
Deep
learning
Fundamentals
Powers
Limitations
Dual system: A possible architecture
System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
Single
Image credit: VectorStock | Wikimedia
Perception
Theory of mind
Recursive reasoning
Facts
Semantics
Events and relations
Working space
Memory
12/04/2024 24
Continuation of System 1
• DL has been heavily invested by industry
• → They need to reap the benefits for the years to come, both hardware and
software sides.
• Enabling techs: Data, compute, networking
• → Scaling up (bigger) & scaling out (mixture)
• → One model for all
• DL fundamentals: Representation, learning & inference
• Rep = data rep + computational graph + symmetry
• Learning as pre-training to extract as much knowledge from data as possible
• Learning as on-the-fly inference (Bayesian, hypernetwork/fast weight)
• Extreme inference = dynamic computational graph on-the-fly.
12/04/2024 25
DeepMind: Scale (up) is enough!
12/04/2024 26
But …
• Scaling is like building a taller ladder to get to the Moon.
• We need rocket and science of escape velocity.
• Human brain is big (1e+14 synapses) but does exactly opposite –
maximize entropy reduction using minimum energy (thinking of the
most efficient heat engine).
12/04/2024 27
One model for all – our early attempt
• «(a) multi-label, (b) multi-view, (c) multi-
view/multi-label and (d) multi-instance »
• Columns are generic message passing scheme
between entities
12/04/2024 28
Pham, Trang, Truyen Tran, and Svetha Venkatesh. "One size fits many: Column
bundle for multi-x learning." arXiv preprint arXiv:1702.07021 (2017).
12/04/2024 29
convolution --
motif detection
3
sequencing
time gaps/transfer
phrase/admission
1
embedding
2
word
vector
medical record
visits/admissions
time gap
?
prediction point output
max-pooling
prediction
4
5
record
vector
Our early attempt (2): Deepr
Nguyen, Phuoc, Truyen Tran,
Nilmini Wickramasinghe, and
Svetha Venkatesh. Deepr: a
convolutional net for medical
records." IEEE journal of
biomedical and health
informatics 21, no. 1 (2016): 22-30.
Concept: Stringify() – everything as a string
One model for all – the case of Gato
12/04/2024 30
Why one-model-for-all possible?
• The world is regular: Rules, patterns, motifs, grammars, recurrence
• World models are learnable from data!
• Advances in ML:
• Model flexibility
• Powerful training and inference machines
• Smart tricks
• Human brain gives an examole
• One brain, but capable of processing all modalities, doing plenty of tasks, and
learning from different kind of training signals.
• Thinking at high level is independent of input modalities and task-specific
skills.
12/04/2024 31
RL Team: Reward is enough
12/04/2024 32
Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton.
"Reward is enough." Artificial Intelligence 299 (2021): 103535.
Knowledge → Reward
→ Generation
12/04/2024 33
12/04/2024 33
Agenda
12/04/2024
34
Part II:
AI post-
DL
System 1
Towards System 2
AI as a science
Part I:
Deep
learning
Fundamentals
Powers
Limitations
Machine reasoning
Reasoning is concerned with arriving at a deduction
about a new combination of circumstances.
Reasoning is to deduce new knowledge from
previously acquired knowledge in response to a
query.
12/04/2024 35
Leslie Valiant
Leon Bottou
Hypotheses
Reasoning as just-
in-time program
synthesis.
It employs
conditional
computation.
Reasoning is
recursive, e.g.,
mental travel.
12/04/2024 36
Neural reasoning: Two methods
• Implicit chaining of predicates through recurrence:
• Step-wise query-specific attention to relevant concepts & relations.
• Iterative concept refinement & combination, e.g., through a working memory.
• Answer is computed from the last memory state & question embedding.
• Explicit program synthesis:
• There is a set of modules, each performs an pre-defined operation.
• Question is parse into a symbolic program.
• The program is implemented as a computational graph constructed by chaining
separate modules.
• The program is executed to compute an answer.
12/04/2024 37
Learning to reason: Reasoning as a skill
• Reasoning as a prediction skill that can be learnt
from data.
• Question answering as zero-shot learning.
• Neural network operations for learning to reason:
• Attention & transformers.
• Dynamic neural networks, conditional computation &
differentiable programming.
• Module networks
• LLMs to generate program on the fly + feedbacks
12/04/2024 38
(Dan Roth; ACM Fellow; IJCAI
John McCarthy Award)
Example: LOGNet
Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic Language
Binding in Relational Visual Reasoning”, IJCAI’20.
12/04/2024 39
Deliberative reasoning
implies memory
• Three steps:
• Store data/representations into memory
• Read query, process sequentially, consult/update memory
• Output answer
• But data memory isn’t enough:
• No memory of controllers → Less modularity and compositionality when
query is complex
• No memory of relations → Much harder to chain predicates.
• Still iterative refinement → Prone to curve fitting
12/04/2024 40
Source: rylanschaeffer.github.io
Relational memories → Relations discovery
12/04/2024 41
12/04/2024 41
Program memory → Program synthesis
Le, Hung, Truyen Tran, and Svetha Venkatesh. "Neural Stored-program Memory."
In International Conference on Learning Representations. 2019.
Slide credit: Hung Le
Neural stored-program memory
(NSM) stores key (the address)
and values (the weight)
The weight is selected and
loaded to the controller of NTM
The stored NTM weights and the
weight of the NUTM is learnt
end-to-end by backpropagation
12/04/2024 42
Application of memory: Reasoning over IoT
Unified representation
• Higher-order,
dynamic
relationships
• Multiple sampling
strategies
Reasoning mechanism
• Memory
• Multi-step
reasoning
12/04/2024 43
Unified multimodal representation
[Events/features streams] [Sensory streams]
Cross-channel
deliberative reasoning
Anomaly detection
Events detection
Prediction/forecast
Generation
[Downstream tasks]
Dist. events assoc.
Decoder
[Memory]
LLMs
Large vision-language models
Insights, dialog
Symbolic processing is
desirable in System 2
• Learning with less and zero-shot
learning;
• Generalization of the solutions to
unseen tasks and unforeseen data
distributions;
• Explainability by construction;
12/04/2024 44
https://ibm.github.io/neuro-symbolic-ai/events/ns-
workshop2023
Self-Aware Learning
• Deeper learning for challenging tasks
• Integrating continuous and symbolic
representations
• Diversified learning modalities
Credit: Yolanda Gil, Bart Selman
AI to Understand Human
Intelligence
• 5 years: AI systems could be designed to
study psychological models of complex
intelligent phenomena that are based on
combinations of symbolic processing and
artificial neural networks.
Henry Kautz's taxonomy (2)
• Symbolic[Neural]—is exemplified by
AlphaGo, where symbolic techniques are
used to call neural techniques. In this case,
the symbolic approach is Monte Carlo tree
search and the neural techniques learn
how to evaluate game positions.
12/04/2024 45
Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore
memorial lecture. AI Magazine, 43(1), pp.105-125.
https://en.wikipedia.org/wiki/Neuro-symbolic_AI
Henry Kautz's taxonomy (3)
• Neural | Symbolic—uses a neural architecture to interpret perceptual data as
symbols and relationships that are reasoned about symbolically. The Neural-
Concept Learner is an example.
12/04/2024 46
Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore
memorial lecture. AI Magazine, 43(1), pp.105-125.
https://en.wikipedia.org/wiki/Neuro-symbolic_AI
Henry Kautz's taxonomy (6)
• Neural[Symbolic]—allows a
neural model to directly call a
symbolic reasoning engine, e.g.,
to perform an action or evaluate
a state. An example would be
ChatGPT using a plugin to query
Wolfram Alpha.
12/04/2024 47
Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore
memorial lecture. AI Magazine, 43(1), pp.105-125.
https://en.wikipedia.org/wiki/Neuro-symbolic_AI
Symbols via Indirection
12/04/2024 48
Z = X + Y
3 1 2
Bind symbols with values
Pointer in Computer Science
https://www.linkedin.com/pulse/
unsolved-problems-ai-part-2-binding-problem-eberhard-schoeneburg/
Indirection binds two objects together and uses one to refer to the other.
Slide credit: Kha Pham
Every computer science
problem can be solved with a
higher level of indirection.
Andrew Koenig, Butler Lampson, David J. Wheeler
InLay: Indirection layer
12/04/2024 49
• Concrete data representation is viewed as a complete graph
with weighted edges.
• The indirection operator maps this graph to a symbolic graph
with the same weight edges, however the vertices are fixed and
trainable.
• This symbolic graph is propagated and the updated node
features are indirection representations
Slide credit: Kha Pham
Experiments on IQ datasets – RAVEN dataset
12/04/2024 50
An IQ problem in RAVEN [1] dataset
Model Accuracy
LSTM 30.1/39.2
Transformers 15.1/42.5
RelationNet 12.5/46.4
PrediNet 13.8/15.6
Average test accuracies (%) without/with InLay in
different OOD testing scenarios on RAVEN
[1] Zhang, Chi, et al. "Raven: A dataset for relational and analogical visual reasoning."
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
• The original paper of RAVEN dataset proposes
different OOD testing scenarios, in which models
are trained on one configuration and tested on
another (but related) configuration.
Slide credit: Kha Pham
Experiments on OOD image classification tasks
12/04/2024 51
Dog Dog?
OOD image classification,
in which test images are distorted.
• When test images are injected with different kinds
of distortions other than ones in training, deep
neural networks may fail drastically in image
classification tasks. [1]
[1] Robert Geirhos, Carlos RM Temme, Jonas Rauber, Heiko H Schütt, Matthias Bethge, and
Felix A Wichmann. Generalisation in humans and deep neural networks. Advances in neural
information processing systems, 31, 2018.
Dataset ViT accuracy
SVHN 65.9/68.8
CIFAR10 38.2/43.1
CIFAR100 17.1/20.4
Average test accuracies (%) without/with InLay of Vision
Transformers (ViT) on different types of distortions
Slide credit: Kha Pham
Embedding symbolic physics into ML
Source: @zhaoshuai1989
12/04/2024 52
Physics-informed neural networks
Figure from talk by Perdikaris & Wang, 2020.
12/04/2024 53
Case study: Covid-19 infections in VN 2021
• Classic model SIR: Close-form solutions hard to calculate
• Parameters change over time due to intervention → Need
more flexible framework.
• Solution: Richards equation → Mixture of Gompertz
curves
• Task: 10-20 data points → Extrapolate 150 more.
12/04/2024 54
Case of HCM City
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
16,000
18,000
0
50
100
150
200
250
300
350
400
3/07/21
10/07/21
17/07/21
24/07/21
31/07/21
7/08/21
14/08/21
21/08/21
28/08/21
4/09/21
11/09/21
18/09/21
25/09/21
2/10/21
9/10/21
16/10/21
23/10/21
30/10/21
Ước lượng số ca tử vong do Covid-19, TP HCM
Tử vong ghi nhận Tử vong ước lượng Tử vong tích lũy (thực tế)
20-21/8: Peak
Total cases
16/10
11/8: Predicting date
12/04/2024 55
Agenda
12/04/2024
56
Part II:
AI post-
DL
System 1
Towards System 2
AI as a science
Part I:
Deep
learning
Fundamentals
Powers
Limitations
DL pushes changes in practice of AI
12/04/2024 57
2000s
Focus: Model
Flow: Data → Feature → Model → Deploy
Reception: Skeptical
2010s
Focus: Data
Flow: Data → Model → Deploy
Reception: Accelerating
2020s
Focus: Prompt
Flow: Prompt → Deploy
Reception: Responsible
Newbehaviours
Emergence
•system behaviour is implicitly induced rather than explicitly constructed
•cause of scientific excitement and anxiety of unanticipated consequences
Homogenisation
•consolidation of methodology for building machine learning system across many applications
•provides strong leverage for many tasks, but also creates single points of failure
Slide credit: Samuel Albanie, 2022
12/04/2024 58
The shifting towards science
Engineering
Design man-made systems
AI
Discover emergent behaviours
Science
Discover laws in nature.
12/04/2024 59
Example: Data → Prompt → Deploy
Long Dang, Thao Le, Vuong Le, Tu Minh Phuong, Truyen Tran, SADL: An Effective In-Context
Learning Method for Compositional Visual QA, 2023
12/04/2024 60
Example: LLM agent for scientific discovery
Request: Design a
material that:
- <Requirement 1>
- <Requirement 2>
- …
User
Crystal LLM Agent
Designed
Prompt
- Task description
- Tools description
- Few-shot examples
- …
High-level tasks
Search for template
Generate from template
Evaluate requirement 1
Evaluate requirement 2
Tools set
Tool 1 Tool 2 Tool 3
Selected tools
Tool 1
Tool2
Tool 3
Tool 3
Execution
Reflect
Correction
Final answer
12/04/2024 61
LLM social agents
• Extended actions space:
• APIs
• RAG
• Architectures with LLM
and external memory
• Long-term
• Short-term/sensory
• Working
Memory LLM World
Other
Agent
Other
Agent
Other
Agent
12/04/2024 62
• Social Interactions
• Working as a team in cooperative tasks
• Effective Communications:
• When/Who/What to communicate?
• Via other’s actions and messages
• What is others’ knowledge or belief?
• Should others’ knowledge be corrected by
communication?
Agenda
12/04/2024
63
Part II:
AI post-
DL
System 1
Towards System 2
AI as a science
Part I:
Deep
learning
Fundamentals
Powers
Limitations
Conclusion
• DL reached its peak in 2022 with ChatGPT. This changed the AI practice
dramatically.
• Deep neural networks are here to stay, may be as a part of the holistic solution to
human-level AI.
• Gradient-based learning is still without parallel.
• DL will be much more general/universal/versatile
• Higher cognitive capabilities will be there, may be with symbol manipulation
capacity.
• Better generalization capability (e.g., extreme)
• We have to deal with consequences of its own success.
• The industry will need to keep the highly trained (overfitted) DL workforce busy!
12/04/2024 64
12/04/2024 65
Credit: AvePoint

More Related Content

Similar to Artificial intelligence in the post-deep learning era

Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning David Voyles
 
How data science works and how can customers help
How data science works and how can customers helpHow data science works and how can customers help
How data science works and how can customers helpDanko Nikolic
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesBalázs Kégl
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsRussell Jurney
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care Meenakshi Sood
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial IntelligenceZavain Dar
 
Global bigdata conf_01282013
Global bigdata conf_01282013Global bigdata conf_01282013
Global bigdata conf_01282013HPCC Systems
 
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Brocade
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsRussell Jurney
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 
Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdfpaijitk
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data ScienceSean Taylor
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceAditya Parameswaran
 

Similar to Artificial intelligence in the post-deep learning era (20)

Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 
How data science works and how can customers help
How data science works and how can customers helpHow data science works and how can customers help
How data science works and how can customers help
 
Deep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiativesDeep learning and the systemic challenges of data science initiatives
Deep learning and the systemic challenges of data science initiatives
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Agile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics ApplicationsAgile Data Science: Hadoop Analytics Applications
Agile Data Science: Hadoop Analytics Applications
 
Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
 
Global bigdata conf_01282013
Global bigdata conf_01282013Global bigdata conf_01282013
Global bigdata conf_01282013
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
Recent Advances in Machine Learning: Bringing a New Level of Intelligence to ...
 
Agile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics ApplicationsAgile Data Science: Building Hadoop Analytics Applications
Agile Data Science: Building Hadoop Analytics Applications
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdf
 
2014 aus-agta
2014 aus-agta2014 aus-agta
2014 aus-agta
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Putting the Magic in Data Science
Putting the Magic in Data SciencePutting the Magic in Data Science
Putting the Magic in Data Science
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
 

More from Deakin University

Deep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeakin University
 
AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...Deakin University
 
Generative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsDeakin University
 
Generative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeDeakin University
 
Machine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryMachine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryDeakin University
 
Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deakin University
 
AI/ML as an empirical science
AI/ML as an empirical scienceAI/ML as an empirical science
AI/ML as an empirical scienceDeakin University
 
Machine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin UniversityMachine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin UniversityDeakin University
 
AI for tackling climate change
AI for tackling climate changeAI for tackling climate change
AI for tackling climate changeDeakin University
 
Deep learning and applications in non-cognitive domains I
Deep learning and applications in non-cognitive domains IDeep learning and applications in non-cognitive domains I
Deep learning and applications in non-cognitive domains IDeakin University
 
Deep learning and applications in non-cognitive domains II
Deep learning and applications in non-cognitive domains IIDeep learning and applications in non-cognitive domains II
Deep learning and applications in non-cognitive domains IIDeakin University
 
Deep learning and applications in non-cognitive domains III
Deep learning and applications in non-cognitive domains IIIDeep learning and applications in non-cognitive domains III
Deep learning and applications in non-cognitive domains IIIDeakin University
 
Deep learning for episodic interventional data
Deep learning for episodic interventional dataDeep learning for episodic interventional data
Deep learning for episodic interventional dataDeakin University
 
Deep learning for detecting anomalies and software vulnerabilities
Deep learning for detecting anomalies and software vulnerabilitiesDeep learning for detecting anomalies and software vulnerabilities
Deep learning for detecting anomalies and software vulnerabilitiesDeakin University
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeakin University
 

More from Deakin University (20)

Deep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advances
 
AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...
 
Generative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of Materials
 
Generative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI Landscape
 
Machine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryMachine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug Discovery
 
Deep Learning 2.0
Deep Learning 2.0Deep Learning 2.0
Deep Learning 2.0
 
Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1
 
Machine reasoning
Machine reasoningMachine reasoning
Machine reasoning
 
AI/ML as an empirical science
AI/ML as an empirical scienceAI/ML as an empirical science
AI/ML as an empirical science
 
Machine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin UniversityMachine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin University
 
AI in the Covid-19 pandemic
AI in the Covid-19 pandemicAI in the Covid-19 pandemic
AI in the Covid-19 pandemic
 
Visual reasoning
Visual reasoningVisual reasoning
Visual reasoning
 
AI for tackling climate change
AI for tackling climate changeAI for tackling climate change
AI for tackling climate change
 
AI for drug discovery
AI for drug discoveryAI for drug discovery
AI for drug discovery
 
Deep learning and applications in non-cognitive domains I
Deep learning and applications in non-cognitive domains IDeep learning and applications in non-cognitive domains I
Deep learning and applications in non-cognitive domains I
 
Deep learning and applications in non-cognitive domains II
Deep learning and applications in non-cognitive domains IIDeep learning and applications in non-cognitive domains II
Deep learning and applications in non-cognitive domains II
 
Deep learning and applications in non-cognitive domains III
Deep learning and applications in non-cognitive domains IIIDeep learning and applications in non-cognitive domains III
Deep learning and applications in non-cognitive domains III
 
Deep learning for episodic interventional data
Deep learning for episodic interventional dataDeep learning for episodic interventional data
Deep learning for episodic interventional data
 
Deep learning for detecting anomalies and software vulnerabilities
Deep learning for detecting anomalies and software vulnerabilitiesDeep learning for detecting anomalies and software vulnerabilities
Deep learning for detecting anomalies and software vulnerabilities
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining I
 

Recently uploaded

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Artificial intelligence in the post-deep learning era

  • 1. AI in the post-deep learning era Prof Truyen Tran Head of AI, Health and Science Applied AI Institute (A2I2), Deakin University truyen.tran@deakin.edu.au 12/04/2024 1
  • 2. “[By 2023] … Emergence of the generally agreed upon "next big thing" in AI beyond deep learning.” Rodney Brooks, Jan 2018 4/12/2024 2
  • 3. “Deep learning is going to be able to do everything” (Geoff Hinton, Nov 2020) 12/04/2024 3
  • 4. 12/04/2024 4 The quest of modern AI: Learning a Turing machine A mechanical Turing machine Can we design a (neural) program that learns to program from data?
  • 5. Three kinds of AI • Cognitive automation: encoding human abstractions → automate tasks normally performed by humans. • Majority of current machine learning & symbolic AI fall into this category. • Cognitive assistance: AI helps us make sense of the world (perceive, think, understand). • This is where the true potential of AI lies. • Some applications of ML fall into this category at present. • Cognitive autonomy: Artificial minds thrive independently of us, exist for their own sake. • Science fiction! François Chollet 12/04/2024 5
  • 6. 2012 2016 Turing Awards 2018 12 years snapshot Picture taken from Bommasani et al, 2021 2024 12/04/2024 6
  • 7. Agenda 12/04/2024 7 Part II: AI post- DL System 1 Towards System 2 AI as a science Part I: Deep learning Fundamentals Powers Limitations
  • 8. Depth refers to number of steps between input-output Integrate-and-fire neuron andreykurenkov.com Feature detector Block representation 12/04/2024 8
  • 9. Integrate-and-fire neuron andreykurenkov.com Inductive biases • Neuron as trainable feature detector • Depth + Skip-connection • Invariance/equivariance: • Convolution (Translation) • Recurrence (Time travel) • Attention (Permutation) • Analogy • Kernel, case-based reasoning, • Attention, memory Feature detector Source: http://karpathy.github.io/assets/rnn/diags.jpeg 12/04/2024 9
  • 10. Neural networks as new Electronic circuits • Computational graph → Circuit • Compositionality → Modular design • Neuron as feature detector → SENSOR, FILTER • Multiplicative gates → AND gate, Transistor, Resistor • Attention mechanism → SWITCH gate • Memory + forgetting → Capacitor + leakage • Skip-connection → Short circuit 12/04/2024 10
  • 11. What DL really means • Functional view: Nested function composition. Base functions are feature transformation. • Depth is number of transformation steps between raw data and output. • State view: Layered data abstraction, distributed representation. • Kernel view: Nested kernels, aka “glorified template matching”. • Programming view: Differentiable programming, dynamic modular composition, trainable computational graphs. • Memory view: An associative way to compress data/world model into weights, and decompress data when prompted. 12/04/2024 11
  • 12. Advances in the past 10 years • Architectures – CNN/RNN family, attention/Transformers, memory/differentiable programming, native data structures (sequence, tree, grid, graph, set), skip-connection, hypernetwork/fast weight. • Training techniques (Param initialization, Adam, RMSProp, BERT, self-supervised learning, contrastive learning). • Robustness (Dropout, normalization). • Large models/compute (GPT-X, etc). • Deep generative models (VAE, GAN, Normalizing flows, Diffusion). • New theoretical understanding (overparameterization, role of depth, nature of gradient learning). • Hardware to support parallelization (GPU, TPU). 12/04/2024 12
  • 13. Picture taken from (Bommasani et al, 2021) A tipping point: Foundation models • A foundation model is a model trained at broad scale that can adapted to a wide range of downstream tasks • Scale and the ability to perform tasks beyond training Slide credit: Samuel Albanie, 2022 13 12/04/2024
  • 14. Key concepts that make DL work • Distributed representation • Associative learning • Layers + backprop → DL picks up contextual information easily, as long as there are signals (numerical or textual). → DL mimics training signals. At extreme, it will be indistinguisable from human’s expression. →Cross-modal association isn’t hard. Symbol grounding “appears” to be solved (it isn’t). →DL scales arbitrarily with data and compute (really key for modern AI) 12/04/2024 14
  • 15. DL works on almost all modalities SIGNALS STRINGS TABLES 12/04/2024 15
  • 16. “Software 2.0 is written in neural network weights” Andrej Karpathy, Nov 2017 4/12/2024 16
  • 17. Why DL is so powerful? Practical • Generality: Applicable to many domains. • Competitive: DL is hard to beat as long as there are data to train. • Scalability: DL is better with more data, and it is very scalable. Theoretical Expressiveness: Neural nets can approximate any function. Learnability: Neural nets are trained easily. Generalisability: Neural nets generalize surprisingly well to unseen data. 12/04/2024 17
  • 18. Why is deep generative models (DGMs) so powerful? DGMs are compression engine Prompting is conditioning for the (preference- guided) decompression. DGMs are approximate program database Prompting is retrieving an approximate program that takes input and delivers output. DGMs are World Model We can live entirely in simulation! 12/04/2024 18
  • 19. The power comes from arbitrary scaling - Rich Sutton’s Bitter Lesson (2019) 12/04/2024 19 “The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin. ” http://www.incompleteideas.net/IncIdeas/BitterLesson.html “The two methods that seem to scale arbitrarily in this way are search and learning.”
  • 20. DL can learn everything … as long as we have the right architecture and clean data … as long as we have the right architecture and clean data 12/04/2024 20
  • 21. What are limitations of deep learning? • Modern neural networks are massive curve fitting • Good at interpolating →Data hungry to cover all variations and smooth local manifolds →Very sample/energy inefficient, low rate of data- knowledge conversion. → Little systematic generalization (novel combinations) • Inference separated from learning →No built-in adaptation other than retraining →Catastrophic forgetting 12/04/2024 • Lack of human-perceived reasoning capability • Lack of logical inference • Lack of natural mechanism to incorporate prior knowledge, e.g., common sense • No built-in causal mechanisms • Limited theoretical understanding. 12/04/2024 21
  • 22. Are these limitations inherent? • YES, statistical systems tend to memorize data and find short-cuts. • We need lots of data to cover all possible variations, hence lots of compute. • But aren’t we great copiers? • NO, neural nets were founded on the basis of distributed representation and parallel processing. These are robust, fast and energy efficient. • We still need to find “binding” tricks that do all sorts of things without relying on statistical training signals + backprop. 12/04/2024 22
  • 23. Agenda 12/04/2024 23 Part II: AI post- DL System 1 Towards System 2 AI as a science Part I: Deep learning Fundamentals Powers Limitations
  • 24. Dual system: A possible architecture System 1: Intuitive System 1: Intuitive System 1: Intuitive • Fast • Implicit/automatic • Pattern recognition • Multiple System 2: Analytical • Slow • Deliberate/rational • Careful analysis • Single, sequential Single Image credit: VectorStock | Wikimedia Perception Theory of mind Recursive reasoning Facts Semantics Events and relations Working space Memory 12/04/2024 24
  • 25. Continuation of System 1 • DL has been heavily invested by industry • → They need to reap the benefits for the years to come, both hardware and software sides. • Enabling techs: Data, compute, networking • → Scaling up (bigger) & scaling out (mixture) • → One model for all • DL fundamentals: Representation, learning & inference • Rep = data rep + computational graph + symmetry • Learning as pre-training to extract as much knowledge from data as possible • Learning as on-the-fly inference (Bayesian, hypernetwork/fast weight) • Extreme inference = dynamic computational graph on-the-fly. 12/04/2024 25
  • 26. DeepMind: Scale (up) is enough! 12/04/2024 26
  • 27. But … • Scaling is like building a taller ladder to get to the Moon. • We need rocket and science of escape velocity. • Human brain is big (1e+14 synapses) but does exactly opposite – maximize entropy reduction using minimum energy (thinking of the most efficient heat engine). 12/04/2024 27
  • 28. One model for all – our early attempt • «(a) multi-label, (b) multi-view, (c) multi- view/multi-label and (d) multi-instance » • Columns are generic message passing scheme between entities 12/04/2024 28 Pham, Trang, Truyen Tran, and Svetha Venkatesh. "One size fits many: Column bundle for multi-x learning." arXiv preprint arXiv:1702.07021 (2017).
  • 29. 12/04/2024 29 convolution -- motif detection 3 sequencing time gaps/transfer phrase/admission 1 embedding 2 word vector medical record visits/admissions time gap ? prediction point output max-pooling prediction 4 5 record vector Our early attempt (2): Deepr Nguyen, Phuoc, Truyen Tran, Nilmini Wickramasinghe, and Svetha Venkatesh. Deepr: a convolutional net for medical records." IEEE journal of biomedical and health informatics 21, no. 1 (2016): 22-30. Concept: Stringify() – everything as a string
  • 30. One model for all – the case of Gato 12/04/2024 30
  • 31. Why one-model-for-all possible? • The world is regular: Rules, patterns, motifs, grammars, recurrence • World models are learnable from data! • Advances in ML: • Model flexibility • Powerful training and inference machines • Smart tricks • Human brain gives an examole • One brain, but capable of processing all modalities, doing plenty of tasks, and learning from different kind of training signals. • Thinking at high level is independent of input modalities and task-specific skills. 12/04/2024 31
  • 32. RL Team: Reward is enough 12/04/2024 32 Silver, David, Satinder Singh, Doina Precup, and Richard S. Sutton. "Reward is enough." Artificial Intelligence 299 (2021): 103535.
  • 33. Knowledge → Reward → Generation 12/04/2024 33 12/04/2024 33
  • 34. Agenda 12/04/2024 34 Part II: AI post- DL System 1 Towards System 2 AI as a science Part I: Deep learning Fundamentals Powers Limitations
  • 35. Machine reasoning Reasoning is concerned with arriving at a deduction about a new combination of circumstances. Reasoning is to deduce new knowledge from previously acquired knowledge in response to a query. 12/04/2024 35 Leslie Valiant Leon Bottou
  • 36. Hypotheses Reasoning as just- in-time program synthesis. It employs conditional computation. Reasoning is recursive, e.g., mental travel. 12/04/2024 36
  • 37. Neural reasoning: Two methods • Implicit chaining of predicates through recurrence: • Step-wise query-specific attention to relevant concepts & relations. • Iterative concept refinement & combination, e.g., through a working memory. • Answer is computed from the last memory state & question embedding. • Explicit program synthesis: • There is a set of modules, each performs an pre-defined operation. • Question is parse into a symbolic program. • The program is implemented as a computational graph constructed by chaining separate modules. • The program is executed to compute an answer. 12/04/2024 37
  • 38. Learning to reason: Reasoning as a skill • Reasoning as a prediction skill that can be learnt from data. • Question answering as zero-shot learning. • Neural network operations for learning to reason: • Attention & transformers. • Dynamic neural networks, conditional computation & differentiable programming. • Module networks • LLMs to generate program on the fly + feedbacks 12/04/2024 38 (Dan Roth; ACM Fellow; IJCAI John McCarthy Award)
  • 39. Example: LOGNet Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic Language Binding in Relational Visual Reasoning”, IJCAI’20. 12/04/2024 39
  • 40. Deliberative reasoning implies memory • Three steps: • Store data/representations into memory • Read query, process sequentially, consult/update memory • Output answer • But data memory isn’t enough: • No memory of controllers → Less modularity and compositionality when query is complex • No memory of relations → Much harder to chain predicates. • Still iterative refinement → Prone to curve fitting 12/04/2024 40 Source: rylanschaeffer.github.io
  • 41. Relational memories → Relations discovery 12/04/2024 41 12/04/2024 41
  • 42. Program memory → Program synthesis Le, Hung, Truyen Tran, and Svetha Venkatesh. "Neural Stored-program Memory." In International Conference on Learning Representations. 2019. Slide credit: Hung Le Neural stored-program memory (NSM) stores key (the address) and values (the weight) The weight is selected and loaded to the controller of NTM The stored NTM weights and the weight of the NUTM is learnt end-to-end by backpropagation 12/04/2024 42
  • 43. Application of memory: Reasoning over IoT Unified representation • Higher-order, dynamic relationships • Multiple sampling strategies Reasoning mechanism • Memory • Multi-step reasoning 12/04/2024 43 Unified multimodal representation [Events/features streams] [Sensory streams] Cross-channel deliberative reasoning Anomaly detection Events detection Prediction/forecast Generation [Downstream tasks] Dist. events assoc. Decoder [Memory] LLMs Large vision-language models Insights, dialog
  • 44. Symbolic processing is desirable in System 2 • Learning with less and zero-shot learning; • Generalization of the solutions to unseen tasks and unforeseen data distributions; • Explainability by construction; 12/04/2024 44 https://ibm.github.io/neuro-symbolic-ai/events/ns- workshop2023 Self-Aware Learning • Deeper learning for challenging tasks • Integrating continuous and symbolic representations • Diversified learning modalities Credit: Yolanda Gil, Bart Selman AI to Understand Human Intelligence • 5 years: AI systems could be designed to study psychological models of complex intelligent phenomena that are based on combinations of symbolic processing and artificial neural networks.
  • 45. Henry Kautz's taxonomy (2) • Symbolic[Neural]—is exemplified by AlphaGo, where symbolic techniques are used to call neural techniques. In this case, the symbolic approach is Monte Carlo tree search and the neural techniques learn how to evaluate game positions. 12/04/2024 45 Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore memorial lecture. AI Magazine, 43(1), pp.105-125. https://en.wikipedia.org/wiki/Neuro-symbolic_AI
  • 46. Henry Kautz's taxonomy (3) • Neural | Symbolic—uses a neural architecture to interpret perceptual data as symbols and relationships that are reasoned about symbolically. The Neural- Concept Learner is an example. 12/04/2024 46 Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore memorial lecture. AI Magazine, 43(1), pp.105-125. https://en.wikipedia.org/wiki/Neuro-symbolic_AI
  • 47. Henry Kautz's taxonomy (6) • Neural[Symbolic]—allows a neural model to directly call a symbolic reasoning engine, e.g., to perform an action or evaluate a state. An example would be ChatGPT using a plugin to query Wolfram Alpha. 12/04/2024 47 Kautz, H., 2022. The third AI summer: AAAI Robert S. Engelmore memorial lecture. AI Magazine, 43(1), pp.105-125. https://en.wikipedia.org/wiki/Neuro-symbolic_AI
  • 48. Symbols via Indirection 12/04/2024 48 Z = X + Y 3 1 2 Bind symbols with values Pointer in Computer Science https://www.linkedin.com/pulse/ unsolved-problems-ai-part-2-binding-problem-eberhard-schoeneburg/ Indirection binds two objects together and uses one to refer to the other. Slide credit: Kha Pham Every computer science problem can be solved with a higher level of indirection. Andrew Koenig, Butler Lampson, David J. Wheeler
  • 49. InLay: Indirection layer 12/04/2024 49 • Concrete data representation is viewed as a complete graph with weighted edges. • The indirection operator maps this graph to a symbolic graph with the same weight edges, however the vertices are fixed and trainable. • This symbolic graph is propagated and the updated node features are indirection representations Slide credit: Kha Pham
  • 50. Experiments on IQ datasets – RAVEN dataset 12/04/2024 50 An IQ problem in RAVEN [1] dataset Model Accuracy LSTM 30.1/39.2 Transformers 15.1/42.5 RelationNet 12.5/46.4 PrediNet 13.8/15.6 Average test accuracies (%) without/with InLay in different OOD testing scenarios on RAVEN [1] Zhang, Chi, et al. "Raven: A dataset for relational and analogical visual reasoning." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. • The original paper of RAVEN dataset proposes different OOD testing scenarios, in which models are trained on one configuration and tested on another (but related) configuration. Slide credit: Kha Pham
  • 51. Experiments on OOD image classification tasks 12/04/2024 51 Dog Dog? OOD image classification, in which test images are distorted. • When test images are injected with different kinds of distortions other than ones in training, deep neural networks may fail drastically in image classification tasks. [1] [1] Robert Geirhos, Carlos RM Temme, Jonas Rauber, Heiko H Schütt, Matthias Bethge, and Felix A Wichmann. Generalisation in humans and deep neural networks. Advances in neural information processing systems, 31, 2018. Dataset ViT accuracy SVHN 65.9/68.8 CIFAR10 38.2/43.1 CIFAR100 17.1/20.4 Average test accuracies (%) without/with InLay of Vision Transformers (ViT) on different types of distortions Slide credit: Kha Pham
  • 52. Embedding symbolic physics into ML Source: @zhaoshuai1989 12/04/2024 52
  • 53. Physics-informed neural networks Figure from talk by Perdikaris & Wang, 2020. 12/04/2024 53
  • 54. Case study: Covid-19 infections in VN 2021 • Classic model SIR: Close-form solutions hard to calculate • Parameters change over time due to intervention → Need more flexible framework. • Solution: Richards equation → Mixture of Gompertz curves • Task: 10-20 data points → Extrapolate 150 more. 12/04/2024 54
  • 55. Case of HCM City 0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 0 50 100 150 200 250 300 350 400 3/07/21 10/07/21 17/07/21 24/07/21 31/07/21 7/08/21 14/08/21 21/08/21 28/08/21 4/09/21 11/09/21 18/09/21 25/09/21 2/10/21 9/10/21 16/10/21 23/10/21 30/10/21 Ước lượng số ca tử vong do Covid-19, TP HCM Tử vong ghi nhận Tử vong ước lượng Tử vong tích lũy (thực tế) 20-21/8: Peak Total cases 16/10 11/8: Predicting date 12/04/2024 55
  • 56. Agenda 12/04/2024 56 Part II: AI post- DL System 1 Towards System 2 AI as a science Part I: Deep learning Fundamentals Powers Limitations
  • 57. DL pushes changes in practice of AI 12/04/2024 57 2000s Focus: Model Flow: Data → Feature → Model → Deploy Reception: Skeptical 2010s Focus: Data Flow: Data → Model → Deploy Reception: Accelerating 2020s Focus: Prompt Flow: Prompt → Deploy Reception: Responsible
  • 58. Newbehaviours Emergence •system behaviour is implicitly induced rather than explicitly constructed •cause of scientific excitement and anxiety of unanticipated consequences Homogenisation •consolidation of methodology for building machine learning system across many applications •provides strong leverage for many tasks, but also creates single points of failure Slide credit: Samuel Albanie, 2022 12/04/2024 58
  • 59. The shifting towards science Engineering Design man-made systems AI Discover emergent behaviours Science Discover laws in nature. 12/04/2024 59
  • 60. Example: Data → Prompt → Deploy Long Dang, Thao Le, Vuong Le, Tu Minh Phuong, Truyen Tran, SADL: An Effective In-Context Learning Method for Compositional Visual QA, 2023 12/04/2024 60
  • 61. Example: LLM agent for scientific discovery Request: Design a material that: - <Requirement 1> - <Requirement 2> - … User Crystal LLM Agent Designed Prompt - Task description - Tools description - Few-shot examples - … High-level tasks Search for template Generate from template Evaluate requirement 1 Evaluate requirement 2 Tools set Tool 1 Tool 2 Tool 3 Selected tools Tool 1 Tool2 Tool 3 Tool 3 Execution Reflect Correction Final answer 12/04/2024 61
  • 62. LLM social agents • Extended actions space: • APIs • RAG • Architectures with LLM and external memory • Long-term • Short-term/sensory • Working Memory LLM World Other Agent Other Agent Other Agent 12/04/2024 62 • Social Interactions • Working as a team in cooperative tasks • Effective Communications: • When/Who/What to communicate? • Via other’s actions and messages • What is others’ knowledge or belief? • Should others’ knowledge be corrected by communication?
  • 63. Agenda 12/04/2024 63 Part II: AI post- DL System 1 Towards System 2 AI as a science Part I: Deep learning Fundamentals Powers Limitations
  • 64. Conclusion • DL reached its peak in 2022 with ChatGPT. This changed the AI practice dramatically. • Deep neural networks are here to stay, may be as a part of the holistic solution to human-level AI. • Gradient-based learning is still without parallel. • DL will be much more general/universal/versatile • Higher cognitive capabilities will be there, may be with symbol manipulation capacity. • Better generalization capability (e.g., extreme) • We have to deal with consequences of its own success. • The industry will need to keep the highly trained (overfitted) DL workforce busy! 12/04/2024 64