Deep Learning 2.0
A research program
20/08/2021 1
Presented at FAIC, 08/2021
A/Prof Truyen Tran
Deakin University
@truyenoz
truyentran.github.io
truyen.tran@deakin.edu.au
letdataspeak.blogspot.com
goo.gl/3jJ1O0
>Understanding
>Accuracy
>Individualisation
>Human-machine teaming
Infer
Discover
Invent
Photo credit: discovermagazine.com
Making positive impact in AI
2012
2016
AusDM 2016
Turing Awards 2018
GPT-3 2020
DL: 8 years snapshot
DL has been fantastic, but …
• It is great at interpolating
•  data hungry to cover all variations and smooth local manifolds
•  little systematic generalization (novel combinations)
• Lack of human-perceived reasoning capability
• Lack natural mechanism to incorporate prior knowledge, e.g., common sense
• No built-in causal mechanisms
•  Have trust issues!
• To be fair, may of these problems are common in statistical learning!
20/08/2021 4
The next AI challenge
2020s-2030s
 Learning + reasoning, general
purpose, human-like
 Has contextual and common-
sense reasoning
 Requires less data
 Adapt to change
 Explainable
Photo credit: DARPA
Deep
Learning
2.0
Memory
Learning
Reasoning
Perception
Data-rich
domains
Health
Software
Drug &
Life
sciences
Materials
science
Solves
Inspires
DL 2.0 architecture
System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
Single
Image credit: VectorStock | Wikimedia
Perception
Theory of mind
Recursive reasoning
Facts
Semantics
Events and relations
Working space
Memory
Core AI faculty:
Memory
Image credit: Iconscout
Trainable Turing machines
Turing machines are hypothetical devices with infinite memory that can compute
all possible computable programs
The quest: Learn a Turing
machine from data, then
use it to solve everything
else.
Image credit: arstechnica
Neural Turing machine (NTM)
(simulating a differentiable Turing machine)
• A controller that takes
input/output and talks to an
external memory module.
• Memory has read/write
operations.
• The main issue is where to
write, and how to update the
memory state.
• All operations are
differentiable.
Source: rylanschaeffer.github.io
Failures of item-only memory for reasoning
• Relational representation is NOT stored  Can’t reuse later
in the chain
• A single memory of items and relations  Can’t understand
how relational reasoning occurs
• The memory-memory relationship is coarse since it is
represented as either dot product, or weighted sum.
20/08/2021 11
Self-attentive associative memories (SAM)
Learning relations automatically over time
20/08/2021 12
Hung Le, Truyen Tran, Svetha Venkatesh, “Self-attentive associative
memory”, ICML'20.
NUTM = NTM + NSM
Hung Le, Truyen Tran, Svetha Venkatesh, “Neural
stored-program memory”, ICLR'20.
Computing devices vs neural counterparts
• FSM (1943) ↔ RNNs (1982)
• PDA (1954) ↔ Stack RNN (1993)
• TM (1936) ↔ NTM (2014)
• UTM/VNA (1936/1945) ↔ NUTM (2019)
Core AI faculty:
Reasoning
Why neural reasoning?
Reasoning is not necessarily
achieved by making logical
inferences
There is a continuity between
[algebraically rich inference] and
[connecting together trainable
learning systems]
Central to reasoning is composition
rules to guide the combinations of
modules to address new tasks
20/08/2021 16
“When we observe a visual scene, when we
hear a complex sentence, we are able to
explain in formal terms the relation of the
objects in the scene, or the precise meaning
of the sentence components. However, there
is no evidence that such a formal analysis
necessarily takes place: we see a scene, we
hear a sentence, and we just know what they
mean. This suggests the existence of a
middle layer, already a form of reasoning, but
not yet formal or logical.”
Bottou, Léon. "From machine learning to machine
reasoning." Machine learning 94.2 (2014): 133-149.
Learning to reason
• Learning is to improve itself by experiencing ~
acquiring knowledge & skills
• Reasoning is to deduce knowledge from
previously acquired knowledge in response to a
query (or a cues)
• Learning to reason is to improve the ability to
decide if a knowledge base entails a predicate.
• E.g., given a video f, determines if the person with the hat turns
before singing.
• Hypotheses:
• Reasoning as just-in-time program synthesis.
• It employs conditional computation.
20/08/2021 17
Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM
(JACM) 44.5 (1997): 697-725.
(Dan Roth; ACM Fellow; IJCAI
John McCarthy Award)
Practical setting: (query,database,answer) triplets
• This is very general:
• Classification: Query = what is this? Database = data.
• Regression: Query = how much? Database = data.
• QA: Query = NLP question. Database = context/image/text.
• Multi-task learning: Query = task ID. Database = data.
• Zero-shot learning: Query = task description. Database = data.
• Drug-protein binding: Query = drug. Database = protein.
• Recommender system: Query = User (or item). Database =
inventories (or user base);
20/08/2021 18
Visual reasoning
20/08/2021 19
20
Reasoning
Qualitative spatial
reasoning
Relational, temporal
inference
Commonsense
Object recognition
Scene graphs
Computer Vision
Natural Language
Processing
Machine
learning
Testbed: Visual QA
Parsing
Symbol binding
Systematic generalisation
Learning to classify
entailment
Unsupervised
learning
Reinforcement
learning
Program synthesis
Action graphs
Event detection
Object
discovery
21
Language-binding Object Graph Network for VQA
Thao Minh Le, Vuong Le,
Svetha Venkatesh, and
Truyen Tran, “Dynamic
Language Binding in
Relational Visual
Reasoning”, IJCAI’20.
Searching for reasoning prior: Attention
22
Attention priors
23
Visual question answering in action
Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic Language Binding in Relational Visual Reasoning”, IJCAI’20.
A general purpose neural reasoning unit for spatio-temporal
objects (OSTR)
OSTR in action – Video QA
Dang, Long Hoang, et al. "Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question
Answering." IJCAI (2021).
Source: religious studies project
Core AI faculty:
Theory of mind
Contextualized recursive reasoning
• Thus far, QA tasks are straightforward and objective:
• Questioner: I will ask about what I don’t know.
• Answerer: I will answer what I know.
• Real life can be tricky, more subjective:
• Questioner: I will ask only questions I think they can
answer.
• Answerer 1: This is what I think they want from an answer.
• Answerer 2: I will answer only what I think they think I can.
20/08/2021 28
 We need Theory of Mind to function socially.
Social dilemma: Stag Hunt games
• Difficult decision: individual outcomes (selfish)
or group outcomes (cooperative).
• Together hunt Stag (both are cooperative): Both have more
meat.
• Solely hunt Hare (both are selfish): Both have less meat.
• One hunts Stag (cooperative), other hunts Hare (selfish): Only
one hunts hare has meat.
• Human evidence: Self-interested but
considerate of others (cultures vary).
• Idea: Belief-based guilt-aversion
• One experiences loss if it lets other down.
• Necessitates Theory of Mind: reasoning about other’s mind.
Theory of Mind Agent with Guilt Aversion (ToMAGA)
Update Theory of Mind
• Predict whether other’s behaviour are
cooperative or uncooperative
• Updated the zero-order belief (what
other will do)
• Update the first-order belief (what other
think about me)
Guilt Aversion
• Compute the expected material reward
of other based on Theory of Mind
• Compute the psychological rewards, i.e.
“feeling guilty”
• Reward shaping: subtract the expected
loss of the other.
Nguyen, Dung, et al. "Theory of Mind with Guilt Aversion Facilitates
Cooperative Reinforcement Learning." Asian Conference on Machine
Learning. PMLR, 2020.
ToM
architecture
• Observer maintains
memory of previous
episodes of the agent.
• It theorizes the “traits” of
the agent.
• Given the current
episode, the observer
tries to infer goal,
intention, action, etc of
the agent.
20/08/2021 31
Making impact:
Accelerating drug
discovery, life
science, and
materials science
Image credit: Nature
Drug repurposing using
relational reasoning over
drug graphs
33
#REF: Do, Kien, et al. "Attentional Multilabel Learning over Graphs-A message passing approach." Machine
Learning, 2019.
We invented a scalable method to
check if an existing drug binds to
any set of targets of interest.
Our method takes target
relationships into account.
Relational reasoning for drug-
protein binding
We designed a model
for detailed interaction
between drug and
protein residues.
The architecture is a
new graph-in-graph.
This results in more
accurate and precise
prediction of binding
site and strength.
Nguyen, T. M., Nguyen, T., Le, T. M., & Tran, T. (2021). “GEFA: Early Fusion Approach in Drug-Target
Affinity Prediction”. IEEE/ACM Transactions on Computational Biology and Bioinformatics
More flexible drug-
disease response
with Relational
Dynamic Memory
20/08/2021 35
Controller
Memory
Graph
Query Output
Read Write
Predicting in silico molecular interaction using relational
multi-memory system
36
𝑴𝑴1 … 𝑴𝑴𝐶𝐶
𝒓𝒓𝑡𝑡
1
…
𝒓𝒓𝑡𝑡
𝐾𝐾
𝒓𝒓𝑡𝑡
∗
Controller
Write
𝒉𝒉𝑡𝑡
Memor
y
Graph
Query Output
Read
heads
#REF: Pham, Trang, Truyen Tran, and Svetha Venkatesh. "Relational
dynamic memory networks." arXiv preprint arXiv:1808.04247(2018).
Image credit: khanacademy
Reinforcement learning + relational reasoning for
chemical reaction prediction
37
#REF: Do, K., Tran, T., & Venkatesh, S. (2019, July). Graph transformation policy network for
chemical reaction prediction. In Proceedings of the 25th ACM SIGKDD International Conference
on Knowledge Discovery & Data Mining (pp. 750-760). ACM.
The method takes graph
morphism as the core.
Reinforcement learning is
employed to explore the graph
morphism dynamics.
.
Image credit: britannica
The road ahead
Image source: pobble365
Yet to be solved …
• Common-sense reasoning
• Reasoning as program synthesis with
callable, reusable modules
• Systematicity, aka systematic
generalization
• Knowledge-driven VQA, knowledge as
semantic memory
• Fluent visual dialog
• Higher-order thought (e.g., self-
awareness and consciousness)
• A better prior for reasoning
20/08/2021 39
Towards a dual tri-process theory
• Stanovich, K. E. (2009). Distinguishing the
reflective, algorithmic, and autonomous minds: Is
it time for a tri-process theory. In two minds: Dual
processes and beyond, 55-88.
20/08/2021 40
Photo credit: mumsgrapevine
The reasoning team @
20/08/2021 41
A/Prof Truyen Tran Dr Vuong Le
Dr Thao Le
Dr Hung Le Mr Long Dang Mr Hoang-Anh Pham
Mr Kha Pham
20/08/2021 42
Thank you Truyen Tran
@truyenoz
truyentran.github.io
truyen.tran@deakin.edu.au
letdataspeak.blogspot.com
goo.gl/3jJ1O0
linkedin.com/in/truyen-tran

Deep Learning 2.0

  • 1.
    Deep Learning 2.0 Aresearch program 20/08/2021 1 Presented at FAIC, 08/2021 A/Prof Truyen Tran Deakin University @truyenoz truyentran.github.io truyen.tran@deakin.edu.au letdataspeak.blogspot.com goo.gl/3jJ1O0
  • 2.
  • 3.
    2012 2016 AusDM 2016 Turing Awards2018 GPT-3 2020 DL: 8 years snapshot
  • 4.
    DL has beenfantastic, but … • It is great at interpolating •  data hungry to cover all variations and smooth local manifolds •  little systematic generalization (novel combinations) • Lack of human-perceived reasoning capability • Lack natural mechanism to incorporate prior knowledge, e.g., common sense • No built-in causal mechanisms •  Have trust issues! • To be fair, may of these problems are common in statistical learning! 20/08/2021 4
  • 5.
    The next AIchallenge 2020s-2030s  Learning + reasoning, general purpose, human-like  Has contextual and common- sense reasoning  Requires less data  Adapt to change  Explainable Photo credit: DARPA
  • 6.
  • 7.
    DL 2.0 architecture System1: Intuitive System 1: Intuitive System 1: Intuitive • Fast • Implicit/automatic • Pattern recognition • Multiple System 2: Analytical • Slow • Deliberate/rational • Careful analysis • Single, sequential Single Image credit: VectorStock | Wikimedia Perception Theory of mind Recursive reasoning Facts Semantics Events and relations Working space Memory
  • 8.
  • 9.
    Trainable Turing machines Turingmachines are hypothetical devices with infinite memory that can compute all possible computable programs The quest: Learn a Turing machine from data, then use it to solve everything else. Image credit: arstechnica
  • 10.
    Neural Turing machine(NTM) (simulating a differentiable Turing machine) • A controller that takes input/output and talks to an external memory module. • Memory has read/write operations. • The main issue is where to write, and how to update the memory state. • All operations are differentiable. Source: rylanschaeffer.github.io
  • 11.
    Failures of item-onlymemory for reasoning • Relational representation is NOT stored  Can’t reuse later in the chain • A single memory of items and relations  Can’t understand how relational reasoning occurs • The memory-memory relationship is coarse since it is represented as either dot product, or weighted sum. 20/08/2021 11
  • 12.
    Self-attentive associative memories(SAM) Learning relations automatically over time 20/08/2021 12 Hung Le, Truyen Tran, Svetha Venkatesh, “Self-attentive associative memory”, ICML'20.
  • 13.
    NUTM = NTM+ NSM Hung Le, Truyen Tran, Svetha Venkatesh, “Neural stored-program memory”, ICLR'20.
  • 14.
    Computing devices vsneural counterparts • FSM (1943) ↔ RNNs (1982) • PDA (1954) ↔ Stack RNN (1993) • TM (1936) ↔ NTM (2014) • UTM/VNA (1936/1945) ↔ NUTM (2019)
  • 15.
  • 16.
    Why neural reasoning? Reasoningis not necessarily achieved by making logical inferences There is a continuity between [algebraically rich inference] and [connecting together trainable learning systems] Central to reasoning is composition rules to guide the combinations of modules to address new tasks 20/08/2021 16 “When we observe a visual scene, when we hear a complex sentence, we are able to explain in formal terms the relation of the objects in the scene, or the precise meaning of the sentence components. However, there is no evidence that such a formal analysis necessarily takes place: we see a scene, we hear a sentence, and we just know what they mean. This suggests the existence of a middle layer, already a form of reasoning, but not yet formal or logical.” Bottou, Léon. "From machine learning to machine reasoning." Machine learning 94.2 (2014): 133-149.
  • 17.
    Learning to reason •Learning is to improve itself by experiencing ~ acquiring knowledge & skills • Reasoning is to deduce knowledge from previously acquired knowledge in response to a query (or a cues) • Learning to reason is to improve the ability to decide if a knowledge base entails a predicate. • E.g., given a video f, determines if the person with the hat turns before singing. • Hypotheses: • Reasoning as just-in-time program synthesis. • It employs conditional computation. 20/08/2021 17 Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM (JACM) 44.5 (1997): 697-725. (Dan Roth; ACM Fellow; IJCAI John McCarthy Award)
  • 18.
    Practical setting: (query,database,answer)triplets • This is very general: • Classification: Query = what is this? Database = data. • Regression: Query = how much? Database = data. • QA: Query = NLP question. Database = context/image/text. • Multi-task learning: Query = task ID. Database = data. • Zero-shot learning: Query = task description. Database = data. • Drug-protein binding: Query = drug. Database = protein. • Recommender system: Query = User (or item). Database = inventories (or user base); 20/08/2021 18
  • 19.
  • 20.
    20 Reasoning Qualitative spatial reasoning Relational, temporal inference Commonsense Objectrecognition Scene graphs Computer Vision Natural Language Processing Machine learning Testbed: Visual QA Parsing Symbol binding Systematic generalisation Learning to classify entailment Unsupervised learning Reinforcement learning Program synthesis Action graphs Event detection Object discovery
  • 21.
    21 Language-binding Object GraphNetwork for VQA Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic Language Binding in Relational Visual Reasoning”, IJCAI’20.
  • 22.
    Searching for reasoningprior: Attention 22
  • 23.
  • 24.
    Visual question answeringin action Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic Language Binding in Relational Visual Reasoning”, IJCAI’20.
  • 25.
    A general purposeneural reasoning unit for spatio-temporal objects (OSTR)
  • 26.
    OSTR in action– Video QA Dang, Long Hoang, et al. "Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering." IJCAI (2021).
  • 27.
    Source: religious studiesproject Core AI faculty: Theory of mind
  • 28.
    Contextualized recursive reasoning •Thus far, QA tasks are straightforward and objective: • Questioner: I will ask about what I don’t know. • Answerer: I will answer what I know. • Real life can be tricky, more subjective: • Questioner: I will ask only questions I think they can answer. • Answerer 1: This is what I think they want from an answer. • Answerer 2: I will answer only what I think they think I can. 20/08/2021 28  We need Theory of Mind to function socially.
  • 29.
    Social dilemma: StagHunt games • Difficult decision: individual outcomes (selfish) or group outcomes (cooperative). • Together hunt Stag (both are cooperative): Both have more meat. • Solely hunt Hare (both are selfish): Both have less meat. • One hunts Stag (cooperative), other hunts Hare (selfish): Only one hunts hare has meat. • Human evidence: Self-interested but considerate of others (cultures vary). • Idea: Belief-based guilt-aversion • One experiences loss if it lets other down. • Necessitates Theory of Mind: reasoning about other’s mind.
  • 30.
    Theory of MindAgent with Guilt Aversion (ToMAGA) Update Theory of Mind • Predict whether other’s behaviour are cooperative or uncooperative • Updated the zero-order belief (what other will do) • Update the first-order belief (what other think about me) Guilt Aversion • Compute the expected material reward of other based on Theory of Mind • Compute the psychological rewards, i.e. “feeling guilty” • Reward shaping: subtract the expected loss of the other. Nguyen, Dung, et al. "Theory of Mind with Guilt Aversion Facilitates Cooperative Reinforcement Learning." Asian Conference on Machine Learning. PMLR, 2020.
  • 31.
    ToM architecture • Observer maintains memoryof previous episodes of the agent. • It theorizes the “traits” of the agent. • Given the current episode, the observer tries to infer goal, intention, action, etc of the agent. 20/08/2021 31
  • 32.
    Making impact: Accelerating drug discovery,life science, and materials science Image credit: Nature
  • 33.
    Drug repurposing using relationalreasoning over drug graphs 33 #REF: Do, Kien, et al. "Attentional Multilabel Learning over Graphs-A message passing approach." Machine Learning, 2019. We invented a scalable method to check if an existing drug binds to any set of targets of interest. Our method takes target relationships into account.
  • 34.
    Relational reasoning fordrug- protein binding We designed a model for detailed interaction between drug and protein residues. The architecture is a new graph-in-graph. This results in more accurate and precise prediction of binding site and strength. Nguyen, T. M., Nguyen, T., Le, T. M., & Tran, T. (2021). “GEFA: Early Fusion Approach in Drug-Target Affinity Prediction”. IEEE/ACM Transactions on Computational Biology and Bioinformatics
  • 35.
    More flexible drug- diseaseresponse with Relational Dynamic Memory 20/08/2021 35 Controller Memory Graph Query Output Read Write
  • 36.
    Predicting in silicomolecular interaction using relational multi-memory system 36 𝑴𝑴1 … 𝑴𝑴𝐶𝐶 𝒓𝒓𝑡𝑡 1 … 𝒓𝒓𝑡𝑡 𝐾𝐾 𝒓𝒓𝑡𝑡 ∗ Controller Write 𝒉𝒉𝑡𝑡 Memor y Graph Query Output Read heads #REF: Pham, Trang, Truyen Tran, and Svetha Venkatesh. "Relational dynamic memory networks." arXiv preprint arXiv:1808.04247(2018). Image credit: khanacademy
  • 37.
    Reinforcement learning +relational reasoning for chemical reaction prediction 37 #REF: Do, K., Tran, T., & Venkatesh, S. (2019, July). Graph transformation policy network for chemical reaction prediction. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 750-760). ACM. The method takes graph morphism as the core. Reinforcement learning is employed to explore the graph morphism dynamics. . Image credit: britannica
  • 38.
    The road ahead Imagesource: pobble365
  • 39.
    Yet to besolved … • Common-sense reasoning • Reasoning as program synthesis with callable, reusable modules • Systematicity, aka systematic generalization • Knowledge-driven VQA, knowledge as semantic memory • Fluent visual dialog • Higher-order thought (e.g., self- awareness and consciousness) • A better prior for reasoning 20/08/2021 39
  • 40.
    Towards a dualtri-process theory • Stanovich, K. E. (2009). Distinguishing the reflective, algorithmic, and autonomous minds: Is it time for a tri-process theory. In two minds: Dual processes and beyond, 55-88. 20/08/2021 40 Photo credit: mumsgrapevine
  • 41.
    The reasoning team@ 20/08/2021 41 A/Prof Truyen Tran Dr Vuong Le Dr Thao Le Dr Hung Le Mr Long Dang Mr Hoang-Anh Pham Mr Kha Pham
  • 42.
    20/08/2021 42 Thank youTruyen Tran @truyenoz truyentran.github.io truyen.tran@deakin.edu.au letdataspeak.blogspot.com goo.gl/3jJ1O0 linkedin.com/in/truyen-tran