Machine Reasoning at A2I2, Deakin University

24/06/2020 1
Geelong, April 2020
A/Prof Truyen Tran
Vuong Le, Thao Le, Hung Le,
Dung Nguyen, Tin Pham
Deakin University
@truyenoz
truyentran.github.io
truyen.tran@deakin.edu.au
letdataspeak.blogspot.com
goo.gl/3jJ1O0
Machine reasoning @A2I2
linkedin.com/in/truyen-tran

24/06/2020 2
Dual-process theories
The problem
Neural memories
Agenda
Social reasoning
The road ahead
Visual reasoning

24/06/2020 4
Among the most challenging scientific questions of our
time are the corresponding analytic and synthetic
problems:
• How does the brain function?
• Can we design a machine which will simulate a brain?
-- Automata Studies, 1956.

What does brain do?
Perceive the world
Conceptualize
Reason about things
Imagine the future
Plan what to do
Act
Learn
Think about self
Have emotion
Be conscious
Love
Be happy and pursue
happiness
Be ethical
Socialize
Which ones are
Turing
computational?

24/06/2020 6
#REF: A Cognitive Architecture Approach to Interactive Task Learning, John E. Laird, University of Michigan

1960s-1990s
 Hand-crafting rules,
domain-specific, logic-
based
 High in reasoning
 Can’t scale.
 Fail on unseen cases.
24/06/2020 7
2020s-2030s
 Learning + reasoning, general
purpose, human-like
 Has contextual and common-
sense reasoning
 Requires less data
 Adapt to change
 Explainable
1990s-present
 Machine learning, general
purpose, statistics-based
 Low in reasoning
 Needs lots of data
 Less adaptive
 Little explanation
Photo credit: DARPA

Many facets of reasoning
Analogical
Relational
Inductive
Deductive
Abductive
Judgemental
Causal
Legal
Scientific
Moral
Social
Visual
Lingual
Medical
Musical
Problem solving
Theorem proving
One-shot learning
Zero-shot learning
Counterfactual

Learning vs Reasoning
Learning is to improve itself by experiencing ~ acquiring
knowledge & skills
Reasoning is to deduce knowledge from previously acquired
knowledge in response to a query (or a cues)
Early theories of intelligence (a) focuses solely on reasoning, (b)
learning can be added separately and later! (Khardon & Roth,
1997).
Learning precedes reasoning or the two interacting?
Can reasoning be learnt? Are learning and reasoning indeed
different facets of the same mental/computational process?
24/06/2020 9
Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM
(JACM) 44.5 (1997): 697-725.
(Dan Roth; ACM
Fellow; IJCAI John
McCarthy Award)

Bottou’s vision
Is not necessarily achieved by making logical
inferences
Continuity between algebraically rich inference
and connecting together trainable learning
systems
Central to reasoning is composition rules to guide
the combinations of modules to address new
tasks
24/06/2020 10
“When we observe a visual scene, when we
hear a complex sentence, we are able to
explain in formal terms the relation of the
objects in the scene, or the precise meaning of
the sentence components. However, there is no
evidence that such a formal analysis
necessarily takes place: we see a scene, we
hear a sentence, and we just know what they
mean. This suggests the existence of a middle
layer, already a form of reasoning, but not
yet formal or logical.”
Bottou, Léon. "From machine learning to machine reasoning." Machine learning 94.2
(2014): 133-149.

Learning to reason, formal def
24/06/2020 11
Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM
(JACM) 44.5 (1997): 697-725.
E.g., given a video f, determines if the person with the
hat turns before singing.

12
What color is the thing with the same
size as the blue cylinder?
blue
• The network guessed the
most common color in the
image.
• Linguistic bias.
• Requires multi-step
reasoning: find blue cylinder
➔ locate other object of the
same size ➔ determine its
color (green).
A testbed: Visual QA

13Adapted from (Aditya et al., 2019)
Events,
Activities
Objects, Regions,
Trajectories
Scene Elements: Volumes, 3D
Surfaces
Image Elements: Regions, Edges,
Textures
Knowledge about
shapes
Qualitative spatial
reasoning
Relations between
objects
Commonsense knowledge
Relations between objects
Why Visual QA?

24/06/2020 14
The problem
Neural memories
Agenda
Social reasoning
The road ahead
Visual reasoning

Dual-process system with memories
15
System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
Facts
Semantics
Events and relational associations
Working space – temporal buffer
Single

24/06/2020 16
Evans, Jonathan St BT. "Dual-processing accounts of reasoning, judgment, and social
cognition." Annu. Rev. Psychol. 59 (2008): 255-278.

System 2
Holds hypothetical thought
Decoupling from representation
Working memory size is not essential. Its
attentional control is.
Computation selection?
24/06/2020 17

Better: A dual tri-process theory
Stanovich, K. E. (2009). Distinguishing the
reflective, algorithmic, and autonomous
minds: Is it time for a tri-process theory. In two
minds: Dual processes and beyond, 55-88.
24/06/2020 18
Photo credit: mumsgrapevine

24/06/2020 19
The problem
Neural memories
Agenda
Social reasoning
The road ahead
Visual reasoning

Visual recognition is a (solvable) special case
20
Image courtesy: https://dcist.com/
What is this?
a cat R-CNN,
YOLO,…
Object recognition Object detection
Where are objects, and what are they?

But Image QA is challenging
21
(CLERV, Johnson et al., 2017)
(Q) How many objects are either
small cylinders or metal things?
(Q) Are there an equal number of
large things and metal spheres?
(GQA, Hudson et al., 2019)
(Q) What is the brown animal sitting
inside of?
(Q) Is there a bag to the right of the
green door?

And Video QA is even more challenging
22(Data: TGIF-QA)

23
Reasoning
Computer Vision
Natural Language
Processing
Machine
learning
Visual Reasoning
Visual QA is a great
testbed

Reasoning over visual modality
• Context:
 Videos/Images: compositional data
 Hierarchy of visual information: objects, actions, attributes,
relations, commonsense knowledge.
 Relation of abstract objects: spatial, temporal, semantic aspects.
• Research questions:
 How to represent object/action relations in images and videos?
 How to reason with object/action relations?
24

A simple VQA framework that works surprisingly well
25
Visual
representation
Question
What is the brown
animal sitting inside of?
Textual
representation
Fusion
Predict
answer
Answer
box
CNN
Word embedding + LSTM

24/06/2020 26
A Prelim Dual-System Model
Le, Thao Minh, Vuong Le, Svetha Venkatesh, and Truyen Tran. "Neural
Reasoning, Fast and Slow, for Video Question Answering." IJCNN (2020).

Separate reasoning process from perception
27
 Video QA: inherent dynamic nature of visual content over time.
 Recent success in visual reasoning with multi-step inference and
handling of compositionality.
System 1: visual
representation
System 2: High-
level reasoning

24/06/2020 28
Separate reasoning process from perception (2)

System 1: Clip-based Relation Network
29
For 𝑘𝑘 = 2,3, … , 𝐾𝐾 where ℎ𝜙𝜙 and
𝑔𝑔𝜃𝜃 are linear transformations
with parameters 𝜙𝜙 and 𝜃𝜃,
respectively, for feature fusion.
Why temporal relations?
• Situate an event/action in relation to
events/actions in the past and
formulate hypotheses on future
events.
• Long-range sequential modeling.

System 2: MAC Net
24/06/2020 30
Hudson, Drew A., and Christopher D. Manning. "Compositional attention
networks for machine reasoning." ICLR 2018.

Results on SVQA dataset.
31
Model Exist Count
Integer
Comparison
Attribute Comparison Query
All
More Equal Less Color Size Type Dir Shape Color Size Type Dir Shape
SA(S) 51.7 36.3 72.7 54.8 58.6 52.2 53.6 52.7 53.0 52.3 29.0 54.0 55.7 38.1 46.3 43.1
TA-
GRU(T)
54.6 36.6 73.0 57.3 57.7 53.8 53.4 54.8 55.1 52.4 22.0 54.8 55.5 41.7 42.9 44.2
SA+TA 52.0 38.2 74.3 57.7 61.6 56.0 55.9 53.4 57.5 53.0 23.4 63.3 62.9 43.2 41.7 44.9
CRN+M
AC
72.8 56.7 84.5 71.7 75.9 70.5 76.2 90.7 75.9 57.2 76.1 92.8 91.0 87.4 85.4 75.8

Results on TGIF-QA dataset.
32
Model Action Trans. FrameQA Count
ST-TP 62.9 69.4 49.5 4.32
Co-Mem 68.2 74.3 51.5 4.10
PSAC 70.4 76.9 55.7 4.27
CRN+MAC 71.3 78.7 59.2 4.23

24/06/2020 33
A System 1 for Video
Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Hierarchical
conditional relation networks for video question answering”, CVPR'20.
System 1: visual
representation
System 2: High-level
reasoning

Conditional Relation Network Unit
34
Problems:
• Lack of a generic mechanism for
modeling the interaction of
multimodal inputs.
• Flaws in temporal relations of
breaking local motion in videos.
Inputs:
• An array of 𝑛𝑛 objects
• Conditioning feature
Output:
• An array of 𝑚𝑚 objects

Hierarchical
Conditional Relation
Networks
35

Results
36
Model Action Trans. FrameQA Count
ST-TP 62.9 69.4 49.5 4.32
Co-Mem 68.2 74.3 51.5 4.10
PSAC 70.4 76.9 55.7 4.27
HME 73.9 77.8 53.8 4.02
HCRN 75.0 81.4 55.9 3.82
TGIF-QA dataset MSVD-QA and MSRVTT-QA datasets.

24/06/2020 37
A System 2 for Relational Reasoning
Thao Minh Le, Vuong Le, Svetha Venkatesh, and Truyen Tran, “Dynamic
Language Binding in Relational Visual Reasoning”, IJCAI’20.
System 1: visual
representation
System 2: High-level
reasoning

Reasoning with structured representation of
spatial relation
38
• Key insight: Reasoning is chaining of relational predicates
to arrive at a final conclusion
→ Needs to uncover spatial relations, conditioned on query
→ Chaining is query-driven
→ Objects/language needs binding
→ Object semantics is query-dependent
→ Very thing is end-to-end differentiable

39
Language-binding Object Graph Network for VQA

40
Language-binding Object Graph Unit (LOG)

43
Comparison with the on CLEVR dataset of
different data fractions.
Performance comparison on
VQA v2 subset of long questions.
Model Action
XNM(Objects) 43.4
MAC(ResNet) 40.7
MAC(Objects) 45.5
HCRN(Objects) 46.8

24/06/2020 45
The problem
Neural memories
Social reasoning
The road ahead
Visual reasoning
Agenda

“[…] one cannot hope to understand
reasoning without understanding the memory
processes […]”
(Thompson and Feeney, 2014)
24/06/2020 46

Dual-process system with memories
47
System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
Facts
Semantics
Events and relational associations
Working space – temporal buffer
Single

From psychology & cognitive science
Reasoning and memory are studied separately, but …
 Analytic reasoning requires working memory
 Spatial reasoning requires place memory
 Inferential reasoning requires episodic memory
Dual-processes in memory:
 Fast retrieval/recognition (using similarity as criteria)
 Slow item recollection
Dual-processes in reasoning
 Induction as fast (using similarity as criteria)
 Deduction as slow

Reasoning as memory
Expert reasoning was enabled by a large long-term memory,
acquired through experience
Working memory for analytic reasoning
 WM capacity predicts IQ
 Higher order cognition = creating & manipulating relations  representation of
premises, temporarily stored in WM
 WM is a system to support information binding to a coordinate system
 Reasoning as deliberative hypothesis testing  memory-retrieval based hypothesis
generation
Knowledge structures support reasoning
Fuzzy-trace theory: gist ( intuition) and verbatim memory (
analytic processing)
Memory is critical for episodic future thinking (mental simulation)
24/06/2020 49

24/06/2020 50
#REF: A Cognitive Architecture Approach to Interactive Task Learning, John E. Laird, University of Michigan
Knowledge graphs
Planning, RL
NTM Attention, RL
RLUnsupervised
CNN
Episodic memory

24/06/2020 51
Learning a Turing machine
 Can we learn a (neural)
program that learns to
program from data?
Visual reasoning is a
specific program of two
inputs (visual, linguistic)

Neural Turing machine (NTM)
A memory-augmented neural network (MANN)
A controller that takes
input/output and talks to an
external memory module.
Memory has read/write
operations.
The main issue is where to write,
and how to update the memory
state.
All operations are differentiable.
Source: rylanschaeffer.github.io

MANN for reasoning
Three steps:
 Store data into memory
 Read query, process sequentially, consult memory
 Output answer
Behind the scene:
 Memory contains data & results of intermediate steps
 LOGNet does the same, memory consists of object representations
Drawbacks of current MANNs:
 No memory of controllers  Less modularity and compositionality when
query is complex
 No memory of relations  Much harder to chain predicates.
24/06/2020 53
Source: rylanschaeffer.github.io

24/06/2020 54
Universal Neural Turing Machines
Hung Le, Truyen Tran, Svetha Venkatesh, “Neural stored-program
memory”, ICLR'20.

Computing devices vs neural counterparts
FSM (1943) ↔ RNNs (1982)
PDA (1954) ↔ Stack RNN (1993)
TM (1936) ↔ NTM (2014)
The missing piece: A memory to store programs
Neural stored-program memory
UTM/VNA (1936/1945) ↔ NUTM--ours (2019)

Question answering (bAbI dataset)
Credit: hexahedria

Neural Stored-Program Architecture (NSA)
-- Work in progress
Flexible programs are essential for intelligence
Cognitive science: Meta-level process controls object-level process
Instance-specific program vs Task-specific program
In QA, program is query-specific
 On-demand program synthesis
Idea: A program to generate programs
Previous works: Fast weights, HyperNetworks, Neural Module
Networks, Adaptive Mixture of Experts
Current work: Programs are generated by combining basis sub-
programs
 The hope: Basis programs span the space of all possible programs
24/06/2020 58

NSA (cont.)
24/06/2020 59
The memory regularization

24/06/2020 60
Neural relational memory
Hung Le, Truyen Tran, Svetha Venkatesh, “Self-attentive associative
memory”, ICML'20.

Failures of current MANNs for reasoning
Relational representation is NOT stored  Can’t reuse later in the
chain
A single memory of items and relations  Can’t understand how
relational reasoning occurs
The memory-memory relationship is coarse since it is represented as
either dot product, or weighted sum.
24/06/2020 61

Self-attentive associative memories (SAM)
Learning relations automatically over time
24/06/2020 62

Multi-step reasoning over graphs
24/06/2020 63

Reasoning on text
24/06/2020 64

24/06/2020 65
The problem
Neural memories
Agenda
Social reasoning
The road ahead
Visual reasoning

Contextualized recursive reasoning
Thus far, QA tasks are straightforward and objective:
Questioner: I will ask about what I don’t know.
Answerer: I will answer what I know.
Real life can be tricky, more subjective:
Questioner: I will ask only questions I think they can answer.
Answerer 1: This is what I think they want from an answer.
Answerer 2: I will answer only what I think they think I can.
 We need Theory of Mind to function socially.
24/06/2020 66

24/06/2020 67
Theory of Mind is the ability to attribute mental
states -- beliefs, intents, desires, emotions,
knowledge, etc. - to oneself and to others.
Wikipedia, 09/04/2020
Source: religious studies project

24/06/2020 68
Theory of mind in guilt aversion MARL
Dung Nguyen, Truyen Tran, Svetha Venkatesh,
“Theory of Mind with Guilt Aversion Facilitates
Cooperative Reinforcement Learning”, ICLR’20
Workshop on Bridging AI and Cognitive Science.
Credit: Office of Public Affairs

Social dilemma: Stag Hunt games
Difficult decision: individual outcomes (selfish) or group
outcomes (cooperative).
 Together hunt Stag (both are cooperative): Both have more meat.
 Solely hunt Hare (both are selfish): Both have less meat.
 One hunts Stag (cooperative), other hunts Hare (selfish): Only one
hunts hare has meat.
Human evidence: Self-interested but considerate of others
(cultures vary).
Idea: Belief-based guilt-aversion
 One experiences loss if it lets other down.
 Necessitates Theory of Mind: reasoning about other’s mind.

Theory of Mind Agent with Guilt Aversion (ToMAGA)
Update Theory of Mind
 Predict whether other’s behaviour are
cooperative or uncooperative
 Updated the zero-order belief (what other
will do)
 Update the first-order belief (what other
think about me)
Guilt Aversion
 Compute the expected material reward of
other based on Theory of Mind
 Compute the psychological rewards, i.e.
“feeling guilty”
 Reward shaping: subtract the expected loss
of the other.

The benefit of ToM1 model in guilt aversion
● The color shows the probability of following cooperative behaviour after 500 interactions.
● The x-axis (y-axis) represents the probability that the first (second) player will follow cooperative
strategy at the beginning.
GA agents without
ToM
ToMAG
A

Deep RL ToMAGA in Grid-world Stag Hunt Games
Agents are initialised
nearby the Stag
→ Learning to hunt Stag
easier
Case 1
Agents are initialised
nearby the Hare
→ Learning to catch Hare
easier
Case 2

24/06/2020 74
The problem
Neural memories
Social reasoning
The road ahead
Visual reasoning
Agenda

Recall: Many facets of reasoning
Analogical
Relational
Inductive
Deductive
Abductive
Judgemental
Causal
Legal
Scientific
Moral
Social
Visual
Lingual
Medical
Musical
Problem solving
Theorem proving
One-shot learning
Zero-shot learning
Counterfactual

Yet to be solved …
Common-sense reasoning
Reasoning as program synthesis with callable,
reusable modules
Systematicity, aka systematic generalization
Knowledge-driven VQA, knowledge as semantic
memory
 Differentiable neural-symbolic systems for reasoning
Visual dialog
 Active question asking
Higher-order thought (e.g., self-awareness and
consciousness)
 Need a better prior for reasoning
24/06/2020 76

24/06/2020 77
Basic circuits
Better priors
Systematicity
Distal events
Multi-source
Graph reasoning
Meta-reasoning
Neural-symbolic
Clinical reasoning
Long-form video
Free-energy principle
Lifelong reasoning
Curriculum planning
Temporal reasoning
Social reasoning
Moral reasoning
Program synthesis
Memory
Recursion
Knowledge-driven
Emotion
Cognitive architecture
AGI
Visual Turing test

It is easy to get lost looking at the
current state of research…
24/06/2020 78
Vietnam News AAAI’20

Rich Sutton’s Bitter Lesson
24/06/2020 79
“The biggest lesson that can be read from 70 years of AI research is
that general methods that leverage computation are ultimately the
most effective, and by a large margin. ”
“The two methods that seem to
scale arbitrarily in this way
are search and learning.”
http://www.incompleteideas.net/IncIdeas/BitterLesson.html

In search for new basic operators
 Neuron as feature detector  SENSOR, FILTER
 Multiplicative gates  AND gate, Transistor,
Resistor
 Attention mechanism  SWITCH gate
 Memory + forgetting  Capacitor + leakage
 Skip-connection  Short circuit
 Computational graph  Circuit
 Compositionality  Modular design
 ..
 Reasoning  Programmable hardware (e.g.,
FPGA)?
24/06/2020 80

24/06/2020 81
Goertzel, B., Iklé, M., & Wigmore, J. (2012). The architecture of human-like general
intelligence. In Theoretical Foundations of Artificial General Intelligence (pp. 123-144).
Atlantis Press, Paris.l,’;[p/

AGI
24/06/2020 82
From: Alexey Potapov, "Cross-Paradigm AGI: How Cognitive, Deep, Probabilistic
and Universal AI can Contribute to Each Other“, AGI’17, Melbourne, Australia

24/06/2020 83
Thank you Truyen Tran
@truyenoz
truyentran.github.io
truyen.tran@deakin.edu.au
letdataspeak.blogspot.com
goo.gl/3jJ1O0
linkedin.com/in/truyen-tran
We’re recruiting for PhD.
Full scholarships available.
URL: truyentran.github.io/scholarship.html

Machine Reasoning at A2I2, Deakin University

More Related Content

What's hot

Similar to Machine Reasoning at A2I2, Deakin University

More from Deakin University

Recently uploaded

Machine Reasoning at A2I2, Deakin University