Deep analytics via learning to reason

via Learning to reason
A/Prof Truyen Tran
Deep analytics

Solves
Inspires
AI/ML
Memory
Learning
Reasoning
Computer
vision
Human-AI
Teaming
Optimisation
Industry
Health
Software
Drug
discovery
Materials
science Manufact
uring
Business
processes
Energy
What we do @A2I2

Agenda
• Where we are
• Learning to reason
• Applications
7/12/2023 3

7/12/2023 6
convolution --
motif detection
3
sequencing
time gaps/transfer
phrase/admission
1
embedding
2
word
vector
medical record
visits/admissions
time gap
?
prediction point output
max-pooling
prediction
4
5
record
vector
Verbalize() – everything as a sentence
Nguyen, Phuoc, Truyen Tran,
Nilmini Wickramasinghe, and
Svetha Venkatesh. Deepr: a
convolutional net for medical
records." IEEE journal of
biomedical and health
informatics 21, no. 1 (2016): 22-30.

10 years on
…
• Things have changed
• Better supply of data scientists,
esp. fuelled by deep learning
revolution
• Much better understanding of
values, processes, and toolsets
(MLOps, AutoML)
• We still need
• Data-driven cultures
7/12/2023 7
https://hbr.org/2022/07/is-data-scientist-still-the-sexiest-job-of-the-21st-century

2022:
Generalist
AI
Source: DeepMind
12/7/2023 8

7/12/2023 9
Cartoonist Zach Weinersmith, Science: Abridged Beyond the Point of Usefulness, 2017
Deep learning
[…] Now DL works on
everything, except for
small data, shifted data,
noisy data, artificially
twisted data, deep stuffs,
exact stuffs, abstract
stuffs, causal stuffs,
symbolic stuffs, thinking
stuffs, and stuffs that no
one knows how they work
like consciousness.

Rodney Brooks’
prediction (1/2018)
[In 2023, there is an] emergence of the
generally agreed upon "next big thing" in AI
beyond deep learning.
As of 11/2022, there is no news yet!
http://rodneybrooks.com/my-dated-predictions/

Agenda
• Where we are
• Applications
7/12/2023 11

What is missing in deep
learning?
• Modern neural networks are good at interpolating
→ Data hungry to cover all variations and smooth
local manifolds
→ Little systematic generalization (novel
combinations)
• Lack of human-perceived reasoning capability
• Lack of logical inference
• Lack of natural mechanism to incorporate prior
knowledge, e.g., common sense
• No built-in causal mechanisms
7/12/2023 12

What is missing:
System 2 capability
• Decoupling from perception/representation
(which deep learning does well)
• Holds hypothetical thought
• Enabling mental travels & imagination.
• Slow. Deliberative. Conscious.
• Needs working memory.
• Size is not essential. Its attentional control
is.
7/12/2023 13

Machine reasoning
Reasoning is concerned with arriving at a
deduction about a new combination of
circumstances.
Reasoning is to deduce new knowledge from
previously acquired knowledge in response to
a query.
7/12/2023 14
Leslie Valiant
Leon Bottou

(Classical AI)
Can we induce from data a
deduction mechanism?
(Bayesian AI)
Can we infer an inference model
from data?
(Connectionist AI)
Can we design a neural network
that learns to reason from (little)
demonstration data?
7/12/2023 15
Image credit: MV times

Hypotheses
Reasoning as just-
in-time program
synthesis.
It employs
conditional
computation.
Reasoning is
recursive, e.g.,
mental travel.
7/12/2023 16

Figure credit: Jonathan Hui
1980s-2000s: Probabilistic reasoning with Belief
Propagation (BP)
• BP iteratively computes “beliefs” of
unobserved variables based on evidences
from observed variables.
• Known result in 2001-2003: BP minimises
Bethe free-energy minimization.
• → Reasoning as belief refinement
• But requires full specification of
potential functions and network
structure!
7/12/2023 17
Heskes, Tom. "Stable fixed points of loopy belief propagation are local minima of the bethe free
energy." Advances in neural information processing systems. 2003.

Why (not) neural reasoning?
Reasoning is not necessarily achieved by
making logical inferences
There is a continuity between [algebraically
rich inference] and [connecting together
trainable learning systems]
Central to reasoning is composition rules to
guide the combinations of modules to address
new tasks
→ Neural networks are a plausible candidate!
7/12/2023 18
“When we observe a visual scene,
when we hear a complex sentence, we
are able to explain in formal terms the
relation of the objects in the scene, or
the precise meaning of the sentence
components. However, there is no
evidence that such a formal analysis
necessarily takes place: we see a
scene, we hear a sentence, and we just
know what they mean. This suggests
the existence of a middle layer,
already a form of reasoning, but not
yet formal or logical.”
Bottou, Léon. "From machine learning to machine
reasoning." Machine learning 94.2 (2014): 133-149.
Leon Bottou

Two approaches to neural reasoning
• Implicit chaining of predicates through recurrence:
• Step-wise query-specific attention to relevant concepts & relations.
• Iterative concept refinement & combination, e.g., through a working
memory.
• Answer is computed from the last memory state & question embedding.
• Explicit program synthesis:
• There is a set of modules, each performs an pre-defined operation.
• Question is parse into a symbolic program.
• The program is implemented as a computational graph constructed by
chaining separate modules.
• The program is executed to compute an answer.
7/12/2023 19

Learning to reason
• Learning is to improve itself by experiencing ~
acquiring knowledge & skills
• Learning to reason is to improve the ability to
decide if a knowledge base entails a predicate.
• E.g., given a video f, determines if the person with the hat
turns before singing.
• Recall the hypotheses:
• Reasoning as just-in-time program synthesis.
• It employs conditional computation.
7/12/2023 20
Khardon, Roni, and Dan Roth. "Learning to reason." Journal of the ACM
(JACM) 44.5 (1997): 697-725.
(Dan Roth; ACM Fellow; IJCAI
John McCarthy Award)

Practical setting:
(query,database,answer) triplets
• Classification: Query = what is this? Database = data.
• Regression: Query = how much? Database = data.
• QA: Query = NLP question. Database = context/image/text.
• Multi-task learning: Query = task ID. Database = data.
• Zero-shot learning: Query = task description. Database = data.
• Drug-protein binding: Query = drug. Database = protein.
• Recommender system: Query = User (or item). Database = inventories (or user
base);
7/12/2023 21

“[…] one cannot hope to
understand reasoning without
understanding the memory
processes […]”
(Thompson and Feeney, 2014)
7/12/2023 22
Learning to remember

Memory is essential to computation and
intelligence
• Memory is the ability to
store, retain and recall
information
• Brain memory stores items,
events and high-level
structures
• Computer memory stores
data and temporary variables
• Reasoning capability is
correlated with memory
capacity
23
Slide credit: Hung Le Heit, Evan, and Brett K. Hayes. "Predicting reasoning from memory." Journal of Experimental Psychology: General 140, no. 1 (2011): 76.

Memory taxonomy based on memory content
24
• Objects, events, items,
variables, entities
Item
Memory
• Relationships, structures,
graphs
Relational
Memory
• Programs, functions,
procedures, how-to knowledge
Program
Memory
Slide credit: Hung Le

Trainable Turing machines
Turing machines are hypothetical devices with infinite memory that can
compute all possible computable programs
The quest: Learn a Turing
machine from data, then
use it to solve everything
else.
Image credit: arstechnica

Neural Turing machine (NTM)
(simulating a differentiable
Turing machine)
• A controller that takes input/output
and talks to an external memory
module.
• Memory has read/write operations.
• The main issue is where to write,
and how to update the memory
state.
• All operations are differentiable.
Source: rylanschaeffer.github.io

Failures of item-only memory for
reasoning
• Relational representation is NOT stored → Can’t reuse later
in the chain
• A single memory of items and relations → Can’t understand
how relational reasoning occurs
• The memory-memory relationship is coarse since it is
represented as either dot product, or weighted sum.
7/12/2023 27

Self-attentive associative memories (SAM)
Learning relations automatically over time
7/12/2023 28

Multi-step reasoning over graphs
7/12/2023 29

Reasoning on text
7/12/2023 30

NUTM: Learn to select program (neural weight)
via program attention
31
Le, Hung, Truyen Tran, and Svetha Venkatesh. "Neural Stored-program Memory."
In International Conference on Learning Representations. 2019.
Slide credit: Hung Le
Neural stored-program memory
(NSM) stores key (the address)
and values (the weight)
The weight is selected and
loaded to the controller of NTM
The stored NTM weights and the
weight of the NUTM is learnt
end-to-end by backpropagation

System 1:
Intuitive
System 1:
Intuitive
System 1:
Intuitive
• Fast
• Implicit/automatic
• Pattern recognition
• Multiple
System 2:
Analytical
• Slow
• Deliberate/rational
• Careful analysis
• Single, sequential
Single
Image credit: VectorStock | Wikimedia
Perception
Theory of mind
Recursive reasoning
Facts
Semantics
Events and relations
Working space
Memory
A next generation AI architecture @A2I2

Agenda
• Where we are
• Applications
7/12/2023 33

Electronic medical records
Three interacting
processes:
Disease progression
Interventions & care
processes
Recording rules
7/12/2023 34
Source: medicalbillingcodings.org

Hypothesis:
Healthcare is
Turing
computational
• Healthcare processes as
executable computer
program obeying hidden
“grammars”
• The “grammars” are
learnable through
observational data
7/12/2023 35
https://www.pinterest.com.au/pin/14073817561987932/
http://workingmodeloftheworld.com/Turing-Machine

Towards a differentiable Turing
machine for health
7/12/2023 36
#REF: Hung Le, Truyen Tran, and Svetha Venkatesh. “Dual Control Memory Augmented Neural Networks
for Treatment Recommendations”, PAKDD18.
Illness memory
Future trajectory
Past trajectory

Results: MIMIC-III data
7/12/2023 37

Noisy, dynamic data: Problem
38
Unified multimodal representation
[Events/features streams] [Sensory streams]
Cross-channel deliberative
reasoning
Anomaly detection
Events detection
Prediction/forecast
Generation
[Downstream tasks]
Dist. events assoc.
Decoder
[Memory]
?

Noisy, dynamic data: System
39
Unified multimodal representation
[Events/features streams] [Sensory streams]
Cross-channel deliberative
reasoning
Anomaly detection
Events detection
Prediction/forecast
Generation
[Downstream tasks]
Dist. events assoc.
Decoder
[Memory]

41
Reasoning
Qualitative spatial
reasoning
Relational, temporal
inference
Commonsense
Object recognition
Scene graphs
Computer Vision
Natural
Language
Processing
Machine
learning
Testbed: Visual QA
Parsing
Symbol binding
Systematic generalisation
Learning to classify
entailment
Unsupervised
learning
Reinforcement
learning
Program synthesis
Action graphs
Event detection
Object
discovery

42
Language-binding Object Graph Network for VQA
Thao Minh Le, Vuong Le,
Svetha Venkatesh, and
Truyen Tran, “Dynamic
Language Binding in
Relational Visual
Reasoning”, IJCAI’20.

A general-purpose neural reasoning unit for spatio-temporal
objects (OSTR)

OSTR in action – Video QA
Dang, Long Hoang, et al. "Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question
Answering." IJCAI (2021).

Video Dialog:
Conversation Objects
in Space-Time
12/7/2023 46
• Data = video + dialog + question
• Training signal = answer
• Model = recurrent reasoning over
space-time & dialog turn
Hoang-Anh Pham, Thao Le, Vuong Le, Tu Minh Phuong, Truyen Tran. "Unsupervised Attention Priors for Visual Question Answering." ECCV’22

Drug-target binding as QA
• Context: Binding targets (e.g.,
RNA/protein sequence, or 3D
structures), as a set, sequence, or
graph.
• Query: Drug (e.g., SMILES string, or
molecular graph)
• Answer: Affinity, binding sites,
modulating effects
7/12/2023 47
Nguyen, Tri Minh, Thin Nguyen, Thao Minh Le, and Truyen
Tran. "Gefa: early fusion approach in drug-target affinity
prediction." IEEE/ACM transactions on computational biology
and bioinformatics 19, no. 2 (2021): 718-728.

Chemical reaction prediction: Reinforcement learning +
relational reasoning
48
#REF: Do, K., Tran, T., & Venkatesh, S. (2019, July). Graph transformation policy network for
chemical reaction prediction. In Proceedings of the 25th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining (pp. 750-760). ACM.
Image credit: britannica
The method takes graph morphism as the core.
Reinforcement learning is employed to explore the graph
morphism dynamics.

The road ahead
Image source: pobble365

Achieving systematic generalization
• Refers to the ability to work robustly with new combinations with zero
probability in training data.
• E.g., if we understand ‘John loves Mary’, then we can also understand ‘Mary loves
John’, but machine may fail due to zero probability of the latter if not done
properly.
• Current DL has a major problem with it.
• This is not new: Has been argued for 30+ years!
• Much research is needed on multiple fronts (e.g., syntax, indirection,
datasets, measuring)
7/12/2023 50
Bahdanau, Dzmitry, et al. "Systematic generalization: what is required and can it be learned?." arXiv preprint arXiv:1811.12889 (2018).
Fodor, Jerry A., and Zenon W. Pylyshyn. "Connectionism and cognitive architecture: A critical analysis." Cognition 28.1-2 (1988): 3-71.

In search for basic neural operators for reasoning
• Basics:
• Neuron as feature detector → Sensor, filter
• Computational graph → Circuit
• Skip-connection → Short circuit
• Essentials
• Multiplicative gates → AND gate, Transistor,
Resistor
• Attention mechanism → SWITCH gate
• Memory + forgetting → Capacitor + leakage
• Compositionality → Modular design
• ..
7/12/2023 51
Photo credit: Nicola Asuni

Searching for reasoning prior: Attention
52
Thao Le, Vuong Le, Sunil Gupta, Svetha Venkatesh, Truyen Tran. "Unsupervised
Attention Priors for Visual Question Answering." WACV’23.

Attention priors
53
Thao Le, Vuong Le, Sunil Gupta, Svetha Venkatesh, Truyen Tran. "Unsupervised
Attention Priors for Visual Question Answering." WACV’23

54
Analogy operator: Structure-Mapping Theory
Systematicity Principle: A predicate that belongs to a mappable
system of mutually interconnecting relationships is more likely to
be imported into the target than is an isolated predicate.
Solar system
Distance
Attractive
force
Revolves
around
Color Temperature
Hydrogen atom
Distance
Attractive
force
Revolves
around
Color Temperature

55
Indirection Layer (InLay)
• Concrete data representation is viewed as a complete graph with
weighted edges.
• The indirection operator maps this graph to a symbolic graph with
the same weight edges, however the vertices are fixed and
trainable.
• This symbolic graph is propagated and the updated node features
are indirection representations

7/12/2023 56
Thank you Truyen Tran
@truyenoz
truyentran.github.io
truyen.tran@deakin.edu.au
letdataspeak.blogspot.com
goo.gl/3jJ1O0
linkedin.com/in/truyen-tran
We’re hiring
PhD & Postdocs
truyen.tran@deakin.edu.au
https://truyentran.github.io/scholarship.html
Tutorial
neuralreasoning.github.io

The reasoning team @
7/12/2023 57
A/Prof Truyen Tran Dr Thao Le Dr Hung Le
Mr Long Dang Mr Hoang-Anh Pham
Mr Kha Pham
Dr Dung Nguyen

Deep analytics via learning to reason

More Related Content

Similar to Deep analytics via learning to reason

More from Deakin University

Recently uploaded

Deep analytics via learning to reason