SlideShare a Scribd company logo
1 of 55
Download to read offline
16/11/2020 1
A/Prof Truyen Tran
With contribution from Vuong Le, Hung
Le, Thao Le, Tin Pham & Dung Nguyen
Deakin University
December 2020
Deep learning 1.0 and Beyond
A tutorial
Part I
@truyenoz
truyentran.github.io
truyen.tran@deakin.edu.au
letdataspeak.blogspot.com
goo.gl/3jJ1O0
linkedin.com/in/truyen-tran
2012
2016
AusDM 2016
Turing Awards 2018
GPT-3 2020
8 years snapshot
Why (still) DL?
Practical
Generality: Applicable to many
domains.
Competitive: DL is hard to beat as
long as there are data to train.
Scalability: DL is better with more
data, and it is very scalable.
Theoretical
Expressiveness: Neural nets
can approximate any function.
Learnability: Neural nets are
trained easily.
Generalisability: Neural nets
generalize surprisingly well to
unseen data.
It is easy to get lost in current DL zoo
16/11/2020 4
Vietnam News
AAAI’20
16/11/2020 5
AAAI’20
asimovinstitute.org/neural-network-zoo/
Model design goals
Resource adaptive,
compressible
Easy to train
Use (almost) no labels
Ability to extrapolate
Support both fast and slow
learning
Support both fast and slow
inference
16/11/2020 6
Uniformity
Universality
Scalability
Reusability
Capture long-term
dependencies in time and
space
Capture invariances natively
Neural memories
Theory of mind
Neural reasoning
A system view
Deep learning 2.0
16/11/2020 7
Agenda
Classic models
Transformers
Graph neural networks
Unsupervised learning
Deep learning 1.0
Deep models via layer stacking
Theoretically powerful, but limited in practice
Integrate-and-fire neuron
andreykurenkov.com
Feature detector
Block representation16/11/2020 8
http://torch.ch/blog/2016/02/04/resnets.html
Practice
Shorten path length with skip-connections
Easier information and gradient flows
16/11/2020 9
http://qiita.com/supersaiakujin/items/935bbc9610d0f87607e8
Theory
Sequence model with recurrence
Assume the stationary world
Classification
Image captioning
Sentence classification
Neural machine translation
Sequence labelling
Source: http://karpathy.github.io/assets/rnn/diags.jpeg
16/11/2020 10
Spatial model with convolutions
Assume filters/motifs are translation invariant
http://colah.github.io/posts/2015-09-NN-Types-FP/
Learnable kernels
andreykurenkov.com
Feature detector,
often many
Convolutional networks
Summarizing filter responses, destroying locations
adeshpande3.github.io
16/11/2020 12
Operator on sets/bags: Attentions
Not everything is created equal for a goal
Need attention model to select or ignore
certain computations or inputs
Can be “soft” (differentiable) or “hard”
(requires RL)
Attention provides a short-cut  long-
term dependencies
Also encourages sparsity if done right!
http://distill.pub/2016/augmented-rnns/
Fast weights | HyperNet
The world is recursive
Early ideas in early 1990s by Juergen Schmidhuber and collaborators.
Data-dependent weights | Using a controller to generate weights of the main
net.
16/11/2020 14
Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." arXiv preprint arXiv:1609.09106 (2016).
Neural architecture search
When design is cheap and non-creative
The space is huge and discrete
Can be done through meta-heuristics (e.g., genetic algorithms) or Reinforcement
learning (e.g., one discrete change in model structure is an action).
16/11/2020 15
Bello, Irwan, et al. "Neural optimizer search with reinforcement learning." arXiv preprint arXiv:1709.07417 (2017).
Neural memories
Theory of mind
Neural reasoning
A system view
Deep learning 2.0
16/11/2020 16
Agenda
Classic models
Transformers
Graph neural networks
Unsupervised learning
Deep learning 1.0
Motivations
RNN is theoretically powerful, but purely sequential, hence slow and has
limited effective memory for finite size.
 Augmenting with external memories solve some problem, but still slow
CNN is a feed-forward net, can be parallelized, but theoretically not too
strong – random long-term dependencies are hard to encode
Prior to 2017, most architectures are mixture of FNN, RNN and CNN Non-
uniformity, hard to scale to a large number of tasks.
We need supports for
 Parallel computation
 Long-rang dependency encoding (constant path length)
 Uniform construction (e.g., like columnar structure of neocortex)
16/11/2020 17
Prelim: Memory networks
 Input is a set  Load into memory,
which is NOT updated.
 State is a RNN with attention reading
from inputs
 Concepts: Query, key and content +
Content addressing.
 Deep models, but constant path length
from input to output.
 Equivalent to a RNN with shared input
set.
16/11/2020 18
Sukhbaatar, Sainbayar, Jason Weston, and Rob
Fergus. "End-to-end memory networks." Advances in
neural information processing systems. 2015.
Transformers: The triumph of self-attention
16/11/2020 19
Tay, Yi, et al. "Efficient transformers: A survey." arXiv
preprint arXiv:2009.06732 (2020).
State
KeyQuery Memory
Transformers are (new) Hopfield net
16/11/2020 20
Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint
arXiv:2008.02217 (2020).
Transformer v.s. memory networks
Memory network:
 Attention to input set
 One hidden state update at a time.
 Final state integrate information of the set, conditioned on the query.
Transformer:
 Loading all inputs into working memory
 Assigns one hidden state per input element.
 All hidden states (including those from the query) to compute the answer.
16/11/2020 21
Universal transformers
16/11/2020 22
https://ai.googleblog.com/2018/08/moving-beyond-translation-with.html
Dehghani, Mostafa, et al. "Universal
Transformers." International Conference on
Learning Representations. 2018.
Efficient Transformers
Transformer is quadratic in time
 Cannot deal with large sets
(or sequence)
16/11/2020 23
Tay, Yi, et al. "Efficient transformers: A survey." arXiv
preprint arXiv:2009.06732 (2020).
Neural memories
Theory of mind
Neural reasoning
A system view
Deep learning 2.0
16/11/2020 24
Agenda
Classic models
Transformers
Graph neural networks
Unsupervised learning
Deep learning 1.0
Why graphs?
Graphs are pervasive in many
scientific disciplines.
Deep learning needs to move beyond
vector, fixed-size data.
The sub-area of graph representation
has reached a certain maturity, with
multiple reviews, workshops and
papers at top AI/ML venues.
16/11/2020 25
NeurIPS 2020
System
medicine
16/11/2020 26https://www.frontiersin.org/articles/10.3389/fphys.2015.00225/full
Biology, pharmacy & chemistry
Molecule as graph: atoms as
nodes, chemical bonds as edges
Computing molecular
properties
Chemical-chemical interaction
Chemical reaction
16/11/2020 27
#REF: Penmatsa, Aravind, Kevin H. Wang, and Eric Gouaux. "X-
ray structure of dopamine transporter elucidates antidepressant
mechanism." Nature 503.7474 (2013): 85-90.
Gilmer, Justin, et al. "Neural message passing for quantum
chemistry." arXiv preprint arXiv:1704.01212 (2017).
Materials science
16/11/2020 28
Xie, Tian, and Jeffrey C. Grossman.
"Crystal Graph Convolutional Neural
Networks for an Accurate and
Interpretable Prediction of Material
Properties." Physical review
letters 120.14 (2018): 145301.
• Crystal properties
• Exploring/generating
solid structures
• Inverse design
Videos as space-time region graphs
(Abhinav Gupta et al, ECCV’18)
Basic neural graph mechanism:
Message passing
16/11/2020 30
#REF: Pham, Trang, et al. "Column Networks
for Collective Classification." AAAI. 2017.
Relation graph
GCN update rule, vector form
GCN update rule, matrix form
Generalized message passing
Attention: Not all messages are created equal
(Do et al arXiv’s17, Veličković et al ICLR’ 18)
16/11/2020 31
Learning deep matrix representations, K Do, T Tran, S
Venkatesh, arXiv preprint arXiv:1703.01454
Neural graph morphism
Input: Graph
Output: A new graph. Same
nodes, different edges.
Model: Graph morphism
Method: Graph
transformation policy
network (GTPN)
16/11/2020 32
Kien Do, Truyen Tran, and Svetha Venkatesh. "Graph Transformation Policy Network for Chemical
Reaction Prediction." KDD’19.
Neural graph recurrence
Graphs that represent interaction between entities through time
Spatial edges are node interaction at a time step
Temporal edges are consistency relationship through time
ASSIGN: Asynchronous, Sparse Interaction Graph Network
(Morais et al, 2021 @ A2I2, Deakin – Work in Progress)
16/11/2020 35
Graph generation
No regular structures (e.g. grid, sequence,…)
Graphs are permutation invariant:
#permutations are exponential function of #nodes
The probability of a generated graph G need to be
marginalized over all possible permutations
Generating graphs with variable size
Aim for diversity of generated graphs
Generation methods
Classical random graph models, e.g., An exponential
family of probability distributions for directed graphs
(Holland and Leinhardt, 1981)
Deep generative models: GraphVAE, Graphphite,
Junction Tree VAE, GAN variants etc.
Sequence-based & RL methods
16/11/2020 37
GraphRNN
A case of graph
dynamics: nodes and
edges are added
sequentially.
Solve tractability using
BFS
16/11/2020 38
You, Jiaxuan, et al.
"GraphRNN: Generating
realistic graphs with deep
auto-regressive
models." ICML (2018).
Graphs step-wise construction using
reinforcement learning
16/11/2020 39
You, Jiaxuan, et al. "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation." NeurIPS (2018).
Graph rep (message passing) | graph validation (RL) | graph
faithfulness (GAN)
Neural memories
Theory of mind
Neural reasoning
A system view
Deep learning 2.0
16/11/2020 41
Agenda
Classic models
Transformers
Graph neural networks
Unsupervised learning
Deep learning 1.0
Unsupervised learning
16/11/2020 42
Photo credit: Brandon/Flickr
Humans mainly learn by exploring without clear instructions and labelling
Representation learning, a bit of history
“Representation is the use of signs that stand in for
and take the place of something else”
It has been a goal of neural networks since the 1980s and the current wave
of deep learning (2005-present)  Replacing feature engineering
Between 2006-2012, many unsupervised learning models with varying
degree of success: RBM, DBN, DBM, DAE, DDAE, PSD
Between 2013-2018, most models were supervised, following AlexNet
Since 2018, unsupervised learning has become competitive (with
contrastive learning, self-supervised learning, BERT)!
16/11/2020 43
16/11/2020 44Source: asimovinstitute.org/neural-network-zoo/
Criteria for a good representation
Separates factors of variation (aka disentanglement), which are
linearly correlated with desired outputs of downstream tasks.
Provides abstraction that is invariant against deformations and
small variations.
Is distributed (one concept is represented by multiple units), which
is compact and good for interpolation.
Optionally, offers dimensionality reduction.
Optionally, is sparse, giving room for emerging symbols.
16/11/2020 45
Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new
perspectives." IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 1798-1828.
Why neural unsupervised learning?
Neural nets have representational richness:
 FFN are functional approximator
 RNN are program approximator, can estimate a program behaviour and
generate a string
 CNN are for translation invariance
 Transformers are powerful contextual encoder
Compactness: Representations are (sparse and) distributed.
 Essential to perception, compact storage and reasoning
Accounting for uncertainty: Neural nets can be stochastic to model
distributions
Symbolic representation: realisation through sparse activations and
gating mechanisms
16/11/2020 46
Neural
autoregressive
models:
Predict the next step
given the history
The keys: (a) long-term dependencies, (b)
ordering, & (c) parameter sharing.
Can be realized using:
 RNN
 CNN: One-sided CNN, dilated CNN (e.g., WaveNet),
PixelCNN
 Transformers  GPT-X family
 Masked autoencoder  MADE
Pros: General, good quality thus far
Cons: Slow – needs better inductive biases for
scalability16/11/2020 47
lyusungwon.github.io/studies/2018/07/25/nade/
Generative models:
Discover the underlying process that generates data
16/11/2020 48
Many applications:
• Text to speech
• Simulate data that are hard to obtain/share in
real life (e.g., healthcare)
• Generate meaningful sentences conditioned on
some input (foreign language, image, video)
• Semi-supervised learning
• Planning
Deep (Denoising) AutoEncoder:
Self-reconstruction of data
16/11/2020
49
Auto-encoderFeature detector
Representation
Raw data
(optionally with
added noise)
Reconstruction
Deep Auto-encoder
Encoder
Decoder
Credit: kvfrans.com
Gaussian
hidden
variables
Data
Generative
net
Recognising
net
Variational Autoencoder
Approximating the posterior by a neural net
Two separate processes: generative (hidden  visible) versus
recognition (visible  hidden)
GAN: Generative Adversarial nets
Matching data statistics
Yann LeCun: GAN is one of best idea in past 10 years!
Instead of modeling the entire distribution of data, learns to map ANY random
distribution into the region of data, so that there is no discriminator that
can distinguish sampled data from real data.
Any random distribution
in any space
Binary discriminator,
usually a neural
classifier
Neural net that maps
z  x
Generative adversarial networks
(Adapted from Goodfellow’s, NIPS 2014)
16/11/2020 52
Progressive GAN: Generated images
16/11/2020 53
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved
quality, stability, and variation. arXiv preprint arXiv:1710.10196.
BERT
Transformer that predicts its own masked parts
BERT is like parallel
approximate pseudo-
likelihood
~ Maximizing the conditional
likelihood of some variables
given the rest.
When the number of variables is
large, this converses to MLE
(maximum likelihood estimate).
16/11/2020 54
https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
Contrastive
learning:
Comparing samples
16/11/2020 55
Le-Khac, Phuc H., Graham Healy, and
Alan F. Smeaton. "Contrastive
Representation Learning: A Framework
and Review." arXiv preprint
arXiv:2010.05113 (2020).
Unsupervised learning: A few more points
No external labels, but rich training signals (thousand bits per sample,
as opposed to a few bits in supervised learning).
A few techniques:
 Compressing data as much as possible with little loss
 Energy-based, i.e., pull down energy of observed data, pull up every else
 Filling the missing slots (aka predictive learning, self-supervised learning)
We have not covered unsupervised learning on graphs (e.g.,
DeepWalk, GPT-GNN), but the general principles should hold.
Question: Multiple objectives, or no objective at all?
Question: Emergence from many simple interacting elements?
16/11/2020 56
Liu, Xiao, et al. "Self-supervised learning: Generative or contrastive." arXiv preprint
arXiv:2006.08218 (2020).
End of part I
16/11/2020 57

More Related Content

What's hot

AI/ML as an empirical science
AI/ML as an empirical scienceAI/ML as an empirical science
AI/ML as an empirical scienceDeakin University
 
Deep learning for detecting anomalies and software vulnerabilities
Deep learning for detecting anomalies and software vulnerabilitiesDeep learning for detecting anomalies and software vulnerabilities
Deep learning for detecting anomalies and software vulnerabilitiesDeakin University
 
Memory advances in Neural Turing Machines
Memory advances in Neural Turing MachinesMemory advances in Neural Turing Machines
Memory advances in Neural Turing MachinesDeakin University
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeakin University
 
Machine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryMachine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryDeakin University
 
Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphsDeakin University
 
Deep learning and applications in non-cognitive domains I
Deep learning and applications in non-cognitive domains IDeep learning and applications in non-cognitive domains I
Deep learning and applications in non-cognitive domains IDeakin University
 
Deep learning and applications in non-cognitive domains III
Deep learning and applications in non-cognitive domains IIIDeep learning and applications in non-cognitive domains III
Deep learning and applications in non-cognitive domains IIIDeakin University
 
Deep learning and applications in non-cognitive domains II
Deep learning and applications in non-cognitive domains IIDeep learning and applications in non-cognitive domains II
Deep learning and applications in non-cognitive domains IIDeakin University
 
Deep learning for biomedicine
Deep learning for biomedicineDeep learning for biomedicine
Deep learning for biomedicineDeakin University
 
Deep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeakin University
 
Predicting the future with social media
Predicting the future with social mediaPredicting the future with social media
Predicting the future with social mediaPeter Wlodarczak
 
Connectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial IntelligenceConnectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial IntelligenceAlaa Al Dahdouh
 
A scenario based approach for dealing with
A scenario based approach for dealing withA scenario based approach for dealing with
A scenario based approach for dealing withijcsa
 
Keynote 1: Teaching and Learning Computational Thinking at Scale
Keynote 1: Teaching and Learning Computational Thinking at ScaleKeynote 1: Teaching and Learning Computational Thinking at Scale
Keynote 1: Teaching and Learning Computational Thinking at ScaleCITE
 
Introduction to soft computing V 1.0
Introduction to soft computing  V 1.0Introduction to soft computing  V 1.0
Introduction to soft computing V 1.0Dr. C.V. Suresh Babu
 
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0Natig Vahabov
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1CS, NcState
 
Deep Learning for Artificial Intelligence (AI)
Deep Learning for Artificial Intelligence (AI)Deep Learning for Artificial Intelligence (AI)
Deep Learning for Artificial Intelligence (AI)Er. Shiva K. Shrestha
 

What's hot (20)

AI/ML as an empirical science
AI/ML as an empirical scienceAI/ML as an empirical science
AI/ML as an empirical science
 
Deep learning for detecting anomalies and software vulnerabilities
Deep learning for detecting anomalies and software vulnerabilitiesDeep learning for detecting anomalies and software vulnerabilities
Deep learning for detecting anomalies and software vulnerabilities
 
Memory advances in Neural Turing Machines
Memory advances in Neural Turing MachinesMemory advances in Neural Turing Machines
Memory advances in Neural Turing Machines
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining I
 
Machine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug DiscoveryMachine Learning and Reasoning for Drug Discovery
Machine Learning and Reasoning for Drug Discovery
 
Representation learning on graphs
Representation learning on graphsRepresentation learning on graphs
Representation learning on graphs
 
Deep learning and applications in non-cognitive domains I
Deep learning and applications in non-cognitive domains IDeep learning and applications in non-cognitive domains I
Deep learning and applications in non-cognitive domains I
 
Deep learning and applications in non-cognitive domains III
Deep learning and applications in non-cognitive domains IIIDeep learning and applications in non-cognitive domains III
Deep learning and applications in non-cognitive domains III
 
Deep learning and applications in non-cognitive domains II
Deep learning and applications in non-cognitive domains IIDeep learning and applications in non-cognitive domains II
Deep learning and applications in non-cognitive domains II
 
Deep learning for biomedicine
Deep learning for biomedicineDeep learning for biomedicine
Deep learning for biomedicine
 
Deep learning for genomics: Present and future
Deep learning for genomics: Present and futureDeep learning for genomics: Present and future
Deep learning for genomics: Present and future
 
Predicting the future with social media
Predicting the future with social mediaPredicting the future with social media
Predicting the future with social media
 
Connectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial IntelligenceConnectivism: Education & Artificial Intelligence
Connectivism: Education & Artificial Intelligence
 
A scenario based approach for dealing with
A scenario based approach for dealing withA scenario based approach for dealing with
A scenario based approach for dealing with
 
Keynote 1: Teaching and Learning Computational Thinking at Scale
Keynote 1: Teaching and Learning Computational Thinking at ScaleKeynote 1: Teaching and Learning Computational Thinking at Scale
Keynote 1: Teaching and Learning Computational Thinking at Scale
 
Neural networks report
Neural networks reportNeural networks report
Neural networks report
 
Introduction to soft computing V 1.0
Introduction to soft computing  V 1.0Introduction to soft computing  V 1.0
Introduction to soft computing V 1.0
 
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0
Introduction to Machine Learning, Hands-on Deep Learning with Tensroflow 2.0
 
Dagstuhl14 intro-v1
Dagstuhl14 intro-v1Dagstuhl14 intro-v1
Dagstuhl14 intro-v1
 
Deep Learning for Artificial Intelligence (AI)
Deep Learning for Artificial Intelligence (AI)Deep Learning for Artificial Intelligence (AI)
Deep Learning for Artificial Intelligence (AI)
 

Similar to Deep Learning 1.0 and Beyond: A Tutorial on Classic and Emerging Models (Part I

Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsNeo4j
 
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...ssuser4b1f48
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningBrodmann17
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchDeep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchinside-BigData.com
 
Deep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeakin University
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeSiby Jose Plathottam
 
Tensor Networks and Their Applications on Machine Learning
Tensor Networks and Their Applications on Machine LearningTensor Networks and Their Applications on Machine Learning
Tensor Networks and Their Applications on Machine LearningKwan-yuet Ho
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning ExplainedMelanie Swan
 
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency ...
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency ...Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency ...
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency ...inside-BigData.com
 
Gradient-based Meta-learning with learned layerwise subspace and metric
Gradient-based Meta-learning with learned layerwise subspace and metricGradient-based Meta-learning with learned layerwise subspace and metric
Gradient-based Meta-learning with learned layerwise subspace and metricNAVER Engineering
 
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceGradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceYoonho Lee
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep LearningAndre Freitas
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Doug Needham
 
New prediction method for data spreading in social networks based on machine ...
New prediction method for data spreading in social networks based on machine ...New prediction method for data spreading in social networks based on machine ...
New prediction method for data spreading in social networks based on machine ...TELKOMNIKA JOURNAL
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...BaoTramDuong2
 

Similar to Deep Learning 1.0 and Beyond: A Tutorial on Classic and Emerging Models (Part I (20)

Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
AI Science
AI Science AI Science
AI Science
 
The Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing SystemsThe Future is Big Graphs: A Community View on Graph Processing Systems
The Future is Big Graphs: A Community View on Graph Processing Systems
 
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
NS-CUK Joint Journal Club: V.T.Hoang, Review on "Heterogeneous Graph Attentio...
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Deep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorchDeep Learning and Automatic Differentiation from Theano to PyTorch
Deep Learning and Automatic Differentiation from Theano to PyTorch
 
Deep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advances
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Deep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and HypeDeep learning: Cutting through the Myths and Hype
Deep learning: Cutting through the Myths and Hype
 
Geometric Deep Learning
Geometric Deep Learning Geometric Deep Learning
Geometric Deep Learning
 
Tensor Networks and Their Applications on Machine Learning
Tensor Networks and Their Applications on Machine LearningTensor Networks and Their Applications on Machine Learning
Tensor Networks and Their Applications on Machine Learning
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency ...
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency ...Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency ...
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency ...
 
Building intelligent systems (that can explain)
Building intelligent systems (that can explain)Building intelligent systems (that can explain)
Building intelligent systems (that can explain)
 
Gradient-based Meta-learning with learned layerwise subspace and metric
Gradient-based Meta-learning with learned layerwise subspace and metricGradient-based Meta-learning with learned layerwise subspace and metric
Gradient-based Meta-learning with learned layerwise subspace and metric
 
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and SubspaceGradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace
 
AI Beyond Deep Learning
AI Beyond Deep LearningAI Beyond Deep Learning
AI Beyond Deep Learning
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
New prediction method for data spreading in social networks based on machine ...
New prediction method for data spreading in social networks based on machine ...New prediction method for data spreading in social networks based on machine ...
New prediction method for data spreading in social networks based on machine ...
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 

More from Deakin University

AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...Deakin University
 
Deep analytics via learning to reason
Deep analytics via learning to reasonDeep analytics via learning to reason
Deep analytics via learning to reasonDeakin University
 
Generative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsDeakin University
 
Generative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeDeakin University
 
AI for tackling climate change
AI for tackling climate changeAI for tackling climate change
AI for tackling climate changeDeakin University
 
Deep learning for episodic interventional data
Deep learning for episodic interventional dataDeep learning for episodic interventional data
Deep learning for episodic interventional dataDeakin University
 
Deep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeakin University
 

More from Deakin University (9)

AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...AI for automated materials discovery via learning to represent, predict, gene...
AI for automated materials discovery via learning to represent, predict, gene...
 
Deep analytics via learning to reason
Deep analytics via learning to reasonDeep analytics via learning to reason
Deep analytics via learning to reason
 
Generative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of MaterialsGenerative AI to Accelerate Discovery of Materials
Generative AI to Accelerate Discovery of Materials
 
Generative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI Landscape
 
AI in the Covid-19 pandemic
AI in the Covid-19 pandemicAI in the Covid-19 pandemic
AI in the Covid-19 pandemic
 
AI for tackling climate change
AI for tackling climate changeAI for tackling climate change
AI for tackling climate change
 
AI for drug discovery
AI for drug discoveryAI for drug discovery
AI for drug discovery
 
Deep learning for episodic interventional data
Deep learning for episodic interventional dataDeep learning for episodic interventional data
Deep learning for episodic interventional data
 
Deep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining IIDeep learning for biomedical discovery and data mining II
Deep learning for biomedical discovery and data mining II
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Deep Learning 1.0 and Beyond: A Tutorial on Classic and Emerging Models (Part I

  • 1. 16/11/2020 1 A/Prof Truyen Tran With contribution from Vuong Le, Hung Le, Thao Le, Tin Pham & Dung Nguyen Deakin University December 2020 Deep learning 1.0 and Beyond A tutorial Part I @truyenoz truyentran.github.io truyen.tran@deakin.edu.au letdataspeak.blogspot.com goo.gl/3jJ1O0 linkedin.com/in/truyen-tran
  • 2. 2012 2016 AusDM 2016 Turing Awards 2018 GPT-3 2020 8 years snapshot
  • 3. Why (still) DL? Practical Generality: Applicable to many domains. Competitive: DL is hard to beat as long as there are data to train. Scalability: DL is better with more data, and it is very scalable. Theoretical Expressiveness: Neural nets can approximate any function. Learnability: Neural nets are trained easily. Generalisability: Neural nets generalize surprisingly well to unseen data.
  • 4. It is easy to get lost in current DL zoo 16/11/2020 4 Vietnam News AAAI’20
  • 6. Model design goals Resource adaptive, compressible Easy to train Use (almost) no labels Ability to extrapolate Support both fast and slow learning Support both fast and slow inference 16/11/2020 6 Uniformity Universality Scalability Reusability Capture long-term dependencies in time and space Capture invariances natively
  • 7. Neural memories Theory of mind Neural reasoning A system view Deep learning 2.0 16/11/2020 7 Agenda Classic models Transformers Graph neural networks Unsupervised learning Deep learning 1.0
  • 8. Deep models via layer stacking Theoretically powerful, but limited in practice Integrate-and-fire neuron andreykurenkov.com Feature detector Block representation16/11/2020 8
  • 9. http://torch.ch/blog/2016/02/04/resnets.html Practice Shorten path length with skip-connections Easier information and gradient flows 16/11/2020 9 http://qiita.com/supersaiakujin/items/935bbc9610d0f87607e8 Theory
  • 10. Sequence model with recurrence Assume the stationary world Classification Image captioning Sentence classification Neural machine translation Sequence labelling Source: http://karpathy.github.io/assets/rnn/diags.jpeg 16/11/2020 10
  • 11. Spatial model with convolutions Assume filters/motifs are translation invariant http://colah.github.io/posts/2015-09-NN-Types-FP/ Learnable kernels andreykurenkov.com Feature detector, often many
  • 12. Convolutional networks Summarizing filter responses, destroying locations adeshpande3.github.io 16/11/2020 12
  • 13. Operator on sets/bags: Attentions Not everything is created equal for a goal Need attention model to select or ignore certain computations or inputs Can be “soft” (differentiable) or “hard” (requires RL) Attention provides a short-cut  long- term dependencies Also encourages sparsity if done right! http://distill.pub/2016/augmented-rnns/
  • 14. Fast weights | HyperNet The world is recursive Early ideas in early 1990s by Juergen Schmidhuber and collaborators. Data-dependent weights | Using a controller to generate weights of the main net. 16/11/2020 14 Ha, David, Andrew Dai, and Quoc V. Le. "Hypernetworks." arXiv preprint arXiv:1609.09106 (2016).
  • 15. Neural architecture search When design is cheap and non-creative The space is huge and discrete Can be done through meta-heuristics (e.g., genetic algorithms) or Reinforcement learning (e.g., one discrete change in model structure is an action). 16/11/2020 15 Bello, Irwan, et al. "Neural optimizer search with reinforcement learning." arXiv preprint arXiv:1709.07417 (2017).
  • 16. Neural memories Theory of mind Neural reasoning A system view Deep learning 2.0 16/11/2020 16 Agenda Classic models Transformers Graph neural networks Unsupervised learning Deep learning 1.0
  • 17. Motivations RNN is theoretically powerful, but purely sequential, hence slow and has limited effective memory for finite size.  Augmenting with external memories solve some problem, but still slow CNN is a feed-forward net, can be parallelized, but theoretically not too strong – random long-term dependencies are hard to encode Prior to 2017, most architectures are mixture of FNN, RNN and CNN Non- uniformity, hard to scale to a large number of tasks. We need supports for  Parallel computation  Long-rang dependency encoding (constant path length)  Uniform construction (e.g., like columnar structure of neocortex) 16/11/2020 17
  • 18. Prelim: Memory networks  Input is a set  Load into memory, which is NOT updated.  State is a RNN with attention reading from inputs  Concepts: Query, key and content + Content addressing.  Deep models, but constant path length from input to output.  Equivalent to a RNN with shared input set. 16/11/2020 18 Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "End-to-end memory networks." Advances in neural information processing systems. 2015.
  • 19. Transformers: The triumph of self-attention 16/11/2020 19 Tay, Yi, et al. "Efficient transformers: A survey." arXiv preprint arXiv:2009.06732 (2020). State KeyQuery Memory
  • 20. Transformers are (new) Hopfield net 16/11/2020 20 Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint arXiv:2008.02217 (2020).
  • 21. Transformer v.s. memory networks Memory network:  Attention to input set  One hidden state update at a time.  Final state integrate information of the set, conditioned on the query. Transformer:  Loading all inputs into working memory  Assigns one hidden state per input element.  All hidden states (including those from the query) to compute the answer. 16/11/2020 21
  • 22. Universal transformers 16/11/2020 22 https://ai.googleblog.com/2018/08/moving-beyond-translation-with.html Dehghani, Mostafa, et al. "Universal Transformers." International Conference on Learning Representations. 2018.
  • 23. Efficient Transformers Transformer is quadratic in time  Cannot deal with large sets (or sequence) 16/11/2020 23 Tay, Yi, et al. "Efficient transformers: A survey." arXiv preprint arXiv:2009.06732 (2020).
  • 24. Neural memories Theory of mind Neural reasoning A system view Deep learning 2.0 16/11/2020 24 Agenda Classic models Transformers Graph neural networks Unsupervised learning Deep learning 1.0
  • 25. Why graphs? Graphs are pervasive in many scientific disciplines. Deep learning needs to move beyond vector, fixed-size data. The sub-area of graph representation has reached a certain maturity, with multiple reviews, workshops and papers at top AI/ML venues. 16/11/2020 25 NeurIPS 2020
  • 27. Biology, pharmacy & chemistry Molecule as graph: atoms as nodes, chemical bonds as edges Computing molecular properties Chemical-chemical interaction Chemical reaction 16/11/2020 27 #REF: Penmatsa, Aravind, Kevin H. Wang, and Eric Gouaux. "X- ray structure of dopamine transporter elucidates antidepressant mechanism." Nature 503.7474 (2013): 85-90. Gilmer, Justin, et al. "Neural message passing for quantum chemistry." arXiv preprint arXiv:1704.01212 (2017).
  • 28. Materials science 16/11/2020 28 Xie, Tian, and Jeffrey C. Grossman. "Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties." Physical review letters 120.14 (2018): 145301. • Crystal properties • Exploring/generating solid structures • Inverse design
  • 29. Videos as space-time region graphs (Abhinav Gupta et al, ECCV’18)
  • 30. Basic neural graph mechanism: Message passing 16/11/2020 30 #REF: Pham, Trang, et al. "Column Networks for Collective Classification." AAAI. 2017. Relation graph GCN update rule, vector form GCN update rule, matrix form Generalized message passing
  • 31. Attention: Not all messages are created equal (Do et al arXiv’s17, Veličković et al ICLR’ 18) 16/11/2020 31 Learning deep matrix representations, K Do, T Tran, S Venkatesh, arXiv preprint arXiv:1703.01454
  • 32. Neural graph morphism Input: Graph Output: A new graph. Same nodes, different edges. Model: Graph morphism Method: Graph transformation policy network (GTPN) 16/11/2020 32 Kien Do, Truyen Tran, and Svetha Venkatesh. "Graph Transformation Policy Network for Chemical Reaction Prediction." KDD’19.
  • 33. Neural graph recurrence Graphs that represent interaction between entities through time Spatial edges are node interaction at a time step Temporal edges are consistency relationship through time
  • 34. ASSIGN: Asynchronous, Sparse Interaction Graph Network (Morais et al, 2021 @ A2I2, Deakin – Work in Progress) 16/11/2020 35
  • 35. Graph generation No regular structures (e.g. grid, sequence,…) Graphs are permutation invariant: #permutations are exponential function of #nodes The probability of a generated graph G need to be marginalized over all possible permutations Generating graphs with variable size Aim for diversity of generated graphs
  • 36. Generation methods Classical random graph models, e.g., An exponential family of probability distributions for directed graphs (Holland and Leinhardt, 1981) Deep generative models: GraphVAE, Graphphite, Junction Tree VAE, GAN variants etc. Sequence-based & RL methods 16/11/2020 37
  • 37. GraphRNN A case of graph dynamics: nodes and edges are added sequentially. Solve tractability using BFS 16/11/2020 38 You, Jiaxuan, et al. "GraphRNN: Generating realistic graphs with deep auto-regressive models." ICML (2018).
  • 38. Graphs step-wise construction using reinforcement learning 16/11/2020 39 You, Jiaxuan, et al. "Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation." NeurIPS (2018). Graph rep (message passing) | graph validation (RL) | graph faithfulness (GAN)
  • 39. Neural memories Theory of mind Neural reasoning A system view Deep learning 2.0 16/11/2020 41 Agenda Classic models Transformers Graph neural networks Unsupervised learning Deep learning 1.0
  • 40. Unsupervised learning 16/11/2020 42 Photo credit: Brandon/Flickr Humans mainly learn by exploring without clear instructions and labelling
  • 41. Representation learning, a bit of history “Representation is the use of signs that stand in for and take the place of something else” It has been a goal of neural networks since the 1980s and the current wave of deep learning (2005-present)  Replacing feature engineering Between 2006-2012, many unsupervised learning models with varying degree of success: RBM, DBN, DBM, DAE, DDAE, PSD Between 2013-2018, most models were supervised, following AlexNet Since 2018, unsupervised learning has become competitive (with contrastive learning, self-supervised learning, BERT)! 16/11/2020 43
  • 43. Criteria for a good representation Separates factors of variation (aka disentanglement), which are linearly correlated with desired outputs of downstream tasks. Provides abstraction that is invariant against deformations and small variations. Is distributed (one concept is represented by multiple units), which is compact and good for interpolation. Optionally, offers dimensionality reduction. Optionally, is sparse, giving room for emerging symbols. 16/11/2020 45 Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new perspectives." IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 1798-1828.
  • 44. Why neural unsupervised learning? Neural nets have representational richness:  FFN are functional approximator  RNN are program approximator, can estimate a program behaviour and generate a string  CNN are for translation invariance  Transformers are powerful contextual encoder Compactness: Representations are (sparse and) distributed.  Essential to perception, compact storage and reasoning Accounting for uncertainty: Neural nets can be stochastic to model distributions Symbolic representation: realisation through sparse activations and gating mechanisms 16/11/2020 46
  • 45. Neural autoregressive models: Predict the next step given the history The keys: (a) long-term dependencies, (b) ordering, & (c) parameter sharing. Can be realized using:  RNN  CNN: One-sided CNN, dilated CNN (e.g., WaveNet), PixelCNN  Transformers  GPT-X family  Masked autoencoder  MADE Pros: General, good quality thus far Cons: Slow – needs better inductive biases for scalability16/11/2020 47 lyusungwon.github.io/studies/2018/07/25/nade/
  • 46. Generative models: Discover the underlying process that generates data 16/11/2020 48 Many applications: • Text to speech • Simulate data that are hard to obtain/share in real life (e.g., healthcare) • Generate meaningful sentences conditioned on some input (foreign language, image, video) • Semi-supervised learning • Planning
  • 47. Deep (Denoising) AutoEncoder: Self-reconstruction of data 16/11/2020 49 Auto-encoderFeature detector Representation Raw data (optionally with added noise) Reconstruction Deep Auto-encoder Encoder Decoder
  • 48. Credit: kvfrans.com Gaussian hidden variables Data Generative net Recognising net Variational Autoencoder Approximating the posterior by a neural net Two separate processes: generative (hidden  visible) versus recognition (visible  hidden)
  • 49. GAN: Generative Adversarial nets Matching data statistics Yann LeCun: GAN is one of best idea in past 10 years! Instead of modeling the entire distribution of data, learns to map ANY random distribution into the region of data, so that there is no discriminator that can distinguish sampled data from real data. Any random distribution in any space Binary discriminator, usually a neural classifier Neural net that maps z  x
  • 50. Generative adversarial networks (Adapted from Goodfellow’s, NIPS 2014) 16/11/2020 52
  • 51. Progressive GAN: Generated images 16/11/2020 53 Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
  • 52. BERT Transformer that predicts its own masked parts BERT is like parallel approximate pseudo- likelihood ~ Maximizing the conditional likelihood of some variables given the rest. When the number of variables is large, this converses to MLE (maximum likelihood estimate). 16/11/2020 54 https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
  • 53. Contrastive learning: Comparing samples 16/11/2020 55 Le-Khac, Phuc H., Graham Healy, and Alan F. Smeaton. "Contrastive Representation Learning: A Framework and Review." arXiv preprint arXiv:2010.05113 (2020).
  • 54. Unsupervised learning: A few more points No external labels, but rich training signals (thousand bits per sample, as opposed to a few bits in supervised learning). A few techniques:  Compressing data as much as possible with little loss  Energy-based, i.e., pull down energy of observed data, pull up every else  Filling the missing slots (aka predictive learning, self-supervised learning) We have not covered unsupervised learning on graphs (e.g., DeepWalk, GPT-GNN), but the general principles should hold. Question: Multiple objectives, or no objective at all? Question: Emergence from many simple interacting elements? 16/11/2020 56 Liu, Xiao, et al. "Self-supervised learning: Generative or contrastive." arXiv preprint arXiv:2006.08218 (2020).
  • 55. End of part I 16/11/2020 57