SlideShare a Scribd company logo
1 of 42
Strictly Confidential
15/11/2016
Applications of Deep
Neural Network
Dr. Z. Xing
Lead of Deep Learning Taskforce, Data Science & Analytics @ NIO USA, Inc
3200 N 1st St, San Jose, CA 95134
September 13th, 2017 @ Tsinghua University, Beijing, China
2
• The volume and heterogeneity of the data that
we are dealing with nowadays has reached a
level of unprecedented complexities and
subtleties
Data “science”
• At the end of the day, data “science” all comes
down to understanding an intricate
representation of the data, so called
representation learning
A typical example of industrial
scale data
• ~ 40 M clicks per second
• ~ 2.5 M servers
• ~ 5.7 terawatt hours annually
≈ 68 M AC units
• ~10-15 exabytes (1018) ≈ 30
M personal laptops
• ….......
= ⨂
3
• Computer/machine can help on representation learning task, but only to a
certain extent…...
• “Conventional” machine learning approaches limited in various ways,
manifested as a difficult learning task when raw format of data is fed to the
system, just as how human learning system processes data.
• Feature extraction is a man-crafted process that needs careful engineering,
domain expertise, etc. Difficult to generalize, scale up with the increasing data
size/model
“Conventional” machine learning
• man-crafted features
• manually create
representation of the data
classification
illustration
purpose only
4
• Artificial intelligence (AI)
“The science and engineering of making intelligent
machines” John McCarthy 1955
Artificial intelligence
5
• Infancy concept of neurons/circuits
originates from Neuro-science,
biophysics, computational physics
Concept of neural network
• The electrical signals (voltage spikes) that
brain processes does NOT represent the
external world at all, how neurons decode
such signals is complicated process, in two
folds: (a) time-dependencies (transient
neuron functions), (b) electrical functionalities
of each cell, activations
• 1011 neurons in human brain, 1015
connections (connectionist….)
soma/body
synapses
6
Concept of “deep” learning
• Representation learning is the key advantage which allows raw format of
data processing, avert the need of man-crafted features
• Multiple levels of representations of data, multiple levels of abstraction.
Accommodate flexible rank of latent space, which locally resembles
Euclidean space
• Each level is often a non-linear module,
aggregating multiple levels allows system to learn
complicated physics
• Higher/deeper levels amplify the components of
input that are relevant/crucial to optimization goals
while suppressing the less relevant part
optimization goal
7
“Deep” against “shallow”
• We want our system to be selective on things that are relevant or important,
while being invariant to things that are not important, for example orientations
of the object, background color, so on and so forth
• “Shallow” or even linear classifier can only carve input space into an over-
simplified regions/hyper-planes
Wolf Samoyed
8
• Unsupervised learning, transfer learning (domain
adaption)
• Auto-encoder
• Variational Auto-encoder (VAE)
• Restricted Boltzmann machine
Different learning mechanisms
• Supervised learning
• Objective function measures an error (𝛿) between system output and
desired target; internal weights keep getting tuned to minimize this 𝛿,
guided by gradients
• However, optimization happens at the level of expected value over many
training instances
• Also optimization goal is to match between two patterns, not taking into
account an overall strategic goal (winning a chess game etc.)
• Stochastic gradient descent, Stochastic Gradient Descent Tricks, (SGD, Bottou, 2007, ref. 18)
• Diederik P. Kingma, Auto-Encoding Variational Bayes, arxiv 1312.6114
encoder
decoder
9
Selectivity–invariance dilemma
• Symmetries in the data: many tasks are invariant to transformations of the
data, for example the recognition task is invariant to changing in pose, light,
location…. (symmetries)
• Human brain can learn to recongnize objects after seen only a few examples
(unsupervised), while most machine learning systems need huge amount of
labelled data (supervised)
• Factoring out the symmetries from the data, while retaining selectivity, is the
key to build artificial intelligence that can compete with human intelligence
Classical learning theory focuses on supervised learning and
postulates that a suitable hypothesis space is given. In other
words, data representation and how to select and learn it, is
classically not considered to be part of the learning problem,
but rather as a prior information.
visual cortex
…...
• Fabio Anselmi, On Invariance and Selectivity in Representation Learning, arxiv 1503.05938
• Attempts of utilizing
group theory, group
average have been
made, on the theory
side, to derive
invariant
representation
learning
10
Local minima for large networks
• Numerical analysis in statistical physics, random matrix theory, neural network
theory shows that local minima rarely an issue for large networks
• Key difference being the dimensionality of the space; proliferation of saddle
points, rather than local minima becomes more relevant in solving high
dimensional problem
• Yann N. Dauphin, Yoshua Bengio et. al (2014), Identifying and attacking the saddle point problem in high-dimensional non-convex
optimization, arxiv 1406.2572
• Anna Choromanska et. al., The Loss Surfaces of Multilayer Networks, arxiv 412 0233
• Similar objective function at various
saddle points
• Statistics of Critical Points of Gaussian Fields on Large-Dimensional
Spaces, Bray and Dean (2007), Phys. Rev. Lett. 98, 150201
• Replica Symmetry Breaking Condition Exposed by Random Matrix
Calculation of Landscape Complexity, Fyodorov Williams (2007),
etc.
Fully connected layer
• Makes no assumption at all on the data
features
• Does not persist any invariance of the input
feature map
• Expensive in terms of computation and
memory consumption
• Multi-layer perceptron (MLP)
11
nonlinearities
nonlinearities
12
Backward propagation (BP)
• BP guides the computer to update
its internal parameters by using
Chain rule of derivatives
• The central problem that BP solves is to evaluate the influences of a parameter on a
function whose computation involves multiple elementary steps (Lagrangian
formalism)
Lagrange
function
Objective
function
Constraints
(network
dynamics)
Lagrange multiplier takes into account the backward
dynamics
Z. Xing, Measurement of the semileptonic CP violating asymmetry a_sl in B_s decays and
the D_s - D_s production asymmetry in 7 TeV pp collisions}", CERN-THESIS-2013-078",
https://inspirehep.net/record/1296591?ln=en
• Y Le Cun, A theoretical
framework of backward
propagation, Proceedings of the
1988 Connectionist Model
Summer School, p21-28, 1988
Convolutional layer
• Convolutional neural network
• 3-dimensional neurons
• local connectivity at each filter/kernel
(local features of data)
• weights-sharing between all neurons
in the same layer, usually named as
an unit of kernel/filter to be convoluted
with input volume (invariance of data)
• output 3D neurons
• depth of filter bank:
Do
• input 3D
neurons
𝐷 𝑜
𝐷𝑖
= 𝐷 𝑜× 𝐷𝑖 × 𝐹 × 𝐹
# of
learnable
weights
𝑁𝑜 =
𝑁𝑖 − 𝐹 + 2 × 𝑝
𝑠
+ 1
s: stride
• the kernels/filters
essential can pick
up latent features
such as brightness
of image, contrast,
RGB color, edges,
etc.
13
Connection to neuro-science
• One route of developing your deep neural net architecture is inspirations
from neuro-science, such as human visual cortex
• Cross channel information learning (cascaded 1x1 convolution) is
biologically inspired because human visual cortex have receptive fields
(kernels) tuned to different orientation
- local groups of
values are often
highly correlated
- invariance to
location, weights
sharing
14
Charles F. Cadieu et. al., Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition, PLOS
Computational Biology, December 2014, Volume 10, Issue 12
Translational invariance
• Convolutional layer relies on translational invariance
(convolution commutes with translation)
• local input regions
• only relative locations are taken into account
stride kernel size
fks determines layer types:
• convolutional
• max pooling
• activation function
translation operator
15
Recurrent architecture
• Recurrent network structures can be used to
learn potential temporal correlations/structures
in the data
• Once “unrolled” or “unfolded”, all layers share
the same weights, can be viewed as
feedforward networks, thus can be optimized
using BP (BPTT, through time)
• However, there is exploding or vanishing
gradient problem along the temporal axis
• Different formalisms and implementations of
recurrent activations are proposed (LSTM, fixed
unit recurrent weights, GRU, etc.) to alleviate
the issue as well as gradient clipping approach
xt=0
ht=0
yt=0
xt=1
ht=1
yt=1
xt=2
ht=2
yt=2
w>1 or
w<1
all these recurrent edges share
the same synaptic weights
16
LSTM – long short-term memory
• Special treatment: memory cells
• Novel inclusion of multiplicative nodes, all edges into or out of these nodes
have fixed unit weight, people used call this fixed unit weight as “constant
error carousel”
• A. Grave, Generating Sequences With Recurrent Neural Networks, arXiv 1308.0850
• A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Studies in Computational Intelligence. Springer, 2012
recurrence
components
fixed unit weight alleviates
vanishing gradient problems
errors
memory
flushing
17
Sequence generation
• Recurrent network can be used for sequence generation
• a man riding a wave on top of a surfboard . (p=0.040413)
• a person riding a surf board on a wave (p=0.017452)
• a man riding a wave on a surfboard in the ocean (p=0.005743)
trainin
g
testing/inference
18
Gated Recurrent Unit (GRU)
• GRU also utilized gating unit to regulate the temporal flow, but with
a simple linear interpolation, instead of memory cell
• Kyunghyun Cho, On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, arXiv:1409.1259
LSTM GRU
reset
update
previousstates
previousstates
• Both LSTM and GRU
utilizes an additive
component when
updating the states which
keeps partial influences
from previous timestamp
19
Activation of hidden layers
• A neural network without any activation would simply be a linear
regression model. Activation function accommodates
sophisticated nonlinearities for data such as images, videos,
speeches, etc.
o Sigmoid function: saturation causes vanishing gradients, slow
convergence, not zero-centered
o Tanh: vanishing gradient problem
o ReLu: avoids and rectifies vanishing gradient, no need for input
normalization, could result in “dead” neurons
o “Leaky” ReLu, pReLu (parameterized ReLu)
o Human neuron activations can actually be a stochastic process
20
Normalization
• Local response normalization. Normalize across neighboring kernels,
lateral inhibition, competition for big activities across neurons computed by
different kernel/filter
• Batch normalization, reducing internal covariate shift
(ICS)
• “Whitening” input feature map accelerates the
training speed and convergence
• But simple normalization procedure may violate
the identify transform depending the non-linear
activation form
21
Pooling
• Summarize across neighboring groups of neurons in the same
kernel map to reduce computations, feature map size
• Less over-fitting
• Aggregates localized spatial information
Alternatives:
• Maximum
• Sum
• Average
• Weighted average with
distance from the center pixel
• Overlapped, non-overlapped
• …......
22
Output layer
• Training a deep neural network is a highly non convex optimization
problem that we usually solve using convex methods
• “Softmax” function: original motivation being treat the outputs of NN as
probabilities conditioned on the inputs, normalized to unity
𝑝 𝑦 = 𝑗 𝑧 𝑖
=
𝑒
𝑧 𝑗
(𝑖)
𝑗=0
𝑘
𝑒 𝑧 𝑘
(𝑖)
Anders Øland, Be Careful What You Back propagate: A Case For Linear Output Activations & Gradient Boosting, arxiv 1707.04199
• What output layer generates is actually not a probability distribution as we all
conjectured
• gradient boosting method,
exponentiating the errors
from the output layer, non-
normalized
23
Reduce over fitting
• Data augmentation…....
• “Drop-out” treatment, randomly drops neurons to prevent
overfitting, “re-scaling” needed when making inference
• Nitish Srivastava et. al., Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014) 1929-1958
24
Image classification
• AlexNet, architecture contributions:
• ReLu, ~6 times faster than saturating approaches
• Local response normalization: ~2% increase in the
precision
• Reduce overfitting:
• data augmentation
• drop-out: a neuron cannot rely on the presence of
particular other neurons thus forced to learn in a
more robust manner
• A. Krizhevsky, ImageNet Classification with
Deep Convolutional Neural Networks, NIPS
2012
output layer: 1000-
way softmax
embedding vector
25
Network in network
• GoogleNet, Inception Net
• Key idea: how to use dense → sparse to improve computational
efficiency
• Local sparsity of using Network-in-network
• “1x1 convolution”, dimensionality reduction in rank of latent feature
manifold (cross-channel pooling layer)
• Hebbian principle – neurons that fire together, wire together
• Christian Szegedy et. al. Going deeper with convolutions
https://arxiv.org/pdf/1409.4842.pdf
• https://arxiv.org/pdf/1512.00567.pdf
• arXiv:1312.4400v3
NIN
enhance representational power
𝑁𝑜 =
𝑁𝑖 − 𝐹 + 2 × 𝑝
𝑠
+ 1
𝐷𝑖 𝐷 𝑜
5x5
3x3
1x1average
pooling
1x1
1x1
9 × 9 > 5 × 5 + 3 × 3 + 1 × 1 = 35
26
Bench mark results
9/13/2017 Invited talk at Tsinghua University
27
Object detection
• Object detection task imposes
additional request to a classifier in
terms of the localization of a/multiple
objects
• Sliding window approach (DPM,
deformable part model) is
computationally too expensive
• Regional proposal method adds some
prior hypothesis on regions that are
promising. However may have multiple
steps pipelined together (RPN for
objectness score, detection network,
classification network)
re-
purposed
28
You Only Look Once (YOLO)
• Labeling images for detection is far more expensive than labelling for
classification or tagging
• Leverage the classification data expands the scope of current detection
systems (transfer learning)
• YOLO is NOT a repurposed classifier
29
YOLO v2 improvements
• Batch norm: un-necessitate
the need of regularization,
drop-out
• Anchor box concept, remove
fully connected layers
• Higher resolution for classifier
part to better adapt to
detections
anchor box increases recall
and does change mAP
30
YOLO v2 results
• Results from most recent YOLO
paper
31
Semantic segmentation
• Approaches such as dilated convolutions are utilized to take
into account the context module in the picture, multi-scale
receptive field
• Enet https://arxiv.org/pdf/1606.02147.pdf
• SegNet https://arxiv.org/pdf/1511.00561.pdf
• “Dense” prediction problem with per-
pixel level precision required
• Context model crucial in this
application
• Typical “encoder-decoder”
architecture: network gets deeper
feature while map narrows down
32
Segmentation performances
• Metrics such as
intersection over union
(IOU) are used to
measure the performance
of segmentation
• Image quality tends to
influence the results
significantly
33
Three-dimensional data
• 3D segmentation, no “voxelization” or cross-
sectional rendering needed even on
unstructured data
• Permutation invariance, learning
transformation matrix of point cloud
https://arxiv.org/pdf/1704.03847.pdf
https://arxiv.org/pdf/1612.00593.pdf
combining the
global and local
per-point
embedding
34
Audio and natural language processing (NLP)
• Audio signals can be represented as a localized format, either in the
temporal or frequency/spectral domain
• Z. Xing et. al. Big Data (Big Data), 2016 IEEE International Conference
• Z. Xing et. al. https://arxiv.org/pdf/1705.05229.pdf
• Text/words can also be embedded, so called
“word vectors”
35
music embedding
Generative adversarial network (GAN)
• While discriminative models manifested with a great success, generative models had less
impact due to difficulties with intractable probabilistic computations (MLE). “Two-player”
min-max approach sidesteps this problem.
• Ian J. Goodfellow, et. al. Generative Adversarial Nets, arXiv:1406.2661v1
• Phillip Isola, Image-to-Image Translation with Conditional Adversarial Networks, arXiv:1611.07004v1
generatordiscriminator
D
G
• here z is some random noise
MinMax
36
Reinforcement learning concept
• Environment representation learning framework naturally follows
human/animal learning processes (“agent”-“environment” nomenclature).
Agent’s actions depends on the state, and may or may not change the
future environment
• Deep neural network re-enables R.L. by learning complex data
representation, without any hand-crafted feature extraction
• Agent state to action mapping depicted by a policy function, which can be
stochastic as well
• Volodymyr Mnih et. al. Human-level control through deep reinforcement learning, nature14236, 2015
37
Formalisms
• Value-base approaches such as Deep Q-Network (DQN):
• Learn value function, implicit policy function (ε-greedy)
• “Experience replay” utilizes to remove correlation that causes
divergence problems of R.L.
• Solely in the context of MDP assumption
• Policy-based approaches such policy gradient method
• No value function, learn policy
• High variance issue
• MDP not necessarily assumed
• Actor-Critic
state actionrewards
policy
38
Applications
• Navigating through an intersection under complicated environments
• David Isele, Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning, arXiv:1705.01196
• Motion negotiations
between “agents” under
dynamically changing
environment
• Shai Shalev-Shwartz et. al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, arXiv:1610.03295v1
39
40
Summary and future research
• Unsupervised learning
• human learns the world more naturally by discovering,
rather than supervision
• Convolutional networks combining with recurrence to takes
into account the temporal correlations, thus make
predictions in a dynamic fashion
• Reinforcement learning to pre-guide the learning into the
“ROI” (region of interest)
data representation
learning
complex
reasoning
Backup slides
41
Thank You

More Related Content

What's hot

Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementIOSR Journals
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its applicationHưng Đặng
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningAmr Rashed
 
Neural networks
Neural networksNeural networks
Neural networksBasil John
 
Image Compression Using Neural Network
 Image Compression Using Neural Network Image Compression Using Neural Network
Image Compression Using Neural NetworkOmkar Lokhande
 
artificial neural network
artificial neural networkartificial neural network
artificial neural networkPallavi Yadav
 
Basics of Artificial Neural Network
Basics of Artificial Neural Network Basics of Artificial Neural Network
Basics of Artificial Neural Network Subham Preetam
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learningaliaKhan71
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computingVajira Thambawita
 
Artificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaArtificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaEr. Arpit Sharma
 
Neural networks...
Neural networks...Neural networks...
Neural networks...Molly Chugh
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkBurhan Muzafar
 

What's hot (20)

Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In Management
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 
Artificial neural networks and its application
Artificial neural networks and its applicationArtificial neural networks and its application
Artificial neural networks and its application
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Soft computing01
Soft computing01Soft computing01
Soft computing01
 
Artificial Neural Network Topology
Artificial Neural Network TopologyArtificial Neural Network Topology
Artificial Neural Network Topology
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Neural networks
Neural networksNeural networks
Neural networks
 
Neural network
Neural networkNeural network
Neural network
 
Neural
NeuralNeural
Neural
 
MaLAI_Hyderabad presentation
MaLAI_Hyderabad presentationMaLAI_Hyderabad presentation
MaLAI_Hyderabad presentation
 
Image Compression Using Neural Network
 Image Compression Using Neural Network Image Compression Using Neural Network
Image Compression Using Neural Network
 
artificial neural network
artificial neural networkartificial neural network
artificial neural network
 
Basics of Artificial Neural Network
Basics of Artificial Neural Network Basics of Artificial Neural Network
Basics of Artificial Neural Network
 
Advance deep learning
Advance deep learningAdvance deep learning
Advance deep learning
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
 
Artificial neural network by arpit_sharma
Artificial neural network by arpit_sharmaArtificial neural network by arpit_sharma
Artificial neural network by arpit_sharma
 
Neural networks...
Neural networks...Neural networks...
Neural networks...
 
Image recognition
Image recognitionImage recognition
Image recognition
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 

Similar to Tsinghua invited talk_zhou_xing_v2r0

MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learningPoo Kuan Hoong
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceTakrim Ul Islam Laskar
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsJon Lederman
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxSagarTekwani4
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningPoo Kuan Hoong
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017Iwan Sofana
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionDony Riyanto
 
Ch 1-1 introduction
Ch 1-1 introductionCh 1-1 introduction
Ch 1-1 introductionZahra Amini
 
Artificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptArtificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptNJUSTAiMo
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima PanditPurnima Pandit
 
Quantum neural network
Quantum neural networkQuantum neural network
Quantum neural networksurat murthy
 

Similar to Tsinghua invited talk_zhou_xing_v2r0 (20)

Computer Design Concepts for Machine Learning
Computer Design Concepts for Machine LearningComputer Design Concepts for Machine Learning
Computer Design Concepts for Machine Learning
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
DSRLab seminar Introduction to deep learning
DSRLab seminar   Introduction to deep learningDSRLab seminar   Introduction to deep learning
DSRLab seminar Introduction to deep learning
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
Neural Networks and Deep Learning Basics
Neural Networks and Deep Learning BasicsNeural Networks and Deep Learning Basics
Neural Networks and Deep Learning Basics
 
Deep learning
Deep learningDeep learning
Deep learning
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptxrecurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
 
PhD Defense
PhD DefensePhD Defense
PhD Defense
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Big Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep LearningBig Data Malaysia - A Primer on Deep Learning
Big Data Malaysia - A Primer on Deep Learning
 
Training machine learning deep learning 2017
Training machine learning deep learning 2017Training machine learning deep learning 2017
Training machine learning deep learning 2017
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 
Deep learning
Deep learningDeep learning
Deep learning
 
Ch 1-1 introduction
Ch 1-1 introductionCh 1-1 introduction
Ch 1-1 introduction
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Artificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.pptArtificial Neural Network Learning Algorithm.ppt
Artificial Neural Network Learning Algorithm.ppt
 
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic)  : Dr. Purnima PanditSoft computing (ANN and Fuzzy Logic)  : Dr. Purnima Pandit
Soft computing (ANN and Fuzzy Logic) : Dr. Purnima Pandit
 
Quantum neural network
Quantum neural networkQuantum neural network
Quantum neural network
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 

Tsinghua invited talk_zhou_xing_v2r0

  • 1. Strictly Confidential 15/11/2016 Applications of Deep Neural Network Dr. Z. Xing Lead of Deep Learning Taskforce, Data Science & Analytics @ NIO USA, Inc 3200 N 1st St, San Jose, CA 95134 September 13th, 2017 @ Tsinghua University, Beijing, China
  • 2. 2 • The volume and heterogeneity of the data that we are dealing with nowadays has reached a level of unprecedented complexities and subtleties Data “science” • At the end of the day, data “science” all comes down to understanding an intricate representation of the data, so called representation learning A typical example of industrial scale data • ~ 40 M clicks per second • ~ 2.5 M servers • ~ 5.7 terawatt hours annually ≈ 68 M AC units • ~10-15 exabytes (1018) ≈ 30 M personal laptops • …....... = ⨂
  • 3. 3 • Computer/machine can help on representation learning task, but only to a certain extent…... • “Conventional” machine learning approaches limited in various ways, manifested as a difficult learning task when raw format of data is fed to the system, just as how human learning system processes data. • Feature extraction is a man-crafted process that needs careful engineering, domain expertise, etc. Difficult to generalize, scale up with the increasing data size/model “Conventional” machine learning • man-crafted features • manually create representation of the data classification illustration purpose only
  • 4. 4 • Artificial intelligence (AI) “The science and engineering of making intelligent machines” John McCarthy 1955 Artificial intelligence
  • 5. 5 • Infancy concept of neurons/circuits originates from Neuro-science, biophysics, computational physics Concept of neural network • The electrical signals (voltage spikes) that brain processes does NOT represent the external world at all, how neurons decode such signals is complicated process, in two folds: (a) time-dependencies (transient neuron functions), (b) electrical functionalities of each cell, activations • 1011 neurons in human brain, 1015 connections (connectionist….) soma/body synapses
  • 6. 6 Concept of “deep” learning • Representation learning is the key advantage which allows raw format of data processing, avert the need of man-crafted features • Multiple levels of representations of data, multiple levels of abstraction. Accommodate flexible rank of latent space, which locally resembles Euclidean space • Each level is often a non-linear module, aggregating multiple levels allows system to learn complicated physics • Higher/deeper levels amplify the components of input that are relevant/crucial to optimization goals while suppressing the less relevant part optimization goal
  • 7. 7 “Deep” against “shallow” • We want our system to be selective on things that are relevant or important, while being invariant to things that are not important, for example orientations of the object, background color, so on and so forth • “Shallow” or even linear classifier can only carve input space into an over- simplified regions/hyper-planes Wolf Samoyed
  • 8. 8 • Unsupervised learning, transfer learning (domain adaption) • Auto-encoder • Variational Auto-encoder (VAE) • Restricted Boltzmann machine Different learning mechanisms • Supervised learning • Objective function measures an error (𝛿) between system output and desired target; internal weights keep getting tuned to minimize this 𝛿, guided by gradients • However, optimization happens at the level of expected value over many training instances • Also optimization goal is to match between two patterns, not taking into account an overall strategic goal (winning a chess game etc.) • Stochastic gradient descent, Stochastic Gradient Descent Tricks, (SGD, Bottou, 2007, ref. 18) • Diederik P. Kingma, Auto-Encoding Variational Bayes, arxiv 1312.6114 encoder decoder
  • 9. 9 Selectivity–invariance dilemma • Symmetries in the data: many tasks are invariant to transformations of the data, for example the recognition task is invariant to changing in pose, light, location…. (symmetries) • Human brain can learn to recongnize objects after seen only a few examples (unsupervised), while most machine learning systems need huge amount of labelled data (supervised) • Factoring out the symmetries from the data, while retaining selectivity, is the key to build artificial intelligence that can compete with human intelligence Classical learning theory focuses on supervised learning and postulates that a suitable hypothesis space is given. In other words, data representation and how to select and learn it, is classically not considered to be part of the learning problem, but rather as a prior information. visual cortex …... • Fabio Anselmi, On Invariance and Selectivity in Representation Learning, arxiv 1503.05938 • Attempts of utilizing group theory, group average have been made, on the theory side, to derive invariant representation learning
  • 10. 10 Local minima for large networks • Numerical analysis in statistical physics, random matrix theory, neural network theory shows that local minima rarely an issue for large networks • Key difference being the dimensionality of the space; proliferation of saddle points, rather than local minima becomes more relevant in solving high dimensional problem • Yann N. Dauphin, Yoshua Bengio et. al (2014), Identifying and attacking the saddle point problem in high-dimensional non-convex optimization, arxiv 1406.2572 • Anna Choromanska et. al., The Loss Surfaces of Multilayer Networks, arxiv 412 0233 • Similar objective function at various saddle points • Statistics of Critical Points of Gaussian Fields on Large-Dimensional Spaces, Bray and Dean (2007), Phys. Rev. Lett. 98, 150201 • Replica Symmetry Breaking Condition Exposed by Random Matrix Calculation of Landscape Complexity, Fyodorov Williams (2007), etc.
  • 11. Fully connected layer • Makes no assumption at all on the data features • Does not persist any invariance of the input feature map • Expensive in terms of computation and memory consumption • Multi-layer perceptron (MLP) 11 nonlinearities nonlinearities
  • 12. 12 Backward propagation (BP) • BP guides the computer to update its internal parameters by using Chain rule of derivatives • The central problem that BP solves is to evaluate the influences of a parameter on a function whose computation involves multiple elementary steps (Lagrangian formalism) Lagrange function Objective function Constraints (network dynamics) Lagrange multiplier takes into account the backward dynamics Z. Xing, Measurement of the semileptonic CP violating asymmetry a_sl in B_s decays and the D_s - D_s production asymmetry in 7 TeV pp collisions}", CERN-THESIS-2013-078", https://inspirehep.net/record/1296591?ln=en • Y Le Cun, A theoretical framework of backward propagation, Proceedings of the 1988 Connectionist Model Summer School, p21-28, 1988
  • 13. Convolutional layer • Convolutional neural network • 3-dimensional neurons • local connectivity at each filter/kernel (local features of data) • weights-sharing between all neurons in the same layer, usually named as an unit of kernel/filter to be convoluted with input volume (invariance of data) • output 3D neurons • depth of filter bank: Do • input 3D neurons 𝐷 𝑜 𝐷𝑖 = 𝐷 𝑜× 𝐷𝑖 × 𝐹 × 𝐹 # of learnable weights 𝑁𝑜 = 𝑁𝑖 − 𝐹 + 2 × 𝑝 𝑠 + 1 s: stride • the kernels/filters essential can pick up latent features such as brightness of image, contrast, RGB color, edges, etc. 13
  • 14. Connection to neuro-science • One route of developing your deep neural net architecture is inspirations from neuro-science, such as human visual cortex • Cross channel information learning (cascaded 1x1 convolution) is biologically inspired because human visual cortex have receptive fields (kernels) tuned to different orientation - local groups of values are often highly correlated - invariance to location, weights sharing 14 Charles F. Cadieu et. al., Deep Neural Networks Rival the Representation of Primate IT Cortex for Core Visual Object Recognition, PLOS Computational Biology, December 2014, Volume 10, Issue 12
  • 15. Translational invariance • Convolutional layer relies on translational invariance (convolution commutes with translation) • local input regions • only relative locations are taken into account stride kernel size fks determines layer types: • convolutional • max pooling • activation function translation operator 15
  • 16. Recurrent architecture • Recurrent network structures can be used to learn potential temporal correlations/structures in the data • Once “unrolled” or “unfolded”, all layers share the same weights, can be viewed as feedforward networks, thus can be optimized using BP (BPTT, through time) • However, there is exploding or vanishing gradient problem along the temporal axis • Different formalisms and implementations of recurrent activations are proposed (LSTM, fixed unit recurrent weights, GRU, etc.) to alleviate the issue as well as gradient clipping approach xt=0 ht=0 yt=0 xt=1 ht=1 yt=1 xt=2 ht=2 yt=2 w>1 or w<1 all these recurrent edges share the same synaptic weights 16
  • 17. LSTM – long short-term memory • Special treatment: memory cells • Novel inclusion of multiplicative nodes, all edges into or out of these nodes have fixed unit weight, people used call this fixed unit weight as “constant error carousel” • A. Grave, Generating Sequences With Recurrent Neural Networks, arXiv 1308.0850 • A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Studies in Computational Intelligence. Springer, 2012 recurrence components fixed unit weight alleviates vanishing gradient problems errors memory flushing 17
  • 18. Sequence generation • Recurrent network can be used for sequence generation • a man riding a wave on top of a surfboard . (p=0.040413) • a person riding a surf board on a wave (p=0.017452) • a man riding a wave on a surfboard in the ocean (p=0.005743) trainin g testing/inference 18
  • 19. Gated Recurrent Unit (GRU) • GRU also utilized gating unit to regulate the temporal flow, but with a simple linear interpolation, instead of memory cell • Kyunghyun Cho, On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, arXiv:1409.1259 LSTM GRU reset update previousstates previousstates • Both LSTM and GRU utilizes an additive component when updating the states which keeps partial influences from previous timestamp 19
  • 20. Activation of hidden layers • A neural network without any activation would simply be a linear regression model. Activation function accommodates sophisticated nonlinearities for data such as images, videos, speeches, etc. o Sigmoid function: saturation causes vanishing gradients, slow convergence, not zero-centered o Tanh: vanishing gradient problem o ReLu: avoids and rectifies vanishing gradient, no need for input normalization, could result in “dead” neurons o “Leaky” ReLu, pReLu (parameterized ReLu) o Human neuron activations can actually be a stochastic process 20
  • 21. Normalization • Local response normalization. Normalize across neighboring kernels, lateral inhibition, competition for big activities across neurons computed by different kernel/filter • Batch normalization, reducing internal covariate shift (ICS) • “Whitening” input feature map accelerates the training speed and convergence • But simple normalization procedure may violate the identify transform depending the non-linear activation form 21
  • 22. Pooling • Summarize across neighboring groups of neurons in the same kernel map to reduce computations, feature map size • Less over-fitting • Aggregates localized spatial information Alternatives: • Maximum • Sum • Average • Weighted average with distance from the center pixel • Overlapped, non-overlapped • …...... 22
  • 23. Output layer • Training a deep neural network is a highly non convex optimization problem that we usually solve using convex methods • “Softmax” function: original motivation being treat the outputs of NN as probabilities conditioned on the inputs, normalized to unity 𝑝 𝑦 = 𝑗 𝑧 𝑖 = 𝑒 𝑧 𝑗 (𝑖) 𝑗=0 𝑘 𝑒 𝑧 𝑘 (𝑖) Anders Øland, Be Careful What You Back propagate: A Case For Linear Output Activations & Gradient Boosting, arxiv 1707.04199 • What output layer generates is actually not a probability distribution as we all conjectured • gradient boosting method, exponentiating the errors from the output layer, non- normalized 23
  • 24. Reduce over fitting • Data augmentation….... • “Drop-out” treatment, randomly drops neurons to prevent overfitting, “re-scaling” needed when making inference • Nitish Srivastava et. al., Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014) 1929-1958 24
  • 25. Image classification • AlexNet, architecture contributions: • ReLu, ~6 times faster than saturating approaches • Local response normalization: ~2% increase in the precision • Reduce overfitting: • data augmentation • drop-out: a neuron cannot rely on the presence of particular other neurons thus forced to learn in a more robust manner • A. Krizhevsky, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012 output layer: 1000- way softmax embedding vector 25
  • 26. Network in network • GoogleNet, Inception Net • Key idea: how to use dense → sparse to improve computational efficiency • Local sparsity of using Network-in-network • “1x1 convolution”, dimensionality reduction in rank of latent feature manifold (cross-channel pooling layer) • Hebbian principle – neurons that fire together, wire together • Christian Szegedy et. al. Going deeper with convolutions https://arxiv.org/pdf/1409.4842.pdf • https://arxiv.org/pdf/1512.00567.pdf • arXiv:1312.4400v3 NIN enhance representational power 𝑁𝑜 = 𝑁𝑖 − 𝐹 + 2 × 𝑝 𝑠 + 1 𝐷𝑖 𝐷 𝑜 5x5 3x3 1x1average pooling 1x1 1x1 9 × 9 > 5 × 5 + 3 × 3 + 1 × 1 = 35 26
  • 27. Bench mark results 9/13/2017 Invited talk at Tsinghua University 27
  • 28. Object detection • Object detection task imposes additional request to a classifier in terms of the localization of a/multiple objects • Sliding window approach (DPM, deformable part model) is computationally too expensive • Regional proposal method adds some prior hypothesis on regions that are promising. However may have multiple steps pipelined together (RPN for objectness score, detection network, classification network) re- purposed 28
  • 29. You Only Look Once (YOLO) • Labeling images for detection is far more expensive than labelling for classification or tagging • Leverage the classification data expands the scope of current detection systems (transfer learning) • YOLO is NOT a repurposed classifier 29
  • 30. YOLO v2 improvements • Batch norm: un-necessitate the need of regularization, drop-out • Anchor box concept, remove fully connected layers • Higher resolution for classifier part to better adapt to detections anchor box increases recall and does change mAP 30
  • 31. YOLO v2 results • Results from most recent YOLO paper 31
  • 32. Semantic segmentation • Approaches such as dilated convolutions are utilized to take into account the context module in the picture, multi-scale receptive field • Enet https://arxiv.org/pdf/1606.02147.pdf • SegNet https://arxiv.org/pdf/1511.00561.pdf • “Dense” prediction problem with per- pixel level precision required • Context model crucial in this application • Typical “encoder-decoder” architecture: network gets deeper feature while map narrows down 32
  • 33. Segmentation performances • Metrics such as intersection over union (IOU) are used to measure the performance of segmentation • Image quality tends to influence the results significantly 33
  • 34. Three-dimensional data • 3D segmentation, no “voxelization” or cross- sectional rendering needed even on unstructured data • Permutation invariance, learning transformation matrix of point cloud https://arxiv.org/pdf/1704.03847.pdf https://arxiv.org/pdf/1612.00593.pdf combining the global and local per-point embedding 34
  • 35. Audio and natural language processing (NLP) • Audio signals can be represented as a localized format, either in the temporal or frequency/spectral domain • Z. Xing et. al. Big Data (Big Data), 2016 IEEE International Conference • Z. Xing et. al. https://arxiv.org/pdf/1705.05229.pdf • Text/words can also be embedded, so called “word vectors” 35 music embedding
  • 36. Generative adversarial network (GAN) • While discriminative models manifested with a great success, generative models had less impact due to difficulties with intractable probabilistic computations (MLE). “Two-player” min-max approach sidesteps this problem. • Ian J. Goodfellow, et. al. Generative Adversarial Nets, arXiv:1406.2661v1 • Phillip Isola, Image-to-Image Translation with Conditional Adversarial Networks, arXiv:1611.07004v1 generatordiscriminator D G • here z is some random noise MinMax 36
  • 37. Reinforcement learning concept • Environment representation learning framework naturally follows human/animal learning processes (“agent”-“environment” nomenclature). Agent’s actions depends on the state, and may or may not change the future environment • Deep neural network re-enables R.L. by learning complex data representation, without any hand-crafted feature extraction • Agent state to action mapping depicted by a policy function, which can be stochastic as well • Volodymyr Mnih et. al. Human-level control through deep reinforcement learning, nature14236, 2015 37
  • 38. Formalisms • Value-base approaches such as Deep Q-Network (DQN): • Learn value function, implicit policy function (ε-greedy) • “Experience replay” utilizes to remove correlation that causes divergence problems of R.L. • Solely in the context of MDP assumption • Policy-based approaches such policy gradient method • No value function, learn policy • High variance issue • MDP not necessarily assumed • Actor-Critic state actionrewards policy 38
  • 39. Applications • Navigating through an intersection under complicated environments • David Isele, Navigating Intersections with Autonomous Vehicles using Deep Reinforcement Learning, arXiv:1705.01196 • Motion negotiations between “agents” under dynamically changing environment • Shai Shalev-Shwartz et. al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving, arXiv:1610.03295v1 39
  • 40. 40 Summary and future research • Unsupervised learning • human learns the world more naturally by discovering, rather than supervision • Convolutional networks combining with recurrence to takes into account the temporal correlations, thus make predictions in a dynamic fashion • Reinforcement learning to pre-guide the learning into the “ROI” (region of interest) data representation learning complex reasoning