4 3 2 1 0 1 2 3 4
x
1
0
1
2
3
4
y
y = 1/(1 + exp(x))
y = tanh(x)
y = max(0, x)
•
ReLU
•
activation function
Maxout1
, LReLU2
,
PReLU3
, ELU4
, SELU5
, etc.
10
"core idea"
The core idea in deep learning is that we
assume that the data was generated by
the composition of factors or features,
potentially at multiple levels in a hierarchy.
— Ian Goodfellow, Yoshua Bengio, Aaron
Courville, "Deep Learning"
11
•
•
… 8
• piecewise linear function
9
9
Razvan Pascanu, Guido Montufar, and Yoshua Bengio, "On the number of response regions of deep feed forward
networks with piece-wise linear activations", NIPS (2014)
8
Merrick Furst, James B. Saxe, and Michael Sipser, "Parity, Circuits, and the Polynomial-Time Hierarchy",
Mathematical systems theory (1984)
22
SGD without momentum
SGD with momentum
*
Stochastic gradient descent (SGD)
Momentum SGD15
15
A. Krizhevsky, I. Sutskever, G. E. Hinton, "ImageNet Classification with
Deep Convolutional Neural Networks", NIPS (2012)
29
Greedy layer-wise pre-training10
•
Yoshua Bengio ICML 2009
10
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, "Greedy layer-wise
training of deep networks", NIPS (2006)
34
ReLU rectified linear unit 11
• Sigmoid
0
11
Xavier Glorot, Antoine Bordes, and Yoshua Bengio, "Deep Sparse
Rectifier Neural Networks", NIPS Workshop (2010)
35
Dropout12
•
[13] Figure 1 13
13
N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R.
Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from
Overfitting", Journal of Machine Learning Research (2014)
12
G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R.
Salakhutdinov, "Improving neural networks by preventing co-adaptation
of feature detectors", On arxiv (2012)
36
(3)
• Function backward()
• Function outputs
outputs→creator→outputs
• Function
Function
x, W, b = Variable(init_x), Variable(init_W), Variable(init_b)
y = LinearFunction()(x, W, b) # forward
# ...backward, update ...
y = LinearFunction()(x, W, b) # forward
51
•
• convolutional layer
• pooling
[A. Krizhevsky, 2016]15
**
15
A. Krizhevsky, I. Sutskever, G. E. Hinton, "ImageNet Classification with
Deep Convolutional Neural Networks", NIPS (2012)
73
AlexNet15
• 2012 ILSVRC
• 224x244 5
3
• LRN (local response normalization) ReLU max
pooling
• AlexNet ImageNet
pre-trained model
• AlexNet pre-trained model
transfer learning
15
A. Krizhevsky, I. Sutskever, G. E. Hinton, "ImageNet Classification with
Deep Convolutional Neural Networks", NIPS (2012)
79
Fully Convolutional Network26
• Classification
pre-training
1x1
• Deconvolution
• semantic low
level skip connection
26
Jonathan Long and Evan Shelhamer et al., "Fully Convolutional
Networks for Semantic Segmentation", appeared in arxiv on Nov. 14,
2014
102
Global Average Pooling (GAP)59
•
• ResNet receptive field
• GAP
•
59
"Parsenet: Looking wider to see better", ICLR 2016
104
SegNet27
•
• Max pooling
•
0
27
Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla "SegNet: A
Deep Convolutional Encoder-Decoder Architecture for Image
Segmentation." PAMI, (2017)
105
Pose Affinity Field33
•
CNN Convolutional Pose Machine
Part Affinity Field
CNN
• OpenPose
https://github.com/CMU-Perceptual-Computing-
Lab/openpose
• Geforce GTX 1080 9fps
33
Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh,
"Realtime Multi-Person Pose Estimation using Part Affinity Fields", CVPR
(2017)
109
[37] D. Xu, Y. Zhu, C. B. Choy, L. Fei-Fei, “Scene Graph
Generation by Iterative Message Passing”, CVPR (2017)
•
Faster R-CNN Region
Proposal network (RPN)
• RPN
…
•
118
ChainerCV
mean Intersection over Union (mIoU) mean
Average Precision (mAP)
Chainer Trainer Extension
# mAP Trainer Extension
evaluator = chainercv.extension.DetectionVOCEvaluator(iterator, model)
#
# e.g., result['main/map']
result = evaluator()
123
DCGAN37
• GAN
• Generator 1
Deconvolution
• Discriminator Generator
•
•
37
Alec Radford, Luke Metz, Soumith Chintala, "Unsupervised
Representation Learning with Deep Convolutional Generative Adversarial
Networks", ICLR (2016)
130
DCGAN37
• GAN DCGAN
• D
stride=2
• D Global Average Pooling
• D Leaky ReLU
• G D Batch Normalization G
D
37
Alec Radford, Luke Metz, Soumith Chintala, "Unsupervised
Representation Learning with Deep Convolutional Generative Adversarial
Networks", ICLR (2016)
131
Improved Techniques for Training GANs38
• GAN
• Feature matching: D fake real
• Minibatch discrimination: G D
1 mode cllapse
D
concat
•
38
T. Salimans, I. Goodfellow, et. al., "Improved Techniques for Training
GANs", NIPS (2016)
132
Improved Techniques for Training GANs38
• Generator
Semi-supervised
learning
• ImageNet DCGAN
• Inception score GAN
pre-trained model
38
T. Salimans, I. Goodfellow, et. al., "Improved Techniques for Training
GANs", NIPS (2016)
133
Wasserstein GAN (WGAN)39
(1)
• GAN Generator
Wasserstein Earth Mover's Distance
WGAN
• Generator 2
Wasserstein
39
Martin Arjovsky, Soumith Chintala, Léon Bottou, "Wasserstein GAN", arXiv:1701.07875 (2017)
134
Wasserstein GAN (WGAN)39
(2)
• WGAN Discriminator Wasserstein
• Discriminator(D)
D Wasserstein
•
39
Martin Arjovsky, Soumith Chintala, Léon Bottou, "Wasserstein GAN", arXiv:1701.07875 (2017)
135
WGAN with Gradient Penalty (WGAN-GP) 40
• WGAN Discriminator
Gradient Penalty
• Chainer v3
40
Gulrajani, Ishaan, et al. "Improved training of wasserstein gans." arXiv preprint arXiv:1704.00028 (2017).
138
Temporal Generative
Adversarial Nets (TGAN)41
• WGAN
•
Video Generator
Image Generator
41
Masaki Saito, Eiichi Matsumoto, Shunta Saito, "Temporal Generative
Adversarial Nets with Singular Value Clipping", ICCV (2017)
139
Temporal Generative
Adversarial Nets (TGAN)41
•
GAN WGAN
1
singular
value clipping
• Inception score
41
Masaki Saito, Eiichi Matsumoto, Shunta Saito, "Temporal Generative
Adversarial Nets with Singular Value Clipping", ICCV (2017)
140
SimGAN42
•
CG Refiner
• Refiner Discriminator
adversarial loss
self-regularization
• Apple, inc. CVPR 2017
"Improving the Realism of Synthetic Images"
42
A. Shrivastava, et. al. "Learning from Simulated and Unsupervised
Images through Adversarial Training", CVPR (2017)
141
1. Penn Treebank ptb
90 1
Chainer example ptb example
https://github.com/chainer/chainer/
tree/master/examples/ptb
2. One Biliion Word
8 80
3. Hutter
90MB/5MB/5MB
train/val/test
153
Attention is all you need65
RNN/CNN Attention Transformer
SOTA
Transformer: A Novel Neural Network
Architecture for Language Understanding
65
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion
Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, "Attention Is All
You Need", (2017)
159
DDPG46
• Deep Deterministic Policy Gradient
(DDPG) Actor-Critic
• Deep Q-Network
End-to-End
•
Deep Reinforcement Learning
(DDPG) demonstration
46
Lillicrap, Timothy P., et al. "Continuous control with deep reinforcement
learning." arXiv preprint arXiv:1509.02971 (2015)
176