Advertisement

Deep LearningフレームワークChainerと最近の技術動向

Researcher at Preferred Networks, Inc.
Sep. 19, 2017
Advertisement

More Related Content

Slideshows for you(20)

Similar to Deep LearningフレームワークChainerと最近の技術動向(20)

Advertisement
Advertisement

Deep LearningフレームワークChainerと最近の技術動向

  1. Deep Learning Chainer 34 2017 Researcher @ Preferred Networks 1
  2. Agenda • • • • • • • Chainer • numpy Chainer • • • GAN • • 2
  3. • • • 2010 • 2012 • 2013 - 2014 UC Berkeley (Visiting Student Researcher) • 2015 • 2016 @ "Semantic Segmentation for Aerial Imagery with Convolutional Neural Network" • 2016.9 Facebook, Inc. • 2016.9 Preferred Networks, Inc. 3
  4. • Preferred Networks • PFN • 106 8 • • FANUC Toyota NTT • We are hiring! 4
  5. PFN AutomotiveHumanoid Robot Consumer Industrial Cloud Device Photo Game Text Speech Infrastructure Factory Robot Automotive Healthcare Smart City Industry4.0 Industrial IoT 5
  6. • • Chainer Define-by-Run • Chainer • GAN • ChaineCV ChainerRL 6
  7. 7
  8. 8
  9. • • fully-connected layer 9
  10. 4 3 2 1 0 1 2 3 4 x 1 0 1 2 3 4 y y = 1/(1 + exp(x)) y = tanh(x) y = max(0, x) • ReLU • activation function Maxout1 , LReLU2 , PReLU3 , ELU4 , SELU5 , etc. 10
  11. "core idea" The core idea in deep learning is that we assume that the data was generated by the composition of factors or features, potentially at multiple levels in a hierarchy. — Ian Goodfellow, Yoshua Bengio, Aaron Courville, "Deep Learning" 11
  12. . representation 12
  13. h(1) = a(1) (W(1) x + b(1) ) h(l) = a(l) (W(l) h(l 1) + b(l 1) ) o = a(o) (W(o) h(o) + b(o) ) • • • hidden layer 1 1 1 Kurt Hornik, Maxwell Stinchcombe, Halbert White, "Multilayer feedforward networks are universal approximators", Neural Networks (1989) 13
  14. A visual proof that neural nets can compute any function” http://neuralnetworksanddeeplearning.com/chap4.html (1) • • 1 14
  15. A visual proof that neural nets can compute any function” http://neuralnetworksanddeeplearning.com/chap4.html (2) • 2 • 15
  16. A visual proof that neural nets can compute any function” http://neuralnetworksanddeeplearning.com/chap4.html (3) • • • 16
  17. A visual proof that neural nets can compute any function” http://neuralnetworksanddeeplearning.com/chap4.html (4) • • 17
  18. A visual proof that neural nets can compute any function” http://neuralnetworksanddeeplearning.com/chap4.html (5) • • • 4 18
  19. (6) • 6 • 2 6 “A visual proof that neural nets can compute any function” http://neuralnetworksanddeeplearning.com/chap4.html 19
  20. (7) • • 7 2 7 “A visual proof that neural nets can compute any function” http://neuralnetworksanddeeplearning.com/chap4.html 20
  21. so many nodes Replacable? • 1 • 21
  22. • • … 8 • piecewise linear function 9 9 Razvan Pascanu, Guido Montufar, and Yoshua Bengio, "On the number of response regions of deep feed forward networks with piece-wise linear activations", NIPS (2014) 8 Merrick Furst, James B. Saxe, and Michael Sipser, "Parity, Circuits, and the Polynomial-Time Hierarchy", Mathematical systems theory (1984) 22
  23. • training data • • 23
  24. • • loss function • 24
  25. • • softmax • one-hot • cross entropy 25
  26. • gradient method gradient descent • • learning rate • 26
  27. backpropagation • • • 27
  28. • minibatch • An overview of gradient descent optimization algorithms 28
  29. SGD without momentum SGD with momentum * Stochastic gradient descent (SGD) Momentum SGD15 15 A. Krizhevsky, I. Sutskever, G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", NIPS (2012) 29
  30. Nesterov accelerated gradient (NAG)16 Momentum SGD RNN AdaGrad17 AdaDelta18 AdaGrad RMSProp19 , Adam20 , Eve21 … 30
  31. : sigmoid cross entropy • • 31
  32. : sigmoid cross entropy • exploding gradients …↓ • vanishing gradient 32
  33. 33
  34. Greedy layer-wise pre-training10 • Yoshua Bengio ICML 2009 10 Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, "Greedy layer-wise training of deep networks", NIPS (2006) 34
  35. ReLU rectified linear unit 11 • Sigmoid 0 11 Xavier Glorot, Antoine Bordes, and Yoshua Bengio, "Deep Sparse Rectifier Neural Networks", NIPS Workshop (2010) 35
  36. Dropout12 • [13] Figure 1 13 13 N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting", Journal of Machine Learning Research (2014) 12 G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors", On arxiv (2012) 36
  37. [14] Residual learning 14 • • 14 K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition", CVPR (2016) 37
  38. 38
  39. • 1. forward Dropout 2. backward 3. autograd 2. 4. optimizer 3. 39
  40. Chainer Preferred Networks Python https://github.com/chainer/chainer 40
  41. Popularity Growth of Chainer 41
  42. Define-and-Run Define-by-Run Define-and-Run Define-and-Run Define-by-Run Define-and-Run Chainer • • Define-and-Run • Define-by-Run • Define-and-Run 42
  43. Define-and-Run Define-by-Run 43
  44. Define-by-Run • • Python • Caffe prototxt • recurrent neural network; RNN for BPTT backpropagation through time 44
  45. numpy Chainer 45
  46. Variable Function Function Function Variable Variable VariableVariable Variable VariableVariable Variable (1) Variable Function • 2 Variable Function • Variable Function directed asyclic graph; DAG • Variable Function rank 46
  47. rank=0 rank=1 rank=2 Variable Function Function Function Variable Variable VariableVariable Variable VariableVariable Variable creator creator inputs inputs inputs (2) • backward • Function Variable inputs • Function Variable Function creator • Function Variable rank Function Variable Function 1 rank 47
  48. Chainer Chainer 48
  49. grad_outputs .backward(inputs, grad_outputs) .backward(inputs, grad_outputs) grad Function Function .backward() grad Variable .inputs Variable Variable grad_outputs .creator.creator grad_outputs grad grad grad VariableVariable Variable .backward(inputs, grad_outputs) grad_outputs grad grad grad VariableVariable Variable Function .inputs creator .inputs =1 creator (1) • Define-by-Run backward Variable • Variable backward() Function backward() • inputs • grad_outputs 49
  50. (2) • Variable Chainer • • grad 50
  51. (3) • Function backward() • Function outputs outputs→creator→outputs • Function Function x, W, b = Variable(init_x), Variable(init_W), Variable(init_b) y = LinearFunction()(x, W, b) # forward # ...backward, update ... y = LinearFunction()(x, W, b) # forward 51
  52. Link • W, b Function • Link • params() 52
  53. Chain • • • Link params() Link 53
  54. Optimizer • Optimizer setup() Chain Link update() • Optimizer • Chainer state 54
  55. • Linear ReLUFunction • ReLU ReLUFunction • ReLU ( 55
  56. • 2 mean squared error; MSE • MSE ReLU 56
  57. • : MNIST • 100 1 forward • SGD Optimizer setup() 57
  58. • MNIST scikit-learn from sklearn.datasets import fetch_mldata mnist = fetch_mldata('MNIST original', data_home='./') x, t = mnist['data'] / 255, mnist['target'] t = numpy.array([t == i for i in range(10)]).T train_x, train_t = x[:60000], t[:60000] val_x, val_t = x[60000:], t[60000:] • 1 • 150 58
  59. • 94% • http://bit.ly/mini_chainer_mnist • numpy Define-by-Run 59
  60. Chainer 60
  61. Trainer (1) Optimizer • Trainer • Chainer Chain • chainer.optimizer SGD, MomentumSGD, NesterovAG, RMSprop, RMSpropGraves, AdaGrad, AdaDelta, Adam, etc... 61
  62. Trainer (2) • Chainer • MNIST • Validation len(val) 62
  63. Trainer (3) • • Chainer chainer.functions chainer.links • softmax cross entropy 63
  64. Trainer • Trainer Optimizer • • Trainer Extension import import 64
  65. Trainer (1) • Extension • snapshot • LogReport, PrintReport • validation Evaluator • PlotReport • Graphviz dot dump_graph • ParameterStatistics • • Trainer extensions (https://docs.chainer.org/en/stable/reference/extensions.html) 65
  66. Trainer (2) • • • Trainer • GPU ParallelUpdater, MultiprocessParallelUpdater MultiprocessIterator 66
  67. GPU • • Chainer CuPy GPGPU • CuPy NCCL NCCL2 GPU • cuDNN NVIDIA v7 CUDA v9 • fp16 67
  68. CuPy • CuPy NumPy NVIDIA CUDA GPU • NumPy API NumPy GPU • GPU • KMeans, Gaussian Mixture Model Example CuPy: https://github.com/cupy/cupy 68
  69. 69
  70. • • • Semantic Segmentation • Instanse-aware Segmentation 70
  71. horse : 94% dog : 3% pig : 2% cat : 1% . . . 71
  72. R G B 72
  73. • • convolutional layer • pooling [A. Krizhevsky, 2016]15 ** 15 A. Krizhevsky, I. Sutskever, G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", NIPS (2012) 73
  74. • convolutional neural network; CNN • • 1 74
  75. • • • stride • padding CS231n Convolutional Neural Networks for Visual Recognition 75
  76. • • receptive field • CS231n Convolutional Neural Networks for Visual Recognition 76
  77. 77
  78. 2012 Toronto Geoffry Hinton 2 10% 28.191 25.77 15.315 11.743 6.656 5.1 3.567 2.991 2.251 ILSVRC Object Classification (Top-5 Error)ImageNet Large Scale Visual Recognition Challenge (ILSVRC) • 1000 1 128 1000 • 2010 • 2011 localization • 2012 Fine-grained • 2013 bounding box • 2015 • 2016 78
  79. AlexNet15 • 2012 ILSVRC • 224x244 5 3 • LRN (local response normalization) ReLU max pooling • AlexNet ImageNet pre-trained model • AlexNet pre-trained model transfer learning 15 A. Krizhevsky, I. Sutskever, G. E. Hinton, "ImageNet Classification with Deep Convolutional Neural Networks", NIPS (2012) 79
  80. GoogleNet16 • 2014 ILSVRC Inception ResNet • Inception 22 • 1x1, 3x3, 5x5 concat • 1x1 16 C. Szegedy, et. al., "Going Deeper with Convolutions", CVPR (2015) 80
  81. VGG17 • 2014 GoogLeNet ILSVRC 2 • 3x3 1 receptive field 3x3 2 5x5, 3 7x7 receptive field • ResNet pre-trained 17 K. Simonyan, A. Zisserman "Very Deep Convolutional Networks for Large-Scale Image Recognition" arXiv technical report, (2014) 81
  82. ResNet18 • ILSVRC 2015 GoogLeNet 22 152 CIFAR 1502 • • 18 Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun , "Deep Residual Learning for Image Recognition." arXiv:1512.03385 (2015) 82
  83. Wide-ResNet19 • ResNet " " ResNet • Residual block Dropout ratio 30~40% 19 Sergey Zagoruyko, Nikos Komodakis, "Wide Residual Networks", arXiv: 1605.07146 (2016) 83
  84. DenseNet20 • CVPR 2017 Residual connection 2 ResBlock • • 20 Gao Huang, Zhuang Liu, Kilian Q. Weinberger, Laurens van der Maaten, "Densely Connected Convolutional Networks", CVPR (2017) 84
  85. ResNeXt21 • 2016 ILSVRC 2 • • cardinality • cardinality 21 Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He, "Aggregated Residual Transformations for Deep Neural Networks", CVPR (2017) 85
  86. Squeeze-and-Excitation22 • 2017 ILSVRC • Inception • • squeeze excitation 22 Jie Hu, Li Shen, Gang Sun, "Squeeze-and-Excitation Networks", arXiv pre-print: 1709.01507 (2017) 86
  87. - 2017 ILSVRC Kaggle ImageNet Object Localization Challenge) - - MSCOCO Dataset - Places Challenge (ADE20K) - YouTube-8M Video Understanding Challenge - Cityscapes Dataset - Mapillary Vistas Datset - VisualGenome 87
  88. 88
  89. Fast R-CNN Faster R-CNN Faster/Fast R-CNN/R-CNN23 • R-CNN: region proposals selective search CNN resize CNN SVM • Fast R-CNN: RoI Pooling bounding box bbox regression • Faster R-CNN: Region proposal network (RPN) CNN RPN / bbox 23 Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." NIPS (2015) 89
  90. You Only Look Once (YOLO)25 • bounding box • bounding box 1 end-to-end • FCN YOLO9000 CVPR2017 25 Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015) 90
  91. Single Shot Multibox Detector (SSD)24 • YOLO Faster R-CNN • • • End-to-end 24 Liu, Wei, et al. "SSD: Single Shot MultiBox Detector." ECCV (2016) 91
  92. Semantic Segmentation 92
  93. Semantic Segmentation • • " " • Instance-aware 93
  94. (1) Dilated convolution51 • • 51 "Multi-Scale Context Aggregation by Dilated Convolutions", ICLR 2016 94
  95. (2) Multi-scale feature ensemble • • 52 52 "Hypercolumns for Object Segmentation and Fine-grained Localization", CVPR 2015 95
  96. (3) Conditional random field (CRF) • CNN refine (DeepLab53 ) • DeepLab refine End-to-End (DPN54 , CRF as RNN55 , Detections and Superpixels56 ) 56 "Higher order conditional random fields in deep neural networks", ECCV 2016 55 "Conditional random fields as recurrent neural networks", ICCV 2015 54 "Semantic image segmentation via deep parsing network", ICCV 2015 53 "Semantic image segmentation with deep convolutional nets and fully connected crfs", ICLR 2015 96
  97. (4) Global average pooling (GAP) • • ParseNet57 Global average pooling FCN 57 "Parsenet: Looking wider to see better", ICLR 2016 97
  98. (1) Mismatched relationship • • • • FCN 98
  99. (2) Confusing Classes • • ADE20K 17.6% 58 • FCN • 58 "Semantic understanding of scenes through the ADE20K dataset", CVPR 2017 99
  100. (3) Inconspicuous Classes • • FCN • sub- region 100
  101. • • • 101
  102. Fully Convolutional Network26 • Classification pre-training 1x1 • Deconvolution • semantic low level skip connection 26 Jonathan Long and Evan Shelhamer et al., "Fully Convolutional Networks for Semantic Segmentation", appeared in arxiv on Nov. 14, 2014 102
  103. Deconvolution • Deconvolution transposed convolution backward convolution • Convolution 1. stride 2. -1 3. padding 4. Convolution • stride Convolution arithmetic https://github.com/vdumoulin/ conv_arithmetic 103
  104. Global Average Pooling (GAP)59 • • ResNet receptive field • GAP • 59 "Parsenet: Looking wider to see better", ICLR 2016 104
  105. SegNet27 • • Max pooling • 0 27 Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." PAMI, (2017) 105
  106. U-Net28 • • Max pooling Deconvolution • concat • "U" U- Net 28 “U-Net: Convolutional Networks for Biomedical Image Segmentation”, Olaf Ronneberger, Philipp Fischer, Thomas Brox, 18 May 2015 106
  107. PSPNet60 Pyramid Pooling Module 60 Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia, "Pyramid Scene Parsing Network", CVPR (2017) 107
  108. Pose Estimation 108
  109. Pose Affinity Field33 • CNN Convolutional Pose Machine Part Affinity Field CNN • OpenPose https://github.com/CMU-Perceptual-Computing- Lab/openpose • Geforce GTX 1080 9fps 33 Zhe Cao and Tomas Simon and Shih-En Wei and Yaser Sheikh, "Realtime Multi-Person Pose Estimation using Part Affinity Fields", CVPR (2017) 109
  110. • → Faster R-CNN • → 110
  111. repeatability reproducibility repeatability reproducibility repeatability reproducibility ACM Repeatability (same team, same experimental setup), Replicability (different team, same experimental setup), Reproducibility (different team, different experiental setup) http://www.acm.org/publications/policies/artifact-review-badging 111
  112. ChainerCV https://github.com/chainer/chainercv ChainerCV • • • • pre-trained 112
  113. ChainerCV 113
  114. - Faster R-CNN (VGG16-based, ) - SSD300, SSD512 : - SegNet - PSPNet (coming soon) - PASCAL VOC (bounding box, segmentation) - Stanford Online Products (classification) - CamVid (segmentation) - Caltech-UCSD Birds-200 (classification, key-points) - Cityscapes (segmentation) 114
  115. : pretrained-model ChainerCV pretarined_model model = FasterRCNNVGG16() # # PASCAL VOC2007 model = FasterRCNNVGG16(pretrained_model='voc07') model = SSD300(pretrained_model='voc0712') model = SegNet(pretrained_model='camvid') 115
  116. predict() ChainerCV predict() # bboxes, labels, scores = model.predict(imgs) # labels = model.predict(imgs) 116
  117. predict() 1. 2. forward 3. non-maximum supression 117
  118. [37] D. Xu, Y. Zhu, C. B. Choy, L. Fei-Fei, “Scene Graph Generation by Iterative Message Passing”, CVPR (2017) • Faster R-CNN Region Proposal network (RPN) • RPN … • 118
  119. ChainerCV • Chainer • public • from chainercv.datasets import VOCDetectionDataset dataset = VOCDetectionDatset(split='trainval', year='2007') # "trainval" 34 img, bbox, label = dataset[34] 119
  120. ChainerCV transforms • data augmentation • ChainerCV data augmentation • • center_crop, pca_lighting, random_crop, random_expand, random_flip, random_rotate, ten_crop, etc... 120
  121. ChainerCV transforms TransformDataset Chainer from chainercv import transforms def transform(in_data): img, bbox, label = in_data img, param = transforms.random_flip(img, x_flip=True, return_param=True) bbox = transforms.flip_bbox(bbox, x_flip=param['x_flip']) return img, bbox, label dataset = TransformDataset(dataset, transform) bounding box 121
  122. bounding box ChainerCV • • • bounding box • matplotlib 122
  123. ChainerCV mean Intersection over Union (mIoU) mean Average Precision (mAP) Chainer Trainer Extension # mAP Trainer Extension evaluator = chainercv.extension.DetectionVOCEvaluator(iterator, model) # # e.g., result['main/map'] result = evaluator() 123
  124. • https://github.com/chainer/chainercv Faster R-CNN SegNet 124
  125. GAN 125
  126. • • • RBM restricted boltzmann machine 34 Variational Auto-Encoder (VAE)35 Generative Adversarial Nets (GAN) 35 Diederik P Kingma and Max Welling, "Auto-Encoding Variational Bayes", ICLR (2014) 34 Smolensky, Paul, "Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory", Parallel Distributed Processing: Explorations in the Microstructure of Cognition (1986) 126
  127. Generator Discriminator Dataset OR from dataset? Generative Adversarial Nets36 • unsupervised learning • G D • D G • G D 36 I. J. Goodfellowm, et. al., "Generative Adversarial Nets", NIPS (2014) 127
  128. GAN • • Discriminator • Generator Discriminator 128
  129. GAN • Generator • Generator 129
  130. DCGAN37 • GAN • Generator 1 Deconvolution • Discriminator Generator • • 37 Alec Radford, Luke Metz, Soumith Chintala, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", ICLR (2016) 130
  131. DCGAN37 • GAN DCGAN • D stride=2 • D Global Average Pooling • D Leaky ReLU • G D Batch Normalization G D 37 Alec Radford, Luke Metz, Soumith Chintala, "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks", ICLR (2016) 131
  132. Improved Techniques for Training GANs38 • GAN • Feature matching: D fake real • Minibatch discrimination: G D 1 mode cllapse D concat • 38 T. Salimans, I. Goodfellow, et. al., "Improved Techniques for Training GANs", NIPS (2016) 132
  133. Improved Techniques for Training GANs38 • Generator Semi-supervised learning • ImageNet DCGAN • Inception score GAN pre-trained model 38 T. Salimans, I. Goodfellow, et. al., "Improved Techniques for Training GANs", NIPS (2016) 133
  134. Wasserstein GAN (WGAN)39 (1) • GAN Generator Wasserstein Earth Mover's Distance WGAN • Generator 2 Wasserstein 39 Martin Arjovsky, Soumith Chintala, Léon Bottou, "Wasserstein GAN", arXiv:1701.07875 (2017) 134
  135. Wasserstein GAN (WGAN)39 (2) • WGAN Discriminator Wasserstein • Discriminator(D) D Wasserstein • 39 Martin Arjovsky, Soumith Chintala, Léon Bottou, "Wasserstein GAN", arXiv:1701.07875 (2017) 135
  136. Wasserstein GAN (WGAN)39 (3) • WGAN Discriminator Wasserstein Wasserstein • Generator Wasserstein • Generator Wasserstein 39 Martin Arjovsky, Soumith Chintala, Léon Bottou, "Wasserstein GAN", arXiv:1701.07875 (2017) 136
  137. Wasserstein GAN (WGAN)39 (4) 1. Discriminator 2. 0.01 3. Generator 4. 39 Martin Arjovsky, Soumith Chintala, Léon Bottou, "Wasserstein GAN", arXiv:1701.07875 (2017) 137
  138. WGAN with Gradient Penalty (WGAN-GP) 40 • WGAN Discriminator Gradient Penalty • Chainer v3 40 Gulrajani, Ishaan, et al. "Improved training of wasserstein gans." arXiv preprint arXiv:1704.00028 (2017). 138
  139. Temporal Generative Adversarial Nets (TGAN)41 • WGAN • Video Generator Image Generator 41 Masaki Saito, Eiichi Matsumoto, Shunta Saito, "Temporal Generative Adversarial Nets with Singular Value Clipping", ICCV (2017) 139
  140. Temporal Generative Adversarial Nets (TGAN)41 • GAN WGAN 1 singular value clipping • Inception score 41 Masaki Saito, Eiichi Matsumoto, Shunta Saito, "Temporal Generative Adversarial Nets with Singular Value Clipping", ICCV (2017) 140
  141. SimGAN42 • CG Refiner • Refiner Discriminator adversarial loss self-regularization • Apple, inc. CVPR 2017 "Improving the Realism of Synthetic Images" 42 A. Shrivastava, et. al. "Learning from Simulated and Unsupervised Images through Adversarial Training", CVPR (2017) 141
  142. Chainer-GAN-lib GAN Chainer Chainer Trainer GAN https://github.com/pfnet-research/chainer-gan-lib 142
  143. Adversarial examples70 • " " • NIPS 2017: Non-targeted Adversarial Attack Google Brain "Non- targeted" 70 Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy, "Explaining and Harnessing Adversarial Examples." ICLR (2015) 143
  144. 144
  145. NLP • • part-of-speech tagging • word segmentation • word sense disambiguation • named entity extraction • syntactic parsing • predicate-argument recognition 145
  146. NLP • • • 146
  147. 147
  148. RNN • RNN (recurrent neural networks) • RNN 148
  149. http://qiita.com/t_Signull/items/21b82be280b46f467d1b LSTM • • 1 3 • : • 1 • • 149
  150. 2.11 Gated recurrent unit (GRU) • LSTM • reset 1 • update) 1 • GRU LSTM 150
  151. RNN one-hot RNN RNN 151
  152. s.t. perplexity; PPL 2 PPL PPL 152
  153. 1. Penn Treebank ptb 90 1 Chainer example ptb example https://github.com/chainer/chainer/ tree/master/examples/ptb 2. One Biliion Word 8 80 3. Hutter 90MB/5MB/5MB train/val/test 153
  154. sequence-to-sequence • • RNN • • • 154
  155. 155
  156. greedy algorithm 156
  157. 157
  158. attention mechanism LSTM 158
  159. Attention is all you need65 RNN/CNN Attention Transformer SOTA Transformer: A Novel Neural Network Architecture for Language Understanding 65 Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, "Attention Is All You Need", (2017) 159
  160. 160
  161. policy 1. 2. 3. 4. 161
  162. • → return • discounted total reward • state value 162
  163. state value function optimal policy 163
  164. action value function 164
  165. greedy ε-greedy greedy ε-greedy ε greedy 165
  166. Q Q-learning Q Q 166
  167. Q (1) Q Q 167
  168. Q (2) Q Q →MSE SGD 168
  169. Deep Q-Network (DQN)43 • Q • Q NN DQN • Experience replay • Target Q-Network • clipping DQN SPACE INVADERS (uploaded by DeepMind) 43 Mnih, Volodymyr, et al. "Playing Atari with Deep Reinforcement Learning", NIPS (2013) 169
  170. DQN Experience replay - replay memory - Q Target Q-Network - Q - Q $theta$ Q clipping - clip 170
  171. • Q • • 171
  172. • 172
  173. • 45 • 45 Pierre Andry, et. al., "Learning invariant sensorimotor behaviors: A developmental approach to imitation mechanisms." Adaptive behavior (2004) 173
  174. Actor-Critic • Actor Critic • • 174
  175. Asynchronous Advantage Actor-Critic (A3C)44 • • Actor-Critic • Experience replay RNN • 44 V. Mnih, et. al., "Asynchronous Methods for Deep Reinforcement Learning", ICML (2016) 175
  176. DDPG46 • Deep Deterministic Policy Gradient (DDPG) Actor-Critic • Deep Q-Network End-to-End • Deep Reinforcement Learning (DDPG) demonstration 46 Lillicrap, Timothy P., et al. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015) 176
  177. ChainerRL • Chainer https://github.com/chainer/chainerrl 177
  178. ChainerRL : • ChainerRL OpenAI Gym Gym • reset step env = YourEnv() # reset obs = env.reset() action = 0 # step # 4 obs, r, done, info = env.step(action) 178
  179. ChainerRL : (1) • • Chainer Q class CustomDiscreteQFunction(chainer.Chain): def __init__(self): super().__init__(l1=L.Linear(100, 50) l2=L.Linear(50, 4)) def __call__(self, x, test=False): h = F.relu(self.l1(x)) h = self.l2(h) return chainerrl.action_value.DiscreteActionValue(h) 179
  180. ChainerRL : (2) class CustomGaussianPolicy(chainer.Chain): def __init__(self): super().__init__(l1=L.Linear(100, 50) mean=L.Linear(50, 4), var=L.Linear(50, 4)) def __call__(self, x, test=False): h = F.relu(self.l1(x)) mean = self.mean(h) var = self.var(h) return chainerrl.distribution.GaussianDistribution(mean, var) 180
  181. ChainerRL : Q Chainer Optimizer q_func = CustomDiscreteQFunction() optimizer = chainer.Adam() optimizer.setup(q_func) agent = chainerrl.agents.DQN(q_func, optimizer, ...) # 181
  182. ChainerRL : (1) ChainerRL • ChainerRL chainerrl.experiments.train_agent_with_evaluation( agent, env, steps=100000, eval_frequency=10000, eval_n_runs=10, outdir='results') 182
  183. ChainerRL : (1) obs = env.reset() r = 0 done = False for _ in range(10000): while not done: action = agent.act_and_train(obs, r) obs, r, done, info = env.step(action) agent.stop_episode_and_train(obs, r, done) obs = env.reset() r, done = 0, False agent.save('final_agent') 183
  184. ChainerRL Quick Start Guide https://github.com/chainer/chainerrl/ blob/master/examples/quickstart/ quickstart.ipynb OpenAI Gym DQN 184
  185. Chainer 185
Advertisement