Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

深層学習とベイズ統計

13,742 views

Published on

2017-08-05 @ Tokyo Web Mining

Published in: Technology
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

深層学習とベイズ統計

  1. 1. 2017-08-05 @ Tokyo Web Mining
  2. 2. Yuta Kashino ( ) BakFoo, Inc. CEO Astro Physics /Observational Cosmology Zope / Python Realtime Data Platform for Enterprise / Prototyping
  3. 3. Yuta Kashino ( ) arXiv stat.ML, stat.TH, cs.CV, cs.CL, cs.LG math-ph, astro-ph PyCon2016 @yutakashino https://www.slideshare.net/yutakashino/pyconjp2016
  4. 4. - - -
  5. 5. http://bayesiandeeplearning.org/
  6. 6. Shakir Mohamed http://blog.shakirm.com/wp-content/uploads/2015/11/CSML_BayesDeep.pdf
  7. 7. - Denker, Schwartz, Wittner, Solla, Howard, Jackel, Hopfield (1987) Denker and LeCun (1991) MacKay (1992) Hinton and van Camp (1993) Neal (1995) Barber and Bishop (1998) Graves (2011) Blundell, Cornebise, Kavukcuoglu, and Wierstra (2015) Hernandez-Lobato and Adam (2015)
  8. 8. - Yarin Gal Zoubin Ghahramani Shakir Mohamed Dastin Tran Rajesh Ranganath David Blei Ian Goodfellow Columbia U U of Cambridge
  9. 9. - - : - : - : - : - : SGD + BackProp … …x1 x2 xd ✓(2) ✓(1) x y y(n) = X j ✓ (2) j ( X i ✓ (1) ji x (n) i ) + ✏(n) p(y(n) | x(n) , ✓) = ( X i ✓ (n) i x (n) i ) ✓ D = {x(n) , y(n) }N n=1 = (X, y)
  10. 10. : - + - 2012 ILSVRC → 2015 - - - -
  11. 11. : - - ReLU, DropOut, Mini Batch, SGD(Adam), LSTM… - - ImageNet, MSCoCo… - : GPU, - : - Theano, Torch, Caffe, TensorFlow, Chainer, MxNet, PyTorch…
  12. 12. : - - - - - https://lossfunctions.tumblr.com/
  13. 13. : - - - - Adversarial examples -
  14. 14. - = = -
  15. 15. - - : - : - : - : - : SGD + BackProp … …x1 x2 xd ✓(2) ✓(1) x y y(n) = X j ✓ (2) j ( X i ✓ (1) ji x (n) i ) + ✏(n) p(y(n) | x(n) , ✓) = ( X i ✓ (n) i x (n) i ) ✓ D = {x(n) , y(n) }N n=1 = (X, y)
  16. 16. - data hypothesis - - P(H | D) = P(H)P(D | H) P H P(H)P(D|H) P(x) = X y P(x, y) P(x, y) = P(x)P(y | x)
  17. 17. - : - : - - - : P(H | D) = P(H)P(D | H) P H P(H)P(D|H) likelihood priorposterior P(✓ | D, m) = P(D | ✓, m)P(✓ | m) P(D | m) m: P(x | D, m) = Z P(x | ✓, D, m)P(✓ | D, m)d✓ P(m | D) = P(D | m)P(m) P(D)
  18. 18. - - : - : - : - : … …x1 x2 xd ✓(2) ✓(1) x y ✓ D = {x(n) , y(n) }N n=1 = (X, y) P(✓ | D, m) = P(D | ✓, m)P(✓ | m) P(D | m) m: P(x | D, m) = Z P(x | ✓, D, m)P(✓ | D, m)d✓ prior
  19. 19. - (Variational Bayes) - (MCMC) P(✓ | D, m) = P(D | ✓, m)P(✓ | m) P(D | m) m:
  20. 20. p(θ|D) KL q(θ) ELBO ⇤ = argmin KL(q(✓; ) || p(✓ | D)) = argmin Eq(✓; )[logq(✓; ) p(✓ | D)] ELBO( ) = Eq(✓; )[p(✓, D) logq(✓; )] ⇤ = argmax ELBO( )
  21. 21. - VI - - David MacKay “Lecture 14 of the Cambridge Course” - PRML 10 http://www.inference.org.uk/itprnn_lectures/
  22. 22. - KL =ELBO q(✓; 1) q(✓; 5) p(✓, D) p(✓, D) ✓✓ ⇤ = argmax ELBO( ) ELBO( ) = Eq(✓; )[p(✓, D) logq(✓; )]
  23. 23. - p q - : - ADVI: Automatic Differentiation Variational Inference - BBVI: Blackbox Variational Inference q(✓; 1) q(✓; 5) p(✓, D) p(✓, D) ✓✓ arxiv:1603.00788 arxiv:1401.0118
  24. 24. Reference - Zoubin Ghahramani “History of Bayesian neural networks” NIPS 2016 Workshop Bayesian Deep Learning - Yarin Gal “Bayesian Deep Learning"O'Reilly Artificial Intelligence in New York, 2017
  25. 25. - Probabilistic Programing Library/Langage - Stan, PyMC3, Anglican, Church, Venture,Figaro, WebPPL, Edward - : Edward / PyMC3 - (VI) Metropolis Hastings Hamilton Monte Carlo Stochastic Gradient Langevin Dynamics No-U-Turn Sampler Blackbox Variational Inference Automatic Differentiation Variational Inference
  26. 26. Edward
  27. 27. Edward - Dustin Tran (Open AI) - Blei Lab - (PPL) - Stan, PyMC3, Anglican, Church, Venture,Figaro, WebPPL - 2016 2 PPL - TensorFlow - George Edward Pelham Box Box-Cox Trans., Box-Jenkins, Ljung-Box test box plot Tukey, 3 2 RA Fisher
  28. 28. PPL Edward TensorFlow(TF) + (PPL) TF: PPL: + + Python/Numpy
  29. 29. 1. TF: - - :
  30. 30. 1. TF:
  31. 31. 1. TF: - - - GPU / TPU Inception v3 Inception v4
  32. 32. 1. TF: - Keras, Slim - TensorBoard
  33. 33. 2. x: edward x⇤ s P(x | ↵) ✓⇤ ⇠ Beta(✓ | 1, 1)
  34. 34. 2. - ( ) Edward p(x, ✓) = Beta(✓ | 1, 1) 50Y n=1 Bernoulli(xn | ✓),
  35. 35. 2. - log_prob() - mean() - sample()
  36. 36. 3. Edward TF
  37. 37. 3. 256 28*28
  38. 38. 4. X, Z Z - (Variational Bayes) - (MCMC) p(z | x) = p(x, z) R p(x, z)dz .
  39. 39. 4.
  40. 40. 4. p(z|x) KL q(z) ELBO
  41. 41. 4. Edward KLqp
  42. 42. 5. Box’s loop George Edward Pelham Box Blei 2014
  43. 43. 5. Box’s loop
  44. 44. Edward - Edward = TensorFlow + + - TensorFlow - - TF GPU, TPU, TensorBoard, Keras - - Box’s Loop - Python
  45. 45. Refrence •D. Tran, A. Kucukelbir, A. Dieng, M. Rudolph, D. Liang, and D.M. Blei. Edward: A library for probabilistic modeling, inference, and criticism.(arXiv preprint arXiv:1610.09787) •D. Tran, M.D. Hoffman, R.A. Saurous, E. Brevdo, K. Murphy, and D.M. Blei. Deep probabilistic programming.(arXiv preprint arXiv:1701.03757) •Box, G. E. (1976). Science and statistics. (Journal of the American Statistical Association, 71(356), 791–799.) •D.M. Blei. Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models. (Annual Review of Statistics and Its Application Volume 1, 2014)
  46. 46. Dropout - Yarin Gal ”Uncertainty in Deep Learning” - Dropout - Dropout : conv - LeNet with Dropout http://mlg.eng.cam.ac.uk/yarin/blog_2248.html
  47. 47. Dropout - LeNet DNN - conv Dropout MNIST
  48. 48. Dropout - CO2
  49. 49. - - -
  50. 50. Questions kashino@bakfoo.com @yutakashino
  51. 51. BakFoo, Inc. NHK NMAPS: +
  52. 52. BakFoo, Inc. PyConJP 2015 Python
  53. 53. BakFoo, Inc.
  54. 54. BakFoo, Inc. : SNS +

×