Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ベイジアンディープニューラルネット

7,416 views

Published on

2017-08-26 @ Machine Learning 15 mins

Published in: Technology
  • Be the first to comment

ベイジアンディープニューラルネット

  1. 1. 2017-08-26 @ Machine Learning 15 mins
  2. 2. Yuta Kashino ( ) BakFoo, Inc. CEO Astro Physics /Observational Cosmology Zope / Python Realtime Data Platform for Enterprise / Prototyping
  3. 3. Yuta Kashino ( ) arXiv stat.ML, stat.TH, cs.CV, cs.CL, cs.LG math-ph, astro-ph PyCon2016 PyCon2017 DNN PPL Edward @yutakashino https://www.slideshare.net/yutakashino/pyconjp2016
  4. 4. - - Edward -
  5. 5. http://bayesiandeeplearning.org/
  6. 6. Shakir Mohamed http://blog.shakirm.com/wp-content/uploads/2015/11/CSML_BayesDeep.pdf
  7. 7. - Denker, Schwartz, Wittner, Solla, Howard, Jackel, Hopfield (1987) Denker and LeCun (1991) MacKay (1992) Hinton and van Camp (1993) Neal (1995) Barber and Bishop (1998) Graves (2011) Blundell, Cornebise, Kavukcuoglu, and Wierstra (2015) Hernandez-Lobato and Adam (2015)
  8. 8. - Yarin Gal Zoubin Ghahramani Shakir Mohamed Dastin Tran Rajesh Ranganath David Blei Ian Goodfellow Columbia U U of Cambridge
  9. 9. - - : - : - : - : - : SGD + BackProp … …x1 x2 xd ✓(2) ✓(1) x y y(n) = X j ✓ (2) j ( X i ✓ (1) ji x (n) i ) + ✏(n) p(y(n) | x(n) , ✓) = ( X i ✓ (n) i x (n) i ) ✓ D = {x(n) , y(n) }N n=1 = (X, y)
  10. 10. : - + - 2012 ILSVRC → 2015 - - - -
  11. 11. : - - ReLU, DropOut, Mini Batch, SGD(Adam), LSTM… - - ImageNet, MSCoCo… - : GPU, - : - Theano, Torch, Caffe, TensorFlow, Chainer, MxNet, PyTorch…
  12. 12. : - - - - - https://lossfunctions.tumblr.com/
  13. 13. : - - - - Adversarial examples -
  14. 14. - = = -
  15. 15. - - : - : - : - : - : SGD + BackProp … …x1 x2 xd ✓(2) ✓(1) x y y(n) = X j ✓ (2) j ( X i ✓ (1) ji x (n) i ) + ✏(n) p(y(n) | x(n) , ✓) = ( X i ✓ (n) i x (n) i ) ✓ D = {x(n) , y(n) }N n=1 = (X, y)
  16. 16. - data hypothesis( ) - : - - P(H | D) = P(H)P(D | H) P H P(H)P(D|H) P(x) = X y P(x, y) P(x, y) = P(x)P(y | x)
  17. 17. - : - : - - - : P(H | D) = P(H)P(D | H) P H P(H)P(D|H) likelihood priorposterior P(✓ | D, m) = P(D | ✓, m)P(✓ | m) P(D | m) m: P(x | D, m) = Z P(x | ✓, D, m)P(✓ | D, m)d✓ P(m | D) = P(D | m)P(m) P(D) evidence
  18. 18. - - : - : - : - : … …x1 x2 xd ✓(2) ✓(1) x y ✓ D = {x(n) , y(n) }N n=1 = (X, y) P(✓ | D, m) = P(D | ✓, m)P(✓ | m) P(D | m) m: P(x | D, m) = Z P(x | ✓, D, m)P(✓ | D, m)d✓ prior
  19. 19. - (Variational Bayes) - (MCMC) P(✓ | D, m) = P(D | ✓, m)P(✓ | m) P(D | m) m: Z P(D | ✓, m)P(✓)d✓
  20. 20. p(θ|D) KL q(θ) ELBO ⇤ = argmin KL(q(✓; ) || p(✓ | D)) = argmin Eq(✓; )[logq(✓; ) p(✓ | D)] ELBO( ) = Eq(✓; )[p(✓, D) logq(✓; )] ⇤ = argmax ELBO( )
  21. 21. - KL =ELBO q(✓; 1) q(✓; 5) p(✓, D) p(✓, D) ✓✓ ⇤ = argmax ELBO( ) ELBO( ) = Eq(✓; )[p(✓, D) logq(✓; )]
  22. 22. - p q - : - ADVI: Automatic Differentiation Variational Inference - BBVI: Blackbox Variational Inference q(✓; 1) q(✓; 5) p(✓, D) p(✓, D) ✓✓ arxiv:1603.00788 arxiv:1401.0118
  23. 23. - VI - - David MacKay “Lecture 14 of the Cambridge Course” - PRML 10 http://www.inference.org.uk/itprnn_lectures/
  24. 24. Reference - Zoubin Ghahramani “History of Bayesian neural networks” NIPS 2016 Workshop Bayesian Deep Learning - Yarin Gal “Bayesian Deep Learning"O'Reilly Artificial Intelligence in New York, 2017
  25. 25. Edward
  26. 26. Edward - Dustin Tran (Open AI) - Blei Lab - (PPL) - 2016 2 PPL - TensorFlow - George Edward Pelham Box Box-Cox Trans., Box-Jenkins, Ljung-Box test box plot Tukey, 3 2 RA Fisher
  27. 27. PPL Edward TensorFlow(TF) + (PPL) TF: PPL: + + Python/Numpy
  28. 28. - Probabilistic Programing Library/Langage - Stan, PyMC3, Anglican, Church, Venture,Figaro, WebPPL, Edward - : Edward / PyMC3 - (VI) Metropolis Hastings Hamilton Monte Carlo Stochastic Gradient Langevin Dynamics No-U-Turn Sampler Blackbox Variational Inference Automatic Differentiation Variational Inference
  29. 29. 1. TF: - - :
  30. 30. 1. TF:
  31. 31. 1. TF: - - - GPU / TPU Inception v3 Inception v4
  32. 32. 1. TF: - Keras, Slim - TensorBoard
  33. 33. 2. x: edward x⇤ s P(x | ↵) ✓⇤ ⇠ Beta(✓ | 1, 1)
  34. 34. 2. - ( ) Edward p(x, ✓) = Beta(✓ | 1, 1) 50Y n=1 Bernoulli(xn | ✓),
  35. 35. 2. - log_prob() - mean() - sample()
  36. 36. 3. Edward TF
  37. 37. 3. 256 28*28
  38. 38. 3. http://edwardlib.org/tutorials/
  39. 39. 3. /
  40. 40. 4. X, Z Z - (Variational Bayes) - (MCMC) p(z | x) = p(x, z) R p(x, z)dz .
  41. 41. 4.
  42. 42. 4. p(z|x) KL q(z) ELBO
  43. 43. 4. Edward KLqp
  44. 44. 5. Box’s loop George Edward Pelham Box Blei 2014
  45. 45. 5. Box’s loop
  46. 46. Edward - Edward = TensorFlow + + - TensorFlow - - TF GPU, TPU, TensorBoard, Keras - - Box’s Loop - Python
  47. 47. Refrence •D. Tran, A. Kucukelbir, A. Dieng, M. Rudolph, D. Liang, and D.M. Blei. Edward: A library for probabilistic modeling, inference, and criticism.(arXiv preprint arXiv:1610.09787) •D. Tran, M.D. Hoffman, R.A. Saurous, E. Brevdo, K. Murphy, and D.M. Blei. Deep probabilistic programming.(arXiv preprint arXiv:1701.03757) •Box, G. E. (1976). Science and statistics. (Journal of the American Statistical Association, 71(356), 791–799.) •D.M. Blei. Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models. (Annual Review of Statistics and Its Application Volume 1, 2014)
  48. 48. Dropout - Yarin Gal ”Uncertainty in Deep Learning” - Dropout - Dropout : conv - LeNet with Dropout http://mlg.eng.cam.ac.uk/yarin/blog_2248.html
  49. 49. Dropout - LeNet DNN - conv Dropout MNIST
  50. 50. Dropout - CO2
  51. 51. - - Edward -
  52. 52. Questions kashino@bakfoo.com @yutakashino
  53. 53. BakFoo, Inc. NHK NMAPS: +
  54. 54. BakFoo, Inc. PyConJP 2015 Python
  55. 55. BakFoo, Inc.
  56. 56. BakFoo, Inc. : SNS +

×