Quoc le, slides MLconf 11/15/13

3,366
-1

Published on

Published in: Technology, Education

Quoc le, slides MLconf 11/15/13

  1. 1. Large Scale Deep Learning Quoc V. Le Google & CMU
  2. 2. Deep Learning •  Google is using Machine Learning •  Machine Learning is difficult •  Requires domain knowledge from human experts Deep Learning: •  Great performances for many problems •  Works well with a large amount of data •  Requires less domain knowledge Focus: •  Scale deep learning to bigger models and bigger problems Quoc V. Le
  3. 3. Deep Learning •  Google is using Machine Learning •  Machine Learning is difficult •  Requires domain knowledge from human experts Deep Learning: •  Great performances for many problems •  Works well with a large amount of data •  Requires less domain knowledge Focus: •  Scale deep learning to bigger models and bigger problems Quoc V. Le
  4. 4. What is Deep Learning? Quoc V. Le
  5. 5. What is Deep Learning? … v = g(B u) B A u = g(A x) x (images, audio, texts, etc.) Quoc V. Le
  6. 6. What is Deep Learning? … v = g(B u) B A u = g(A x) x (images, audio, texts, etc.) Quoc V. Le
  7. 7. High-level features by Deep Learning Face detector, Cat detector … Edge detectors Pixels Quoc V. Le
  8. 8. Google’s DistBelief Model Goal: Train deep learning on many machines Model: A multiple layered architecture Forward pass to compute the features Backward pass to compute the gradient Training Data Quoc V. Le
  9. 9. Model partition with DistBelief Model DistBelief distributes a model across multiple machines and multiple cores. Machine (Model Partition) Training Data Quoc V. Le
  10. 10. Model partition with DistBelief Model DistBelief distributes a model across multiple machines and cores. Machine (Model Partition) Training Data Core Quoc V. Le
  11. 11. Model partition with DistBelief Model Stochastic Gradient Descent (SGD) Model parameters are partitioned Can use up to 1000 cores Training Data Quoc V. Le
  12. 12. Model partition with DistBelief Model But training is still slow on large data sets Can we add more parallelism? Idea: Train multiple models on different partitions of the data, and merge them Training Data Quoc V. Le
  13. 13. Data partition with DistBelief Parameter Server ∆p p’ = p + ∆p p’ Model Workers Data Shards Quoc V. Le
  14. 14. Parallelism in DistBelief Model parallelism via model partitioning Data parallelism via data partitioning and asynchronous communications DistBelief can scale to billion examples and use 100,000 cores or more Thanks to its speed, DistBelief dramatically improves many applications Quoc V. Le
  15. 15. Applications Voice Search Photo Search Text Understanding Quoc V. Le
  16. 16. Voice Search Classifier Hidden layers with 1000s nodes Speech frame label! Quoc V. Le
  17. 17. Voice Search Quoc V. Le
  18. 18. Applications Voice Search Photo Search Text Understanding Quoc V. Le
  19. 19. Photo Search
  20. 20. Cat detector Front page of New York Times Quoc V. Le
  21. 21. Seat-belt Archery Boston rocker Shredder
  22. 22. Face Amusement, Park Hammock
  23. 23. Google+ PhotoSearch
  24. 24. Applications Voice Search Photo Search Text Understanding Quoc V. Le
  25. 25. Text understanding Very useful but also difficult We should try to understand the meaning of words Deep Learning can learn the meaning of words Quoc V. Le
  26. 26. Text understanding ~100-D vector space Clinton Paris Obama whale dolphin Quoc V. Le
  27. 27. Predicting the next word in a sentence Classifier Hidden Layers E E E E E the! Word Matrix cat! sat! on! the! is a matrix of dimension ||Vocab|| x d Quoc V. Le
  28. 28. Visualizing the word vectors •  Example nearest neighbors trained on Google News apple Apple iPhone
  29. 29. Relation Extraction Mikolov, Sutskever, Le. Learning the Meaning behind Words. Google OpenSource Blog, 2013 Quoc V. Le
  30. 30. Machine Translation Quoc V. Le
  31. 31. Summary Model partition Data partition Voice Search Photo Search Text Understanding Quoc V. Le
  32. 32. Joint work with Kai Chen Greg Corrado Rajat Monga Andrew Ng Jeff Dean Matthieu Devin Paul Tucker Ke Yang Samy Bengio, Tom Dean, Josh Levenberg, Geoff Hinton, Tomas Additional Mikolov, Mark Mao, Patrick Nguyen, Marc’Aurelio Ranzato, Thanks: Mark Segal, Jon Shlens, Ilya Sutskever, Vincent Vanhoucke
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×