Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Large Scale Deep Learning
Quoc V. Le	

Google & CMU
Deep Learning	


•  Google is using Machine Learning 	

•  Machine Learning is difficult	

•  Requires domain knowledge fro...
Deep Learning	


•  Google is using Machine Learning 	

•  Machine Learning is difficult	

•  Requires domain knowledge fro...
What is Deep Learning?	


Quoc V. Le
What is Deep Learning?	


…	

v = g(B u)	

B	

A	


u = g(A x)	

x	


(images, audio, texts, etc.)	


Quoc V. Le
What is Deep Learning?	


…	

v = g(B u)	

B	

A	


u = g(A x)	

x	


(images, audio, texts, etc.)	


Quoc V. Le
High-level features by Deep Learning	

Face detector, Cat detector	

…	


Edge detectors	

Pixels	


Quoc V. Le
Google’s DistBelief	

Model	

Goal: Train deep learning on many
machines	

	

Model: A multiple layered architecture	

	

...
Model partition with DistBelief 	

Model	


DistBelief distributes a model across
multiple machines and multiple cores. 	
...
Model partition with DistBelief 	

Model	


DistBelief distributes a model across
multiple machines and cores. 	


Machine...
Model partition with DistBelief 	

Model	


Stochastic Gradient Descent (SGD)	

	

Model parameters are partitioned	

	

C...
Model partition with DistBelief 	

Model	

But training is still slow on large data sets	

Can we add more parallelism?	

...
Data partition with DistBelief 	

Parameter Server	


∆p	


p’ = p + ∆p	


p’	


Model	

Workers	

Data	

Shards	


Quoc V...
Parallelism in DistBelief 	

Model parallelism via model partitioning	

	

Data parallelism via data partitioning and asyn...
Applications	


Voice Search	


Photo Search	

 Text Understanding	


Quoc V. Le
Voice Search	

Classifier	

Hidden layers with 1000s nodes	


Speech frame	


label!

Quoc V. Le
Voice Search	


Quoc V. Le
Applications	


Voice Search	


Photo Search	

 Text Understanding	


Quoc V. Le
Photo Search
Cat detector	

Front page of New York Times	


Quoc V. Le
Seat-belt	


Archery	


Boston rocker	


Shredder
Face	

Amusement, Park	


Hammock
Google+ PhotoSearch
Applications	


Voice Search	


Photo Search	

 Text Understanding	


Quoc V. Le
Text understanding	


Very useful but also difficult	

	

We should try to understand the meaning of words	

	

Deep Learni...
Text understanding	

~100-D vector space	

Clinton

Paris

Obama

whale

dolphin

Quoc V. Le
Predicting the next word in a sentence	

Classifier	

Hidden Layers	


E	


E	


E	


E	


E	


the!

Word Matrix	


cat!

...
Visualizing the word vectors	


• 

Example nearest neighbors trained on Google News	

apple

Apple

iPhone
Relation Extraction	


Mikolov, Sutskever, Le. Learning the Meaning behind Words. Google OpenSource Blog, 2013	


Quoc V. ...
Machine Translation	


Quoc V. Le
Summary	

Model partition	

Data partition	


Voice Search	


Photo Search	

 Text Understanding	


Quoc V. Le
Joint work with	


Kai Chen	


Greg Corrado	


Rajat Monga	

 Andrew Ng	


Jeff Dean	


Matthieu Devin	


Paul Tucker	


K...
Upcoming SlideShare
Loading in …5
×

Quoc le, slides MLconf 11/15/13

4,960 views

Published on

Published in: Technology, Education

Quoc le, slides MLconf 11/15/13

  1. 1. Large Scale Deep Learning Quoc V. Le Google & CMU
  2. 2. Deep Learning •  Google is using Machine Learning •  Machine Learning is difficult •  Requires domain knowledge from human experts Deep Learning: •  Great performances for many problems •  Works well with a large amount of data •  Requires less domain knowledge Focus: •  Scale deep learning to bigger models and bigger problems Quoc V. Le
  3. 3. Deep Learning •  Google is using Machine Learning •  Machine Learning is difficult •  Requires domain knowledge from human experts Deep Learning: •  Great performances for many problems •  Works well with a large amount of data •  Requires less domain knowledge Focus: •  Scale deep learning to bigger models and bigger problems Quoc V. Le
  4. 4. What is Deep Learning? Quoc V. Le
  5. 5. What is Deep Learning? … v = g(B u) B A u = g(A x) x (images, audio, texts, etc.) Quoc V. Le
  6. 6. What is Deep Learning? … v = g(B u) B A u = g(A x) x (images, audio, texts, etc.) Quoc V. Le
  7. 7. High-level features by Deep Learning Face detector, Cat detector … Edge detectors Pixels Quoc V. Le
  8. 8. Google’s DistBelief Model Goal: Train deep learning on many machines Model: A multiple layered architecture Forward pass to compute the features Backward pass to compute the gradient Training Data Quoc V. Le
  9. 9. Model partition with DistBelief Model DistBelief distributes a model across multiple machines and multiple cores. Machine (Model Partition) Training Data Quoc V. Le
  10. 10. Model partition with DistBelief Model DistBelief distributes a model across multiple machines and cores. Machine (Model Partition) Training Data Core Quoc V. Le
  11. 11. Model partition with DistBelief Model Stochastic Gradient Descent (SGD) Model parameters are partitioned Can use up to 1000 cores Training Data Quoc V. Le
  12. 12. Model partition with DistBelief Model But training is still slow on large data sets Can we add more parallelism? Idea: Train multiple models on different partitions of the data, and merge them Training Data Quoc V. Le
  13. 13. Data partition with DistBelief Parameter Server ∆p p’ = p + ∆p p’ Model Workers Data Shards Quoc V. Le
  14. 14. Parallelism in DistBelief Model parallelism via model partitioning Data parallelism via data partitioning and asynchronous communications DistBelief can scale to billion examples and use 100,000 cores or more Thanks to its speed, DistBelief dramatically improves many applications Quoc V. Le
  15. 15. Applications Voice Search Photo Search Text Understanding Quoc V. Le
  16. 16. Voice Search Classifier Hidden layers with 1000s nodes Speech frame label! Quoc V. Le
  17. 17. Voice Search Quoc V. Le
  18. 18. Applications Voice Search Photo Search Text Understanding Quoc V. Le
  19. 19. Photo Search
  20. 20. Cat detector Front page of New York Times Quoc V. Le
  21. 21. Seat-belt Archery Boston rocker Shredder
  22. 22. Face Amusement, Park Hammock
  23. 23. Google+ PhotoSearch
  24. 24. Applications Voice Search Photo Search Text Understanding Quoc V. Le
  25. 25. Text understanding Very useful but also difficult We should try to understand the meaning of words Deep Learning can learn the meaning of words Quoc V. Le
  26. 26. Text understanding ~100-D vector space Clinton Paris Obama whale dolphin Quoc V. Le
  27. 27. Predicting the next word in a sentence Classifier Hidden Layers E E E E E the! Word Matrix cat! sat! on! the! is a matrix of dimension ||Vocab|| x d Quoc V. Le
  28. 28. Visualizing the word vectors •  Example nearest neighbors trained on Google News apple Apple iPhone
  29. 29. Relation Extraction Mikolov, Sutskever, Le. Learning the Meaning behind Words. Google OpenSource Blog, 2013 Quoc V. Le
  30. 30. Machine Translation Quoc V. Le
  31. 31. Summary Model partition Data partition Voice Search Photo Search Text Understanding Quoc V. Le
  32. 32. Joint work with Kai Chen Greg Corrado Rajat Monga Andrew Ng Jeff Dean Matthieu Devin Paul Tucker Ke Yang Samy Bengio, Tom Dean, Josh Levenberg, Geoff Hinton, Tomas Additional Mikolov, Mark Mao, Patrick Nguyen, Marc’Aurelio Ranzato, Thanks: Mark Segal, Jon Shlens, Ilya Sutskever, Vincent Vanhoucke

×