Your SlideShare is downloading. ×
Quoc le, slides  MLconf 11/15/13
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Quoc le, slides MLconf 11/15/13

2,664
views

Published on

Published in: Technology, Education

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,664
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
35
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Large Scale Deep Learning Quoc V. Le Google & CMU
  • 2. Deep Learning •  Google is using Machine Learning •  Machine Learning is difficult •  Requires domain knowledge from human experts Deep Learning: •  Great performances for many problems •  Works well with a large amount of data •  Requires less domain knowledge Focus: •  Scale deep learning to bigger models and bigger problems Quoc V. Le
  • 3. Deep Learning •  Google is using Machine Learning •  Machine Learning is difficult •  Requires domain knowledge from human experts Deep Learning: •  Great performances for many problems •  Works well with a large amount of data •  Requires less domain knowledge Focus: •  Scale deep learning to bigger models and bigger problems Quoc V. Le
  • 4. What is Deep Learning? Quoc V. Le
  • 5. What is Deep Learning? … v = g(B u) B A u = g(A x) x (images, audio, texts, etc.) Quoc V. Le
  • 6. What is Deep Learning? … v = g(B u) B A u = g(A x) x (images, audio, texts, etc.) Quoc V. Le
  • 7. High-level features by Deep Learning Face detector, Cat detector … Edge detectors Pixels Quoc V. Le
  • 8. Google’s DistBelief Model Goal: Train deep learning on many machines Model: A multiple layered architecture Forward pass to compute the features Backward pass to compute the gradient Training Data Quoc V. Le
  • 9. Model partition with DistBelief Model DistBelief distributes a model across multiple machines and multiple cores. Machine (Model Partition) Training Data Quoc V. Le
  • 10. Model partition with DistBelief Model DistBelief distributes a model across multiple machines and cores. Machine (Model Partition) Training Data Core Quoc V. Le
  • 11. Model partition with DistBelief Model Stochastic Gradient Descent (SGD) Model parameters are partitioned Can use up to 1000 cores Training Data Quoc V. Le
  • 12. Model partition with DistBelief Model But training is still slow on large data sets Can we add more parallelism? Idea: Train multiple models on different partitions of the data, and merge them Training Data Quoc V. Le
  • 13. Data partition with DistBelief Parameter Server ∆p p’ = p + ∆p p’ Model Workers Data Shards Quoc V. Le
  • 14. Parallelism in DistBelief Model parallelism via model partitioning Data parallelism via data partitioning and asynchronous communications DistBelief can scale to billion examples and use 100,000 cores or more Thanks to its speed, DistBelief dramatically improves many applications Quoc V. Le
  • 15. Applications Voice Search Photo Search Text Understanding Quoc V. Le
  • 16. Voice Search Classifier Hidden layers with 1000s nodes Speech frame label! Quoc V. Le
  • 17. Voice Search Quoc V. Le
  • 18. Applications Voice Search Photo Search Text Understanding Quoc V. Le
  • 19. Photo Search
  • 20. Cat detector Front page of New York Times Quoc V. Le
  • 21. Seat-belt Archery Boston rocker Shredder
  • 22. Face Amusement, Park Hammock
  • 23. Google+ PhotoSearch
  • 24. Applications Voice Search Photo Search Text Understanding Quoc V. Le
  • 25. Text understanding Very useful but also difficult We should try to understand the meaning of words Deep Learning can learn the meaning of words Quoc V. Le
  • 26. Text understanding ~100-D vector space Clinton Paris Obama whale dolphin Quoc V. Le
  • 27. Predicting the next word in a sentence Classifier Hidden Layers E E E E E the! Word Matrix cat! sat! on! the! is a matrix of dimension ||Vocab|| x d Quoc V. Le
  • 28. Visualizing the word vectors •  Example nearest neighbors trained on Google News apple Apple iPhone
  • 29. Relation Extraction Mikolov, Sutskever, Le. Learning the Meaning behind Words. Google OpenSource Blog, 2013 Quoc V. Le
  • 30. Machine Translation Quoc V. Le
  • 31. Summary Model partition Data partition Voice Search Photo Search Text Understanding Quoc V. Le
  • 32. Joint work with Kai Chen Greg Corrado Rajat Monga Andrew Ng Jeff Dean Matthieu Devin Paul Tucker Ke Yang Samy Bengio, Tom Dean, Josh Levenberg, Geoff Hinton, Tomas Additional Mikolov, Mark Mao, Patrick Nguyen, Marc’Aurelio Ranzato, Thanks: Mark Segal, Jon Shlens, Ilya Sutskever, Vincent Vanhoucke

×