Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Deep Learning Cases:
Text and Image Processing
Grigory Sapunov
Founders & Developers: Deep Learning Unicorns
Moscow 03.04....
“Simple” Image & Video Processing
Simple tasks: Classification and Detection
http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
Detection t...
Case #1: IJCNN 2011
The German Traffic Sign Recognition Benchmark
● Classification, >40 classes
● >50,000 real-life images...
Case #2: ILSVRC 2010-2015
Large Scale Visual Recognition Challenge (ILSVRC)
● Object detection (200 categories, ~0.5M imag...
Case #2: ILSVRC 2010-2015
● Blue: Traditional CV
● Purple: Deep Learning
● Red: Human
Examples: Object Detection
Example: Face Detection + Emotion Classification
Example: Face Detection + Classification + Regression
Examples: Food Recognition
Examples: Computer Vision on the Road
Examples: Pedestrian Detection
Examples: Activity Recognition
Examples: Road Sign Recognition (on mobile!)
● NVidia Jetson TK1/TX1
○ 192/256 CUDA Cores
○ 64-bit Quad-Core ARM A15/A57 CPU, 2/4 Gb Mem
● Raspberry Pi 3
○ 1.2 GHz 64-...
...even more mobile
http://www.digitaltrends.com/cool-tech/swiss-drone-ai-follows-trails/
This drone can automatically fol...
...even homemade automobile
Meet the 26-Year-Old Hacker Who Built a Self-
Driving Car... in His Garage
https://www.youtube...
More complex Image & Video
Processing
https://www.youtube.com/watch?v=ZJMtDRbqH40
NYU Semantic Segmentation with a Convolutional Network (33 categories)
Semanti...
Caption Generation
http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
Example: NeuralTalk and Walk
Ingredients:
● https://github.com/karpathy/neuraltalk2
Project for learning Multimodal Recurr...
More hacking: NeuralTalk and Walk
Product of the near future: DenseCap and ?
http://arxiv.org/abs/1511.07571 DenseCap: Fully Convolutional Localization Netw...
Image Colorization
http://richzhang.github.io/colorization/
Visual Question Answering
https://avisingh599.github.io/deeplearning/visual-qa/
Reinforcement Learning
Управление симулированным автомобилем на основе видеосигнала (2013)
http://people.idsia.ch/~juergen...
Reinforcement Learning
Reinforcement Learning
Human-level control through deep reinforcement learning (2014)
http://www.nature.com/nature/journal...
Reinforcement Learning
Fun: Deep Dream
http://blogs.wsj.com/digits/2016/02/29/googles-computers-paint-like-van-gogh-and-the-art-sells-for-thousan...
More Fun: Neural Style
http://www.dailymail.co.uk/sciencetech/article-3214634/The-algorithm-learn-copy-artist-Neural-netwo...
More Fun: Neural Style
http://www.boredpanda.com/inceptionism-neural-network-deep-dream-art/
More Fun: Photo-realistic Synthesis
http://arxiv.org/abs/1601.04589 Combining Markov Random Fields and Convolutional Neura...
More Fun: Neural Doodle
http://arxiv.org/abs/1603.01768 Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artw...
Text Processing / NLP
Deep Learning and NLP
Variety of tasks:
● Finding synonyms
● Fact extraction: people and company names, geography, prices,...
https://code.google.com/archive/p/word2vec/
Example: Semantic Spaces (word2vec, GloVe)
http://nlp.stanford.edu/projects/glove/
Example: Semantic Spaces (word2vec, GloVe)
Encoding semantics
Using word2vec instead of word indexes allows you to better deal with the word
meanings (e.g. no need t...
Multi-modal learning
http://arxiv.org/abs/1411.2539 Unifying Visual-Semantic Embeddings with Multimodal Neural Language Mo...
Example: More multi-modal learning
Case: Sentiment analysis
http://nlp.stanford.edu/sentiment/
Can capture complex cases where bag-of-words models fail.
“Thi...
Case: Machine Translation
Sequence to Sequence Learning with Neural Networks, http://arxiv.org/abs/1409.3215
Case: Automated Speech Translation
Translating voice calls and video calls in 7 languages and instant messages in over 50....
Case: Baidu Automated Speech Recognition (ASR)
More Fun: MtG cards
http://www.escapistmagazine.com/articles/view/scienceandtech/14276-Magic-The-Gathering-Cards-Made-by-A...
Case: Question Answering
A Neural Network for Factoid Question Answering over Paragraphs, https://cs.umd.edu/~miyyer/qblea...
Case: Dialogue Systems
A Neural Conversational Model,
Oriol Vinyals, Quoc Le
http://arxiv.org/abs/1506.05869
What for: Conversational Commerce
https://medium.com/chris-messina/2016-will-be-the-year-of-conversational-commerce-1586e8...
What for: Conversational Commerce
Summary
Why Deep Learning is helpful? Or even a game-changer
● Works on raw data (pixels, sound, text or chars), no need to featur...
Still some issues exist
● No dataset -- no deep learning
There are a lot of data available (and it’s required for deep lea...
So what to do next?
Universal Libraries and Frameworks
● Torch7 (http://torch.ch/)
● TensorFlow (https://www.tensorflow.org/)
● Theano (http:/...
Libraries & Frameworks for image/video processing
● OpenCV (http://opencv.org/)
● Caffe (http://caffe.berkeleyvision.org/)...
Libraries & Frameworks for speech
● CNTK (http://www.cntk.ai/)
● KALDI (http://kaldi-asr.org/)
● Google Speech API (https:...
Libraries & Frameworks for text processing
● Torch7 (http://torch.ch/)
● Theano/Keras/…
● TensorFlow (https://www.tensorfl...
What to read and where to study?
- CS231n: Convolutional Neural Networks for Visual Recognition, Fei-Fei
Li, Andrej Karpat...
What to read and where to study?
- Google+ Deep Learning community (https://plus.google.
com/communities/11286638158045726...
Whom to follow?
- Jürgen Schmidhuber (http://people.idsia.ch/~juergen/)
- Geoffrey E. Hinton (http://www.cs.toronto.edu/~h...
https://ru.linkedin.com/in/grigorysapunov
gs@inten.to
Thanks!
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
Deep Learning Cases: Text and Image Processing
Upcoming SlideShare
Loading in …5
×

Deep Learning Cases: Text and Image Processing

8,024 views

Published on

@Founders & Developers meetup April 3 2016: Deep Learning Unicorns

Published in: Technology
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y6a5rkg5 } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Deep Learning Cases: Text and Image Processing

  1. 1. Deep Learning Cases: Text and Image Processing Grigory Sapunov Founders & Developers: Deep Learning Unicorns Moscow 03.04.2016 gs@inten.to
  2. 2. “Simple” Image & Video Processing
  3. 3. Simple tasks: Classification and Detection http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf Detection task is harder than classification, but both are almost done. And with better-than-human quality.
  4. 4. Case #1: IJCNN 2011 The German Traffic Sign Recognition Benchmark ● Classification, >40 classes ● >50,000 real-life images ● First Superhuman Visual Pattern Recognition ○ 2x better than humans ○ 3x better than the closest artificial competitor ○ 6x better than the best non-neural method Method Correct (Error) 1 Committee of CNNs 99.46 % (0.54%) 2 Human Performance 98.84 % (1.16%) 3 Multi-Scale CNNs 98.31 % (1.69%) 4 Random Forests 96.14 % (3.86%) http://people.idsia.ch/~juergen/superhumanpatternrecognition.html
  5. 5. Case #2: ILSVRC 2010-2015 Large Scale Visual Recognition Challenge (ILSVRC) ● Object detection (200 categories, ~0.5M images) ● Classification + localization (1000 categories, 1.2M images)
  6. 6. Case #2: ILSVRC 2010-2015 ● Blue: Traditional CV ● Purple: Deep Learning ● Red: Human
  7. 7. Examples: Object Detection
  8. 8. Example: Face Detection + Emotion Classification
  9. 9. Example: Face Detection + Classification + Regression
  10. 10. Examples: Food Recognition
  11. 11. Examples: Computer Vision on the Road
  12. 12. Examples: Pedestrian Detection
  13. 13. Examples: Activity Recognition
  14. 14. Examples: Road Sign Recognition (on mobile!)
  15. 15. ● NVidia Jetson TK1/TX1 ○ 192/256 CUDA Cores ○ 64-bit Quad-Core ARM A15/A57 CPU, 2/4 Gb Mem ● Raspberry Pi 3 ○ 1.2 GHz 64-bit quad-core ARM Cortex-A53, 1 Gb SDRAM, US$35 ● Tablets, Smartphones ● Google Project Tango Deep Learning goes mobile!
  16. 16. ...even more mobile http://www.digitaltrends.com/cool-tech/swiss-drone-ai-follows-trails/ This drone can automatically follow forest trails to track down lost hikers
  17. 17. ...even homemade automobile Meet the 26-Year-Old Hacker Who Built a Self- Driving Car... in His Garage https://www.youtube.com/watch?v=KTrgRYa2wbI
  18. 18. More complex Image & Video Processing
  19. 19. https://www.youtube.com/watch?v=ZJMtDRbqH40 NYU Semantic Segmentation with a Convolutional Network (33 categories) Semantic Segmentation
  20. 20. Caption Generation http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
  21. 21. Example: NeuralTalk and Walk Ingredients: ● https://github.com/karpathy/neuraltalk2 Project for learning Multimodal Recurrent Neural Networks that describe images with sentences ● Webcam/notebook Result: ● https://vimeo.com/146492001
  22. 22. More hacking: NeuralTalk and Walk
  23. 23. Product of the near future: DenseCap and ? http://arxiv.org/abs/1511.07571 DenseCap: Fully Convolutional Localization Networks for Dense Captioning
  24. 24. Image Colorization http://richzhang.github.io/colorization/
  25. 25. Visual Question Answering https://avisingh599.github.io/deeplearning/visual-qa/
  26. 26. Reinforcement Learning Управление симулированным автомобилем на основе видеосигнала (2013) http://people.idsia.ch/~juergen/gecco2013torcs.pdf http://people.idsia.ch/~juergen/compressednetworksearch.html
  27. 27. Reinforcement Learning
  28. 28. Reinforcement Learning Human-level control through deep reinforcement learning (2014) http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html Playing Atari with Deep Reinforcement Learning (2013) http://arxiv.org/abs/1312.5602
  29. 29. Reinforcement Learning
  30. 30. Fun: Deep Dream http://blogs.wsj.com/digits/2016/02/29/googles-computers-paint-like-van-gogh-and-the-art-sells-for-thousands/
  31. 31. More Fun: Neural Style http://www.dailymail.co.uk/sciencetech/article-3214634/The-algorithm-learn-copy-artist-Neural-network-recreate-snaps-style-Van-Gogh-Picasso.html
  32. 32. More Fun: Neural Style http://www.boredpanda.com/inceptionism-neural-network-deep-dream-art/
  33. 33. More Fun: Photo-realistic Synthesis http://arxiv.org/abs/1601.04589 Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis
  34. 34. More Fun: Neural Doodle http://arxiv.org/abs/1603.01768 Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks (a) Original painting by Renoir, (b) semantic annotations, (c) desired layout, (d) generated output.
  35. 35. Text Processing / NLP
  36. 36. Deep Learning and NLP Variety of tasks: ● Finding synonyms ● Fact extraction: people and company names, geography, prices, dates, product names, … ● Classification: genre and topic detection, positive/negative sentiment analysis, authorship detection, … ● Machine translation ● Search (written and spoken) ● Question answering ● Dialog systems ● Language modeling, Part of speech recognition
  37. 37. https://code.google.com/archive/p/word2vec/ Example: Semantic Spaces (word2vec, GloVe)
  38. 38. http://nlp.stanford.edu/projects/glove/ Example: Semantic Spaces (word2vec, GloVe)
  39. 39. Encoding semantics Using word2vec instead of word indexes allows you to better deal with the word meanings (e.g. no need to enumerate all synonyms because their vectors are already close to each other). But the naive way to work with word2vec vectors still gives you a “bag of words” model, where phrases “The man killed the tiger” and “The tiger killed the man” are equal. Need models which pay attention to the word ordering: paragraph2vec, sentence embeddings (using RNN/LSTM), even World2Vec (LeCunn @CVPR2015).
  40. 40. Multi-modal learning http://arxiv.org/abs/1411.2539 Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
  41. 41. Example: More multi-modal learning
  42. 42. Case: Sentiment analysis http://nlp.stanford.edu/sentiment/ Can capture complex cases where bag-of-words models fail. “This movie was actually neither that funny, nor super witty.”
  43. 43. Case: Machine Translation Sequence to Sequence Learning with Neural Networks, http://arxiv.org/abs/1409.3215
  44. 44. Case: Automated Speech Translation Translating voice calls and video calls in 7 languages and instant messages in over 50. https://www.skype.com/en/features/skype-translator/
  45. 45. Case: Baidu Automated Speech Recognition (ASR)
  46. 46. More Fun: MtG cards http://www.escapistmagazine.com/articles/view/scienceandtech/14276-Magic-The-Gathering-Cards-Made-by-Artificial-Intelligence
  47. 47. Case: Question Answering A Neural Network for Factoid Question Answering over Paragraphs, https://cs.umd.edu/~miyyer/qblearn/
  48. 48. Case: Dialogue Systems A Neural Conversational Model, Oriol Vinyals, Quoc Le http://arxiv.org/abs/1506.05869
  49. 49. What for: Conversational Commerce https://medium.com/chris-messina/2016-will-be-the-year-of-conversational-commerce-1586e85e3991
  50. 50. What for: Conversational Commerce
  51. 51. Summary
  52. 52. Why Deep Learning is helpful? Or even a game-changer ● Works on raw data (pixels, sound, text or chars), no need to feature engineering ○ Some features are really hard to develop (requires years of work for group of experts) ○ Some features are patented (i.e. SIFT, SURF for images) ● Allows end-to-end learning (pixels-to-category, sound to sentence, English sentence to Chinese sentence, etc) ○ No need to do segmentation, etc. (a lot of manual labor) ⇒ You can iterate faster (and get superior quality at the same time!)
  53. 53. Still some issues exist ● No dataset -- no deep learning There are a lot of data available (and it’s required for deep learning, otherwise simple models could be better) ○ But sometimes you have no dataset… ■ Nonetheless some hacks available: Transfer learning, Data augmentation, Mechanical Turk, … ● Requires a lot of computations. No cluster or GPU machines -- much more time required
  54. 54. So what to do next?
  55. 55. Universal Libraries and Frameworks ● Torch7 (http://torch.ch/) ● TensorFlow (https://www.tensorflow.org/) ● Theano (http://deeplearning.net/software/theano/) ○ Keras (http://keras.io/) ○ Lasagne (https://github.com/Lasagne/Lasagne) ○ blocks (https://github.com/mila-udem/blocks) ○ pylearn2 (https://github.com/lisa-lab/pylearn2) ● CNTK (http://www.cntk.ai/) ● Neon (http://neon.nervanasys.com/) ● Deeplearning4j (http://deeplearning4j.org/) ● Google Prediction API (https://cloud.google.com/prediction/) ● … ● http://deeplearning.net/software_links/
  56. 56. Libraries & Frameworks for image/video processing ● OpenCV (http://opencv.org/) ● Caffe (http://caffe.berkeleyvision.org/) ● Torch7 (http://torch.ch/) ● clarifai (http://clarif.ai/) ● Google Vision API (https://cloud.google.com/vision/) ● … ● + all universal libraries
  57. 57. Libraries & Frameworks for speech ● CNTK (http://www.cntk.ai/) ● KALDI (http://kaldi-asr.org/) ● Google Speech API (https://cloud.google.com/) ● Yandex SpeechKit (https://tech.yandex.ru/speechkit/) ● Baidu Speech API (http://www.baidu.com/) ● wit.ai (https://wit.ai/) ● …
  58. 58. Libraries & Frameworks for text processing ● Torch7 (http://torch.ch/) ● Theano/Keras/… ● TensorFlow (https://www.tensorflow.org/) ● MetaMind (https://www.metamind.io/) ● Google Translate API (https://cloud.google.com/translate/) ● … ● + all universal libraries
  59. 59. What to read and where to study? - CS231n: Convolutional Neural Networks for Visual Recognition, Fei-Fei Li, Andrej Karpathy, Stanford (http://vision.stanford. edu/teaching/cs231n/index.html) - CS224d: Deep Learning for Natural Language Processing, Richard Socher, Stanford (http://cs224d.stanford.edu/index.html) - Neural Networks for Machine Learning, Geoffrey Hinton (https://www. coursera.org/course/neuralnets) - Computer Vision course collection (http://eclass.cc/courselists/111_computer_vision_and_navigation) - Deep learning course collection (http://eclass.cc/courselists/117_deep_learning) - Book “Deep Learning”, Ian Goodfellow, Yoshua Bengio and Aaron Courville (http://www.deeplearningbook.org/)
  60. 60. What to read and where to study? - Google+ Deep Learning community (https://plus.google. com/communities/112866381580457264725) - VK Deep Learning community (http://vk.com/deeplearning) - Quora (https://www.quora.com/topic/Deep-Learning) - FB Deep Learning Moscow (https://www.facebook. com/groups/1505369016451458/) - Twitter Deep Learning Hub (https://twitter.com/DeepLearningHub) - NVidia blog (https://devblogs.nvidia.com/parallelforall/tag/deep-learning/) - IEEE Spectrum blog (http://spectrum.ieee.org/blog/cars-that-think) - http://deeplearning.net/ - Arxiv Sanity Preserver http://www.arxiv-sanity.com/ - ...
  61. 61. Whom to follow? - Jürgen Schmidhuber (http://people.idsia.ch/~juergen/) - Geoffrey E. Hinton (http://www.cs.toronto.edu/~hinton/) - Google DeepMind (http://deepmind.com/) - Yann LeCun (http://yann.lecun.com, https://www.facebook.com/yann.lecun) - Yoshua Bengio (http://www.iro.umontreal.ca/~bengioy, https://www.quora. com/profile/Yoshua-Bengio) - Andrej Karpathy (http://karpathy.github.io/) - Andrew Ng (http://www.andrewng.org/) - ...
  62. 62. https://ru.linkedin.com/in/grigorysapunov gs@inten.to Thanks!

×