Successfully reported this slideshow.
Your SlideShare is downloading. ×

Deep Learning Cases: Text and Image Processing

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 65 Ad
Advertisement

More Related Content

Slideshows for you (20)

Similar to Deep Learning Cases: Text and Image Processing (20)

Advertisement
Advertisement

Deep Learning Cases: Text and Image Processing

  1. 1. Deep Learning Cases: Text and Image Processing Grigory Sapunov Founders & Developers: Deep Learning Unicorns Moscow 03.04.2016 gs@inten.to
  2. 2. “Simple” Image & Video Processing
  3. 3. Simple tasks: Classification and Detection http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf Detection task is harder than classification, but both are almost done. And with better-than-human quality.
  4. 4. Case #1: IJCNN 2011 The German Traffic Sign Recognition Benchmark ● Classification, >40 classes ● >50,000 real-life images ● First Superhuman Visual Pattern Recognition ○ 2x better than humans ○ 3x better than the closest artificial competitor ○ 6x better than the best non-neural method Method Correct (Error) 1 Committee of CNNs 99.46 % (0.54%) 2 Human Performance 98.84 % (1.16%) 3 Multi-Scale CNNs 98.31 % (1.69%) 4 Random Forests 96.14 % (3.86%) http://people.idsia.ch/~juergen/superhumanpatternrecognition.html
  5. 5. Case #2: ILSVRC 2010-2015 Large Scale Visual Recognition Challenge (ILSVRC) ● Object detection (200 categories, ~0.5M images) ● Classification + localization (1000 categories, 1.2M images)
  6. 6. Case #2: ILSVRC 2010-2015 ● Blue: Traditional CV ● Purple: Deep Learning ● Red: Human
  7. 7. Examples: Object Detection
  8. 8. Example: Face Detection + Emotion Classification
  9. 9. Example: Face Detection + Classification + Regression
  10. 10. Examples: Food Recognition
  11. 11. Examples: Computer Vision on the Road
  12. 12. Examples: Pedestrian Detection
  13. 13. Examples: Activity Recognition
  14. 14. Examples: Road Sign Recognition (on mobile!)
  15. 15. ● NVidia Jetson TK1/TX1 ○ 192/256 CUDA Cores ○ 64-bit Quad-Core ARM A15/A57 CPU, 2/4 Gb Mem ● Raspberry Pi 3 ○ 1.2 GHz 64-bit quad-core ARM Cortex-A53, 1 Gb SDRAM, US$35 ● Tablets, Smartphones ● Google Project Tango Deep Learning goes mobile!
  16. 16. ...even more mobile http://www.digitaltrends.com/cool-tech/swiss-drone-ai-follows-trails/ This drone can automatically follow forest trails to track down lost hikers
  17. 17. ...even homemade automobile Meet the 26-Year-Old Hacker Who Built a Self- Driving Car... in His Garage https://www.youtube.com/watch?v=KTrgRYa2wbI
  18. 18. More complex Image & Video Processing
  19. 19. https://www.youtube.com/watch?v=ZJMtDRbqH40 NYU Semantic Segmentation with a Convolutional Network (33 categories) Semantic Segmentation
  20. 20. Caption Generation http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
  21. 21. Example: NeuralTalk and Walk Ingredients: ● https://github.com/karpathy/neuraltalk2 Project for learning Multimodal Recurrent Neural Networks that describe images with sentences ● Webcam/notebook Result: ● https://vimeo.com/146492001
  22. 22. More hacking: NeuralTalk and Walk
  23. 23. Product of the near future: DenseCap and ? http://arxiv.org/abs/1511.07571 DenseCap: Fully Convolutional Localization Networks for Dense Captioning
  24. 24. Image Colorization http://richzhang.github.io/colorization/
  25. 25. Visual Question Answering https://avisingh599.github.io/deeplearning/visual-qa/
  26. 26. Reinforcement Learning Управление симулированным автомобилем на основе видеосигнала (2013) http://people.idsia.ch/~juergen/gecco2013torcs.pdf http://people.idsia.ch/~juergen/compressednetworksearch.html
  27. 27. Reinforcement Learning
  28. 28. Reinforcement Learning Human-level control through deep reinforcement learning (2014) http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html Playing Atari with Deep Reinforcement Learning (2013) http://arxiv.org/abs/1312.5602
  29. 29. Reinforcement Learning
  30. 30. Fun: Deep Dream http://blogs.wsj.com/digits/2016/02/29/googles-computers-paint-like-van-gogh-and-the-art-sells-for-thousands/
  31. 31. More Fun: Neural Style http://www.dailymail.co.uk/sciencetech/article-3214634/The-algorithm-learn-copy-artist-Neural-network-recreate-snaps-style-Van-Gogh-Picasso.html
  32. 32. More Fun: Neural Style http://www.boredpanda.com/inceptionism-neural-network-deep-dream-art/
  33. 33. More Fun: Photo-realistic Synthesis http://arxiv.org/abs/1601.04589 Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis
  34. 34. More Fun: Neural Doodle http://arxiv.org/abs/1603.01768 Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks (a) Original painting by Renoir, (b) semantic annotations, (c) desired layout, (d) generated output.
  35. 35. Text Processing / NLP
  36. 36. Deep Learning and NLP Variety of tasks: ● Finding synonyms ● Fact extraction: people and company names, geography, prices, dates, product names, … ● Classification: genre and topic detection, positive/negative sentiment analysis, authorship detection, … ● Machine translation ● Search (written and spoken) ● Question answering ● Dialog systems ● Language modeling, Part of speech recognition
  37. 37. https://code.google.com/archive/p/word2vec/ Example: Semantic Spaces (word2vec, GloVe)
  38. 38. http://nlp.stanford.edu/projects/glove/ Example: Semantic Spaces (word2vec, GloVe)
  39. 39. Encoding semantics Using word2vec instead of word indexes allows you to better deal with the word meanings (e.g. no need to enumerate all synonyms because their vectors are already close to each other). But the naive way to work with word2vec vectors still gives you a “bag of words” model, where phrases “The man killed the tiger” and “The tiger killed the man” are equal. Need models which pay attention to the word ordering: paragraph2vec, sentence embeddings (using RNN/LSTM), even World2Vec (LeCunn @CVPR2015).
  40. 40. Multi-modal learning http://arxiv.org/abs/1411.2539 Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
  41. 41. Example: More multi-modal learning
  42. 42. Case: Sentiment analysis http://nlp.stanford.edu/sentiment/ Can capture complex cases where bag-of-words models fail. “This movie was actually neither that funny, nor super witty.”
  43. 43. Case: Machine Translation Sequence to Sequence Learning with Neural Networks, http://arxiv.org/abs/1409.3215
  44. 44. Case: Automated Speech Translation Translating voice calls and video calls in 7 languages and instant messages in over 50. https://www.skype.com/en/features/skype-translator/
  45. 45. Case: Baidu Automated Speech Recognition (ASR)
  46. 46. More Fun: MtG cards http://www.escapistmagazine.com/articles/view/scienceandtech/14276-Magic-The-Gathering-Cards-Made-by-Artificial-Intelligence
  47. 47. Case: Question Answering A Neural Network for Factoid Question Answering over Paragraphs, https://cs.umd.edu/~miyyer/qblearn/
  48. 48. Case: Dialogue Systems A Neural Conversational Model, Oriol Vinyals, Quoc Le http://arxiv.org/abs/1506.05869
  49. 49. What for: Conversational Commerce https://medium.com/chris-messina/2016-will-be-the-year-of-conversational-commerce-1586e85e3991
  50. 50. What for: Conversational Commerce
  51. 51. Summary
  52. 52. Why Deep Learning is helpful? Or even a game-changer ● Works on raw data (pixels, sound, text or chars), no need to feature engineering ○ Some features are really hard to develop (requires years of work for group of experts) ○ Some features are patented (i.e. SIFT, SURF for images) ● Allows end-to-end learning (pixels-to-category, sound to sentence, English sentence to Chinese sentence, etc) ○ No need to do segmentation, etc. (a lot of manual labor) ⇒ You can iterate faster (and get superior quality at the same time!)
  53. 53. Still some issues exist ● No dataset -- no deep learning There are a lot of data available (and it’s required for deep learning, otherwise simple models could be better) ○ But sometimes you have no dataset… ■ Nonetheless some hacks available: Transfer learning, Data augmentation, Mechanical Turk, … ● Requires a lot of computations. No cluster or GPU machines -- much more time required
  54. 54. So what to do next?
  55. 55. Universal Libraries and Frameworks ● Torch7 (http://torch.ch/) ● TensorFlow (https://www.tensorflow.org/) ● Theano (http://deeplearning.net/software/theano/) ○ Keras (http://keras.io/) ○ Lasagne (https://github.com/Lasagne/Lasagne) ○ blocks (https://github.com/mila-udem/blocks) ○ pylearn2 (https://github.com/lisa-lab/pylearn2) ● CNTK (http://www.cntk.ai/) ● Neon (http://neon.nervanasys.com/) ● Deeplearning4j (http://deeplearning4j.org/) ● Google Prediction API (https://cloud.google.com/prediction/) ● … ● http://deeplearning.net/software_links/
  56. 56. Libraries & Frameworks for image/video processing ● OpenCV (http://opencv.org/) ● Caffe (http://caffe.berkeleyvision.org/) ● Torch7 (http://torch.ch/) ● clarifai (http://clarif.ai/) ● Google Vision API (https://cloud.google.com/vision/) ● … ● + all universal libraries
  57. 57. Libraries & Frameworks for speech ● CNTK (http://www.cntk.ai/) ● KALDI (http://kaldi-asr.org/) ● Google Speech API (https://cloud.google.com/) ● Yandex SpeechKit (https://tech.yandex.ru/speechkit/) ● Baidu Speech API (http://www.baidu.com/) ● wit.ai (https://wit.ai/) ● …
  58. 58. Libraries & Frameworks for text processing ● Torch7 (http://torch.ch/) ● Theano/Keras/… ● TensorFlow (https://www.tensorflow.org/) ● MetaMind (https://www.metamind.io/) ● Google Translate API (https://cloud.google.com/translate/) ● … ● + all universal libraries
  59. 59. What to read and where to study? - CS231n: Convolutional Neural Networks for Visual Recognition, Fei-Fei Li, Andrej Karpathy, Stanford (http://vision.stanford. edu/teaching/cs231n/index.html) - CS224d: Deep Learning for Natural Language Processing, Richard Socher, Stanford (http://cs224d.stanford.edu/index.html) - Neural Networks for Machine Learning, Geoffrey Hinton (https://www. coursera.org/course/neuralnets) - Computer Vision course collection (http://eclass.cc/courselists/111_computer_vision_and_navigation) - Deep learning course collection (http://eclass.cc/courselists/117_deep_learning) - Book “Deep Learning”, Ian Goodfellow, Yoshua Bengio and Aaron Courville (http://www.deeplearningbook.org/)
  60. 60. What to read and where to study? - Google+ Deep Learning community (https://plus.google. com/communities/112866381580457264725) - VK Deep Learning community (http://vk.com/deeplearning) - Quora (https://www.quora.com/topic/Deep-Learning) - FB Deep Learning Moscow (https://www.facebook. com/groups/1505369016451458/) - Twitter Deep Learning Hub (https://twitter.com/DeepLearningHub) - NVidia blog (https://devblogs.nvidia.com/parallelforall/tag/deep-learning/) - IEEE Spectrum blog (http://spectrum.ieee.org/blog/cars-that-think) - http://deeplearning.net/ - Arxiv Sanity Preserver http://www.arxiv-sanity.com/ - ...
  61. 61. Whom to follow? - Jürgen Schmidhuber (http://people.idsia.ch/~juergen/) - Geoffrey E. Hinton (http://www.cs.toronto.edu/~hinton/) - Google DeepMind (http://deepmind.com/) - Yann LeCun (http://yann.lecun.com, https://www.facebook.com/yann.lecun) - Yoshua Bengio (http://www.iro.umontreal.ca/~bengioy, https://www.quora. com/profile/Yoshua-Bengio) - Andrej Karpathy (http://karpathy.github.io/) - Andrew Ng (http://www.andrewng.org/) - ...
  62. 62. https://ru.linkedin.com/in/grigorysapunov gs@inten.to Thanks!

×