Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Learning: Application Landscape - March 2018

2,937 views

Published on

A presentation on the recent progress in Deep Learning. Collection of cases and applications. March 2018 version.

Published in: Technology
  • Be the first to comment

Deep Learning: Application Landscape - March 2018

  1. 1. Deep Learning: Application landscape Grigory Sapunov Private Event / Mar 2018 gs@inten.to
  2. 2. The Context
  3. 3. AI/ML/DL ● Artificial Intelligence (AI) is a broad field of study dedicated to complex problem solving. ● Machine Learning (ML) is usually considered as a subfield of AI. ML is a data-driven approach focused on creating algorithms that has the ability to learn from the data without being explicitly programmed. ● Deep Learning (DL) is a subfield of ML focused on deep neural networks (NN) able to automatically learn hierarchical representations.
  4. 4. Different approaches to solving problems
  5. 5. Deep Learning approach
  6. 6. Deep Learning success: why now?
  7. 7. Recent progress
  8. 8. Typical image-related tasks https://research.facebook.com/blog/learning-to-segment/ Detection task is harder than classification, but both are almost done. And with better-than-human quality.
  9. 9. Human quality is estimated as ~5.1% error rate on this dataset (0.051) From Lex Fridman slides: https://selfdrivingcars.mit.edu/ Image recognition quality on ImageNet dataset
  10. 10. Example: Object Detection
  11. 11. Example: Activity Recognition
  12. 12. Example: Semantic Segmentation
  13. 13. https://stanfordmlgroup.github.io/projects/chexnet/ Example: Radiologist-Level Pneumonia Detection
  14. 14. Example: Image Colorization Learning Representations for Automatic Colorization https://arxiv.org/abs/1603.06668
  15. 15. Example: Photo-realistic Style Transfer https://arxiv.org/abs/1703.07511 Deep Photo Style Transfer
  16. 16. Example: Background removal https://towardsdatascience.com/background-removal-with-deep-learning-c4f2104b3157
  17. 17. Example: Object removal http://hi.cs.waseda.ac.jp/~iizuka/projects/completion/en/
  18. 18. Example: Image completion http://hi.cs.waseda.ac.jp/~iizuka/projects/completion/en/
  19. 19. Example: Learning Lip Sync from Audio http://grail.cs.washington.edu/projects/AudioToObama/ https://www.youtube.com/watch?v=9Yq67CjDqvw
  20. 20. Example: DeepFakes, FakeApp https://thenextweb.com/artificial-intelligence/2018/02/21/deepfakes-algorithm-nails-donald-trump-in-most-convincing-fake-yet/
  21. 21. New kid on the block: GAN https://www.technologyreview.com/lists/technologies/2018/
  22. 22. Example: Generating images by GAN Progressive Growing of GANs for Improved Quality, Stability, and Variation, https://github.com/tkarras/progressive_growing_of_gans https://www.youtube.com/watch?v=XOxxPcy5Gr4
  23. 23. GAN rapid evolution The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation https://arxiv.org/abs/1802.07228
  24. 24. Example: Multi-Domain Image-to-Image Translation https://github.com/yunjey/StarGAN
  25. 25. Example: Unsupervised Image-to-Image Translation http://research.nvidia.com/publication/2017-12_Unsupervised-Image-to-Image-Translation https://www.youtube.com/watch?v=nlyXoX2aIek https://arxiv.org/abs/1703.00848
  26. 26. But...
  27. 27. What’s with the Big Picture? https://www.engadget.com/2018/01/23/photo-stitch-ai-fail-the-big-picture/
  28. 28. Still some issues exist: Reasoning Deep learning is mainly about perception, but there is a lot of inference involved in everyday human reasoning. ● Neural networks lack common sense ● Cannot find information by inference ● Cannot explain the answer ○ It could be a must-have requirement in some areas, i.e. law, medicine. ○ GDPR is coming The most fruitful approach is likely to be a hybrid neural-symbolic system. Topic of active research right now.
  29. 29. Adversarial Examples
  30. 30. Adversarial Examples https://spectrum.ieee.org/cars-that-think/transportation/sensors/slight-street-sign-modifications-can-fool-machine-learning-algorithms
  31. 31. Robust Adversarial Examples https://blog.openai.com/robust-adversarial-inputs/
  32. 32. Physical Adversarial Examples http://www.labsix.org/physical-objects-that-fool-neural-nets/
  33. 33. Adversarial Patch https://arxiv.org/abs/1712.09665
  34. 34. Computer & Human Adversarial Examples https://spectrum.ieee.org/the-human-os/robotics/artificial-intelligence/hacking-the-brain-with-adversarial-images
  35. 35. Text Processing / NLP
  36. 36. Deep Learning and NLP Variety of tasks: ● Classification: language detection, genre and topic detection, positive/negative sentiment analysis, authorship detection, … ● Fact extraction: people and company names, geography, prices, dates, product names, … ● Language modeling, Part of speech recognition ● Key phrase extraction ● Finding synonyms ● Machine translation ● Search (written and spoken) ● Question answering ● Dialog systems
  37. 37. Example: Entity Extraction https://aws.amazon.com/blogs/aws/amazon-comprehend-continuously-trained-natural-language-processing/
  38. 38. Example: Neural Machine Translation vs. other https://research.googleblog.com/2016/09/a-neural-network-for-machine.html
  39. 39. Example: Machine Translation Quality Evolution https://bit.ly/mt_mar2018
  40. 40. Example: Legal document analyzing / NDA https://www.prnewswire.com/news-releases/artificial-intelligence-more-accurate-than-lawyers-for-reviewing-contracts-new-study-reveals-300603781.html “The highest performing lawyer in the study achieved 94% accuracy - matching the AI - while the lowest performing lawyer achieved an average 67% accuracy. The challenge took the LawGeex AI 26 seconds to complete, compared to an average of 92 minutes for the lawyers. The longest time taken by a lawyer to complete the test was 156 minutes, and the shortest time was 51 minutes.”
  41. 41. Example: Legal document analyzing / Privacy policies https://www.wired.com/story/polisis-ai-reads-privacy-policies-so-you-dont-have-to/ “In about 30 seconds, Polisis can read a privacy policy it's never seen before and extract a readable summary, displayed in a graphic flow chart, of what kind of data a service collects, where that data could be sent, and whether a user can opt out of that collection or sharing.”
  42. 42. https://research.googleblog.com/2017/05/efficient-smart-reply-now-for-gmail.html Example: Text generation / Smart Reply
  43. 43. https://arxiv.org/abs/1708.08151 Automated Crowdturfing Attacks and Defenses in Online Review Systems Example: Review generation (Human-like!)
  44. 44. Example: Seq2SQL https://arxiv.org/abs/1709.00103 Seq2SQL: Generating Structured Queries from Natural Language ...
  45. 45. Example: Question Answering SQuAD: 100,000+ Questions for Machine Comprehension of Text, https://arxiv.org/abs/1606.05250 https://rajpurkar.github.io/SQuAD-explorer/ http://u.cs.biu.ac.il/~yogo/squad-vs-human.pdf
  46. 46. https://blog.drift.com/chatbots-report/
  47. 47. Still many problems with chatbots http://www.eweek.com/big-data-and-analytics/state-of-chatbots-in-2018-rapidly-moving-into-the-mainstream Key PointSource findings include: ● When AI is present, half of (49 percent) consumers are already willing to shop more frequently, 34 percent will spend more money and 38 percent will share their experiences with friends and family. ● 51 percent of consumers still anticipate frustrations around chatbots not understanding what they’re looking for; 44 percent question the accuracy of the information chatbots provide. ● More than half (54 percent) of consumers would still prefer to talk to a customer service representative. ● If a customer is on hold with a customer service rep, 34 percent of customers want to switch to a chatbot after 5 minutes have passed. However, 59 percent get frustrated if a chatbot doesn’t resolve their inquiry in that same time.
  48. 48. Text + Image / Multimodal learning
  49. 49. DL/Multi-modal Learning Deep Learning models become multi-modal: they use 2+ modalities simultaneously, i.e.: ● Image caption generation: images + text ● Search Web by an image: images + text ● Video describing: the same but added time dimension ● Visual question answering: images + text ● Speech recognition: audio + video (lip motion) ● Image classification and navigation: RGB-D (color + depth) Will be possible to match different modalities easily.
  50. 50. Example: Caption Generation (text by image) http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
  51. 51. Example: NeuralTalk and Walk Ingredients: ● https://github.com/karpathy/neuraltalk2 Project for learning Multimodal Recurrent Neural Networks that describe images with sentences ● Webcam/notebook Result: ● https://vimeo.com/146492001
  52. 52. More hacking: NeuralTalk and Walk
  53. 53. Example: Video description (text by video) https://vsubhashini.github.io/s2vt.html
  54. 54. Example: Image generation by text AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks, https://arxiv.org/abs/1711.10485
  55. 55. Example: Code generation by image pix2code: Generating Code from a Graphical User Interface Screenshot, https://arxiv.org/abs/1705.07962
  56. 56. SketchCode: Go from idea to HTML in 5 seconds Automated front-end development using deep learning https://blog.insightdatascience.com/automated-front-end-development-using-deep-learning-3169dd086e82
  57. 57. Speech
  58. 58. Speech Recognition: Word Error Rate (WER) [2017] “Google’s speech recognition technology now has a 4.9% word error rate” (2017) https://venturebeat.com/2017/05/17/googles-speech-recognition-technology-now-has-a-4-9-word-error-rate/ Microsoft “It can now transcribe human speech with a 5.1% error rate” http://uk.businessinsider.com/microsofts-speech-recognition-5-1-error-rate-human-level-accuracy-2017-8 IBM. “The company has reached a 5.5 percent word error rate that's nearly on par with humans.” https://www.engadget.com/2017/03/10/ibm-speech-recognition-accuracy-record/
  59. 59. Speech Recognition: Lip Reading “This lip reading performance beats a professional lip reader on videos from BBC television, and we also demonstrate that visual information helps to improve speech recognition performance even when the audio is available.” Lip Reading Sentences in the Wild, https://arxiv.org/abs/1611.05358 “To the best of our knowledge, LipNet is the first end-to-end sentence-level lipreading model that simultaneously learns spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 95.2% accuracy in sentence-level, overlapped speaker split task, outperforming experienced human lipreaders and the previous 86.4% word-level state-of-the-art accuracy.“ LipNet: End-to-End Sentence-level Lipreading, https://arxiv.org/abs/1611.01599
  60. 60. Case: Amazon Echo Amazon Alexa is in more than 20 million devices. The vast majority of these are in the Amazon Echo portfolio. https://www.voicebot.ai/2017/10/27/bezos-says-20-million-amazon-alexa-devices-sold/
  61. 61. Case: Skype Live Translation Translating voice calls and video calls in 8 languages and instant messages in over 50. https://www.skype.com/en/features/skype-translator/
  62. 62. Case: Google Pixel Buds Google packed its headphones (in combination with the Pixel 2) with the power to translate between 40 languages, literally in real-time. The company has finally done what science fiction and countless Kickstarters have been promising us, but failing to deliver on, for years. This technology could fundamentally change how we communicate across the global community. https://www.engadget.com/2017/10/04/google-pixel-buds-translation-change-the-world/
  63. 63. ● “Our approach does not use complex linguistic and acoustic features as input. Instead, we generate human-like speech from text using neural networks trained using only speech examples and corresponding text transcripts.” Speech Synthesis: Tacotron 2 (Google, 2017) https://research.googleblog.com/2017/12/tacotron-2-generating-human-like-speech.html
  64. 64. ● “Deep Voice 3 introduces a completely novel neural network architecture for speech synthesis. This novel architecture trains an order of magnitude faster, allowing us to scale over 800 hours of training data and synthesize speech from over 2,400 voices, which is more than any other previously published text-to-speech model.” Speech Synthesis: Deep Voice 3 (Baidu, 2017) http://research.baidu.com/deep-voice-3-2000-speaker-neural-text-speech/
  65. 65. But the same problem with adversarial examples... Did you hear that? Adversarial Examples Against Automatic Speech Recognition https://arxiv.org/abs/1801.00554
  66. 66. Did you hear that? Adversarial Examples Against Automatic Speech Recognition https://arxiv.org/abs/1801.00554
  67. 67. [Robotic] Control
  68. 68. Drone control http://www.digitaltrends.com/cool-tech/swiss-drone-ai-follows-trails/ This drone can automatically follow forest trails to track down lost hikers
  69. 69. Car control Meet the 26-Year-Old Hacker Who Built a Self-Driving Car... in His Garage https://www.youtube.com/watch?v=KTrgRYa2wbI
  70. 70. Car driving https://www.youtube.com/watch?v=YuyT2SDcYrU “Actually a “Perception to Action” system. The visual perception and control system is a Deep learning architecture trained end to end to transform pixels from the cameras into steering angles. And this car uses regular color cameras, not LIDARS like the Google cars. It is watching the driver and learns.”
  71. 71. Example: Sensorimotor Deep Learning “In this project we aim to develop deep learning techniques that can be deployed on a robot to allow it to learn directly from trial-and-error, where the only information provided by the teacher is the degree to which it is succeeding at the current task.” http://rll.berkeley.edu/deeplearningrobotics/
  72. 72. Games https://blog.openai.com/dota-2/ https://blog.openai.com/more-on-dota-2/
  73. 73. AlphaGo Lee: Computer-Human 4:1
  74. 74. AlphaGo Zero
  75. 75. AlphaZero
  76. 76. Poker: Libratus http://www.dailymail.co.uk/sciencetech/article-4177262/AI-beats-professional-poker-players-Pittsburgh.html https://fr.pokernews.com/news/2017/01/ai-bot-libratus-poker-no-limit-wins-science-32312.htm “The research has implications for situations where information is incomplete and misinformation can be given, such as business negotiations, military strategy, cybersecurity and planning of medical treatments.”
  77. 77. ML for Systems
  78. 78. ML in datacenters “We’ve managed to reduce the amount of energy we use for cooling by up to 40 percent.” https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/
  79. 79. Device Placement with Reinforcement Learning Device Placement Optimization with Reinforcement Learning https://arxiv.org/abs/1706.04972
  80. 80. Neural Architecture Search Efficient Neural Architecture Search via Parameter Sharing https://arxiv.org/abs/1802.03268
  81. 81. Examples - Improving ML algorithms: Device placement, Architecture search, Optimizer search, Ensembling, ... - Optimizing indexes in DB (The Case for Learned Index Structures, https://arxiv.org/abs/1712.01208) - Improving datacenter efficiency: optimize cooling, optimize virtual machine placement, ... - … Computer Systems are filled with heuristics that work well “in general case”. But they generally don’t adapt to actual pattern of usage and don’t take into account available context. We can use ML anywhere we’re using heuristics to make a decision! See Jeff Dean talk at NIPS 2017 http://learningsys.org/nips17/assets/slides/dean-nips17.pdf
  82. 82. Examples Compilers: instruction scheduling, register allocation, loop nest parallelization strategies, … Networking: TCP window size decisions, backoff for retransmits, data compression, ... Operating systems: process scheduling, buffer cache insertion/replacement, file system prefetching, … Job scheduling systems: which tasks/VMs to co-locate on same machine, which tasks to pre-empt, ... ASIC design: physical circuit layout, test case selection, … See Jeff Dean talk at NIPS 2017 http://learningsys.org/nips17/assets/slides/dean-nips17.pdf
  83. 83. See Jeff Dean talk at NIPS 2017 http://learningsys.org/nips17/assets/slides/dean-nips17.pdf
  84. 84. Data & Models
  85. 85. No dataset — no deep learning Deep learning requires a lot of data (otherwise simple models could be better). But sometimes you have no dataset… Nonetheless several ways available: ● Transfer learning ● Data augmentation ● Mechanical Turk ● Unsupervised pre-training ● moving towards one-shot and zero-shot learning ● …
  86. 86. The data scale versus the model performance
  87. 87. http://www.spacemachine.net/views/2016/3/datasets-over-algorithms Importance of Datasets
  88. 88. Data & Models vs. Code The almost same state-of-the-art code is mostly available for all the market. Currently the real differentiator is a data or trained models (the data derivative thing). Using a publicly available code/algorithm with unique data it’s possible to create a better quality model than with the highly-specialized code with public data. There is a space for a new type of infrastructure ● Data and algorithm marketplaces ● Model marketplaces and model repositories ● AutoML (already appearing) ● Model management ● Model quality evaluation ● ...
  89. 89. Hardware
  90. 90. Still some issues exist: Computing power DL requires a lot of computations. Without a cluster or GPU machines much more time is required. ● Currently GPUs (mostly NVIDIA) is the only choice ● FPGA/ASIC are coming into this field (Google TPU gen.2, Bitmain Sophon, Intel 2018+). The situation resembles the path of Bitcoin mining ● Neuromorphic computing is on the rise (IBM TrueNorth, Intel, memristors, etc) ● Quantum computing can benefit machine learning as well (but probably it won’t be a desktop or in-house server solutions)
  91. 91. NVIDIA slides: http://www.nvidia.com/content/events/geoInt2015/LBrown_DL.pdf
  92. 92. Computing power grows https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664
  93. 93. Distributed training is a commodity now Image from: https://github.com/uber/horovod
  94. 94. Case: AlphaGo Zero https://deepmind.com/blog/alphago-zero-learning-scratch/
  95. 95. Trends: Supercomputer performance (GFLOPS FP64) https://en.wikipedia.org/wiki/TOP500
  96. 96. Personal Supercomputers ● NVIDIA DGX-1 Server ($149,000) Performance: 1000 TFLOPS FP16, 125 TFLOPS FP32 * NVIDIA DGX-2 (16 TESLA V100, 2 PFLOPS FP16) is just announced ● DeepLearning11 ($16,500, contains 10x NVIDIA GeForce GTX 1080 Ti) Performance: 100 TFLOPS FP32 ● NVIDIA GTX Titan V gaming card ($3000) 6.9 TFLOPS FP64 (! it is not usually reported FP16 performance !) ○ Corresponds to the best supercomputer in the world at 2001–2002 (IBM ASCI White with 7.226 TFLOPS peak speed) and a supercomputer on 500th place (still a cool supercomputer) of the TOP500 list in November 2007 (the entry level to the list was the 5.9 TFlop/s) ● For comparison: Huawei Mate 10 smartphone with Kirin 970 Neural Network Processing Unit, 1.92 TFLOPS FP16 ○ A similar performance (but FP64) had the top performing supercomputer of 1997 https://blog.inten.to/hardware-for-deep-learning-part-3-gpu-8906c1644664
  97. 97. AI at the edge ● NVidia Jetson TK1/TX1/TX2 ○ 192/256/256 CUDA Cores ○ 64/64/128-bit 4/4/6-Core ARM CPU, 2/4/8 Gb Mem ○ Xavier is coming ● Tablets, Smartphones ○ Qualcomm Snapdragon 845 ○ Apple A11 Bionic ○ Huawei Kirin 970 ● Raspberry Pi 3 (1.2 GHz 4-core) ● Movidius Neural Compute Stick
  98. 98. References: Hardware for Deep Learning series of posts: https://blog.inten.to/hardware-for-deep-learning-current-state-and-trends-51c01ebbb6dc ● Part 1: Introduction and Executive summary ● Part 2: CPU ● Part 3: GPU ● Part 4: FPGA ● Part 5: ASIC ● Part 6: Mobile AI ● Part 7: Neuromorphic computing ● Part 8: Quantum computing
  99. 99. Security
  100. 100. https://blog.openai.com/preparing-for-malicious-uses-of-ai/
  101. 101. AI changes the landscape of threats ● Expansion of existing threats ○ The costs of attacks are lowered ■ Set of actors who can carry out attacks expands ■ The rate and scale of attacks can increase ■ The set of potential targets can expand ● Introduction of new threats ○ AI systems can compete tasks that would be otherwise impractical for humans ○ Exploiting vulnerabilities of AI systems ● Change to the typical character of threats ○ Attacks can be especially effective ○ Finely targeted ○ Difficult to attribute
  102. 102. Many other issues exist as well ● Unintentional forms of AI misuse like algorithmic bias ● Indirect threats: mass unemployment, or other second- or third-order effects from the deployment of AI technology ● System-level threats that would come from the dynamic interaction between non-malicious actors, e.g. “race to the bottom” on AI safety ● Existential risks from the human-level AI ● Unclear regulation
  103. 103. On the good side
  104. 104. https://ru.linkedin.com/in/grigorysapunov gs@inten.to Thanks!

×