Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep learning cases - Founders Institute/Moscow - 2017.10.19

1,372 views

Published on

Updated version of the presentation on deep learning cases for image, video, text processing and control.

Dedicated section on hardware for the modern and future AI.

Published in: Technology

Deep learning cases - Founders Institute/Moscow - 2017.10.19

  1. 1. Deep Learning Cases: Image, Text and Control Grigory Sapunov Founders Institute Moscow 19.10.2017 gs@inten.to
  2. 2. AI/ML/DL ● Artificial Intelligence (AI) is a broad field of study dedicated to complex problem solving. ● Machine Learning (ML) is usually considered as a subfield of AI. ML is a data-driven approach focused on creating algorithms that has the ability to learn from the data without being explicitly programmed. ● Deep Learning (DL) is a subfield of ML focused on deep neural networks (NN) able to automatically learn hierarchical representations.
  3. 3. “Simple” Image & Video Processing
  4. 4. Typical tasks for CNNs https://research.facebook.com/blog/learning-to-segment/ Detection task is harder than classification, but both are almost done. And with better-than-human quality.
  5. 5. Super-human recognition ● Blue: Traditional CV ● Purple: Deep Learning ● Red: Human
  6. 6. Case #1: IJCNN 2011 The German Traffic Sign Recognition Benchmark ● Classification, >40 classes ● >50,000 real-life images ● First Superhuman Visual Pattern Recognition ○ 2x better than humans ○ 3x better than the closest artificial competitor ○ 6x better than the best non-neural method Method Correct (Error) 1 Committee of CNNs 99.46 % (0.54%) 2 Human Performance 98.84 % (1.16%) 3 Multi-Scale CNNs 98.31 % (1.69%) 4 Random Forests 96.14 % (3.86%) http://people.idsia.ch/~juergen/superhumanpatternrecognition.html
  7. 7. Case #2: ILSVRC 2010-2015 Large Scale Visual Recognition Challenge (ILSVRC) ● Object detection (200 categories, ~0.5M images) ● Classification + localization (1000 categories, 1.2M images)
  8. 8. Examples: Object Detection
  9. 9. Example: Face Detection + Emotion Classification
  10. 10. Example: Face Detection + Classification + Regression
  11. 11. Examples: Food Recognition
  12. 12. Examples: Computer Vision on the Road
  13. 13. Examples: Pedestrian Detection
  14. 14. Examples: Activity Recognition
  15. 15. Examples: Road Sign Recognition (on mobile!)
  16. 16. More complex Image & Video Processing
  17. 17. https://www.youtube.com/watch?v=ZJMtDRbqH40 NYU Semantic Segmentation with a Convolutional Network (33 categories) Semantic Segmentation
  18. 18. Semantic Segmentation
  19. 19. Image Colorization http://richzhang.github.io/colorization/
  20. 20. Fun: Deep Dream http://blogs.wsj.com/digits/2016/02/29/googles-computers-paint-like-van-gogh-and-the-art-sells-for-thousands/
  21. 21. More Fun: Neural Style http://www.boredpanda.com/inceptionism-neural-network-deep-dream-art/
  22. 22. More Fun: Neural Doodle http://arxiv.org/abs/1603.01768 Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks (a) Original painting by Renoir, (b) semantic annotations, (c) desired layout, (d) generated output.
  23. 23. More Fun: Photo-realistic Style Transfer https://arxiv.org/abs/1703.07511 Deep Photo Style Transfer
  24. 24. More Fun: Photo-realistic Style Transfer https://arxiv.org/abs/1703.07511 Deep Photo Style Transfer
  25. 25. Generative Adversarial Networks (GANs) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks https://arxiv.org/abs/1511.06434
  26. 26. Generative Adversarial Networks (GANs) http://www.evolvingai.org/ppgn
  27. 27. Text Processing / NLP
  28. 28. Deep Learning and NLP Variety of tasks: ● Finding synonyms ● Fact extraction: people and company names, geography, prices, dates, product names, … ● Classification: genre and topic detection, positive/negative sentiment analysis, authorship detection, … ● Machine translation ● Search (written and spoken) ● Question answering ● Dialog systems ● Language modeling, Part of speech recognition
  29. 29. https://code.google.com/archive/p/word2vec/ Example: Semantic Spaces (word2vec, GloVe) vector('king') - vector('man') + vector('woman') = vector('queen')
  30. 30. http://nlp.stanford.edu/projects/glove/ Example: Semantic Spaces (word2vec, GloVe)
  31. 31. Encoding semantics Using word2vec instead of word indexes allows you to better deal with the word meanings (e.g. no need to enumerate all synonyms because their vectors are already close to each other). But the naive way to work with word2vec vectors still gives you a “bag of words” model, where phrases “The man killed the tiger” and “The tiger killed the man” are equal. Need models which pay attention to the word ordering: paragraph2vec, sentence embeddings (using RNN/LSTM), even World2Vec (LeCunn @CVPR2015).
  32. 32. Multi-modal learning http://arxiv.org/abs/1411.2539 Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
  33. 33. Example: More multi-modal learning
  34. 34. Caption Generation http://arxiv.org/abs/1411.4555 “Show and Tell: A Neural Image Caption Generator”
  35. 35. Example: NeuralTalk and Walk Ingredients: ● https://github.com/karpathy/neuraltalk2 Project for learning Multimodal Recurrent Neural Networks that describe images with sentences ● Webcam/notebook Result: ● https://vimeo.com/146492001
  36. 36. More hacking: NeuralTalk and Walk
  37. 37. Product of the near future: DenseCap and ? http://arxiv.org/abs/1511.07571 DenseCap: Fully Convolutional Localization Networks for Dense Captioning
  38. 38. Example: Image generation by text StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks, https://arxiv.org/abs/1612.03242
  39. 39. Visual Question Answering https://avisingh599.github.io/deeplearning/visual-qa/
  40. 40. Case: Sentiment analysis http://nlp.stanford.edu/sentiment/ Can capture complex cases where bag-of-words models fail. “This movie was actually neither that funny, nor super witty.”
  41. 41. Case: Sentiment analysis https://blog.openai.com/unsupervised-sentiment-neuron/ “Our research implies that simply training large unsupervised next-step-prediction models on large amounts of data may be a good approach to use when creating systems with good representation learning capabilities.”
  42. 42. Case: Machine Translation Sequence to Sequence Learning with Neural Networks, http://arxiv.org/abs/1409.3215
  43. 43. Case: Automated Speech Translation Translating voice calls and video calls in 8 languages and instant messages in over 50. https://www.skype.com/en/features/skype-translator/
  44. 44. Speech Recognition: Word Error Rate (WER) “Google now has just an 8 percent error rate. Compare that to 23 percent in 2013” (2015) http://venturebeat.com/2015/05/28/google-says-its-speech-recognition-technology-now-has-only-an-8-word-error-rate/ IBM Watson. “The performance of our new system – an 8% word error rate – is 36% better than previously reported external results.” (2015) https://developer.ibm.com/watson/blog/2015/05/26/ibm-watson-announces-breakthrough-in-conversational-speech-transcr iption/ Baidu. “We are able to reduce error rates of our previous end-to-end system in English by up to 43%, and can also recognize Mandarin speech with high accuracy. Creating high-performing recognizers for two very different languages, English and Mandarin, required essentially no expert knowledge of the languages” (2015) http://arxiv.org/abs/1512.02595
  45. 45. Example: Baidu Deep Speech 2 (2015) ● “The Deep Speech 2 ASR pipeline approaches or exceeds the accuracy of Amazon Mechanical Turk human workers on several benchmarks, works in multiple languages with little modification, and is deployable in a production setting.” ● “Table 13 shows that the DS2 system outperforms humans in 3 out of the 4 test sets and is competitive on the fourth. Given this result, we suspect that there is little room for a generic speech system to further improve on clean read speech without further domain adaptation” Deep Speech 2: End-to-End Speech Recognition in English and Mandarin, http://arxiv.org/abs/1512.02595
  46. 46. Case: Baidu Automated Speech Recognition (ASR)
  47. 47. More Fun: MtG cards http://www.escapistmagazine.com/articles/view/scienceandtech/14276-Magic-The-Gathering-Cards-Made-by-Artificial-Intelligence
  48. 48. https://arxiv.org/abs/1708.08151 Automated Crowdturfing Attacks and Defenses in Online Review Systems Case: Review generation
  49. 49. Case: Question Answering A Neural Network for Factoid Question Answering over Paragraphs, https://cs.umd.edu/~miyyer/qblearn/
  50. 50. Case: Dialogue Systems A Neural Conversational Model, Oriol Vinyals, Quoc Le http://arxiv.org/abs/1506.05869
  51. 51. What for: Conversational Commerce https://medium.com/chris-messina/2016-will-be-the-year-of-conversational-commerce-1586e85e3991
  52. 52. What for: Conversational Commerce
  53. 53. [Robotic] Control
  54. 54. Reinforcement Learning Simulated race car control (2013) http://people.idsia.ch/~juergen/gecco2013torcs.pdf http://people.idsia.ch/~juergen/compressednetworksearch.html
  55. 55. Reinforcement Learning
  56. 56. Reinforcement Learning Human-level control through deep reinforcement learning (2014) http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html Playing Atari with Deep Reinforcement Learning (2013) http://arxiv.org/abs/1312.5602
  57. 57. Reinforcement Learning
  58. 58. Game of Go: Computer-Human 4:1
  59. 59. AlphaGo in datacenters “We’ve managed to reduce the amount of energy we use for cooling by up to 40 percent.” https://deepmind.com/blog/deepmind-ai-reduces-google-data-centre-cooling-bill-40/
  60. 60. Drone control http://www.digitaltrends.com/cool-tech/swiss-drone-ai-follows-trails/ This drone can automatically follow forest trails to track down lost hikers
  61. 61. Car control Meet the 26-Year-Old Hacker Who Built a Self-Driving Car... in His Garage https://www.youtube.com/watch?v=KTrgRYa2wbI
  62. 62. Car driving https://www.youtube.com/watch?v=YuyT2SDcYrU “Actually a “Perception to Action” system. The visual perception and control system is a Deep learning architecture trained end to end to transform pixels from the cameras into steering angles. And this car uses regular color cameras, not LIDARS like the Google cars. It is watching the driver and learns.”
  63. 63. Example: Sensorimotor Deep Learning “In this project we aim to develop deep learning techniques that can be deployed on a robot to allow it to learn directly from trial-and-error, where the only information provided by the teacher is the degree to which it is succeeding at the current task.” http://rll.berkeley.edu/deeplearningrobotics/
  64. 64. Summary
  65. 65. DL/Multi-modal Learning Deep Learning models become multi-modal: they use 2+ modalities simultaneously, i.e.: ● Image caption generation: images + text ● Search Web by an image: images + text ● Video describing: the same but added time dimension ● Visual question answering: images + text ● Speech recognition: audio + video (lips motion) ● Image classification and navigation: RGB-D (color + depth) Where does it aim to? ● Common metric space for each concept, “thought vector”. Will be possible to match different modalities easily.
  66. 66. DL/Transfer of Ideas Methods developed for one modality are successfully transferred to another: ● Convolutional Neural Networks, CNNs (originally developed for image recognition) work well on texts, speech and some time-series signals (e.g. ECG). ● Recurrent Neural Networks, RNNs (mostly used on language and other sequential data) seem to work on images. If the technologies successfully transfer from one modality to another (for example, image to texts for CNNs) then probably the ideas worked in one domain will work in another (style transfer for images could be transferred to texts).
  67. 67. Why Deep Learning is helpful? Or even a game-changer ● Works on raw data (pixels, sound, text or chars), no need to feature engineering ○ Some features are really hard to develop (requires years of work for group of experts) ○ Some features are patented (i.e. SIFT, SURF for images) ● Allows end-to-end learning (pixels-to-category, sound to sentence, English sentence to Chinese sentence, etc) ○ No need to do segmentation, etc. (a lot of manual labor) ⇒ You can iterate faster (and get superior quality at the same time!)
  68. 68. Still some issues exist: Datasets ● No dataset -- no deep learning There are a lot of data available (and it’s required for deep learning, otherwise simple models could be better) ○ But sometimes you have no dataset… ■ Nonetheless some hacks available: Transfer learning, Data augmentation, Mechanical Turk, …
  69. 69. http://www.spacemachine.net/views/2016/3/datasets-over-algorithms Still some issues exist: Datasets
  70. 70. Still some issues exist: Computing power ● Requires a lot of computations. No cluster or GPU machines -- much more time required ● Currently GPUs (mostly NVIDIA) is the only choice ● Waiting FPGA/ASIC coming into this field (Google TPU gen.2, Intel 2017+). The situation resembles the path of Bitcoin mining ● Neuromorphic computing is on the rise (IBM TrueNorth, memristors, etc) ● Quantum computing can benefit machine learning as well (but probably it won’t be a desktop or in-house server solutions)
  71. 71. Datasets and computing power are growing
  72. 72. Computing power is growing ● Google TPU gen.2 ○ 180 TFLOPS? ● NVIDIA DGX-1 ($129,000) ○ 170 TFLOPS (FP16) ○ 85 TFLOPS (FP32) ● NVIDIA Tesla V100/P100 ○ 15/10.6 TFLOPS ○ 120 TFLOPS on V100 Tensor Core units ● NVIDIA GTX Titan X (Pascal [new] / Maxwell [old]) ($1000) ○ 11/6.1 TFLOPS (FP32) ● NVIDIA GTX 1080/1080 Ti ($700) ○ 8/11.3 TFLOPS (FP32) ● NVIDIA Drive PX-2 / PX ○ 8.0/2.3 TFLOPS
  73. 73. ● NVidia Jetson TK1/TX1/TX2 ○ 192/256/256 CUDA Cores ○ 64/64/128-bit 4/4/6-Core ARM CPU, 2/4/8 Gb Mem ● Raspberry Pi 3 ○ 1.2 GHz 64-bit quad-core ARM Cortex-A53, 1 Gb SDRAM, US$35 ● Tablets, Smartphones ○ Qualcomm Snapdragon 835, Apple A11 Bionic ● Google Project Tango Deep Learning goes mobile!
  74. 74. Still some issues exist: Reasoning Deep learning is mainly about perception, but there is a lot of inference involved in everyday human reasoning. ● Neural networks lack common sense ● Cannot find information by inference ● Cannot explain the answer ○ It could be a must-have requirement in some areas, i.e. law, medicine.
  75. 75. Still some issues exist: Reasoning The most fruitful approach is likely to be a hybrid neural-symbolic system. Topic of active research right now. And it seems all major players are already go this way (Watson, Siri, Cyc, …) There is a lot of knowledge available (or extractable) in the world. Large knowledge bases about the real world (Cyc/OpenCyc, FreeBase, Wikipedia, schema.org, RDF, ..., scientific journals + text mining, …)
  76. 76. So what to do next?
  77. 77. Universal Libraries and Frameworks ● Torch7, PyTorch (http://torch.ch/, http://pytorch.org) [Lua, Python] ● TensorFlow (https://www.tensorflow.org/) [Python, C++] ● Keras (http://keras.io/) [Python] ● Theano (http://deeplearning.net/software/theano/) [Python] ○ Lasagne (https://github.com/Lasagne/Lasagne) ○ blocks (https://github.com/mila-udem/blocks) ○ pylearn2 (https://github.com/lisa-lab/pylearn2) ● Microsoft Cognitive Toolkit (CNTK) (http://www.cntk.ai/) [Python, C++, C#, BrainScript] ● Neon (http://neon.nervanasys.com/) [Python] ● Deeplearning4j (http://deeplearning4j.org/) [Java] ● MXNet (http://mxnet.io/) [C++, Python, R, Scala, Julia, Matlab, Javascript] ● …
  78. 78. Libraries & Frameworks for image/video processing ● OpenCV (http://opencv.org/) ● Caffe/Caffe2 (http://caffe.berkeleyvision.org/, https://caffe2.ai/) ● Torch7 (http://torch.ch/) ● clarifai (http://clarif.ai/) ● Google Vision API (https://cloud.google.com/vision/) ● … ● + all universal libraries
  79. 79. Libraries & Frameworks for speech ● Microsoft Cognitive Toolkit (CNTK) (http://www.cntk.ai/) [Python, C++, C#, BrainScript] ● KALDI (http://kaldi-asr.org/) [C++] ● Google Speech API (https://cloud.google.com/) ● Yandex SpeechKit (https://tech.yandex.ru/speechkit/) ● Baidu Speech API (http://www.baidu.com/) ● wit.ai (https://wit.ai/) ● …
  80. 80. Libraries & Frameworks for text processing ● Torch7 (http://torch.ch/) ● Theano/Keras/… ● TensorFlow (https://www.tensorflow.org/) ● Google Translate API (https://cloud.google.com/translate/) ● Salesforce Einstein (https://www.salesforce.com/products/einstein/overview/) ● ● Machine Translation Benchmark (July 2017) (https://www.slideshare.net/KonstantinSavenkov/intento-machine-translation-benchmark-july-2017) ● Intent Detection Benchmark (August 2017) (https://www.slideshare.net/KonstantinSavenkov/nlu-intent-detection-benchmark-by-intento-august- 2017) ● ...
  81. 81. What to read and where to study? - CS231n: Convolutional Neural Networks for Visual Recognition, Fei-Fei Li, Andrej Karpathy, Stanford (http://vision.stanford.edu/teaching/cs231n/index.html) - CS224d: Deep Learning for Natural Language Processing, Richard Socher, Stanford (http://cs224d.stanford.edu/index.html) - Neural Networks for Machine Learning, Geoffrey Hinton (https://www.coursera.org/course/neuralnets) - Computer Vision course collection (http://eclass.cc/courselists/111_computer_vision_and_navigation) - Deep learning course collection (http://eclass.cc/courselists/117_deep_learning) - Book “Deep Learning”, Ian Goodfellow, Yoshua Bengio and Aaron Courville (http://www.deeplearningbook.org/)
  82. 82. What to read and where to study? - Google+ Deep Learning community (https://plus.google.com/communities/112866381580457264725) - VK Deep Learning community (http://vk.com/deeplearning) - Quora (https://www.quora.com/topic/Deep-Learning) - FB Deep Learning Moscow (https://www.facebook.com/groups/1505369016451458/) - Twitter Deep Learning Hub (https://twitter.com/DeepLearningHub) - NVidia blog (https://devblogs.nvidia.com/parallelforall/tag/deep-learning/) - IEEE Spectrum blog (http://spectrum.ieee.org/blog/cars-that-think) - http://deeplearning.net/ - Arxiv Sanity Preserver http://www.arxiv-sanity.com/ - ...
  83. 83. Whom to follow? - Jürgen Schmidhuber (http://people.idsia.ch/~juergen/) - Geoffrey E. Hinton (http://www.cs.toronto.edu/~hinton/) - Google DeepMind (http://deepmind.com/) - Yann LeCun (http://yann.lecun.com, https://www.facebook.com/yann.lecun) - Yoshua Bengio (http://www.iro.umontreal.ca/~bengioy, https://www.quora.com/profile/Yoshua-Bengio) - Andrej Karpathy (http://karpathy.github.io/) - Andrew Ng (http://www.andrewng.org/) - ...
  84. 84. [Bonus] Hardware
  85. 85. Hardware: Overview Serious problems with the current processors are: ● energy efficiency (DeepMind used 1,202 CPUs and 176 GPUs) ● architecture (not well-suitable for brain-like computations)
  86. 86. Computing power is growing ● Google TPU gen.2 ○ 180 TFLOPS? ● NVIDIA DGX-1 ($129,000) ○ 170 TFLOPS (FP16) ○ 85 TFLOPS (FP32) ● NVIDIA Tesla V100/P100 ○ 15/10.6 TFLOPS ○ 120 TFLOPS on V100 Tensor Core units ● NVIDIA GTX Titan X (Pascal [new] / Maxwell [old]) ($1000) ○ 11/6.1 TFLOPS (FP32) ● NVIDIA GTX 1080/1080 Ti ($700) ○ 8/11.3 TFLOPS (FP32) ● NVIDIA Drive PX-2 / PX ○ 8.0/2.3 TFLOPS
  87. 87. (Sep 23, 2017) Inside iPhone 8: Apple's A11 Bionic introduces 5 new custom silicon engines “Creating an entirely new GPU architecture "wasn't innovative enough," so A11 Bionic also features an entirely new Neural Engine within its Image Signal Processor, tuned to solve very specific problems such as matching, analyzing and calculating thousands of reference points within a flood of image data rushing from the camera sensor. Those tasks could be sent to the GPU, but having logic optimized specifically for matrix multiplications and floating-point processing allows the Neural Engine to excel at those tasks. http://appleinsider.com/articles/17/09/23/inside-iphone-8-apples-a11-bionic-introduces-5-new-custom-silicon-engines Mobile AI: Apple
  88. 88. (Aug 16, 2017) We are making on-device AI ubiquitous “In fact, the Hexagon DSP with Qualcomm Hexagon Vector eXtensions on Snapdragon 835 has been shown to offer a 25X improvement in energy efficiency and an 8X Improvement in performance when compared against running the same workloads (GoogleNet Inception Network) on the Qualcomm Kryo CPU. We have introduced the Snapdragon Neural Processing Engine (NPE) Software Developer Kit (SDK). This features an accelerated runtime for on-device execution of convolutional neural networks (CNN) and recurrent neural networks (RNN) — which are great for tasks like image recognition and natural language processing, respectively” https://www.qualcomm.com/news/onq/2017/08/16/we-are-making-device-ai-ubiquitous Mobile AI: Qualcomm
  89. 89. FPGA/ASIC ● FPGA (field-programmable gate array) is an integrated circuit designed to be configured by a customer or a designer after manufacturing ● ASIC (application-specific integrated circuit) is an integrated circuit customized for a particular use, rather than intended for general-purpose use. ● Both FPGAs and ASICs are usually much more energy-efficient than general purpose processors (so more productive with respect to GFLOPS per Watt). ● OpenCL can be the language for development for FPGA, and more ML/DL libraries are using OpenCL too (for example, Caffe). So, there should appear an easy way to do ML on FPGAs. ● Bitcoin mining is another heavy-lifting task which passed the way from CPU through GPU to FPGA and finally ASICs. The history could repeat itself with deep learning.
  90. 90. FPGA/ASIC custom chips There is a lot of movement to FPGA/ASIC right now: ● Mobileye chips with specially developed ASIC cores are used in BMW, Tesla, Volvo, etc. ● Microsoft develops Project Catapult that uses clusters of FPGAs https://blogs.msdn.microsoft.com/msr_er/2015/11/12/project-catapult-servers-available-to-academic-researchers/ ● Baidu tries to use FPGAs for DL http://www.hotchips.org/wp-content/uploads/hc_archives/hc26/HC26-12-day2-epub/HC26.12-5-FPGAs-epub/HC26.12.545-Soft-Def-Acc-Ouyang-baidu-v3--baidu-v4.pdf ● Altera (one of the FPGA monsters) was acquired by Intel in 2015. Intel is working on a hybrid Xeon+FPGA chip http://www.nextplatform.com/2016/03/14/intel-marrying-fpga-beefy-broadwell-open-compute-future/ ● Nervana plans to make a special chip to make machine learning faster (acquired by Intel) http://www.eetimes.com/document.asp?doc_id=1328523& ● Movidius (acquired by Intel) Myriad X VPU - a dedicated hardware accelerator for deep neural network inferences. https://www.movidius.com/myriadx
  91. 91. ASIC: Google TPU ● (May 18, 2016) Google announced Tensor Processing Unit (TPU) ○ a custom ASIC built specifically for machine learning — and tailored for TensorFlow ○ Has been running TPUs inside Google’s data centers for more than a year. ○ Server racks with TPUs used in the AlphaGo matches with Lee Sedol https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html https://cloudplatform.googleblog.com/2017/04/quantifying-the-performance-of-the-TPU-our-first-machine-learning-chip.html
  92. 92. ASIC: Google TPU gen.2 ● (May 17, 2017) Build and train machine learning models on our new Google Cloud TPUs ○ Second generation of a custom ASIC built specifically for machine learning ○ Now supports training, not only inference ○ Enormous up to 180 teraflops of floating-point performance https://blog.google/topics/google-cloud/google-cloud-offer-tpus-machine-learning/ https://cloud.google.com/tpu/
  93. 93. A “TPU pod” built with 64 second-generation TPUs delivers up to 11.5 petaflops of machine learning acceleration.
  94. 94. https://cloud.google.com/tpu/
  95. 95. FPGA: Intel DLIA (Nov 15, 2016) Intel Unveils FPGA to Accelerate Neural Networks The Intel Deep Learning Inference Accelerator (DLIA) combines traditional Intel CPUs with field programmable gate arrays (FPGAs), semiconductors that can be reprogrammed to perform specialized computing tasks. FPGAs allow users to tailor compute power to specific workloads or applications. http://datacenterfrontier.com/intel-unveils-fpga-to-accelerate-ai-neural-networks/
  96. 96. ASIC: Intel Knights Mill (Aug 24, 2017) Intel Spills Details on Knights Mill Processor Knights Mill, a Xeon Phi processor tweaked for machine learning applications. Knights Mill represents the chipmaker’s first Xeon Phi offering aimed exclusively at the machine learning market, specifically for the training of deep neural networks. For the inferencing side of deep learning, Intel points to its Altera-based FPGA products, which are being used extensively by Microsoft in its Azure cloud. Knights Mill is scheduled for launch in Q4 of this year. https://www.top500.org/news/intel-spills-details-on-knights-mill-processor/
  97. 97. ASIC: Intel Nervana NNP (Oct 17, 2017) Announcing Industry’s First Neural Network Processor Intel will ship the industry’s first silicon for neural network processing, the Intel® Nervana™ Neural Network Processor (NNP), before the end of this year (ex-Lake Crest processor). ● New memory architecture designed for maximizing utilization of silicon computation ● Massive bi-directional data transfer to achieve true model parallelism where neural network parameters are distributed across multiple chips. ● A new numeric format called Flexpoint https://newsroom.intel.com/editorials/intel-pioneers-new-technologies-advance-artificial-intelligence/
  98. 98. Neuromorphic chips ● DARPA SyNAPSE program (Systems of Neuromorphic Adaptive Plastic Scalable Electronics) ● IBM TrueNorth; Stanford Neurogrid; HRL neuromorphic chip; Human Brain Project SpiNNaker and HICANN; Qualcomm. https://www.technologyreview.com/s/526506/neuromorphic-chips/ x http://www.eetimes.com/document.asp?doc_id=1327791
  99. 99. Neuromorphic chips: Snapdragon 820 Over the years, Qualcomm’s primary focus had been to make mobile processors for smartphones and tablets. But the company is now trying to expand into other areas including making chips for automobile and robots as well. The company is also marketing the Kyro as its neuromorphic, cognitive computing platform Zeroth. http://www.extremetech.com/computing/200090-qualcomms-cognitive-compute-processors-are-coming-to-snapdragon-820
  100. 100. Neuromorphic chips: IBM TrueNorth ● 1M neurons, 256M synapses, 4096 neurosynaptic cores on a chip, est. 46B synaptic ops per sec per W ● Uses 70mW, power density is 20 milliwatts per cm^2— almost 1/10,000th the power of most modern microprocessors ● “Our sights are now set high on the ambitious goal of integrating 4,096 chips in a single rack with 4B neurons and 1T synapses while consuming ~4kW of power”. ● Currently IBM is making plans to commercialize it. ● (2016) Lawrence Livermore National Lab got a cluster of 16 TrueNorth chips (16M neurons, 4B synapses, for context, the human brain has 86B neurons). When running flat out, the entire cluster will consume a grand total of 2.5 watts. http://spectrum.ieee.org/tech-talk/computing/hardware/ibms-braininspired-computer-chip-comes-from-the-future
  101. 101. Neuromorphic chips: IBM TrueNorth ● (03.2016) IBM Research demonstrated convolutional neural nets with close to state of the art performance: “Convolutional Networks for Fast, Energy-Efficient Neuromorphic Computing”, http://arxiv.org/abs/1603.08270
  102. 102. Neuromorphic chips: Intel Loihi (Sep 25, 2017) As part of an effort within Intel Labs, Intel has developed a first-of-its-kind self-learning neuromorphic chip – codenamed Loihi – that mimics how the brain functions by learning to operate based on various modes of feedback from the environment. This extremely energy-efficient chip, which uses the data to learn and make inferences, gets smarter over time and does not need to be trained in the traditional way. It takes a novel approach to computing via asynchronous spiking. It is up to 1,000 times more energy-efficient than general purpose computing required for typical training systems. In the first half of 2018, the Loihi test chip will be shared with leading university and research institutions with a focus on advancing AI. https://newsroom.intel.com/editorials/intels-new-self-learning-chip-promises-accelerate-artificial-intelligence/
  103. 103. Neuromorphic chips: Intel Loihi ● Fully asynchronous neuromorphic many core mesh that supports a wide range of sparse, hierarchical and recurrent neural network topologies ● Each neuromorphic core includes a learning engine that can be programmed to adapt network parameters during operation, supporting supervised, unsupervised, reinforcement and other learning paradigms. ● Fabrication on Intel’s 14 nm process technology. ● A total of 130,000 neurons and 130 million synapses. ● Development and testing of several algorithms with high algorithmic efficiency for problems including path planning, constraint satisfaction, sparse coding, dictionary learning, and dynamic pattern learning and adaptation. https://newsroom.intel.com/editorials/intels-new-self-learning-chip-promises-accelerate-artificial-intelligence/
  104. 104. Memristors ● Neuromorphic chips generally use the same silicon transistors and digital circuits that make up ordinary computer processors. There is another way to build brain inspired chips. https://www.technologyreview.com/s/537211/a-better-way-to-build-brain-inspired-chips/ ● Memristors (memory resistor), exotic electronic devices only confirmed to exist in 2008. The memristor's electrical resistance is not constant but depends on the history of current that had previously flowed through the device, i.e.the device remembers its history. An analog memory device. ● Some startups try to make special chips for low-power machine learning, i.e. Knowm http://www.forbes.com/sites/alexknapp/2015/09/09/this-startup-has-a-brain-inspired-chip-for-machine-learning/#5007095d51a2 http://www.eetimes.com/document.asp?doc_id=1327068
  105. 105. https://www.technologyreview.com/s/603495/10-breakthrough-technologies-2017-practical-quantum-computers/
  106. 106. Quantum Computing: D-Wave ● May 2013 Google teamed with NASA and launched Quantum AI Lab, equipped with a quantum computer from D-Wave Systems (D-Wave 2, 512 qubits). ● Aug 2015 D-Wave announced D-Wave 2X (1000+ qubits) ● Actually D-Wave computers are not full quantum computers.
  107. 107. Quantum Computing: D-Wave ● (May 2013) “We’ve already developed some quantum machine learning algorithms. One produces very compact, efficient recognizers -- very useful when you’re short on power, as on a mobile device. Another can handle highly polluted training data, where a high percentage of the examples are mislabeled, as they often are in the real world. And we’ve learned some useful principles: e.g., you get the best results not with pure quantum computing, but by mixing quantum and classical computing.” https://research.googleblog.com/2013/05/launching-quantum-artificial.html
  108. 108. Quantum Computing: D-Wave ● (Jun 2014) Yet results on the D-Wave 2 computer seem controversial: “Using random spin glass instances as a benchmark, we find no evidence of quantum speedup when the entire data set is considered, and obtain inconclusive results when comparing subsets of instances on an instance-by-instance basis. Our results do not rule out the possibility of speedup for other classes of problems and illustrate the subtle nature of the quantum speedup question.” http://science.sciencemag.org/content/early/2014/06/18/science.1252319
  109. 109. Quantum Computing: D-Wave ● (Dec 2015) “We found that for problem instances involving nearly 1000 binary variables, quantum annealing significantly outperforms its classical counterpart, simulated annealing. It is more than 108 times faster than simulated annealing running on a single core. We also compared the quantum hardware to another algorithm called Quantum Monte Carlo. This is a method designed to emulate the behavior of quantum systems, but it runs on conventional processors. While the scaling with size between these two methods is comparable, they are again separated by a large factor sometimes as high as 108 .” https://research.googleblog.com/2015/12/when-can-quantum-annealing-win.html
  110. 110. Quantum Computing: Google ● (Jul 2016) “ We have performed the first completely scalable quantum simulation of a molecule … In our experiment, we focus on an approach known as the variational quantum eigensolver (VQE), which can be understood as a quantum analog of a neural network. The quantum advantage of VQE is that quantum bits can efficiently represent the molecular wavefunction whereas exponentially many classical bits would be required. Using VQE, we quantum computed the energy landscape of molecular hydrogen, H2. https://research.googleblog.com/2016/07/towards-exact-quantum-description-of.html
  111. 111. Quantum Computing: Google (May 2017) Google Plans to Demonstrate the Supremacy of Quantum Computing “Google’s quantum computing chip is a 2-by-3 array of qubits. The company hopes to make a 7-by-7 array later this year. By the end of this year, the team aims to increase the number of superconducting qubits it builds on integrated circuits to create a 7-by-7 array. With this quantum IC, the Google researchers aim to perform operations at the edge of what’s possible with even the best supercomputers, and so demonstrate “quantum supremacy.”” https://spectrum.ieee.org/computing/hardware/google-plans-to-demonstrate-the-supremacy-of-quantum-computing
  112. 112. Quantum Computing: IBM (Sep 13, 2017) IBM Makes Breakthrough in Race to Commercialize Quantum Computers “IBM has been pushing to commercialize quantum computers and recently began allowing anyone to experiment with running calculations on a 16-qubit quantum computer it has built to demonstrate the technology.” https://www.bloomberg.com/news/articles/2017-09-13/ibm-makes-breakthrough-in-race-to-commercialize-quantum-computers “IBM announced on May 17, 2017 that it has successfully built and tested its most powerful universal quantum computing processors. Its upgraded 16 qubit processor (pictured) will be available for use by developers, researchers, and programmers to explore quantum computing using a real quantum processor at no cost via the IBM Cloud. IBM first opened public access to its quantum processors one year ago, to serve as an enablement tool for scientific research, a resource for university classrooms, and a catalyst of enthusiasm for the field. To date users have run more than 300,000 quantum experiments on the IBM Cloud” https://phys.org/news/2017-05-ibm-powerful-universal-quantum-processors.html
  113. 113. Quantum Computing: Intel (Oct 10, 2017) Quantum Inside: Intel Manufactures an Exotic New Chip “Intel’s quantum chip uses superconducting qubits. The approach builds on an existing electrical circuit design but uses a fundamentally different electronic phenomenon that only works at very low temperatures. The chip, which can handle 17 qubits, was developed over the past 18 months by researchers at a lab in Oregon and is being manufactured at an Intel facility in Arizona. https://www.technologyreview.com/s/609094/quantum-inside-intel-manufactures-an-exotic-new-chip/ https://newsroom.intel.com/news/intel-delivers-17-qubit-superconducting-chip-advanced-packaging-qutech/
  114. 114. Quantum Computing ● Quantum computers can provide significant speedups for many problems in machine learning (training of classical Boltzmann machines, Quantum Bayesian inference, SVM, PCA, Linear algebra, etc) and can enable fundamentally different types of learning. https://www.youtube.com/watch?v=ETJcALOplOA ● The three known types of quantum computing: ○ Universal Quantum: Offers the potential to be exponentially faster than traditional computers for a number of important applications: Machine Learning, Cryptography, Material Science, etc. The hardest to build. Current estimates: >100.000 physical qubits. ○ Analog Quantum: will be able to simulate complex quantum interactions that are intractable for any known conventional machine: Quantum Chemistry, Quantum Dynamics, etc. Could happen within next 5 years. It is conjectured that it will contain physical 50-100 qubits. ○ Quantum Annealer: a very specialized form of quantum computing. Suited for optimization problems. The easiest to build. Has no known advantages over conventional computing. http://www.research.ibm.com/quantum/expertise.html
  115. 115. Hardware: Summary ● Ordinary CPUs are general purpose and not as effective as they could be ● GPUs are becoming more and more powerful each year (but still consuming a lot of power). ● ASICs/FPGAs are on the rise. We’ve already seen some and will probably see even more interesting announces this year. ● Neuromorphic chips etc. are probably much farther from the market (3-5 years?) while already show interesting results. ● Memristors are probably even farther, but keep an eye on them. ● Quantum computing: still unclear. Probably will be cloud solutions, not desktop ones.
  116. 116. https://ru.linkedin.com/in/grigorysapunov gs@inten.to Thanks!

×