Successfully reported this slideshow.
Your SlideShare is downloading. ×

Scaling up Deep Learning by Scaling Down

Ad

Scaling up deep
learning by scaling
down
—
Nick Pentreath
Principal Engineer
@MLnick

Ad

About
IBM Developer / © 2020 IBM Corporation
– @MLnick on Twitter, Github, LinkedIn
– Principal Engineer, IBM CODAIT (Cent...

Ad

Improving the Enterprise AI Lifecycle in Open Source
IBM Developer / © 2020 IBM Corporation 3
– CODAIT aims to make AI sol...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 39 Ad
1 of 39 Ad

Scaling up Deep Learning by Scaling Down

Download to read offline

In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning.

In the last few years, deep learning has achieved dramatic success in a wide range of domains, including computer vision, artificial intelligence, speech recognition, natural language processing and reinforcement learning.

Advertisement
Advertisement

More Related Content

Slideshows for you (19)

Similar to Scaling up Deep Learning by Scaling Down (20)

Advertisement

More from Databricks (20)

Advertisement

Scaling up Deep Learning by Scaling Down

  1. 1. Scaling up deep learning by scaling down — Nick Pentreath Principal Engineer @MLnick
  2. 2. About IBM Developer / © 2020 IBM Corporation – @MLnick on Twitter, Github, LinkedIn – Principal Engineer, IBM CODAIT (Center for Open-Source Data & AI Technologies) – Machine Learning & AI – Apache Spark committer & PMC – Author of Machine Learning with Spark – Various conferences & meetups 2
  3. 3. Improving the Enterprise AI Lifecycle in Open Source IBM Developer / © 2020 IBM Corporation 3 – CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise. – We contribute to and advocate for the open-source technologies that are foundational to IBM’s AI offerings. – 30+ open-source developers! Center for Open Source Data & AI Technologies codait.org CODAIT Open Source @ IBM
  4. 4. Agenda 4 – Deep Learning overview & computational considerations – Evolving efficiency of model architectures – Model compression – Model distillation – Conclusion DEG / June 4, 2020 / © 2020 IBM Corporation
  5. 5. Machine Learning Workflow 5 Data Analyze Process Train Deploy Predict & Maintain DEG / June 4, 2020 / © 2020 IBM Corporation Compute-heavy
  6. 6. Deep Learning – Original theory from 1940s; computer models originated around 1960s; fell out of favor in 1980s/90s – Recent resurgence due to • Bigger (and better) data; standard datasets (e.g. ImageNet) • Better hardware (GPUs, TPUs) • Improvements to algorithms, architectures and optimization – Leading to new state-of-the-art results in computer vision (images and video); speech/text; language translation and more IBM Developer / © 2020 IBM Corporation 6Source: Wikipedia
  7. 7. Modern Neural Networks – Deep (multi-layer) networks – Computer vision • Convolution neural networks (CNNs) • Image classification, object detection, segmentation – Sequences and time-series • Machine translation, text generation • Recurrent neural networks - LSTM, GRU – Natural language processing • Word embeddings • Transformers, attention mechanisms – Deep learning frameworks • Flexibility, computation graphs, auto- differentiation, GPUs IBM Developer / © 2020 IBM Corporation 7Source: Stanford CS231n
  8. 8. Evolution of Training Computation Requirements IBM Developer / © 2020 IBM Corporation 8 Source Computational resources required for training AI models doubles every 3 to 4 months
  9. 9. Example: Image Classification IBM Developer / © 2020 IBM Corporation 9Source Input image Inference Prediction beagle: 0.82 basset: 0.09 bluetick: 0.07 ...
  10. 10. Example: Inception V3 Source IBM Developer / © 2020 IBM Corporation 10 Effectively matrix multiplication ~24 million parameters 78.8% accuracy (ImageNet)
  11. 11. Accuracy vs Computational Complexity (ImageNet) IBM Developer / © 2020 IBM Corporation 11 Source: Paper, blog
  12. 12. Computational efficiency (ImageNet) IBM Developer / © 2020 IBM Corporation 12 Source: Paper, blog
  13. 13. Deep Learning Deployment IBM Developer / © 2020 IBM Corporation 13 – Model training typically uses substantial hardware – GPU / multi-GPU – Cloud-based deployment scenarios
  14. 14. Deep Learning Deployment IBM Developer / © 2020 IBM Corporation 14 – Edge devices have more limited resources • Memory • Compute (CPU, mobile GPU, edge TPU) • Network bandwidth – Also applies to low-latency applications
  15. 15. IBM Developer / © 2020 IBM Corporation 15 How do we improve performance efficiency? – Architecture improvements – Model pruning – Quantization – Model distillation
  16. 16. Architecture Improvements IBM Developer / © 2020 IBM Corporation 16
  17. 17. Specialized architectures for low-resource targets Source IBM Developer / © 2020 IBM Corporation 17 Standard Convolution Building Block Inception V3 MobileNet V1 Depthwise Convolution Building Block (~8x less computation) ~4 million parameters 70.9% accuracy ~24 million parameters 78.8% accuracy (ImageNet)
  18. 18. Trade off accuracy vs model size Source IBM Developer / © 2020 IBM Corporation 18 – Scale layer width & resolution multiplier to target available computation budget – Width multiplier = “thinner” models – Resolution multiplier scales input image representation
  19. 19. MobileNet V2 Source IBM Developer / © 2020 IBM Corporation 19 – Same depthwise convolution backbone – Add linear bottlenecks & shortcut connections ~3.4 million parameters 72% accuracy
  20. 20. Accuracy vs Computation - Updated (ImageNet) IBM Developer / © 2020 IBM Corporation 20 Source: Paper, blog
  21. 21. Computational efficiency - Updated (ImageNet) IBM Developer / © 2020 IBM Corporation 21 Source: Paper, blog
  22. 22. EfficientNet Source: blog post, paper IBM Developer / © 2020 IBM Corporation 22 – Neural Architecture Search to find backbone – Optimize for accuracy & efficiency (FLOPS) ~5.3 million parameters 77.3% accuracy ~60 million parameters 84.5% accuracy
  23. 23. MobileNet V3 Source: GitHub, paper IBM Developer / © 2020 IBM Corporation 23 – Hardware-aware Neural Architecture Search ~5.4 million parameters 75.2% accuracy
  24. 24. One network to rule them all? Source: GitHub, paper IBM Developer / © 2020 IBM Corporation 24 – Once for All: Train One Network and Specialize it for Efficient Deployment – Manual design or NAS is hugely costly in terms of computation – Train one network, “cherry-pick” the sub- net without additional training
  25. 25. Model Pruning IBM Developer / © 2020 IBM Corporation 25 – Reduce # of model parameters – Effectively like L1 regularization – remove weights with small impact on prediction – Sparse weights -> model compression & lower latency
  26. 26. Model Pruning IBM Developer / © 2020 IBM Corporation 26 Source 70 71 72 73 74 75 76 77 78 79 0% 20% 40% 60% 80% 100% Top-1Accuracy(%) Model Sparsity ImageNet Classification InceptionV3 MobileNet V1 224
  27. 27. Model Pruning IBM Developer / © 2020 IBM Corporation 27 Source 26 26.5 27 27.5 28 28.5 29 29.5 30 0% 20% 40% 60% 80% 100% BLEUScore Model Sparsity Language Translation English-German German-English
  28. 28. Quantization IBM Developer / © 2020 IBM Corporation 28
  29. 29. Quantization IBM Developer / © 2020 IBM Corporation 29 Source – Most DL computation users 32 (or even 64) bits floating point – Quantization reduces numerical precision of weights by binning values – Popular targets are 16- bit FP and 8-bit integer coding
  30. 30. Quantization IBM Developer / © 2020 IBM Corporation 30 Source – Post-training quantization • Useful if you can’t (or don’t wish to) retrain a model • Give up accuracy • Various options – Float16 – Dynamic – Int8 – Training-aware quantization • Much more complex • Can provide large efficiency gains with minimal accuracy loss 78 71.9 77.2 63.7 77.5 70.9 InceptionV3 MobileNet V2 224Top-1Accuracy(%) ImageNet Classification Original Post-training Training-aware
  31. 31. Quantization IBM Developer / © 2020 IBM Corporation 31 Source 100% 100% 25% 26% InceptionV3 MobileNet V2 224 %orginalmodelsize ImageNet Classification Original Quantized 75% 110% 48% 61% InceptionV3 MobileNet V2 224 %orginallatency ImageNet Classification Post-training Training-aware
  32. 32. Quantization IBM Developer / © 2020 IBM Corporation 32 – TensorFlow Model Optimization – PyTorch – Distiller for PyTorch
  33. 33. Model Distillation IBM Developer / © 2020 IBM Corporation 33 – Large models may be over-parameterized – Use a large, complex model to teach a smaller, simpler model – Effectively distil the core knowledge of the large model
  34. 34. Model Distillation IBM Developer / © 2020 IBM Corporation 34 Source: Distiller docs, paper
  35. 35. Model Distillation IBM Developer / © 2020 IBM Corporation 35 – BERT model distillations have been very successful – DistilBERT – TinyBERT – Others (see this blog post)
  36. 36. Conclusion – Model distillation is less popular but potentially compelling in NLP tasks – Area of rapid research evolution – New efficient model architectures are rapidly evolving • If one fits your needs, use it! – Compression techniques can yield large efficiency gains • Now good support in DL frameworks / supporting libraries – Perhaps combining pruning & quantization (though trickier) 36DEG / June 4, 2020 / © 2020 IBM Corporation
  37. 37. Thank you codait.org twitter.com/MLnick github.com/MLnick developer.ibm.com 37DEG / June 4, 2020 / © 2020 IBM Corporation Check out the Model Asset Exchange https://ibm.biz/model-exchange Sign up for IBM Cloud https://ibm.biz/BdqdSi
  38. 38. References Efficient Inference in Deep Learning – Where is the Problem? Analysis of deep neural networks MobileNets EfficientNet Making Neural Nets Work With Low Precision Speeding up BERT 38DEG / June 4, 2020 / © 2020 IBM Corporation Distilling the Knowledge in a Neural Network Once for All: Train One Network and Specialize it for Efficient Deployment Distiller – PyTorch TensorFlow Model Optimization Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
  39. 39. IBM Developer / © 2020 IBM Corporation 39

×