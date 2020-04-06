Successfully reported this slideshow.
Deep Learning On Mobile A PRACTITIONER'S GUIDE
Deep Learning On Mobile A PRACTITIONER'S GUIDE
Why Deep Learning on Mobile? Privacy Reliability Cost Latency
Latency Is Expensive! 100 milliseconds 1% loss [Amazon 2008]
Latency Is Expensive! >3 sec load time 53% bounce Mobile Site Visits [Google Research, Webpagetest.org]
Power of 10 0.1s Seamless Uninterrupted flow of thought 1s 10s Limit of attention [Miller 1968; Card et al. 1991; Nielsen 1993]
Efficient Mobile Inference Engine Efficient Model+ = DL App
How to Train My Model?
Learn to Play Melodica 3 Months
Already Play Piano?
FINE-TUNE your skills 3months 1week
Fine-tuning Assemble a dataset Find a pre- trained model Fine-tune a pre-trained model Run using existing frameworks "Don't Be A Hero" - Andrej Karpathy
CustomVision.ai Use Fatkun Browser Extension to download images from Search Engine, or use Bing Image Search API to programmatically download photos with proper rights
How Do I Run My Models?
Core ML TF Lite ML Kit
@PRACTICALDLBOOK@PRACTICALDLBOOK Apple Ecosystem 22 Metal • 2014 BNNS + MPS • 2016 Core ML • 2017 Core ML 2 • 2018 Core ML...
Apple Ecosystem Metal • 2014 BNNS + MPS • 2016 Core ML • 2017 Core ML 2 • 2018 Core ML 3 • 2019 - On-device training - Personalization - Create ML UI
@PRACTICALDLBOOK Core ML Benchmark 538 129 75 557 109 7877 44 3674 35 3071 33 2926 18 15 0 100 200 300 400 500 600 ResNet-...
TensorFlow Ecosystem TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 Smaller Faster Minimal dependencies Allows running custom operators
TensorFlow Lite is small 75KB Core Interpreter 1.5MB TensorFlow Mobile 400KB Core Interpreter + Supported Operations
@PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Lite is Fast 27 Takes advantage of on-device hardware acceleration FlatBuffers...
TensorFlow Ecosystem TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 Smaller Faster Minimal dependencies Allows running custom operators
TensorFlow Ecosystem TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 $ tflite_convert --keras_model_file = keras_model.h5 --output_file=foo.tflite
TensorFlow Ecosystem TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 Trained TensorFlow Model TF Lite Converter .tflite model Android App iOS App
@PRACTICALDLBOOK@PRACTICALDLBOOK ML Kit 31 Simple Abstraction over TensorFlow Lite Built in APIs for Image Labeling, OCR, ...
How Do I Keep My IP Safe?
Fritz Full fledged mobile lifecycle support Deployment, instrumentation, etc. from Python
Recommendation for Product Development Train a model using framework of choice Convert to TensorFlow Lite format Upload to Firebase Deploy to iOS/Android apps with MLKit
An Important Question APP TOO BIG! WHAT DO? Apple does not allow apps over 200 MB to be downloaded over cellular network. Download on demand, and interpret on device instead.
What Effect Does Hardware have on Performance?
Big Things Come In Small Packages
Effect of Hardware L-R: iPhone XS, iPhone X, iPhone 5 https://twitter.com/matthieurouif/status/1126575118812110854?s=11
TensorFlow Lite Benchmarks Alpha Lab releases Numericcal: http://alpha.lab.numericcal.com/
TensorFlow Lite Benchmarks Crowdsourcing AI Benchmark App by Andrey Ignatov from ETH Zurich. http://ai-benchmark.com/
Alchemy by Fritz https://alchemy.fritz.ai/ Python library to analyze and estimate mobile performance No need to deploy on mobile
User Experience Standpoint TO GET 95%+ USER COVERAGE, SUPPORT PHONES RELEASED IN THE PAST 3.5 YEARS IF NOT POSSIBLE, OFFER GRACEFUL DEGRADATION
@PRACTICALDLBOOK 43 Battery Standpoint Won’t AI inference kill the battery quickly? Answers: You don’t usually run AI mode...
Energy Reduction from 30 FPS to 1 FPS iPad Pro 2017
What Exciting Applications Can I Build?
@PRACTICALDLBOOK Seeing AI Audible Barcode recognition Aim: Help blind users identify products using barcode Issue: Blind ...
@PRACTICALDLBOOK AR Hand Puppets Hart Woolery from 2020CV Object Detection (Hand) + Key Point Estimation 47 [https://twitt...
Zero-Gravity Space, Takahiro Horikawa, Mask RCNN (segmentation) + PixMix (Image In-Painting) + Unity (Physics)
[HomeCourt.ai]Object Detection (Ball, Hoop, Player) + Body Pose + Perspective Transformation
Polarr, Machine Guided Composition, Automated cropping with highest aesthetic score
Remove objects Brian Schulman, Adventurous Co. Object Segmentation + Image In Painting https://twitter.com/smashfactory/status/1139461813710442496
Magic Sudoku App Edge Detection + Classification + AR Kit https://twitter.com/braddwyer/status/910030265006923776
People Segmentation AR Kit Abound Labs https://www.aboundlabs.com/ https://twitter.com/nobbis/status/1135975245406515202
Snapchat Face Swap GANs
Can I Make My Model Even More Efficient?
How To Find Efficient Pre-Trained Models Papers with Code https://paperswithcode.com/sota Model Zoo https://modelzoo.co
What you can affordWhat you want
@PRACTICALDLBOOK Model Pruning Aim: Remove all connections with absolute weights below a threshold 58 Song Han, Jeff Pool,...
@PRACTICALDLBOOK Pruning in Keras model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(51...
@PRACTICALDLBOOK So many techniques - so little time! Quantization Weight sharing Channel pruning Filter pruning (ThiNet) ...
The one with the best thing ever
Pocket Flow – 1 Line to Make a Model Efficient Tencent AI Labs created an Automatic Model Compression (AutoMC) framework
Can I design a better architecture myself? Maybe? But AI can do it much better!
@PRACTICALDLBOOK@PRACTICALDLBOOK AutoML – Let AI Design an Efficient Arch 64 Neural Architecture Search (NAS) - An automat...
@PRACTICALDLBOOK 65 Evolution of Mobile NAS Methods Method Top-1 Acc (%) Pixel-1 Runtime Search Cost (GPU Hours) MobileNet...
ProxylessNAS – Per Hardware Tuned CNNs Han Cai and Ligeng Zhu and Song Han, "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware", ICLR 2019
Can I Train on a Mobile Device?
@PRACTICALDLBOOK On-Device Training in Core ML 69 let updateTask = try MLUpdateTask( forModelAt: modelUrl, trainingData: t...
Can I Train a Global Model Without Access to User Data?
FEDERATED LEARNING!!! https://federated.withgoogle.com/
@PRACTICALDLBOOK 72 TensorFlow Federated Train a global model using 1000s of devices without access to data Encryption + S...
@PRACTICALDLBOOK What We Learnt Today 73 ⁻ Why deep learning on mobile? ⁻ Building a model ⁻ Running a model ⁻ Hardware fa...
How to Access the Slides in 1 Second HTTP://PRACTICALDL.AI @PRACTICALDLBOOK
@SiddhaGanju @MeherKasam @AnirudhKoul
Weekly #106: Deep Learning on Mobile

https://learn.xnextcon.com/event/eventdetails/W20040610
This talk explains how to practically bring the power of convolutional neural networks and deep learning to memory and power-constrained devices like smartphones. You will learn various strategies to circumvent obstacles and build mobile-friendly shallow CNN architectures that significantly reduce the memory footprint and therefore make them easier to store on a smartphone;

The talk also dives into how to use a family of model compression techniques to prune the network size for live image processing, enabling you to build a CNN version optimized for inference on mobile devices. Along the way, you will learn practical strategies to preprocess your data in a manner that makes the models more efficient in the real world.

Published in: Technology
Weekly #106: Deep Learning on Mobile

  1. 1. @PRACTICALDLBOOK@PRACTICALDLBOOK Deep Learning On Mobile A PRACTITIONER’S GUIDE
  2. 2. @PRACTICALDLBOOK 2
  3. 3. @PRACTICALDLBOOK@PRACTICALDLBOOK Deep Learning On Mobile A PRACTITIONER’S GUIDE
  4. 4. @PRACTICALDLBOOK @SiddhaGanju @MeherKasam @AnirudhKoul 4
  5. 5. @PRACTICALDLBOOK Why Deep Learning on Mobile? Privacy Reliability Cost Latency 5
  6. 6. @PRACTICALDLBOOK 6 https://media.giphy.com/media/fBzSGPMxD0isw/giphy.gif
  7. 7. @PRACTICALDLBOOK Latency Is Expensive! 7 100 milliseconds 1% loss [Amazon 2008]
  8. 8. @PRACTICALDLBOOK@PRACTICALDLBOOK Latency Is Expensive! 8 >3 sec load time 53% bounce Mobile Site Visits [Google Research, Webpagetest.org]
  9. 9. @PRACTICALDLBOOK@PRACTICALDLBOOK Power of 10 9 0.1s Seamless Uninterrupted flow of thought 1s 10s Limit of attention [Miller 1968; Card et al. 1991; Nielsen 1993]
  10. 10. @PRACTICALDLBOOK 10 Efficient Mobile Inference Engine Efficient Model+ = DL App
  11. 11. @PRACTICALDLBOOK How to Train My Model? 11
  12. 12. @PRACTICALDLBOOK 12 Learn to Play Melodica 3 Months
  13. 13. @PRACTICALDLBOOK Already Play Piano? 13
  14. 14. @PRACTICALDLBOOK 14 FINE-TUNE your skills 3months 1week
  15. 15. @PRACTICALDLBOOK@PRACTICALDLBOOK Fine-tuning 15 Assemble a dataset Find a pre- trained model Fine-tune a pre-trained model Run using existing frameworks “Don’t Be A Hero” - Andrej Karpathy
  16. 16. @PRACTICALDLBOOK@PRACTICALDLBOOK CustomVision.ai 16 Use Fatkun Browser Extension to download images from Search Engine, or use Bing Image Search API to programmatically download photos with proper rights
  17. 17. @PRACTICALDLBOOK 17
  18. 18. @PRACTICALDLBOOK 18
  19. 19. @PRACTICALDLBOOK Demo 19
  20. 20. @PRACTICALDLBOOK How Do I Run My Models? 20
  21. 21. @PRACTICALDLBOOK 21 Core ML TF Lite ML Kit
  22. 22. @PRACTICALDLBOOK@PRACTICALDLBOOK Apple Ecosystem 22 Metal • 2014 BNNS + MPS • 2016 Core ML • 2017 Core ML 2 • 2018 Core ML 3 • 2019 - Tiny models (~ KB)! - 1 bit model quantization support - Batch API for improved performance - Conversion support for MXNet, ONNX - tf-coreml
  23. 23. @PRACTICALDLBOOK@PRACTICALDLBOOK Apple Ecosystem 23 Metal • 2014 BNNS + MPS • 2016 Core ML • 2017 Core ML 2 • 2018 Core ML 3 • 2019 - On-device training - Personalization - Create ML UI
  24. 24. @PRACTICALDLBOOK Core ML Benchmark 538 129 75 557 109 7877 44 3674 35 3071 33 2926 18 15 0 100 200 300 400 500 600 ResNet-50 MobileNet SqueezeNet EXECUTION TIME (MS) ON APPLE DEVICES iPhone 5s (2013) iPhone 6 (2014) iPhone 6s (2015) iPhone 7 (2016) iPhone X (2017) iPhone XS (2018) 24 https://heartbeat.fritz.ai/ios-12-core-ml-benchmarks-b7a79811aac1 GPUs became a thing here!
  25. 25. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Ecosystem 25 TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 Smaller Faster Minimal dependencies Allows running custom operators
  26. 26. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Lite is small 26 75KB Core Interpreter 1.5MB TensorFlow Mobile 400KB Core Interpreter + Supported Operations
  27. 27. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Lite is Fast 27 Takes advantage of on-device hardware acceleration FlatBuffers •Reduces code footprint, memory usage •Reduces CPU cycles on serialization and deserialization •Improves startup time Pre-fused activations Combining batch normalization layer with previous Convolution Static memory and static execution plan Decreases load time
  28. 28. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Ecosystem 28 TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 Smaller Faster Minimal dependencies Allows running custom operators
  29. 29. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Ecosystem 29 TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 $ tflite_convert --keras_model_file = keras_model.h5 --output_file=foo.tflite
  30. 30. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Ecosystem 30 TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 Trained TensorFlow Model TF Lite Converter .tflite model Android App iOS App
  31. 31. @PRACTICALDLBOOK@PRACTICALDLBOOK ML Kit 31 Simple Abstraction over TensorFlow Lite Built in APIs for Image Labeling, OCR, Face Detection, Barcode scanning, Landmark detection, Smart reply Model management with Firebase Upload model on web interface to distribute A/B Testing
  32. 32. @PRACTICALDLBOOK 32 How Do I Keep My IP Safe?
  33. 33. @PRACTICALDLBOOK Fritz Full fledged mobile lifecycle support Deployment, instrumentation, etc. from Python 33
  34. 34. @PRACTICALDLBOOK@PRACTICALDLBOOK Recommendation for Product Development 34 Train a model using framework of choice Convert to TensorFlow Lite format Upload to Firebase Deploy to iOS/Android apps with MLKit
  35. 35. @PRACTICALDLBOOK@PRACTICALDLBOOK An Important Question 35 APP TOO BIG! WHAT DO? Apple does not allow apps over 200 MB to be downloaded over cellular network. Download on demand, and interpret on device instead.
  36. 36. @PRACTICALDLBOOK 36 What Effect Does Hardware have on Performance?
  37. 37. @PRACTICALDLBOOK Big Things Come In Small Packages 37
  38. 38. @PRACTICALDLBOOK Effect of Hardware L-R: iPhone XS, iPhone X, iPhone 5 38 https://twitter.com/matthieurouif/status/1126575118812110854?s=11
  39. 39. @PRACTICALDLBOOK TensorFlow Lite Benchmarks Alpha Lab releases Numericcal: http://alpha.lab.numericcal.com/
  40. 40. @PRACTICALDLBOOK 40 TensorFlow Lite Benchmarks Crowdsourcing AI Benchmark App by Andrey Ignatov from ETH Zurich. http://ai-benchmark.com/
  41. 41. @PRACTICALDLBOOK 41 Alchemy by Fritz https://alchemy.fritz.ai/ Python library to analyze and estimate mobile performance No need to deploy on mobile
  42. 42. @PRACTICALDLBOOK 42 User Experience Standpoint TO GET 95%+ USER COVERAGE, SUPPORT PHONES RELEASED IN THE PAST 3.5 YEARS IF NOT POSSIBLE, OFFER GRACEFUL DEGRADATION
  43. 43. @PRACTICALDLBOOK 43 Battery Standpoint Won’t AI inference kill the battery quickly? Answers: You don’t usually run AI models constantly, you run it for a few seconds. With a modern flagship phone, running Mobilenet at 30 fps should burn battery in 2-3 hours. Bigger question - Do you really need to run it at 30 FPS? Or could it be run 1 FPS?
  44. 44. @PRACTICALDLBOOK@PRACTICALDLBOOK Energy Reduction from 30 FPS to 1 FPS 44 iPad Pro 2017
  45. 45. @PRACTICALDLBOOK 45 What Exciting Applications Can I Build?
  46. 46. @PRACTICALDLBOOK Seeing AI Audible Barcode recognition Aim: Help blind users identify products using barcode Issue: Blind users don’t know where the barcode is Solution: Guide user in finding a barcode with audio cues 46
  47. 47. @PRACTICALDLBOOK AR Hand Puppets Hart Woolery from 2020CV Object Detection (Hand) + Key Point Estimation 47 [https://twitter.com/2020cv_inc/status/1093219359676280832] AR Hand Puppets, Hart Woolery from 2020CV, Object Detection (Hand) + Key Point Estimation
  48. 48. @PRACTICALDLBOOK 48 Zero-Gravity Space, Takahiro Horikawa, Mask RCNN (segmentation) + PixMix (Image In-Painting) + Unity (Physics)
  49. 49. @PRACTICALDLBOOK 49 [HomeCourt.ai]Object Detection (Ball, Hoop, Player) + Body Pose + Perspective Transformation
  50. 50. @PRACTICALDLBOOK 50 Polarr, Machine Guided Composition, Automated cropping with highest aesthetic score
  51. 51. @PRACTICALDLBOOK Remove objects Brian Schulman, Adventurous Co. Object Segmentation + Image In Painting 51 https://twitter.com/smashfactory/status/1139461813710442496
  52. 52. @PRACTICALDLBOOK Magic Sudoku App Edge Detection + Classification + AR Kit 52 https://twitter.com/braddwyer/status/910030265006923776
  53. 53. @PRACTICALDLBOOK People Segmentation AR Kit Abound Labs https://www.aboundlabs.com/ 53 https://twitter.com/nobbis/status/1135975245406515202
  54. 54. @PRACTICALDLBOOK@PRACTICALDLBOOK Snapchat 54 Face Swap GANs
  55. 55. @PRACTICALDLBOOK Can I Make My Model Even More Efficient? 55
  56. 56. @PRACTICALDLBOOK@PRACTICALDLBOOK How To Find Efficient Pre-Trained Models 56 Papers with Code https://paperswithcode.com/sota Model Zoo https://modelzoo.co
  57. 57. @PRACTICALDLBOOK@PRACTICALDLBOOK 57 What you can affordWhat you want
  58. 58. @PRACTICALDLBOOK Model Pruning Aim: Remove all connections with absolute weights below a threshold 58 Song Han, Jeff Pool, John Tran, William J. Dally, "Learning both Weights and Connections for Efficient Neural Networks", 2015
  59. 59. @PRACTICALDLBOOK Pruning in Keras model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) 59 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), prune.Prune(tf.keras.layers.Dense(512, activation=tf.nn.relu)), tf.keras.layers.Dropout(0.2), prune.Prune(tf.keras.layers.Dense(10, activation=tf.nn.softmax)) ])
  60. 60. @PRACTICALDLBOOK So many techniques - so little time! Quantization Weight sharing Channel pruning Filter pruning (ThiNet) Better Layers (Dilated Conv, HetConv, OctConv) Knowledge Distillation Binary networks (BNN, XNOR-Net) Lottery Ticket Hypothesis and many more ... 60
  61. 61. @PRACTICALDLBOOK 61 The one with the best thing ever
  62. 62. @PRACTICALDLBOOK Pocket Flow – 1 Line to Make a Model Efficient Tencent AI Labs created an Automatic Model Compression (AutoMC) framework 62
  63. 63. @PRACTICALDLBOOK 63 Can I design a better architecture myself? Maybe? But AI can do it much better!
  64. 64. @PRACTICALDLBOOK@PRACTICALDLBOOK AutoML – Let AI Design an Efficient Arch 64 Neural Architecture Search (NAS) - An automated approach for designing models using reinforcement learning while maximizing accuracy. Hardware Aware NAS = Maximizes accuracy while minimizing run-time on device Incorporates latency information into the reward objective function Measure real-world inference latency by executing on a particular platform 1.5x faster than MobileNetV2 (MnasNet) ResNet-50 accuracy with 19x less parameters SSD300 mAP with 35x less FLOPs
  65. 65. @PRACTICALDLBOOK 65 Evolution of Mobile NAS Methods Method Top-1 Acc (%) Pixel-1 Runtime Search Cost (GPU Hours) MobileNetV1 70.6 113 Manual MobileNetV2 72.0 75 Manual MnasNet 74.0 76 40,000 (4 years+) ProxylessNas-R 74.6 78 200 Single-Path NAS 74.9 79.5 3.75 hours
  66. 66. @PRACTICALDLBOOK ProxylessNAS – Per Hardware Tuned CNNs 66 Han Cai and Ligeng Zhu and Song Han, "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware", ICLR 2019
  67. 67. @PRACTICALDLBOOK 67
  68. 68. @PRACTICALDLBOOK Can I Train on a Mobile Device? 68
  69. 69. @PRACTICALDLBOOK On-Device Training in Core ML 69 let updateTask = try MLUpdateTask( forModelAt: modelUrl, trainingData: trainingData, configuration: configuration, completionHandler: { [weak self] self.model = context.model context.model.write(to: newModelUrl) }) ⁻ Core ML 3 introduced on device learning ⁻ Never have to send training data to the server with the help of MLUpdateTask. ⁻ Schedule training when device is charging to save power
  70. 70. @PRACTICALDLBOOK Can I Train a Global Model Without Access to User Data? 70
  71. 71. @PRACTICALDLBOOK 71 FEDERATED LEARNING!!! https://federated.withgoogle.com/
  72. 72. @PRACTICALDLBOOK 72 TensorFlow Federated Train a global model using 1000s of devices without access to data Encryption + Secure Aggregation Protocol Can take a few days to wait for aggregations to build up https://github.com/tensorflow/federated
  73. 73. @PRACTICALDLBOOK What We Learnt Today 73 ⁻ Why deep learning on mobile? ⁻ Building a model ⁻ Running a model ⁻ Hardware factors ⁻ Benchmarking ⁻ State-of-the-art applications ⁻ Making a model more efficient ⁻ Federated Learning
  74. 74. @PRACTICALDLBOOK@PRACTICALDLBOOK How to Access the Slides in 1 Second HTTP://PRACTICALDL.AI @PRACTICALDLBOOK
  75. 75. @PRACTICALDLBOOK @SiddhaGanju @MeherKasam @AnirudhKoul 75
  76. 76. @PRACTICALDLBOOK

×