Advertisement

Deep learning on mobile - 2019 Practitioner's Guide

Deep Learning Data Scientist at Microsoft AI & Research
Jun. 18, 2019
Advertisement

More Related Content

Slideshows for you(20)

Similar to Deep learning on mobile - 2019 Practitioner's Guide(20)

Advertisement

Deep learning on mobile - 2019 Practitioner's Guide

  1. @PRACTICALDLBOOK@PRACTICALDLBOOK Deep Learning On Mobile A PRACTITIONER’S GUIDE
  2. @PRACTICALDLBOOK 2
  3. @PRACTICALDLBOOK@PRACTICALDLBOOK Deep Learning On Mobile A PRACTITIONER’S GUIDE
  4. @PRACTICALDLBOOK @SiddhaGanju @MeherKasam @AnirudhKoul 4
  5. @PRACTICALDLBOOK Why Deep Learning on Mobile? Privacy Reliability Cost Latency 5
  6. @PRACTICALDLBOOK 6 https://media.giphy.com/media/fBzSGPMxD0isw/giphy.gif
  7. @PRACTICALDLBOOK Latency Is Expensive! 7 100 milliseconds 1% loss [Amazon 2008]
  8. @PRACTICALDLBOOK@PRACTICALDLBOOK Latency Is Expensive! 8 >3 sec load time 53% bounce Mobile Site Visits [Google Research, Webpagetest.org]
  9. @PRACTICALDLBOOK@PRACTICALDLBOOK Power of 10 9 0.1s Seamless Uninterrupted flow of thought 1s 10s Limit of attention [Miller 1968; Card et al. 1991; Nielsen 1993]
  10. @PRACTICALDLBOOK 10 Efficient Mobile Inference Engine Efficient Model+ = DL App
  11. @PRACTICALDLBOOK How to Train My Model? 11
  12. @PRACTICALDLBOOK 12 Learn to Play Melodica 3 Months
  13. @PRACTICALDLBOOK Already Play Piano? 13
  14. @PRACTICALDLBOOK 14 FINE-TUNE your skills 3months 1week
  15. @PRACTICALDLBOOK@PRACTICALDLBOOK Fine-tuning 15 Assemble a dataset Find a pre- trained model Fine-tune a pre-trained model Run using existing frameworks “Don’t Be A Hero” - Andrej Karpathy
  16. @PRACTICALDLBOOK@PRACTICALDLBOOK CustomVision.ai 16 Use Fatkun Browser Extension to download images from Search Engine, or use Bing Image Search API to programmatically download photos with proper rights
  17. @PRACTICALDLBOOK 17
  18. @PRACTICALDLBOOK 18
  19. @PRACTICALDLBOOK Demo 19
  20. @PRACTICALDLBOOK How Do I Run My Models? 20
  21. @PRACTICALDLBOOK 21 Core ML TF Lite ML Kit
  22. @PRACTICALDLBOOK@PRACTICALDLBOOK Apple Ecosystem 22 Metal • 2014 BNNS + MPS • 2016 Core ML • 2017 Core ML 2 • 2018 Core ML 3 • 2019 - Tiny models (~ KB)! - 1 bit model quantization support - Batch API for improved performance - Conversion support for MXNet, ONNX - tf-coreml
  23. @PRACTICALDLBOOK@PRACTICALDLBOOK Apple Ecosystem 23 Metal • 2014 BNNS + MPS • 2016 Core ML • 2017 Core ML 2 • 2018 Core ML 3 • 2019 - On-device training - Personalization - Create ML UI
  24. @PRACTICALDLBOOK Core ML Benchmark 538 129 75 557 109 7877 44 3674 35 3071 33 2926 18 15 0 100 200 300 400 500 600 ResNet-50 MobileNet SqueezeNet EXECUTION TIME (MS) ON APPLE DEVICES iPhone 5s (2013) iPhone 6 (2014) iPhone 6s (2015) iPhone 7 (2016) iPhone X (2017) iPhone XS (2018) 24 https://heartbeat.fritz.ai/ios-12-core-ml-benchmarks-b7a79811aac1 GPUs became a thing here!
  25. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Ecosystem 25 TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 Smaller Faster Minimal dependencies Allows running custom operators
  26. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Lite is small 26 75KB Core Interpreter 1.5MB TensorFlow Mobile 400KB Core Interpreter + Supported Operations
  27. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Lite is Fast 27 Takes advantage of on-device hardware acceleration FlatBuffers •Reduces code footprint, memory usage •Reduces CPU cycles on serialization and deserialization •Improves startup time Pre-fused activations Combining batch normalization layer with previous Convolution Static memory and static execution plan Decreases load time
  28. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Ecosystem 28 TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 Smaller Faster Minimal dependencies Allows running custom operators
  29. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Ecosystem 29 TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 $ tflite_convert --keras_model_file = keras_model.h5 --output_file=foo.tflite
  30. @PRACTICALDLBOOK@PRACTICALDLBOOK TensorFlow Ecosystem 30 TensorFlow • 2015 TensorFlow Mobile • 2016 TensorFlow Lite • 2018 Trained TensorFlow Model TF Lite Converter .tflite model Android App iOS App
  31. @PRACTICALDLBOOK@PRACTICALDLBOOK ML Kit 31 Simple Abstraction over TensorFlow Lite Built in APIs for Image Labeling, OCR, Face Detection, Barcode scanning, Landmark detection, Smart reply Model management with Firebase Upload model on web interface to distribute A/B Testing
  32. @PRACTICALDLBOOK 32 How Do I Keep My IP Safe?
  33. @PRACTICALDLBOOK Fritz Full fledged mobile lifecycle support Deployment, instrumentation, etc. from Python 33
  34. @PRACTICALDLBOOK@PRACTICALDLBOOK Recommendation for Product Development 34 Train a model using framework of choice Convert to TensorFlow Lite format Upload to Firebase Deploy to iOS/Android apps with MLKit
  35. @PRACTICALDLBOOK@PRACTICALDLBOOK An Important Question 35 APP TOO BIG! WHAT DO? Apple does not allow apps over 200 MB to be downloaded over cellular network. Download on demand, and interpret on device instead.
  36. @PRACTICALDLBOOK 36 What Effect Does Hardware have on Performance?
  37. @PRACTICALDLBOOK Big Things Come In Small Packages 37
  38. @PRACTICALDLBOOK Effect of Hardware L-R: iPhone XS, iPhone X, iPhone 5 38 https://twitter.com/matthieurouif/status/1126575118812110854?s=11
  39. @PRACTICALDLBOOK TensorFlow Lite Benchmarks Alpha Lab releases Numericcal: http://alpha.lab.numericcal.com/
  40. @PRACTICALDLBOOK 40 TensorFlow Lite Benchmarks Crowdsourcing AI Benchmark App by Andrey Ignatov from ETH Zurich. http://ai-benchmark.com/
  41. @PRACTICALDLBOOK 41 Alchemy by Fritz https://alchemy.fritz.ai/ Python library to analyze and estimate mobile performance No need to deploy on mobile
  42. @PRACTICALDLBOOK 42 User Experience Standpoint TO GET 95%+ USER COVERAGE, SUPPORT PHONES RELEASED IN THE PAST 3.5 YEARS IF NOT POSSIBLE, OFFER GRACEFUL DEGRADATION
  43. @PRACTICALDLBOOK 43 Battery Standpoint Won’t AI inference kill the battery quickly? Answers: You don’t usually run AI models constantly, you run it for a few seconds. With a modern flagship phone, running Mobilenet at 30 fps should burn battery in 2-3 hours. Bigger question - Do you really need to run it at 30 FPS? Or could it be run 1 FPS?
  44. @PRACTICALDLBOOK@PRACTICALDLBOOK Energy Reduction from 30 FPS to 1 FPS 44 iPad Pro 2017
  45. @PRACTICALDLBOOK 45 What Exciting Applications Can I Build?
  46. @PRACTICALDLBOOK Seeing AI Audible Barcode recognition Aim: Help blind users identify products using barcode Issue: Blind users don’t know where the barcode is Solution: Guide user in finding a barcode with audio cues 46
  47. @PRACTICALDLBOOK AR Hand Puppets Hart Woolery from 2020CV Object Detection (Hand) + Key Point Estimation 47 [https://twitter.com/2020cv_inc/status/1093219359676280832] AR Hand Puppets, Hart Woolery from 2020CV, Object Detection (Hand) + Key Point Estimation
  48. @PRACTICALDLBOOK 48 Zero-Gravity Space, Takahiro Horikawa, Mask RCNN (segmentation) + PixMix (Image In-Painting) + Unity (Physics)
  49. @PRACTICALDLBOOK 49 [HomeCourt.ai]Object Detection (Ball, Hoop, Player) + Body Pose + Perspective Transformation
  50. @PRACTICALDLBOOK 50 Polarr, Machine Guided Composition, Automated cropping with highest aesthetic score
  51. @PRACTICALDLBOOK Remove objects Brian Schulman, Adventurous Co. Object Segmentation + Image In Painting 51 https://twitter.com/smashfactory/status/1139461813710442496
  52. @PRACTICALDLBOOK Magic Sudoku App Edge Detection + Classification + AR Kit 52 https://twitter.com/braddwyer/status/910030265006923776
  53. @PRACTICALDLBOOK People Segmentation AR Kit Abound Labs https://www.aboundlabs.com/ 53 https://twitter.com/nobbis/status/1135975245406515202
  54. @PRACTICALDLBOOK@PRACTICALDLBOOK Snapchat 54 Face Swap GANs
  55. @PRACTICALDLBOOK Can I Make My Model Even More Efficient? 55
  56. @PRACTICALDLBOOK@PRACTICALDLBOOK How To Find Efficient Pre-Trained Models 56 Papers with Code https://paperswithcode.com/sota Model Zoo https://modelzoo.co
  57. @PRACTICALDLBOOK@PRACTICALDLBOOK 57 What you can affordWhat you want
  58. @PRACTICALDLBOOK Model Pruning Aim: Remove all connections with absolute weights below a threshold 58 Song Han, Jeff Pool, John Tran, William J. Dally, "Learning both Weights and Connections for Efficient Neural Networks", 2015
  59. @PRACTICALDLBOOK Pruning in Keras model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10, activation=tf.nn.softmax) ]) 59 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), prune.Prune(tf.keras.layers.Dense(512, activation=tf.nn.relu)), tf.keras.layers.Dropout(0.2), prune.Prune(tf.keras.layers.Dense(10, activation=tf.nn.softmax)) ])
  60. @PRACTICALDLBOOK So many techniques - so little time! Quantization Weight sharing Channel pruning Filter pruning (ThiNet) Better Layers (Dilated Conv, HetConv, OctConv) Knowledge Distillation Binary networks (BNN, XNOR-Net) Lottery Ticket Hypothesis and many more ... 60
  61. @PRACTICALDLBOOK 61 The one with the best thing ever
  62. @PRACTICALDLBOOK Pocket Flow – 1 Line to Make a Model Efficient Tencent AI Labs created an Automatic Model Compression (AutoMC) framework 62
  63. @PRACTICALDLBOOK 63 Can I design a better architecture myself? Maybe? But AI can do it much better!
  64. @PRACTICALDLBOOK@PRACTICALDLBOOK AutoML – Let AI Design an Efficient Arch 64 Neural Architecture Search (NAS) - An automated approach for designing models using reinforcement learning while maximizing accuracy. Hardware Aware NAS = Maximizes accuracy while minimizing run-time on device Incorporates latency information into the reward objective function Measure real-world inference latency by executing on a particular platform 1.5x faster than MobileNetV2 (MnasNet) ResNet-50 accuracy with 19x less parameters SSD300 mAP with 35x less FLOPs
  65. @PRACTICALDLBOOK 65 Evolution of Mobile NAS Methods Method Top-1 Acc (%) Pixel-1 Runtime Search Cost (GPU Hours) MobileNetV1 70.6 113 Manual MobileNetV2 72.0 75 Manual MnasNet 74.0 76 40,000 (4 years+) ProxylessNas-R 74.6 78 200 Single-Path NAS 74.9 79.5 3.75 hours
  66. @PRACTICALDLBOOK ProxylessNAS – Per Hardware Tuned CNNs 66 Han Cai and Ligeng Zhu and Song Han, "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware", ICLR 2019
  67. @PRACTICALDLBOOK 67
  68. @PRACTICALDLBOOK Can I Train on a Mobile Device? 68
  69. @PRACTICALDLBOOK On-Device Training in Core ML 69 let updateTask = try MLUpdateTask( forModelAt: modelUrl, trainingData: trainingData, configuration: configuration, completionHandler: { [weak self] self.model = context.model context.model.write(to: newModelUrl) }) ⁻ Core ML 3 introduced on device learning ⁻ Never have to send training data to the server with the help of MLUpdateTask. ⁻ Schedule training when device is charging to save power
  70. @PRACTICALDLBOOK Can I Train a Global Model Without Access to User Data? 70
  71. @PRACTICALDLBOOK 71 FEDERATED LEARNING!!! https://federated.withgoogle.com/
  72. @PRACTICALDLBOOK 72 TensorFlow Federated Train a global model using 1000s of devices without access to data Encryption + Secure Aggregation Protocol Can take a few days to wait for aggregations to build up https://github.com/tensorflow/federated
  73. @PRACTICALDLBOOK What We Learnt Today 73 ⁻ Why deep learning on mobile? ⁻ Building a model ⁻ Running a model ⁻ Hardware factors ⁻ Benchmarking ⁻ State-of-the-art applications ⁻ Making a model more efficient ⁻ Federated Learning
  74. @PRACTICALDLBOOK@PRACTICALDLBOOK How to Access the Slides in 1 Second HTTP://PRACTICALDL.AI @PRACTICALDLBOOK
  75. @PRACTICALDLBOOK @SiddhaGanju @MeherKasam @AnirudhKoul 75
  76. @PRACTICALDLBOOK

Editor's Notes

  1. [M] Let’s just say that in the 21st century, the human attention span is a bit low.
  2. Amazon published the results of an A/B test they did way back in 2008! They found that every 100ms increase in latency, correlated with a 1% decrease in profits. Imagine what a couple of seconds can do? Actually we don’t need to imagine.
  3. We have the results from a Google study about a decade later where they found that a loading time of 3 seconds or more on a mobile website resulted in 53% probability of a user leaving the webpage.
  4. If you ever had to tell someone the Moore’s law equivalent for human attention span, you can refer them to these numbers that have been to be true as early as 1968 and as late as 1993. It’s very possible it’s gotten worse since push notifications became a thing. The study found that: 0.1 second is about the limit where the user feels that the system is reacting instantly, like typing characters on the keyboard and having them appear on the screen. no special feedback is necessary except to display the result. 1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data. Or about the amount of time I spend thinking before buying the next Apple product. 10 seconds is about the limit for keeping the user's attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.
  5. At a high level, the recipe for a great deep learning app consists of two things- an efficient inference engine and an efficient model. So how do we train a model?
  6. You don't need Microsoft s or Google’s ocean boiling GPU cluster
  7. We’ve looked at how to do fine-tuning, but we figured it’d cool to demo it live. They say doing a live demo is a bad idea, but what are we if not full of bad ideas. In the spirit of adventure, let’s do it.
  8. [S] We’ve discussing fine-tuning in the context of images, but let’s try it out on a harder input – audio. In the interest of time, we won’t show the process of actually collecting the data, but all we did was to collect or record some audio files for each class. We collected applause because we know there’s going to be a lot of it in this session.
  9. It’s the same revolution that desktop gpus brought about. Core ML models run 38% faster on iOS 12 compared to iOS 11. We’re just at the beginning of an incredible wave of mobile experiences powered by on-device machine learning. Processors like the A12 are going to make it happen.
  10. Minimal dependencies -> Easier to package and deploy
  11. Just one line of code! On 2 billion devices. Google assistant is on 1 billion devices. Photos, Gboard, Gmail, Nest!
  12. Develop one tflite model for both ios and android apps
  13. [s] Even though an iPhone 10s looks smaller than a macbook air, guess what, its stronger.
  14. I could give you the numbers, but they say showing is better than telling. what's the result of all that powerful hardware
  15. Additionally also gives a layer by layer breakdown of how much processing power it needs
  16. 2015 was the year of gpu - Aim for 10 FPS on oldest phones for real-time UX
  17. [s]
  18. As we all have painfully experienced, in real life, what you really want, is not what you can always afford. And that's the same in machine learning, We all know deep learning works if you have large GPU servers, what about when you want to run it on a 3 year old device. What's the number 1 limitation, it turns out to be memory. If you look at image net in the last couple of years, it started with 240 megabytes. VGG was over half a gig. So the question we will solve now is how to get these neural networks do these amazing things yet have a very small memory footprint
  19. Pruning redundant, non-informative weights in a previously trained network reduces the size of the network at inference time. Take a network, prune, and then retrain the remaining connections Train, prune, retrain all in a loop
  20. Art Vandelay here asks a very important question.
Advertisement