Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 Technical Sessions

289 views

Published on

See how Intel® Processor Graphics can accelerate machine learning and AI workloads to solve complex problems that were previously very difficult.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 Technical Sessions

  1. 1. SIGGRAPH 2019 | LOS ANGLES | 28 JULY - 1 AUGUST
  2. 2. Hisham Chowdhury Software Architect, Intel Corporation AcceleratingMachineLearning withintel®processorgraphics
  3. 3. WhatisMachineLearning? 3 “Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.” *Source:expertsystem.com Training Inference
  4. 4. MLUsage 4 *Source:pixelmatorpro,Apple.com
  5. 5. PopularCNNArchitectureandAccuracy 5 *Source:towardsdatascience.com
  6. 6. Machinelearningon Intel®processorgraphics
  7. 7. End-to-endaicompute datacenter gateway Edge Many-to-many hyperscale for stream and massive batch data processing 1-to-many with majority streaming data from devices 1-to-1 devices with lower power and often UX requirements Ethernet & Wireless Wireless and non-IP wired protocols ü Secure ü High throughput ü Real-time Intel® Xeon® Processors Intel® Core™ & Atom™ Processors Intel® FPGA Intel® Xeon Phi™ Processors* Crest Family (Nervana ASIC)* Intel® Processor Graphics Movidius Myriad (VPU)Vision Intel® GNA (IP)*Speech
  8. 8. Intel®processorgraphicsinferenceLandscape 8
  9. 9. WindowsMachinelearning onIntel®ProcessorGraphics 9
  10. 10. winml • Load Model, Load Video/Images • Bind input/output resource • Evaluate Result: • Get probability and prediction • Transform inputs (Style Transfer, Denoising etc) • Supports CPU, GPU, Accelerators (VPU) 10
  11. 11. DirectML • low-level API for machine learning (ML) • Hardware-accelerated machine learning primitives (called operators) are the building blocks of DirectML • Can get integrated part of D3D12 games, applications • Meta Command • DirectML provides Direct3D 12 metacommands feature which allows HW vendors to provide the most efficient implementation for the primitives for the underlying HW • Achieves high HW efficiency on Intel® hardware using MetaCommand 11
  12. 12. macOSMachinelearning onIntel®ProcessorGraphics 12
  13. 13. 13 *source:apple.com
  14. 14. InferenceWorkflow *Source:mitochrome.com
  15. 15. InferenceArchitecture Inference Application 1 Vision Core ML Accelerate and BNNS Metal Performance Shaders CPU iGPU Inference Application 2 Natural Language Processing GamePlayKit • CoreML • CPU, GPU, Accelerators • Image analysis, natural language processing, audio to text, identifying sounds in audio • Built on top of low-level primitives like Accelerate and BNNS, Metal Performance Shaders (MPS) • Metal Performance Shaders (MPS) • GPU only • Low level primitive API (MPS Graph API is also supported) providing for ML, Image processing, RayTracing needs • Most efficient for underlying Intel® architecture • Can get integrated part of Metal games, applications and dispatched part of same GPU command buffer
  16. 16. Bringingmachinelearning trainingtotheedge 16
  17. 17. CreateML • ML models now can be created directly using CreateML on the macOS device *Source:Apple.com
  18. 18. macosMLArchitecturewithTraining Inference Application 1 Vision Core ML Accelerate and BNNS Metal Performance Shaders CPU iGPU Inference Application 2 Natural Language Processing GamePlayKit Inference Training Turi CreateCreate ML Training Application 1 Training Application 2
  19. 19. Webmachinelearning 1 9
  20. 20. WebMachineLearning:POC WebML/NN CoreML/BNNS/MPS MacOS/iOS WinML/DirectML Windows TF-Lite/NN API Android CPU GPU Accelerators JS ML frameworks Web App Web Browser OS ML API new existing WebAssembly ONNX Models WebGL/WebGPU TensorFlow Models Other Models
  21. 21. WebMachineLearning:withTensorflow.js 21 Platform TensorFlow.js (WebGL) (ms-) TensorFlow.js (WebML/MPS) (ms- ) Speedup MBP 15" 2016 2.7GHz Intel Core i7 + Intel HD Graphics 530 1536MB 130.810 18.371 7.120 MBP 15" 2016 2.7GHz Intel Core i7 + AMD Radeon Pro 455 1536MB 46.756 19.362 2.415 MBP 13" 2017 3.5GHz Intel Core i7 + Intel Iris Plus Graphics 650 1536MB 66.479 19.885 3.343 MBP 13" 2016 2.9GHz Intel Core i5 + Intel Iris Graphics 550 1536MB 71.128 18.904 3.763 Disclaimer • Platforms used for these numbers: macbook pro 13”, 15” with Intel Graphics 530, 550, 650 and AMD Radeon Pro 455. it was run on macOS highSierra (10.13.4) • All testing was performed at Intel. Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing reference
  22. 22. PERFORMANCEstate windows and macOS
  23. 23. WebMLusingMetalPerformanceShaders(MPS) vsWebGL,WASM(Legacy) 23 0 100 200 300 400 500 600 MobileNet (ms-) SqueezeNet (ms-) TensorFlow.js (ms-) WebML Chromium POC msecs (lower is better, inference time) WASM WebGL 2 WebMLwith MPS •Disclaimer • Configurations used for test and perf data: MacBook Pro 13” with Intel Iris Graphics 550, 530 some with fixed 850 Mhz frequency and some with dynamic frequency • All testing was performed at Intel® Folsom • Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing reference
  24. 24. GEMMEfficiency Intel®Gen9ProcessorGraphics YAxis:Gflops,XAxis:MatrixDimensions 24 0 200 400 600 800 1000 1200 1400 256x256x256512x512x512 0768x0768x0768 1024x1024x1024 1280x1280x1280 1536x1536x1536 1792x1792x1792 2048x2048x2048 2304x2304x2304 2560x2560x2560 3072x3072x3072 3584x3584x3584 4096x4096x4096 fp16 GEMM Intel Optimized HW Theoritical Max 80% HW Theoritical Max 0 100 200 300 400 500 600 700 256x256x256512x512x512 0768x0768x0768 1024x1024x1024 1280x1280x1280 1536x1536x1536 1792x1792x1792 2048x2048x2048 2304x2304x2304 2560x2560x2560 3072x3072x3072 3584x3584x3584 4096x4096x4096 fp32 GEMM Intel Optimized HW Theoritical Max 80% HW Theoritical Max •Disclaimer • Configurations used for test and perf data: MacBook Pro 13” with Intel Iris Graphics 550, 530 some with fixed 850 Mhz frequency and some with dynamic frequency • All testing was performed at Intel® Folsom • Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing reference
  25. 25. macOSMojave=>macOSCatalina %Improvements 25 0 20 40 60 80 100 120 140 160 VGG19 VGG16InceptionV4InceptionV3 ResNet50InceptionV1 AlexNet GoogleNetPlacesM obilenetSqueezeNet Denoiser CoreML 0 10 20 30 40 50 60 70 80 90 VGG19 VGG16 InceptionV3 ResNet50 InceptionV1 AlexNet GoogleNetPlaces SqueezeNet MetalPerformanceShaders 0 10 20 30 40 50 60 70 Fuji 22 MP Fuji 24 MP Canon22 MP Canon50 MP Adobe LightRoom Enhance Detail Disclaimer • Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No Configurations used for test and perf data: MacBook Pro 13” with Intel Iris Graphics 530 some with dynamic frequency. Mojave numbers are from macOS10.14.5 and Catalina numbers are from macOS 10.15 beta. • All testing was performed at Intel® Folsom. Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing reference
  26. 26. WindowsOCT2018=>WindowsMAY2019 26 0 20 40 60 80 100 120 140 160 Canon22 MP Canon50 MP Fuji 24 MP Adobe LightRoom Enhanced Detail %improvement Windows Oct2018->May2019 Disclaimer • Configurations used for test and perf data: Latest Windows OS and Intel® Kabylake Graphics • All testing was performed at Intel. Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing reference
  27. 27. usecasesrunningon Intel®processorgraphics 2 7
  28. 28. 28 Photoenhancement–PixelMatorPro Intel GPU on MacOS using CoreML AI framework Professionally Enhance Your Photos without Time Consuming Manual Trial and Error Original – Nice, But Overexposed Post ML Enhance on Pixelmator Pro
  29. 29. 29 Enhancedetails–AdobeLightroom Intel GPU on MacOS using CoreML and on Windows using WinML AI frameworks https://theblog.adobe.com/enhance-details/
  30. 30. SmartRetail–cashier-lessstore Kiosk Recognize who pick up what and how many, add the goods into user account’s shopping cart for payment Smart Shelf with pressure sensor Tracking stop position and count gender, age of people to generate thermodynamic chart Recognize goods, how many, how much and payment Camera on the shelf also could check if goods were displayed in the right position IA edge computing workstation Smart weighting station Identify customer and associate with account Recognize people’s gender and age to push ad Intel GPU on Linux using OpenVINO AI SDK
  31. 31. Reinforcementlearningfordevelopingagentsingames Demonstrated on intel graphics by Unity at Game Developers conference March 2019 A real dog uses vision and other senses to orient itself and to decide where to go. Puppo follows the same methodology. It collects observations about the scene such as proximity to the target, the relative position between itself and the target and the orientation of its own legs, so it can decide what action to take next. In Puppo’s case, the action describes how to rotate the joint motors in order to move. After each action Puppo performs, we give a reward to the agent. The reward is comprised of: The dog learned to walk rather quickly in about 1 min. Then, as the training continued, the dog learned to run. https://blogs.unity3d.com/wp- content/uploads/2018/10/DogFetchTraining.mp4?_=1 Courtesy Unity Link to Demo Intel GPU on Windows using DirectML AI Framework Save Developer Time to Deliver Game Agents; Improve Game Experience
  32. 32. AWSDeepracer–AIforComputervisionand reinforcementlearningonIntelatom®processor Intel GPU on Linux using OpenVINO AI SDK Applicable to Teach Robots from Vacuum Cleaners to Strawberry Pickers
  33. 33. styletransfer Intel GPU on MacOS using CoreML and on Windows using WinML AI frameworks
  34. 34. Posenet Real-time human pose estimation in the browser Browser based PoseNet using WebML on Intel GPU with clDNN (Winodws/Linux) and MetalPerformanceShaders (macOS) backend
  35. 35. AIbaseddenoising:IntelOpenImageDenoiser Intel GPU on MacOS using CoreML and on Windows using WinML AI frameworks
  36. 36. Objectdetection usingwebml
  37. 37. Improvementswith11th Generation Intel®ProcessorGraphics “icelake”
  38. 38. • 10 nm process • 64 execution units (EUs) which increases the core compute capability by 2.67x1 over Gen9 • Gen11 addresses the corresponding bandwidth needs by improving compression, increasing L3 cache as well as increasing peak memory bandwidth • ~ 1 TF FP32 perf; ~2 TF FP16 perf • Improved SharedLocalMemory (SLM) performance (~1/4 latency vs Gen9) CPU Core System Agent Display Controller PCIe Memory Controller CPU Cores LLC Cache slice Intel® Processor Graphics Gen11 Intel® Core Processor SoC Ring Interconnect L3$ SliceCommon SubSlice EU EU I$ & thread dispatch EU EU EU EU EU EU Sampler SLM Dataport [LD/ST] Tex$ Media Sampler SubSlice EU EU I$ & thread dispatch EU EU EU EU EU EU Sampler SLM Dataport [LD/ST] Tex$ Media Sampler SubSlice EU EU I$ & thread dispatch EU EU EU EU EU EU Sampler SLM Dataport [LD/ST] Tex$ Media Sampler SubSlice EU EU I$ & thread dispatch EU EU EU EU EU EU Sampler SLM Dataport [LD/ST] Tex$ Media Sampler SubSlice EU EU I$ & thread dispatch EU EU EU EU EU EU Sampler SLM Dataport [LD/ST] Tex$ Media Sampler SubSlice EU EU I$ & thread dispatch EU EU EU EU EU EU Sampler SLM Dataport [LD/ST] Tex$ Media Sampler SubSlice EU EU I$ & thread dispatch EU EU EU EU EU EU Sampler SLM Dataport [LD/ST] Tex$ Media Sampler SubSlice EU EU I$ & thread dispatch EU EU EU EU EU EU Sampler SLM Dataport [LD/ST] Tex$ Media Sampler SubSlice Geometry Global AssetsGTI BlitterMedia Fixed Function Raster HiZ/Depth Pixel Dispatch Pixel Backend
  39. 39. 3 9 Disclaimer • Configurations used for test and perf data: with Intel Gen9 graphics (24 EU) and Intel Gen11 graphics (64 EU) some with fixed frequency and some with dynamic frequency. All testing was performed at Intel • Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing reference • All testing was permed at Intel® Folsom 1.50 1.90 2.30 2.70 VGG16_b01 VGG16_b04 VGG16_b16 VGG19_b01 VGG19_b04 VGG19_b16 InceptionV3_b01 InceptionV3_b04 InceptionV3_b16 ResNet50_b01 ResNet50_b04 ResNet50_b16 ML Bench x improvement Gen9 vs Gen11
  40. 40. ISVApplicationImprovements 40 Disclaimer • Configurations used for test and perf data: with Intel Gen9 graphics (24 EU) and Intel Gen11 graphics (64 EU) some with fixed frequency and some with dynamic frequency. All testing was performed at Intel • Numbers may differ based on actual hardware used and/or based on how the benchmark is written. Intel® makes no guarantee on the specific numbers and it is intended for providing reference • All testing was permed at Intel® Folsom 1.88 1.89 1.90 1.91 1.92 1.93 1.94 1.95 1.96 1.97 1.98 Fuji 22 MP Fuji 24 MP Canon22 MP Canon50 MP Adobe LightRoom Enhance Detail x improvement Gen9 vs Gen11
  41. 41. AI/MLpossibilities 41 Stylizea15minvideo w/AI Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks Results have been simulated and are provided for informational purposes only. Results were derived using simulations run on an architecture simulator. Any difference in system hardware or software design or configuration may affect actual performance. System Configurations: ICL Media performance is based on projections and subject to change. Gen 9 performance is based on KBL-R U42 system 1. Stylize video using Cyberlink PowerDirector Style Transfer leveraging Intel OpenVINO 2. 250 22MP images uses WinML, CoreML and Adobe Lightroom Classic and CC 48 Minutes 30 Minutes Gen11 Enhancing250 imagesw/ML 1.1 hours 42 Minutes Gen9 1 2 Cyberlink PowerDirector Adobe Lightroom Classic & CC Performance 1.0x 1.7-2.7x
  42. 42. summary • Machine Learning is here on the Edge!! • Use Intel® Integrated Graphics for your Machine learning acceleration • Ships with most Windows and Mac platforms • Intel optimized ML stack is enabled by default • Automatic improvements delivered with OS and driver updates • Large improvement with 11th Gen Intel® Processor Graphics • Intel is continuously working with OSVs(Apple, Microsoft), ISVs, Open Source Community and others to improve the Intel® Graphics Software and Hardware for ML needs 42
  43. 43. references • Intel® processor Graphics gen11 aka “Icelake” • Apple Machine learning on Intel® • CreateML • CoreML • Metal Performance Shaders • Windows AI • WebML • Intel® Open Image Denoiser • Windows May2019 ML improvements on Intel® • Adobe Enhance Details • Unity AI • WinML Get Started • DirectML 43
  44. 44. Acknowledgements 44 • Aaftab Munshi • Joseph Van De Water • Sudhir Tonse • Ningxin Hu • Gokul N Tonpe • Insoo Woo • Ben Ashbaugh • Murali Ramadoss • Thanh-Kevin Dang • Jay Patel • Prashanth Palaniappan • Xiaoqing Wu • Sachin Sane • Katen Shah • Brian Jacobosky • Arzhange Safdarzadeh • Anthony Bernecky • Leland E Martin • Antal Tungler • Damien Triolet • Jacek Krol • Jacek Nowak • Kalyan Muthukumar
  45. 45. LegalDisclaimer Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or compenent can be absolutely secure. Check with your system manufacturer or retailer or learn more at [intel.com]. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors and Intel Integrated GPU. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. All testing was performed at Intel® Folsom Intel, the Intel logo, are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © Intel Corporation.
  46. 46. questions

×