Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AIDC India - AI on IA

55 views

Published on

AIDC Summit Series - India 2019

Published in: Software
  • Be the first to comment

  • Be the first to like this

AIDC India - AI on IA

  1. 1. Intelarchitecturefor ArtificialIntelligence Austin Cherian Head - High Performance Computing Business, India austin.cherian@intel.com
  2. 2. hardware Multi-purpose to purpose-built AI compute from device to cloud solutions Partner ecosystem to facilitate AI in finance, health, retail, industrial & more Intel analytics ecosystem to get your data ready Data Driving AI forward through R&D, investments & policy Future tools Software to accelerate development & deployment of real solutions Bring Your AI Vision to Life Using Intel’s Comprehensive Portfolio #IntelAIDC2019 | #AIonIntel | #IntelAI
  3. 3. Data-centricinfrastructure Move Faster Process EverythingStore More INTEL® SILICON PHOTONICS CPU AI ACCELERATORSINTEL® ETHERNET INTEL® OMNI-PATH FABRIC GPU (Integrated & Discrete) FPGA, GPU Powering the Future of Compute & Communications #IntelAIDC2019 | #AIonIntel | #IntelAI
  4. 4. HARDWARE Multi-purpose to purpose-built AI compute from cloud to device All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Deep Learning Training Inference AI Mainstream intensive Most other #IntelAIDC2019 | #AIonIntel | #IntelAI
  5. 5. HARDWARE Multi-purpose to purpose-built AI compute from device to cloud Large-scale data centers such as public cloud or comms service providers, gov’t & academia, large enterprise IT User-touch end point devices with lower power requirements such as laptops, tablets, smart home devices, drones Small-scale data centers, small business IT infrastructure, to few on-premise server racks & workstations All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Varies to <1ms <5ms <10-40ms ~100ms DatacenterEdgeEndpoint #IntelAIDC2019 | #AIonIntel | #IntelAI
  6. 6. HARDWARE Multi-purpose to purpose-built AI compute from device to cloud IoT SENSORS (Security, home, retail, industrial…) Display, Video, AR/VR, Gestures, Speech DESKTOP & MOBILITY Vision & Inference Speech SELF-DRIVING VEHICLES Autonomous Driving SERVERS, APPLIANCES & GATEWAYS Latency- Bound Inference Basic Inference, Media & Vision Most Use Cases SERVERS & APPLIANCES DatacenterEdgeEndpoint Flexible & Memory Bandwidth-Bound Use Cases Varies to <1ms <5ms <10-40ms ~100ms Dedicated Media & Vision Inference Most Use Cases Most Intensive Use Cases NNP-L M.2 CardSOC Special Purpose Special Purpose 1GNA=Gaussian Neural Accelerator All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Images are examples of intended applications but not an exhaustive list. Onesizedoesnotfitall
  7. 7. Intel®Xeon®ScalableProcessorFamily Now build the AI you want on the CPU you know yourFOUNDATION forAI Getmaximumutilization running data center & AI workloads side-by-side Breakmemorybarriers in order to apply AI to large data sets & models Trainmodelsatscale through efficient scaling to many nodes Accessoptimizedtools including continuous performance gains for TensorFlow, MXNet, & more Runinthecloud including AWS, Microsoft, Alibaba, TenCent, Google, Baidu, & more Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804
  8. 8. Intel®Xeon®ScalableProcessorforAI ArtificialIntelligenceWithIntel®Xeon®ScalableProcessors Deep Learning INFERENCE & Deep Learning TRAINING Generational performance improvements Continuous software optimizations Lower precision integerops Scaling efficiency #IntelAIDC2019 | #AIonIntel | #IntelAI
  9. 9. Upto65%PerformanceBoostwithIntel®AVX-512 onIntel®Xeon®Platinum8180processor 1 1.37 1 1.65 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Intel® AVX-512 OFF Caffe GoogLeNet v1 Intel® AVX-512 ON Caffe GoogLeNet v1 Intel® AVX-512 OFF Caffe AlexNet Intel® AVX-512 ON Caffe AlexNet Convolution layer performance on Intel® Xeon® Platinum 8180 Processor Test results above quantify the value add of Intel® AVX-512 to Convolution layer performance. All results shown above are measured on Intel® Xeon® Platinum 8180 Processor running AI topologies on Caffe framework with and without Intel® AVX-512 enabled Convolutionlayerperformance (Measuredinmilliseconds) representedrelativetoabaseline1.0 Higherisbetter Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Batch Sizes AlexNet:256 GoogleNet-V1: 96 Configuration Details on Slide: 24 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Generational performance improvements Enhanced compute performance with Intel® AVX-512 on Intel® Xeon® Scalable Processor
  10. 10. INTRODUCING2nd GENERATION INTEL® XEON® SCALABLEPROCESSORS LEADERSHIP WORKLOAD PERFORMANCE GROUNDBREAKING MEMORY INNOVATION EMBEDDED ARTIFICIAL INTELLIGENCE ACCELERATION ENHANCED AGILITY & UTILIZATION HARDWARE ENHANCED SECURITY BUILT-IN VALUE UNINTERRUPTED #IntelAIDC2019 | #AIonIntel | #IntelAI
  11. 11. Intel®DeepLearningBoost(DLBoost)featuring Vector Neural Network Instructions (VNNI) INT8 07 06 05 04 03 02 01 00 Sign Mantissa NEW vpdpbusd OUTPUT INT32 CONSTANT INT32 INPUT INT8 INPUT INT8 AVX-512 (VNNI) instruction to accelerate INT8 convolutions: vpdpbusd INPUT INT8 INPUT INT8 vpmaddubsw vpmaddwd vpaddd OUTPUT INT16 OUTPUT INT32 CONSTANT INT16 CONSTANT INT32 OUTPUT INT32 Current AVX-512 instructions to perform INT8 convolutions: vpmaddubsw, vpmaddwd, vpaddd #IntelAIDC2019 | #AIonIntel | #IntelAI
  12. 12. IncreasingAIperformanceonIntel®Xeon®PROCESSORS Intel® Optimizations for Caffe ResNet-50 Inference Throughput Performance Intel® DL Boost Theoretical Throughput per core over 1st Generation Intel® Xeon® Scalable Processors BASE SKX launch July 2017 1st Generation Intel® Xeon® Scalable Processor 2S Intel® Xeon® Platinum 8280 processor (28 cores/S) 2S Intel®Xeon® Platinum 9282 processor (56 cores/S) vs. BASE vs. BASE 2S Intel® Xeon® Platinum 8180 processor (28 cores/S) 14x1 30x15.7x1 1 Based on Intel internal testing: 1X,5.7x,14x and 30x performance improvement based on Intel® Optimization for Café ResNet-50 inference throughput performance on Intel® Xeon® Scalable Processor. See Configuration Details slide 22 Performance results are based on testing as of 7/11/2017(1x) ,11/8/2018 (5.7x), 2/20/2019 (14x) and 2/26/2019 (30x) and may not reflect all publically available security updates. No product can be absolutely secure. See configuration slide 22 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance 2nd Generation Intel® Xeon® Scalable Processor 1st Gen Xeon-SP Int8 1s Gen Xeon-SP FP32 Upto 1.3x 2nd Gen Xeon-SP Int8 w/ Intel® DL Boost 3 Instructions VPMADDUBSW VPMADDWD VPADDD 1st Gen Xeon-SP Int8 Upto 3x 1 Instruction VPDPBUSD Faster throughput, but inefficient Uses 3 instructions per operation DL Boost fixes this, combines 3 instructions into 1 vs. BASE #IntelAIDC2019 | #AIonIntel | #IntelAI
  13. 13. Intel® Nervana™neuralnetworkprocessors(NNP)¥ NNP-L NNP-I Highly-efficient multi-model inferencing for cloud, data center & intense appliances Fastest time-to-train with high bandwidth AI server connections for the most persistent, intense usage DEDICATED DL TRAINING DEDICATED DL INFERENCE ‡The Intel® Nervana™ Neural Network Processor is a future product that is not broadly available today All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
  14. 14. INTEL® FPGAPRODUCTPORTFOLIO ‡The Intel® Nervana™ Neural Network Processor is a future product that is not broadly available today All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
  15. 15. ‡The Intel® Nervana™ Neural Network Processor is a future product that is not broadly available today All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
  16. 16. Intel® Movidius™ Visionprocessingunit(vPU) Power-Efficient Image Processing, Computer Vision & Deep Learning for Devices SURVEILLANCe Detection & Classification • Identification • Multi-Nodal Systems • Multi-Modal Sensing • Video, Image Capture • SERVICEROBOTS Navigation • 3D Vol. Mapping • Multi-Modal Sensing • WEARABLES Detection, Tracking • Recognition • Video, Image, Session Capture • DRONES • Sense & Avoid • GPS Denied Hovering • Pixel Labeling • Video, Image Capture SMARTHOME • Detection, Tracking • Perimeter, Presence Monitoring • Recognition, Classification • Multi-Nodal Systems • Multi-Modal Sensing • Video, Image Capture AR-VRHMD • 6DOF Pose, Position, Mapping • Gaze, Eye Tracking • Gesture Tracking, Recognition • See-Through Camera All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. #IntelAIDC2019 | #AIonIntel | #IntelAI
  17. 17. #IntelAIDC2019 | #AIonIntel | #IntelAI
  18. 18. Intelintegratedprocessorgraphics Built-in Deep Learning Inference Acceleration • Shipped in > 1billion Intel SOCs • Broad choice of performance/power offering across Intel® Atom™ , Intel® Core™ and Intel® Xeon™ processors Ubiquity/Scalability MediaLeadership • Intel® Quick Synch Video – fixed function media blocks to improve power and performance • Intel® Media SDK - API that provides access to hardware-accelerated codecs Powerful&FlexibleArchitecture • Rich data type support for 32bitFP, 16bitFP, 32bitInteger, 16bitInteger with SIMD multiply-accumulate instructions MemoryArchitecture • Shared memory architecture on die between CPU and GPU to enable lower latency and power Hardwareintegration Softwaresupport MacOS (CoreML and MPS1) Windows O/S (WinML) OpenVINO™ Toolkit (Win, Linux) clDNN All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
  19. 19. Intel®Gaussianneuralaccelerator(GNA) https://software.intel.com/en- us/iot/speech-enabling-dev-kit TryitTODAY! Intel® Speech Enabling Developer Kit Learn more: https://sigport.org/sites/default/files/docs/PosterFinal.pdf Amplethroughput For speech, language, and other sensing inference Lowpower <100 mW power consumption for always-on applications Flexibility Gaussian mixture model (GMM) and neural network inference support Intel® GNA (IP) DSP Streaming Co-Processor for Low-Power Audio Inference & More All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
  20. 20. less often → AccessDistribution Data Access Frequency Cooler data  more often Hot data DRAM HOT TIER SSD WARM TIER Intel®3DNandSSD Optimize performancegiven cost and power budget LLC Core CPU L2 L1 pico-secs nano-secs Memory Sub-System 10s GB <100nanosecs Network Storage SSD 10s TB <100millisecs 10s TB <100microsecs Compute 100s GB Move Data Closer to <1microsec 1s TB Maintain Persistenc< y10microsecs Goal:EfficientData-CentricArchitecture HDD / TAPE COLD TIER #IntelAIDC2019 | #AIonIntel | #IntelAI
  21. 21. Thebestofbothworldswith Intel®Optane™DCPersistentMemory Performance comparable to DRAM at low latencies1 Data persistence with higher capacity than DRAM2 Memoryattributes 1“Performance comparable to DRAM” - Intel persistent memory is expected to perform at latencies near DDR4 DRAM. “low latencies” - Data transferred across the memory bus causes latencies to be orders of magnitude lower when compared to transferring data across PCIe or I/O bus’ to NAND/Hard Disk. 2Intel persistent memory offers 3 different capacities – 128GB, 256GB, 512GB. Individual DIMMs of DDR4 DRAM max out at 256GB. Performance results are based on testing as of February 22, 2019 and may not reflect all publicly available security updates. See slide 24 for details. No product or component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. Storageattributes #IntelAIDC2019 | #AIonIntel | #IntelAI
  22. 22. Connectivity High-speed connectivity for massively parallel & distributed AI All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Intel®SiliconPhotonics Connects memory and compute, integrating connectivity technologies onto a single die for affordable, scalable solutions Comingsoon SmartNIC(CascadeGlacier) Enables optimized performance for Intel® Xeon® processor-based systems Intel®Omni-PathArchitecture Provides low-latency interconnect to scale to hundreds of thousands of nodes without losing performance or reliability
  23. 23. Intel®Omni-PathArchitectureEvolutionaryApproach,RevolutionaryFeatures,End-to-EndSolution HFI Adapters Single port x8 and x16 x8 Adapter (58 Gb/s) x16 Adapter (100 Gb/s) Edge Switches 1U Form Factor 24 and 48 port 24-port Edge Switch 48-port Edge Switch Director Switches QSFP-based 288 and 1,152 port 288-port Director Switch (7U chassis) 48-port Leaves 1,152-port Director Switch (20U chassis) 48-port Leaves Cables Third Party Vendors Passive Copper Active Optical Silicon OEM custom designs HFI and Switch ASICs Switch silicon up to 48 ports (1200 GB/s total b/w HFI silicon Up to 2 ports (50 GB/s total b/w) “-F” Processors w/integrated HFI Software Open Source Host Software and Fabric Manager #IntelAIDC2019 | #AIonIntel | #IntelAI
  24. 24. Edge Device ARTIFICIALINTELLIGENCE Platforms Finance Healthcare Energy Industrial Transport Retail Home More… Data Center TOOLKITSApp Developers librariesData Scientists foundationLibrary Developers * * * * FOR * * * * HardwareIT System Architects SolutionsSolution Architects AI Solutions Catalog (Public & Internal) DEEPLEARNINGACCELERATORS DEEPLEARNINGDEPLOYMENT OpenVINO™ † Intel® Movidius™ SDK Open Visual Inference & Neural Network Optimization toolkit for inference deployment on CPU, processor graphics, FPGA & VPU using TF, Caffe* & MXNet* Optimized inference deployment for all Intel® Movidius™ VPUs using TensorFlow* & Caffe* DEEPLEARNINGFRAMEWORKS Now optimized for CPU Optimizations in progress TensorFlow* MXNet* Caffe* BigDL/Spark* Caffe2* PyTorch* PaddlePaddle* DEEPLEARNING Intel® Deep Learning Studio‡ Open-source tool to compress deep learning development cycle MACHINELEARNINGLIBRARIES Python R Distributed •Scikit- learn •Pandas •NumPy •Cart •Random Forest •e1071 •MlLib (on Spark) •Mahout ANALYTICS,MACHINE&DEEPLEARNINGPRIMITIVES Python DAAL MKL-DNN clDNN Intel distribution optimized for machine learning Intel® Data Analytics Acceleration Library (for machine learning) Open-source deep neural network functions for CPU, processor graphics DEEPLEARNINGGRAPHCOMPILER Intel® nGraph™ Compiler (Alpha) Open-sourced compiler for deep learning model computations optimized for multiple devices (CPU, GPU, NNP) using multiple frameworks (TF, MXNet, ONNX) AIFOUNDATION A R T I F I C I A l I N T E L L I G E n C e NNP L-1000 * * * * † Formerly the Intel® Computer Vision SDK *Other names and brands may be claimed as the property of others. All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Ai.intel.com Inference #IntelAIDC2019 | #AIonIntel | #IntelAI
  25. 25. TransformaIwithSoftware Akanksha Balani Country Lead - Intel Software Tools – IAGS, Intel
  26. 26. Intersectionofdata&computegrowth 4TBAutonomousvehicle 5TBCONNECTEDAIRPLANE 1PBSmartFactory 1.5GBAverage internetuser 750pBCloudvideoProvider DailyBy2020 Source: Amalgamation of analyst data and Intel analysis. Business Insights Operational Insights Security Insights #IntelAIDC2019 | #AIonIntel | #IntelAI
  27. 27. Consumer Health Finance Retail Government Energy Transport Industrial Other Smart Assistants Chatbots Search Personalization Augmented Reality Robots Enhanced Diagnostics Drug Discovery Patient Care Research Sensory Aids Algorithmic Trading Fraud Detection Research Personal Finance Risk Mitigation Support Experience Marketing Merchandising Loyalty Supply Chain Security Defense Data Insights Safety & Security Resident Engagement Smarter Cities Oil & Gas Exploration Smart Grid Operational Improvement Conservation Autonomous Cars Automated Trucking Aerospace Shipping Search & Rescue Factory Automation Predictive Maintenance Precision Agriculture Field Automation Advertising Education Gaming Professional & IT Services Telco/Media Sports Source: Intel forecast Aiwilltransform #IntelAIDC2019 | #AIonIntel | #IntelAI
  28. 28. Intel®AITools DELIVERING ROBUST TOOLSETS & POWERFUL RESOURCES. ACCELERATING INNOVATIVE AI SOLUTIONS.
  29. 29. Edge Device ARTIFICIALINTELLIGENCE Platforms Finance Healthcare Energy Industrial Transport Retail Home More… Data Center TOOLKITSApp Developers librariesData Scientists foundationLibrary Developers * * * * FOR * * * * HardwareIT System Architects SolutionsSolution Architects AI Solutions Catalog (Public & Internal) DEEPLEARNINGACCELERATORS DEEPLEARNINGDEPLOYMENT OpenVINO™ † Intel® Movidius™ SDK Open Visual Inference & Neural Network Optimization toolkit for inference deployment on CPU, processor graphics, FPGA & VPU using TF, Caffe* & MXNet* Optimized inference deployment for all Intel® Movidius™ VPUs using TensorFlow* & Caffe* DEEPLEARNINGFRAMEWORKS Now optimized for CPU Optimizations in progress TensorFlow* MXNet* Caffe* BigDL/Spark* Caffe2* PyTorch* PaddlePaddle* DEEPLEARNING Intel® Deep Learning Studio‡ Open-source tool to compress deep learning development cycle MACHINELEARNINGLIBRARIES Python R Distributed •Scikit- learn •Pandas •NumPy •Cart •Random Forest •e1071 •MlLib (on Spark) •Mahout ANALYTICS,MACHINE&DEEPLEARNINGPRIMITIVES Python DAAL MKL-DNN clDNN Intel distribution optimized for machine learning Intel® Data Analytics Acceleration Library (for machine learning) Open-source deep neural network functions for CPU, processor graphics DEEPLEARNINGGRAPHCOMPILER Intel® nGraph™ Compiler (Alpha) Open-sourced compiler for deep learning model computations optimized for multiple devices (CPU, GPU, NNP) using multiple frameworks (TF, MXNet, ONNX) AIFOUNDATION AI NNP L-1000 * * * * † Formerly the Intel® Computer Vision SDK *Other names and brands may be claimed as the property of others. All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Ai.intel.com Inference #IntelAIDC2019 | #AIonIntel | #IntelAI
  30. 30. Intel®Xeon®processorsNow Optimized For Deep Learning INFERENCE THROUGHPUT Intel® Xeon® Platinum 8180 Processor higher Intel optimized Caffe GoogleNet v1 with Intel® MKL inference throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe TRAINING THROUGHPUT Intel® Xeon® Platinum 8180 Processor higher Intel Optimized Caffe AlexNet with Intel® MKL training throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe UP TO 241x 1 UP TO 277x 1 Optimized Frameworks Optimized Intel® MKL Libraries Inference and training throughput uses FP32 instructions Deliver significant AI performance with hardware & software optimizations on Intel® Xeon® Scalable family 1 The benchmark results may need to be revised as additional testing is conducted. The results depend on the specific platform configurations and workloads utilized in the testing, and may not be applicable to any particular user's components, computer system or workloads. The results are not necessarily representative of other benchmarks and other benchmark results may show greater or lesser impact from mitigations. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance.Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 2018. Configurations: See slide 4. #IntelAIDC2019 | #AIonIntel | #IntelAI
  31. 31. Intelsoftware–ExtractPerformance Build highly optimized media infrastructure, solutions, & applications Fast, Dense, High Quality Transcoding Improve performance, scalability, & reliability for applications and frameworks -Computing and ML/DL Technical & Enterprise compute, HPC, AI Take advantage of deep system-wide insight & analysis for system & embedded apps Manuf., Retail, Drones, Robots… Smart Cities, Auto. Driving, Gaming… Create solutions using Computer Vision – OpenVino Toolkit, Deep Learning, Graphics, Libraries, Media, OpenCL™, & more Optimization Tools , SDKs Edge to Data Center to Cloud Intel® Distribution of Python Intel® DAAL AI&IoT AI,HPC,Enterprise #IntelAIDC2019 | #AIonIntel | #IntelAI
  32. 32. Intel®ParallelStudioXE POWER THROUGH PERFORMANCE BOTTLENECKS. CODE NEW BREAKTHROUGHS FOR AI. #IntelAIDC2019 | #AIonIntel | #IntelAI
  33. 33. 34 AISoftwareOptimization Intel® Parallel Studio XE Up to 35X faster application performance **Intel® Xeon Phi™ Processor Software Ecosystem Momentum Guide Performance results are based tests from 2016-2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations & functions. Any change to any of those factors may cause the results to vary. You should consult other information & performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/performance. See configurations in individual case study links. Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804. For more complete information about compiler optimizations, see our Optimization Notice. NERSC (National Energy Research Scientific Computing Center) Read case study Science&research For more success stories, review Intel® Parallel Studio XE Case Studies Artificialintelligence Performance speedup of up to 23X faster with Intel optimized scikit-learn vs. stock scikit-learn Google Cloud Platform Read blog LifeScience Simulations ran up to 7.6X faster with 9X energy efficiency** LAMMPS code - Sandia National Laboratories Read technology brief #IntelAIDC2019 | #AIonIntel | #IntelAI
  34. 34. Artificial Intelligence Energy EDA Science & Research Manufacturing Government Computer Software IT Healthcare Digital Media Telecommunications 35 Intel®ParallelStudioXEforAI:HighPerformance, ScalableSoftwareacrossMultipleIndustries 4X 8X 1.35X Kyoto University the Walker Molecular Dynamics lab 3X 1.4X 4X 10X 11X 25X 2.5X 1.25X 1.3X 5X 2X 20X 2.5X Performance results are based tests from ~2015-2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. For more complete information about performance and benchmark results, visit www.intel.com/benchmark. See configurations in Intel® Parallel Studio XE Case Studies deck, & individual case studies links at this site More Success Stories ▪ Intel® Parallel Studio XE Case Studies deck ▪ Case studies site Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets & other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User & Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 Google Cloud Platform 23X
  35. 35. Intel®DistributionforPython* SUPERCHARGE PYTHON APPS. RETHINK HIGH PERFORMANCE FOR AI. #IntelAIDC2019 | #AIonIntel | #IntelAI
  36. 36. 3 Python*Landscape Challenge#1 Domain experts are not professional software programmers Adoption of Python continues to grow among domain experts & developers for its productivity benefits MostPopularCodingLanguagesof2018 Challenge#2 Python performance limits migration to production systems Intel’s Python Tools › Accelerate Python performance › Enable easy access › Empower the community #IntelAIDC2019 | #AIonIntel | #IntelAI
  37. 37. 38 1Available only in Intel® Parallel Studio Composer Edition. EcosystemcompatibilityGreaterProductivityFasterPerformance Supports Python 2.7 & 3.6, Conda & PIP Operating System: Windows*, Linux*, MacOS1* Intel® Architecture Platforms Performance Libraries, Parallelism, Multithreading, Language Extensions › Accelerated NumPy/SciPy/scikit-learn with Intel® MKL1 & Intel® DAAL2 › Data analytics, machine learning & deep learning with scikit-learn, pyDAAL, TensorFlow* & Caffe* › Scale with Numba* & Cython* › Includes optimized mpi4py, works with Dask* & PySpark* › Optimized for latest Intel® architecture › Prebuilt & optimized packages for numerical computing, machine/deep learning, HPC, & data analytics › Drop in replacement for existing Python- No code changes required › Jupyter* notebooks, Matplotlib included › Free download & free for all uses including commercial deployment › Supports Python 2.7 & 3.6, optimizations integrated in Anaconda* Distribution › Distribution & optimized packages available via Conda, PIP, APT GET, YUM, & DockerHub, numerical performance optimizations integrated in Anaconda Distribution › Optimizations upstreamed to main Python trunk › Priority Support with Intel® Parallel Studio XE 1Intel® Math Kernel Library 2Intel® Data Analytics Acceleration Library Prebuilt & Accelerated Packages AcceleratePython*withIntel®DistributionforPython*High Performance Python* for Scientific Computing, Data Analytics, Machine & Deep Learning Learn More: software.intel.com/distribution-for-python Operating System: Windows*, Linux*, MacOS1* Intel® Architecture Platforms
  38. 38. Intel®PerformanceLibraries POWERFUL & AWARD-WINNING PERFORMANCE LIBRARIES TO OPTIMIZE CODE & ACCELERATE DEVELOPMENT. 1Data from Evans Data Software Developer surveys, 2011-2016
  39. 39. Fast,ScalableCodewithIntel®MathKernelLibrary (Intel® MKL) 40 › Speeds computations for scientific, engineering, financial and machine learning applications by providing highly optimized, threaded, and vectorized math functions › Provides key functionality for dense and sparse linear algebra (BLAS, LAPACK, PARDISO), FFTs, vector math, summary statistics, deep learning, splines and more › Dispatches optimized code for each processor automatically without the need to branch code › Optimized for single core vectorization and cache utilization › Automatic parallelism for multi-core and many-core › Scales from core to clusters › Available at no cost & royalty free › Great performance with minimal effort! 1 Available only in Intel® Parallel Studio Composer Edition. Dense&SPARSELinearAlgebra FastFourierTransforms VectorMath VectorRNGs FastPoissonSolver &More! Intel®MKLLibraryOffers…
  40. 40. SpeedupAnalytics&MachineLearningwith Intel®DataAnalyticsAccelerationLibrary(Intel®DAAL) › Highly tuned functions for classical machine learning & analytics performance from datacenter to edge running on Intel® processor-based devices › Simultaneously ingests data & computes results for highest throughput performance › Supports batch, streaming & distributed usage models to meet a range of application needs › Includes Python*, C++, Java* APIs, & connectors to popular data sources including Spark* & Hadoop* Pre-processing Transformation Analysis Modeling DecisionMaking Decompression, Filtering, Normalization Aggregation, Dimension Reduction Summary Statistics Clustering, etc. Machine Learning (Training) Parameter Estimation Simulation Forecasting Decision Trees, etc. Validation Hypothesis Testing Model Errors What’sNewinthe2019Release New Algorithms › Logistic Regression, most widely-used classification algorithm › Extended Gradient Boosting Functionality for inexact split calculations & user-defined callback canceling for greater flexibility › User-defined Data Modification Procedure supports a wide range of feature extraction & transformation techniques Learn More: software.intel.com/daal #IntelAIDC2019 | #AIonIntel | #IntelAI
  41. 41. Spark Core Feature Parity Lower TCO, improved ease of use Efficient Scale-Out HighPerformanceDeepLearningforApacheSpark*onCPUInfrastructure No need to deploy costly accelerators, duplicate data, or suffer through scaling headaches! Designed&OptimizedforIntel®Xeon®Processor Powered by Intel® MKL-DNN DataFrame ML Pipelines SQL SparkR Streaming MLlib GraphX BigDL #IntelAIDC2019 | #AIonIntel | #IntelAI
  42. 42. 4 Consumer Sentiment Analysis Image Similarity * Search Image Transfer Learning Image Generation 3D Image Support Fraud Detection Anomaly Detection Recommendation NCF Wide n Deep Object Detection Tensorflow support Low latency serving Health Finance Retail Manufacturing Infrastructure #IntelAIDC2019 | #AIonIntel | #IntelAI
  43. 43. Result Client JD.Com, 2nd largest online retailer in China, ~ 25 M users. ChallengE Building deep learning applications such as image similarity search without moving data. Solution Switched from GPU to CPU cluster. Using Apache Spark* with BigDL, running on Intel® Xeon® processors Intel® Xeon® CPU Processing ~380M images 4XGaiN CaseStudy: ImageRecognition #IntelAIDC2019 | #AIonIntel | #IntelAI
  44. 44. The integrated surveillance system connected to cameras at stadiums, which transmitted video data to operational HQ in each city. Intel® Distribution of OpenVINO™ toolkit allowed Axxonsoft to distribute the neural network video analytics of the video across all available Intel hardware, for zone entry detection, abandoned objects detection, and facial recognition. SecurityforStadiums atWorldCup2018 Result used to protect Surveillance Cameras9000+ fans2Million+ See case study for details.
  45. 45. Result 60%increaseIn coverage rate and 20% accuracy rate, better than traditional rule-based approach Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. *Other names and brands may be claimed as the property of others. “Performance of Intel® Xeon® processors and the sustained optimization of Apache Spark were key [to deploy] a single- platform that consolidates and analyzes all types of data, from any channel, within a highly secure environment.” https://ai.intel.com/nervana/wp-content/uploads/sites/53/2018/06/Intel-White-Paper-Union-Pay_2_hir-res_Keep-the-Size-of-Figure-6.pdf https://www.intel.com/content/www/us/en/financial-services-it/union-pay-case-study.html Client China UnionPay*, which specializes in banking services and payment systems. It is the 3rd largest payment network in the world. ChallengE Detect fraudulent credit card transactions with more coverage and accuracy. Solution Using Cloudera Enterprise (Hadoop Cluster), Apache Spark* with BigDL, running on Intel® Xeon® and 5th Gen Intel® Core™ Processors for credit card fraud detection. Historical data is stored on Apache Hive*. Data preprocessing done with Apache Spark SQL*.
  46. 46. Result Working closely with Intel’s Analytics Zoo team, Midea built a highly-optimized defect detection solution, and chose Intel® Xeon® Scalable 6130/6148 over GPU-based servers as it met their latency requirements and more easily integrated into their existing infrastructure https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics “Analytics Zoo from Intel provides a great tool for developing the end-to-end AI solutions, building pipelines across cloud and edge computing, and optimizing the hardware resources.” Zheng Hu, Director of Computer Vision Research Institute, Midea Public Client Midea Group is a Chinese electrical appliance manufacturer with 21 manufacturing plants and 260 logistics centers across 200 countries ChallengE Midea needed to eliminate defects caused by scratched surfaces, missing bolts, misaligned labeling on surfaces (glass, polished metal, painted), and human inspection was not able to meet target quality metrics or detection rate requirements. Solution An advanced defect inspection system built on top of Analytics Zoo, which provides a unified analytics + AI platform that seamlessly unites Spark, BigDL and TensorFlow* programs into an integrated pipeline. The system was based on Intel® Xeon Scalable 6130/6148 servers and Core i7 edge devices. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. *Other names and brands may be claimed as the property of others.
  47. 47. Result The platform provides multiple functions from onboard Wi-Fi to computer vision applications such as human/vehicle detection at crossroads, onboard empty seat detection and intruder detection. OpenVINO™ provides a scalable, high performance common platform across a variety of hardware for greater efficiencies. In-train visionplatform Enables pedestrian & vehicle identification at crossroads + on-train empty seat detection #IntelAIDC2019 | #AIonIntel | #IntelAI
  48. 48. Intel®DistributionofOpenVINO™Toolkit COMPUTER VISION & DEEP LEARNING APPS... NOW FASTER. #IntelAIDC2019 | #AIonIntel | #IntelAI
  49. 49. OpenVINO™SoftwaretoolkitVisual Inferencing & Neural Network Optimization DEPLOY COMPUTER VISION & DEEP LEARNING CAPABILITIES TO THE EDGE HighPerformance,highEfficiencyfortheedge Writeonce+scaletoDiverseAccelerators BroadFrameworksupport Other names and brands may be claimed as the property of others VPU = Vision Processing Unit (Movidius)
  50. 50. 51 What’sInsidetheOpenVINO™toolkit OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos Intel®Architecture-Based PlatformsSupport OS Support CentOS* 7.4 (64 bit) Ubuntu* 16.04.3 LTS (64 bit) Microsoft Windows* 10 (64 bit) Yocto Project* version Poky Jethro v2.0.3 (64 bit) Intel® Deep Learning Deployment Toolkit Traditional Computer Vision Tools & Libraries Model Optimizer Convert & Optimize Inference Engine Optimized InferenceIR OpenCV* OpenVX* Photography Vision Optimized Libraries IR = Intermediate Representation file For Intel® CPU & CPU with integrated graphics Increase Media/Video/Graphics Performance Intel® Media SDK Open Source version OpenCL™ Drivers & Runtimes For CPU with integrated graphics Optimize Intel® FPGA FPGA RunTime Environment (from Intel® FPGA SDK for OpenCL™) Bitstreams FPGA – Linux* only 20+ Pre-trained Models Code SamplesComputer Vision Algorithms Samples
  51. 51. 0 2 4 6 8 10 12 14 16 18 20 GoogLeNet v1 Vgg16* Squeezenet* 1.1 GoogLeNet v1 (32) Vgg16* (32) Squeezenet* 1.1 (32) Std. Caffe on CPU OpenCV on CPU OpenVINO on CPU OpenVINO on GPU OpenVINO on FPGA 52 Get an even Bigger Performance Boost with Intel® FPGA 1Depending on workload, quality/resolution for FP16 may be marginally impacted. A performance/quality tradeoff from FP32 to FP16 can affect accuracy; customers are encouraged to experiment to find what works best for their situation. Performance results are based on testing as of June 13, 2018 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Configuration: Testing by Intel as of June 13, 2018. Intel® Core™ i7-6700K CPU @ 2.90GHz fixed, GPU GT2 @ 1.00GHz fixed Internal ONLY testing, Test v3.15.21 – Ubuntu* 16.04, OpenVINO 2018 RC4, Intel® Arria® 10 FPGA 1150GX. Tests were based on various parameters such as model used (these are public), batch size, and other factors. Different models can be accelerated with different Intel hardware solutions, yet use the same Intel software tools. Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Public Models (Batch Size) RelativePerformance Improvement Standard Caffe* Baseline 19.9x1 OpenVINO on CPU+Intel® FPGAOpenVINOon CPU+ Intel® Processor Graphics(GPU)/ (FP16) Comparison of Frames per Second (FPS) IncreaseDeepLearningWorkloadPerformance onPublicModelsusingOpenVINO™toolkit&Intel®Architecture
  52. 52. oneAPI Single Programming Model to Deliver Cross-Architecture Performance All information provided in this deck is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.
  53. 53. 54 ProgrammingChallenge Diverse set of data-centric hardware No common programming language or APIs Inconsistent tool support across platforms Each platform requires unique software investment Spatial FPGA Matrix AI Vector GPU Scalar CPU SVMS Optimization Notice
  54. 54. The future is a diverse mix of scalar, vector, matrix, & spatial architectures deployed in CPU, GPU, AI, FPGA & other accelerators DiverseWorkloadsrequireDIVERSEARCHITECTURES 55 Spatial FPGA Matrix AI Vector GPU Scalar CPU SVMS Optimization Notice
  55. 55. 56 Project oneAPI delivers a unified programming model to simplify development across diverse architectures Common developer experience across Scalar, Vector, Matrix & Spatial architectures (CPU, GPU, AI and FPGA) Uncompromised native high-level language performance Based on industry standards & open specifications Optimized Applications Optimized Middleware / Frameworks oneAPI Language & Libraries Intel’soneAPICoreConcept FPGAAIGPUCPU Scalar Vector Matrix Spatial oneAPI Tools Optimization Notice
  56. 56. Some capabilities may differ per architecture. 57 oneAPIforcross-architectureperformance Optimized Applications Optimized Middleware & Frameworks oneAPI Product Direct Programming Data Parallel C++ API-Based Programming Libraries Analysis & Debug Tools Scalar Vector Matrix Spatial FPGAAIGPUCPU Optimization Notice
  57. 57. Language to deliver uncompromised parallel programming productivity and performance across CPUs and accelerators Based on C++ with language enhancements being driven through community project Open, cross-industry alternative to single architecture proprietary language DataparallelC++Standards-based, Cross-architecture Language There will still be a need to tune for each architecture. Optimization Notice
  58. 58. Visit TechDecoded.intel.io — a video series where developers learn to put into practice key optimization strategies with Intel Development tools. Focused conversations where tech. visionaries share key concepts on front-line topics, what you need to know and why it matters. Put into practice — short videos and articles that deliver the how- to’s of specific programming tasks using Intel tools. Watch big picture videos Dig deeper with Essential Get started with Quick Hits Webinars covering strategies, practices and tools that help you optimize applications and solutions performance. Visual Code Systems & IoT Data Science Data Center & Computing Modernization Cloud Computing GetTheMostFromYourCodeTodaywithIntelTech.Decoded Optimization Notice
  59. 59. 60 Tools for C/C++/Python/Fortran developers – HPC, AI, IOT, Cloud Partner programs focused on Developer enablement Developers, Customers, Partners Trained 129Customers Engaged 150K 67Programs 62Partners software.intel.com Techdecoded.intel.io Intel®Software Intel®DeveloperWorkshops #IntelAIDC2019 | #AIonIntel | #IntelAI
  60. 60. LET’SACCELERATETHEFUTURETOGETHER #IntelAIDC2019 | #AIonIntel | #IntelAI
  61. 61. SecuritybarrierRecognitionmodelUsing intel®DeepLearningDeploymentToolkit 65 #IntelAIDC2019 | #AIonIntel | #IntelAI
  62. 62. Run Inference 1: Model vehicle-license-plate- detection-barrier-0007 Detects Vehicles Run Inference 2: Model vehicle-attributes- recognition-barrier-0010 Classifies vehicle attributes Run Inference 3: Model license-plate-recognition- barrier-0001 Detects License Plates Load Input Image(s) Display Results 66 #IntelAIDC2019 | #AIonIntel | #IntelAI
  63. 63. 67 End-to-EndVisionWorkflow Decode Pre- Processing Inference Post- Processing Encode GPUCPU GPUCPU FPGA VPU GPUCPU Intel® Media SDK OpenCV* Intel® Deep Learning Deployment Toolkit Intel Media SDK Video input Video output with results annotated OpenCV #IntelAIDC2019 | #AIonIntel | #IntelAI
  64. 64. 68 KeyVisionSolutionsOptimizedbyIntel® DistributionofOpenVINO™toolkit Intel teamed with Philips to show that servers powered by Intel® Xeon® Scalable processors & Intel® Distribution of OpenVINO™ toolkit can efficiently perform deep learning inference on patients’ X-rays & computed tomography (CT) scans, without the need for accelerators. Achieved breakthrough performance for AI inferencing: ▪ 188x increase in throughput (images/sec) on Bone-age prediction model.1 ▪ 38x increase in throughput (images/sec) on Lung segmentation model. 1 “Intel® Xeon® Scalable processors and OpenVINO toolkit appears to be the right solution for medical imaging AI workloads. Our customers can use their existing hardware to its maximum potential, without having to complicate their infrastructure, while still aiming to achieve quality output resolution at exceptional speeds." — Vijayananda J., chief architect and fellow, Data Science and AI, Philips HealthSuite Insights, India White Paper 1See white paper for performance details. Philips #IntelAIDC2019 | #AIonIntel | #IntelAI
  65. 65. 69 The Intel® Distribution of OpenVINO™ toolkit helped GE deliver optimized inferencing to its deep learning image-classification solution. By bringing AI to its clinical diagnostic scanning, GE no longer needed an expensive 3rd party accelerator board, achieving: ▪ 5.9x inferencing performance above the target1 ▪ 14x inferencing speed over the baseline solution1 ▪ Improved image quality, diagnostic capabilities, and clinical workflows With the OpenVINO™ toolkit , we are now able to optimize inferencing across Intel® silicon, exceeding our throughput goals by almost 6x,” said David Chevalier, Principal Engineer for GE Healthcare. “We want to not only keep deployment costs down for our customers, but also offer a flexible, high-performance solution for a new era of smarter medical imaging. Our partnership with Intel allows us to bring the power of AI to clinical diagnostic scanning and other healthcare workflows in a cost-effective manner.” GE Healthcare* Intel-GE Healthcare, Intel® Distribution of OpenVINO™ Optimizes Deep Learning Performance for Healthcare Imaging KeyVisionSolutionsOptimizedbyIntel®Distributionof OpenVINO™toolkit 1See white paper for performance details.
  66. 66. 70 DemonstratedIndustrySuccessAccess Developer Success Stories for details & more examples #IntelAIDC2019 | #AIonIntel | #IntelAI

×