SlideShare a Scribd company logo
1 of 71
Download to read offline
Intelarchitecturefor
ArtificialIntelligence
Austin Cherian
Head - High Performance Computing Business, India
austin.cherian@intel.com
hardware
Multi-purpose to purpose-built
AI compute from device to cloud
solutions
Partner ecosystem to facilitate AI in
finance, health, retail, industrial & more
Intel analytics
ecosystem to
get your data
ready
Data
Driving AI forward
through R&D,
investments &
policy
Future
tools
Software to accelerate development &
deployment of real solutions
Bring Your AI Vision to Life Using Intel’s Comprehensive Portfolio
#IntelAIDC2019 | #AIonIntel | #IntelAI
Data-centricinfrastructure
Move Faster Process EverythingStore More
INTEL® SILICON PHOTONICS CPU
AI ACCELERATORSINTEL® ETHERNET
INTEL® OMNI-PATH FABRIC
GPU
(Integrated &
Discrete)
FPGA, GPU
Powering the Future of Compute & Communications
#IntelAIDC2019 | #AIonIntel | #IntelAI
HARDWARE Multi-purpose to purpose-built
AI compute from cloud to device
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Deep
Learning
Training
Inference
AI
Mainstream intensive
Most
other
#IntelAIDC2019 | #AIonIntel | #IntelAI
HARDWARE Multi-purpose to purpose-built
AI compute from device to cloud
Large-scale data centers such as public
cloud or comms service providers, gov’t
& academia, large enterprise IT
User-touch end point devices with lower power
requirements such as laptops, tablets, smart home
devices, drones
Small-scale data centers, small business
IT infrastructure, to few on-premise
server racks & workstations
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Varies to <1ms <5ms <10-40ms ~100ms
DatacenterEdgeEndpoint
#IntelAIDC2019 | #AIonIntel | #IntelAI
HARDWARE Multi-purpose to purpose-built
AI compute from device to cloud
IoT SENSORS
(Security, home, retail, industrial…)
Display, Video, AR/VR, Gestures, Speech
DESKTOP & MOBILITY
Vision &
Inference Speech
SELF-DRIVING VEHICLES
Autonomous
Driving
SERVERS, APPLIANCES & GATEWAYS
Latency-
Bound
Inference
Basic Inference,
Media & Vision
Most Use Cases
SERVERS & APPLIANCES
DatacenterEdgeEndpoint
Flexible & Memory
Bandwidth-Bound
Use Cases
Varies to <1ms <5ms <10-40ms ~100ms
Dedicated
Media & Vision
Inference
Most Use Cases
Most Intensive
Use Cases
NNP-L
M.2 CardSOC
Special Purpose Special Purpose
1GNA=Gaussian Neural Accelerator
All products, computer systems, dates, and figures are preliminary based on current expectations, and are
subject to change without notice. Images are examples of intended applications but not an exhaustive list.
Onesizedoesnotfitall
Intel®Xeon®ScalableProcessorFamily
Now build the AI you want on the CPU you know
yourFOUNDATION
forAI
Getmaximumutilization
running data center & AI workloads side-by-side
Breakmemorybarriers
in order to apply AI to large data sets & models
Trainmodelsatscale
through efficient scaling to many nodes
Accessoptimizedtools
including continuous performance gains for TensorFlow, MXNet,
& more
Runinthecloud
including AWS, Microsoft, Alibaba, TenCent, Google, Baidu, & more
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may
cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel
measured as of November 2016. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not
guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804
Intel®Xeon®ScalableProcessorforAI
ArtificialIntelligenceWithIntel®Xeon®ScalableProcessors
Deep Learning INFERENCE & Deep Learning TRAINING
Generational
performance
improvements
Continuous
software
optimizations
Lower
precision
integerops
Scaling
efficiency
#IntelAIDC2019 | #AIonIntel | #IntelAI
Upto65%PerformanceBoostwithIntel®AVX-512
onIntel®Xeon®Platinum8180processor
1
1.37
1
1.65
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Intel® AVX-512 OFF
Caffe GoogLeNet v1
Intel® AVX-512 ON
Caffe GoogLeNet v1
Intel® AVX-512 OFF
Caffe AlexNet
Intel® AVX-512 ON
Caffe AlexNet
Convolution layer performance on Intel® Xeon® Platinum 8180 Processor
Test results above quantify the value add of Intel® AVX-512 to Convolution layer performance. All results shown above are measured
on Intel® Xeon® Platinum 8180 Processor running AI topologies on Caffe framework with and without Intel® AVX-512 enabled
Convolutionlayerperformance
(Measuredinmilliseconds)
representedrelativetoabaseline1.0
Higherisbetter
Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as
"Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system.
Batch Sizes AlexNet:256 GoogleNet-V1: 96 Configuration Details on Slide: 24
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 2017
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations
include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured
by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Generational
performance
improvements
Enhanced compute performance with Intel® AVX-512 on Intel® Xeon® Scalable Processor
INTRODUCING2nd GENERATION
INTEL® XEON® SCALABLEPROCESSORS
LEADERSHIP WORKLOAD
PERFORMANCE
GROUNDBREAKING
MEMORY INNOVATION
EMBEDDED
ARTIFICIAL INTELLIGENCE
ACCELERATION
ENHANCED
AGILITY & UTILIZATION
HARDWARE ENHANCED
SECURITY
BUILT-IN
VALUE
UNINTERRUPTED
#IntelAIDC2019 | #AIonIntel | #IntelAI
Intel®DeepLearningBoost(DLBoost)featuring Vector Neural Network Instructions (VNNI)
INT8 07 06 05 04 03 02 01 00
Sign Mantissa
NEW
vpdpbusd OUTPUT
INT32
CONSTANT
INT32
INPUT
INT8
INPUT
INT8
AVX-512 (VNNI) instruction to accelerate INT8 convolutions: vpdpbusd
INPUT
INT8
INPUT
INT8
vpmaddubsw
vpmaddwd
vpaddd
OUTPUT
INT16 OUTPUT
INT32
CONSTANT
INT16 CONSTANT
INT32
OUTPUT
INT32
Current AVX-512 instructions to perform INT8 convolutions: vpmaddubsw, vpmaddwd, vpaddd
#IntelAIDC2019 | #AIonIntel | #IntelAI
IncreasingAIperformanceonIntel®Xeon®PROCESSORS
Intel® Optimizations for Caffe ResNet-50
Inference Throughput Performance
Intel® DL Boost Theoretical Throughput per core over
1st Generation Intel® Xeon® Scalable Processors
BASE
SKX launch
July 2017
1st Generation Intel® Xeon®
Scalable Processor
2S Intel® Xeon®
Platinum 8280
processor
(28 cores/S)
2S Intel®Xeon®
Platinum
9282 processor
(56 cores/S)
vs. BASE vs. BASE
2S Intel® Xeon®
Platinum 8180
processor
(28 cores/S)
14x1 30x15.7x1
1 Based on Intel internal testing: 1X,5.7x,14x and 30x performance improvement based on Intel® Optimization for Café ResNet-50 inference throughput performance on Intel® Xeon® Scalable Processor. See Configuration Details slide 22
Performance results are based on testing as of 7/11/2017(1x) ,11/8/2018 (5.7x), 2/20/2019 (14x) and 2/26/2019 (30x) and may not reflect all publically available security updates. No product can be absolutely secure. See configuration slide 22
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction
sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel
microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific
instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using
specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance
2nd Generation Intel® Xeon®
Scalable Processor
1st Gen
Xeon-SP
Int8
1s Gen
Xeon-SP
FP32
Upto
1.3x
2nd Gen
Xeon-SP
Int8 w/ Intel®
DL Boost
3 Instructions
VPMADDUBSW
VPMADDWD
VPADDD
1st Gen
Xeon-SP
Int8
Upto
3x
1 Instruction
VPDPBUSD
Faster throughput, but inefficient
Uses 3 instructions per operation
DL Boost fixes this, combines 3 instructions into 1
vs. BASE
#IntelAIDC2019 | #AIonIntel | #IntelAI
Intel® Nervana™neuralnetworkprocessors(NNP)¥
NNP-L
NNP-I Highly-efficient multi-model
inferencing for cloud, data
center & intense appliances
Fastest time-to-train with
high bandwidth AI server
connections for the most
persistent, intense usage
DEDICATED
DL TRAINING
DEDICATED
DL INFERENCE
‡The Intel® Nervana™ Neural Network Processor is a future product that is not broadly available today
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
INTEL® FPGAPRODUCTPORTFOLIO
‡The Intel® Nervana™ Neural Network Processor is a future product that is not broadly available today
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
‡The Intel® Nervana™ Neural Network Processor is a future product that is not broadly available today
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Intel® Movidius™ Visionprocessingunit(vPU)
Power-Efficient Image Processing, Computer Vision & Deep Learning for Devices
SURVEILLANCe
Detection & Classification •
Identification •
Multi-Nodal Systems •
Multi-Modal Sensing •
Video, Image Capture •
SERVICEROBOTS
Navigation •
3D Vol. Mapping •
Multi-Modal Sensing •
WEARABLES
Detection, Tracking •
Recognition •
Video, Image, Session Capture •
DRONES
• Sense & Avoid
• GPS Denied Hovering
• Pixel Labeling
• Video, Image Capture
SMARTHOME
• Detection, Tracking
• Perimeter, Presence Monitoring
• Recognition, Classification
• Multi-Nodal Systems
• Multi-Modal Sensing
• Video, Image Capture
AR-VRHMD
• 6DOF Pose, Position, Mapping
• Gaze, Eye Tracking
• Gesture Tracking, Recognition
• See-Through Camera
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
#IntelAIDC2019 | #AIonIntel | #IntelAI
#IntelAIDC2019 | #AIonIntel | #IntelAI
Intelintegratedprocessorgraphics
Built-in Deep Learning Inference Acceleration
• Shipped in > 1billion Intel SOCs
• Broad choice of performance/power offering across
Intel® Atom™ , Intel®
Core™ and Intel® Xeon™ processors
Ubiquity/Scalability
MediaLeadership
• Intel® Quick Synch Video – fixed function media
blocks to improve power and performance
• Intel® Media SDK - API that provides access to
hardware-accelerated codecs
Powerful&FlexibleArchitecture
• Rich data type support for 32bitFP, 16bitFP,
32bitInteger, 16bitInteger with
SIMD multiply-accumulate instructions
MemoryArchitecture
• Shared memory architecture on die between CPU
and GPU to enable lower latency and power
Hardwareintegration Softwaresupport
MacOS (CoreML and MPS1)
Windows O/S (WinML)
OpenVINO™ Toolkit (Win, Linux)
clDNN
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Intel®Gaussianneuralaccelerator(GNA)
https://software.intel.com/en-
us/iot/speech-enabling-dev-kit
TryitTODAY!
Intel® Speech Enabling
Developer Kit
Learn more: https://sigport.org/sites/default/files/docs/PosterFinal.pdf
Amplethroughput
For speech, language, and
other sensing inference
Lowpower
<100 mW power consumption
for always-on applications
Flexibility
Gaussian mixture model (GMM) and
neural network inference support
Intel®
GNA
(IP)
DSP
Streaming Co-Processor for Low-Power Audio Inference & More
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
less often →
AccessDistribution
Data Access Frequency
Cooler data
 more often
Hot data
DRAM
HOT TIER
SSD
WARM TIER
Intel®3DNandSSD
Optimize performancegiven
cost and power budget
LLC
Core
CPU
L2
L1
pico-secs
nano-secs
Memory Sub-System 10s
GB
<100nanosecs
Network
Storage
SSD
10s
TB
<100millisecs
10s
TB
<100microsecs
Compute
100s GB
Move Data Closer to
<1microsec
1s TB
Maintain Persistenc<
y10microsecs
Goal:EfficientData-CentricArchitecture
HDD / TAPE
COLD TIER
#IntelAIDC2019 | #AIonIntel | #IntelAI
Thebestofbothworldswith
Intel®Optane™DCPersistentMemory
Performance comparable
to DRAM at low latencies1
Data persistence with
higher capacity than DRAM2
Memoryattributes
1“Performance comparable to DRAM” - Intel persistent memory is expected to perform at latencies near DDR4 DRAM. “low latencies” - Data transferred across the memory bus causes latencies to be orders of magnitude lower when compared to
transferring data across PCIe or I/O bus’ to NAND/Hard Disk. 2Intel persistent memory offers 3 different capacities – 128GB, 256GB, 512GB. Individual DIMMs of DDR4 DRAM max out at 256GB. Performance results are based on testing as of February
22, 2019 and may not reflect all publicly available security updates. See slide 24 for details. No product or component can be absolutely secure.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that
product when combined with other products. For more information go to www.intel.com/benchmarks.
Storageattributes
#IntelAIDC2019 | #AIonIntel | #IntelAI
Connectivity
High-speed connectivity for massively parallel & distributed AI
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Intel®SiliconPhotonics
Connects memory and compute,
integrating connectivity
technologies onto a single die for
affordable, scalable solutions
Comingsoon
SmartNIC(CascadeGlacier)
Enables optimized performance for
Intel® Xeon® processor-based
systems
Intel®Omni-PathArchitecture
Provides low-latency interconnect to
scale to hundreds of thousands of
nodes without losing performance
or reliability
Intel®Omni-PathArchitectureEvolutionaryApproach,RevolutionaryFeatures,End-to-EndSolution
HFI Adapters
Single port
x8 and x16
x8 Adapter
(58 Gb/s)
x16
Adapter
(100 Gb/s)
Edge Switches
1U Form Factor
24 and 48 port
24-port
Edge Switch
48-port
Edge Switch
Director Switches
QSFP-based
288 and 1,152 port
288-port
Director Switch
(7U chassis)
48-port Leaves
1,152-port
Director
Switch
(20U chassis)
48-port
Leaves
Cables
Third Party Vendors
Passive Copper
Active Optical
Silicon
OEM custom designs
HFI and Switch ASICs
Switch silicon
up to 48 ports
(1200 GB/s
total b/w
HFI silicon
Up to 2 ports
(50 GB/s
total b/w)
“-F”
Processors
w/integrated
HFI
Software
Open Source
Host Software and
Fabric Manager
#IntelAIDC2019 | #AIonIntel | #IntelAI
Edge
Device
ARTIFICIALINTELLIGENCE
Platforms Finance Healthcare Energy Industrial Transport Retail Home More…
Data Center
TOOLKITSApp
Developers
librariesData
Scientists
foundationLibrary
Developers
*
*
*
*
FOR
* * * *
HardwareIT System
Architects
SolutionsSolution
Architects
AI Solutions Catalog
(Public & Internal)
DEEPLEARNINGACCELERATORS
DEEPLEARNINGDEPLOYMENT
OpenVINO™ † Intel® Movidius™ SDK
Open Visual Inference & Neural Network Optimization
toolkit for inference deployment on CPU, processor
graphics, FPGA & VPU using TF, Caffe* & MXNet*
Optimized inference deployment
for all Intel® Movidius™ VPUs using
TensorFlow* & Caffe*
DEEPLEARNINGFRAMEWORKS
Now optimized for CPU Optimizations in progress
TensorFlow* MXNet* Caffe* BigDL/Spark* Caffe2* PyTorch* PaddlePaddle*
DEEPLEARNING
Intel® Deep
Learning Studio‡
Open-source tool to compress
deep learning development cycle
MACHINELEARNINGLIBRARIES
Python R Distributed
•Scikit-
learn
•Pandas
•NumPy
•Cart
•Random
Forest
•e1071
•MlLib (on Spark)
•Mahout
ANALYTICS,MACHINE&DEEPLEARNINGPRIMITIVES
Python DAAL MKL-DNN clDNN
Intel distribution
optimized for
machine learning
Intel® Data Analytics
Acceleration Library
(for machine learning)
Open-source deep neural
network functions for
CPU, processor graphics
DEEPLEARNINGGRAPHCOMPILER
Intel® nGraph™ Compiler (Alpha)
Open-sourced compiler for deep learning model
computations optimized for multiple devices (CPU, GPU,
NNP) using multiple frameworks (TF, MXNet, ONNX)
AIFOUNDATION
A
R
T
I
F
I
C
I
A
l
I
N
T
E
L
L
I
G
E
n
C
e
NNP L-1000
* * * *
† Formerly the Intel® Computer Vision SDK
*Other names and brands may be claimed as the property of others.
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Ai.intel.com
Inference
#IntelAIDC2019 | #AIonIntel | #IntelAI
TransformaIwithSoftware
Akanksha Balani
Country Lead - Intel Software Tools – IAGS, Intel
Intersectionofdata&computegrowth
4TBAutonomousvehicle
5TBCONNECTEDAIRPLANE
1PBSmartFactory
1.5GBAverage internetuser
750pBCloudvideoProvider
DailyBy2020
Source: Amalgamation of analyst data and Intel analysis.
Business
Insights
Operational
Insights
Security
Insights
#IntelAIDC2019 | #AIonIntel | #IntelAI
Consumer Health Finance Retail Government Energy Transport Industrial Other
Smart Assistants
Chatbots
Search
Personalization
Augmented
Reality
Robots
Enhanced
Diagnostics
Drug
Discovery
Patient Care
Research
Sensory
Aids
Algorithmic
Trading
Fraud Detection
Research
Personal Finance
Risk Mitigation
Support
Experience
Marketing
Merchandising
Loyalty
Supply Chain
Security
Defense
Data
Insights
Safety & Security
Resident
Engagement
Smarter
Cities
Oil & Gas
Exploration
Smart
Grid
Operational
Improvement
Conservation
Autonomous
Cars
Automated
Trucking
Aerospace
Shipping
Search & Rescue
Factory
Automation
Predictive
Maintenance
Precision
Agriculture
Field Automation
Advertising
Education
Gaming
Professional & IT
Services
Telco/Media
Sports
Source: Intel forecast
Aiwilltransform
#IntelAIDC2019 | #AIonIntel | #IntelAI
Intel®AITools
DELIVERING ROBUST TOOLSETS & POWERFUL RESOURCES.
ACCELERATING INNOVATIVE AI SOLUTIONS.
Edge
Device
ARTIFICIALINTELLIGENCE
Platforms Finance Healthcare Energy Industrial Transport Retail Home More…
Data Center
TOOLKITSApp
Developers
librariesData
Scientists
foundationLibrary
Developers
*
*
*
*
FOR
* * * *
HardwareIT System
Architects
SolutionsSolution
Architects
AI Solutions Catalog
(Public & Internal)
DEEPLEARNINGACCELERATORS
DEEPLEARNINGDEPLOYMENT
OpenVINO™ † Intel® Movidius™ SDK
Open Visual Inference & Neural Network Optimization
toolkit for inference deployment on CPU, processor
graphics, FPGA & VPU using TF, Caffe* & MXNet*
Optimized inference deployment
for all Intel® Movidius™ VPUs using
TensorFlow* & Caffe*
DEEPLEARNINGFRAMEWORKS
Now optimized for CPU Optimizations in progress
TensorFlow* MXNet* Caffe* BigDL/Spark* Caffe2* PyTorch* PaddlePaddle*
DEEPLEARNING
Intel® Deep
Learning Studio‡
Open-source tool to compress
deep learning development cycle
MACHINELEARNINGLIBRARIES
Python R Distributed
•Scikit-
learn
•Pandas
•NumPy
•Cart
•Random
Forest
•e1071
•MlLib (on Spark)
•Mahout
ANALYTICS,MACHINE&DEEPLEARNINGPRIMITIVES
Python DAAL MKL-DNN clDNN
Intel distribution
optimized for
machine learning
Intel® Data Analytics
Acceleration Library
(for machine learning)
Open-source deep neural
network functions for
CPU, processor graphics
DEEPLEARNINGGRAPHCOMPILER
Intel® nGraph™ Compiler (Alpha)
Open-sourced compiler for deep learning model
computations optimized for multiple devices (CPU, GPU,
NNP) using multiple frameworks (TF, MXNet, ONNX)
AIFOUNDATION
AI
NNP L-1000
* * * *
† Formerly the Intel® Computer Vision SDK
*Other names and brands may be claimed as the property of others.
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Ai.intel.com
Inference
#IntelAIDC2019 | #AIonIntel | #IntelAI
Intel®Xeon®processorsNow Optimized For Deep Learning
INFERENCE THROUGHPUT
Intel® Xeon® Platinum 8180 Processor
higher Intel optimized Caffe GoogleNet v1 with Intel®
MKL inference throughput compared to
Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe
TRAINING THROUGHPUT
Intel® Xeon® Platinum 8180 Processor
higher Intel Optimized Caffe AlexNet with Intel® MKL
training throughput compared to
Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe
UP TO
241x
1
UP TO
277x
1 Optimized
Frameworks
Optimized Intel®
MKL Libraries
Inference and training throughput uses FP32 instructions
Deliver significant AI performance with hardware & software optimizations on Intel® Xeon® Scalable family
1 The benchmark results may need to be revised as additional testing is conducted. The results depend on the specific platform configurations and workloads utilized in the testing, and may not be applicable to any particular user's components, computer system or workloads. The results are not necessarily representative of other
benchmarks and other benchmark results may show greater or lesser impact from mitigations. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more
complete information visit: http://www.intel.com/performance.Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and
functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit:
http://www.intel.com/performance Source: Intel measured as of June 2018. Configurations: See slide 4.
#IntelAIDC2019 | #AIonIntel | #IntelAI
Intelsoftware–ExtractPerformance
Build highly
optimized
media
infrastructure,
solutions, &
applications
Fast, Dense, High
Quality Transcoding
Improve performance,
scalability, & reliability
for applications and
frameworks -Computing
and ML/DL
Technical & Enterprise compute, HPC, AI
Take advantage of
deep system-wide
insight & analysis
for system &
embedded apps
Manuf., Retail,
Drones, Robots…
Smart Cities, Auto. Driving, Gaming…
Create solutions using
Computer Vision –
OpenVino Toolkit,
Deep Learning,
Graphics, Libraries,
Media, OpenCL™,
& more
Optimization Tools , SDKs
Edge to Data Center to Cloud
Intel® Distribution of Python
Intel® DAAL
AI&IoT AI,HPC,Enterprise
#IntelAIDC2019 | #AIonIntel | #IntelAI
Intel®ParallelStudioXE
POWER THROUGH PERFORMANCE BOTTLENECKS.
CODE NEW BREAKTHROUGHS FOR AI.
#IntelAIDC2019 | #AIonIntel | #IntelAI
34
AISoftwareOptimization
Intel® Parallel Studio XE
Up to 35X faster application
performance
**Intel® Xeon Phi™ Processor Software Ecosystem Momentum Guide
Performance results are based tests from 2016-2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations & functions. Any change to any of those factors may cause the results to vary. You should consult other information & performance tests to assist you in fully evaluating your
contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/performance. See configurations in individual case study links.
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3
instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product
User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804. For more complete information about compiler optimizations, see
our Optimization Notice.
NERSC (National Energy Research
Scientific Computing Center)
Read case study
Science&research
For more success stories, review Intel® Parallel Studio XE Case Studies
Artificialintelligence
Performance speedup of
up to 23X faster with Intel
optimized scikit-learn vs. stock
scikit-learn
Google Cloud Platform
Read blog
LifeScience
Simulations ran up to 7.6X
faster with 9X
energy efficiency**
LAMMPS code - Sandia National
Laboratories
Read technology brief
#IntelAIDC2019 | #AIonIntel | #IntelAI
Artificial Intelligence
Energy
EDA
Science & Research
Manufacturing
Government
Computer Software
IT
Healthcare
Digital Media
Telecommunications
35
Intel®ParallelStudioXEforAI:HighPerformance,
ScalableSoftwareacrossMultipleIndustries
4X 8X 1.35X
Kyoto University
the Walker Molecular
Dynamics lab
3X
1.4X 4X
10X
11X
25X
2.5X 1.25X 1.3X
5X 2X
20X
2.5X
Performance results are based tests from ~2015-2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. For more
complete information about performance and benchmark results, visit www.intel.com/benchmark. See configurations in Intel® Parallel Studio XE Case Studies deck, & individual case studies links at this site
More Success Stories
▪ Intel® Parallel Studio XE
Case Studies deck
▪ Case studies site
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets & other optimizations. Intel does not guarantee the availability, functionality,
or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer
to the applicable product User & Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804
Google Cloud Platform 23X
Intel®DistributionforPython*
SUPERCHARGE PYTHON APPS.
RETHINK HIGH PERFORMANCE FOR AI.
#IntelAIDC2019 | #AIonIntel | #IntelAI
3
Python*Landscape
Challenge#1
Domain experts are not
professional software programmers
Adoption of Python
continues to grow among
domain experts &
developers for its
productivity benefits
MostPopularCodingLanguagesof2018
Challenge#2
Python performance limits migration
to production systems
Intel’s Python Tools
› Accelerate Python performance
› Enable easy access
› Empower the community
#IntelAIDC2019 | #AIonIntel | #IntelAI
38
1Available only in Intel® Parallel Studio Composer Edition.
EcosystemcompatibilityGreaterProductivityFasterPerformance
Supports Python 2.7 & 3.6, Conda & PIP
Operating System: Windows*, Linux*, MacOS1*
Intel® Architecture Platforms
Performance Libraries, Parallelism,
Multithreading, Language Extensions
› Accelerated NumPy/SciPy/scikit-learn
with Intel® MKL1 & Intel® DAAL2
› Data analytics, machine learning & deep
learning with scikit-learn, pyDAAL,
TensorFlow* & Caffe*
› Scale with Numba* & Cython*
› Includes optimized mpi4py, works with
Dask* & PySpark*
› Optimized for latest Intel® architecture
› Prebuilt & optimized packages for
numerical computing, machine/deep
learning, HPC, & data analytics
› Drop in replacement for existing Python-
No code changes required
› Jupyter* notebooks, Matplotlib included
› Free download & free for all uses
including commercial deployment
› Supports Python 2.7 & 3.6, optimizations
integrated in Anaconda* Distribution
› Distribution & optimized packages available
via Conda, PIP, APT GET, YUM, & DockerHub,
numerical performance optimizations
integrated in Anaconda Distribution
› Optimizations upstreamed to main Python
trunk
› Priority Support with Intel® Parallel Studio XE
1Intel® Math Kernel Library
2Intel® Data Analytics Acceleration Library
Prebuilt & Accelerated Packages
AcceleratePython*withIntel®DistributionforPython*High Performance Python* for Scientific Computing, Data Analytics, Machine & Deep Learning
Learn More: software.intel.com/distribution-for-python
Operating System: Windows*, Linux*, MacOS1*
Intel® Architecture Platforms
Intel®PerformanceLibraries
POWERFUL & AWARD-WINNING PERFORMANCE LIBRARIES
TO OPTIMIZE CODE & ACCELERATE DEVELOPMENT.
1Data from Evans Data Software Developer surveys, 2011-2016
Fast,ScalableCodewithIntel®MathKernelLibrary
(Intel® MKL)
40
› Speeds computations for scientific, engineering, financial and
machine learning applications by providing highly optimized,
threaded, and vectorized math functions
› Provides key functionality for dense and sparse linear algebra
(BLAS, LAPACK, PARDISO), FFTs, vector math, summary
statistics, deep learning, splines and more
› Dispatches optimized code for each processor automatically
without the need to branch code
› Optimized for single core vectorization and cache utilization
› Automatic parallelism for multi-core and many-core
› Scales from core to clusters
› Available at no cost & royalty free
› Great performance with minimal effort!
1 Available only in Intel® Parallel Studio Composer Edition.
Dense&SPARSELinearAlgebra
FastFourierTransforms
VectorMath
VectorRNGs
FastPoissonSolver
&More!
Intel®MKLLibraryOffers…
SpeedupAnalytics&MachineLearningwith
Intel®DataAnalyticsAccelerationLibrary(Intel®DAAL)
› Highly tuned functions for classical machine learning & analytics
performance from datacenter to edge running on Intel®
processor-based devices
› Simultaneously ingests data & computes results for highest
throughput performance
› Supports batch, streaming & distributed usage models to meet a
range of application needs
› Includes Python*, C++, Java* APIs, & connectors to popular data
sources including Spark* & Hadoop*
Pre-processing Transformation Analysis Modeling DecisionMaking
Decompression,
Filtering,
Normalization
Aggregation,
Dimension Reduction
Summary Statistics
Clustering, etc.
Machine Learning (Training)
Parameter Estimation
Simulation
Forecasting
Decision
Trees, etc.
Validation
Hypothesis Testing
Model Errors
What’sNewinthe2019Release
New Algorithms
› Logistic Regression, most widely-used classification algorithm
› Extended Gradient Boosting Functionality for inexact split calculations &
user-defined callback canceling for greater flexibility
› User-defined Data Modification Procedure supports a wide range of
feature extraction & transformation techniques
Learn More: software.intel.com/daal
#IntelAIDC2019 | #AIonIntel | #IntelAI
Spark
Core
Feature
Parity
Lower TCO, improved
ease of use
Efficient
Scale-Out
HighPerformanceDeepLearningforApacheSpark*onCPUInfrastructure
No need to deploy costly accelerators, duplicate data,
or suffer through scaling headaches!
Designed&OptimizedforIntel®Xeon®Processor
Powered by Intel® MKL-DNN
DataFrame
ML Pipelines
SQL SparkR Streaming MLlib GraphX BigDL
#IntelAIDC2019 | #AIonIntel | #IntelAI
4
Consumer
Sentiment Analysis
Image Similarity
* Search
Image Transfer
Learning
Image Generation
3D Image
Support
Fraud Detection
Anomaly Detection
Recommendation
NCF
Wide n Deep
Object
Detection
Tensorflow
support
Low latency
serving
Health Finance Retail Manufacturing Infrastructure
#IntelAIDC2019 | #AIonIntel | #IntelAI
Result
Client
JD.Com, 2nd largest
online retailer in China,
~ 25 M users.
ChallengE
Building deep learning applications
such as image similarity search
without moving data.
Solution
Switched from GPU to CPU cluster.
Using Apache Spark* with BigDL,
running on Intel® Xeon® processors
Intel® Xeon® CPU
Processing ~380M images
4XGaiN
CaseStudy: ImageRecognition
#IntelAIDC2019 | #AIonIntel | #IntelAI
The integrated surveillance system connected to cameras at stadiums, which transmitted video
data to operational HQ in each city.
Intel® Distribution of OpenVINO™ toolkit allowed Axxonsoft to distribute the neural network
video analytics of the video across all available Intel hardware, for zone entry detection,
abandoned objects detection, and facial recognition.
SecurityforStadiums
atWorldCup2018
Result
used to protect
Surveillance
Cameras9000+
fans2Million+
See case study for details.
Result
60%increaseIn coverage rate and 20% accuracy rate, better
than traditional rule-based approach
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate. *Other names and brands may be claimed as the property of others.
“Performance of Intel® Xeon® processors and the sustained
optimization of Apache Spark were key [to deploy] a single-
platform that consolidates and analyzes all types of data, from
any channel, within a highly secure environment.”
https://ai.intel.com/nervana/wp-content/uploads/sites/53/2018/06/Intel-White-Paper-Union-Pay_2_hir-res_Keep-the-Size-of-Figure-6.pdf
https://www.intel.com/content/www/us/en/financial-services-it/union-pay-case-study.html
Client
China UnionPay*, which
specializes in banking
services and payment
systems. It is the 3rd largest
payment network in the world.
ChallengE
Detect fraudulent credit card
transactions with more coverage and
accuracy.
Solution
Using Cloudera Enterprise (Hadoop Cluster),
Apache Spark* with BigDL, running on Intel®
Xeon® and 5th Gen Intel® Core™ Processors for
credit card fraud detection. Historical data is
stored on Apache Hive*. Data preprocessing
done with Apache Spark SQL*.
Result
Working closely with Intel’s Analytics Zoo team,
Midea built a highly-optimized defect detection
solution, and chose Intel® Xeon® Scalable
6130/6148 over GPU-based servers as it met
their latency requirements and more easily
integrated into their existing infrastructure
https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics
“Analytics Zoo from Intel provides a great tool for developing
the end-to-end AI solutions, building pipelines across cloud
and edge computing, and optimizing the hardware resources.”
Zheng Hu, Director of Computer
Vision Research Institute, Midea
Public
Client
Midea Group is a Chinese
electrical appliance
manufacturer with 21
manufacturing plants and 260
logistics centers across 200
countries
ChallengE
Midea needed to eliminate defects
caused by scratched surfaces, missing
bolts, misaligned labeling on surfaces
(glass, polished metal, painted), and
human inspection was not able to meet
target quality metrics or detection rate
requirements.
Solution
An advanced defect inspection system built on
top of Analytics Zoo, which provides a unified
analytics + AI platform that seamlessly unites
Spark, BigDL and TensorFlow* programs into
an integrated pipeline. The system was based
on Intel® Xeon Scalable 6130/6148 servers
and Core i7 edge devices.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document.
You should visit the referenced web site and confirm whether referenced data are accurate. *Other names and brands may be claimed as the property of others.
Result
The platform provides multiple functions from onboard Wi-Fi to computer vision applications
such as human/vehicle detection at crossroads, onboard empty seat detection and intruder
detection.
OpenVINO™ provides a scalable, high performance common platform across a variety of
hardware for greater efficiencies.
In-train
visionplatform
Enables pedestrian & vehicle
identification at crossroads +
on-train empty seat detection
#IntelAIDC2019 | #AIonIntel | #IntelAI
Intel®DistributionofOpenVINO™Toolkit
COMPUTER VISION & DEEP LEARNING APPS... NOW FASTER.
#IntelAIDC2019 | #AIonIntel | #IntelAI
OpenVINO™SoftwaretoolkitVisual Inferencing & Neural Network Optimization
DEPLOY COMPUTER
VISION & DEEP LEARNING
CAPABILITIES TO THE
EDGE
HighPerformance,highEfficiencyfortheedge
Writeonce+scaletoDiverseAccelerators
BroadFrameworksupport
Other names and brands may be claimed as the property of others
VPU = Vision Processing Unit (Movidius)
51
What’sInsidetheOpenVINO™toolkit
OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
Intel®Architecture-Based
PlatformsSupport
OS Support CentOS* 7.4 (64 bit) Ubuntu* 16.04.3 LTS (64 bit) Microsoft Windows* 10 (64 bit) Yocto Project* version Poky Jethro v2.0.3 (64 bit)
Intel® Deep Learning Deployment Toolkit Traditional Computer Vision Tools & Libraries
Model Optimizer
Convert & Optimize
Inference Engine
Optimized InferenceIR OpenCV* OpenVX*
Photography
Vision
Optimized Libraries
IR = Intermediate
Representation file
For Intel® CPU & CPU with integrated graphics
Increase Media/Video/Graphics Performance
Intel® Media SDK
Open Source version
OpenCL™
Drivers & Runtimes
For CPU with integrated graphics
Optimize Intel® FPGA
FPGA RunTime Environment
(from Intel® FPGA SDK for OpenCL™)
Bitstreams
FPGA – Linux* only
20+ Pre-trained
Models
Code SamplesComputer Vision
Algorithms
Samples
0
2
4
6
8
10
12
14
16
18
20
GoogLeNet v1 Vgg16* Squeezenet* 1.1 GoogLeNet v1 (32) Vgg16* (32) Squeezenet* 1.1 (32)
Std. Caffe on CPU OpenCV on CPU OpenVINO on CPU OpenVINO on GPU OpenVINO on FPGA
52
Get an even Bigger Performance Boost with Intel® FPGA
1Depending on workload, quality/resolution for FP16 may be marginally impacted. A performance/quality tradeoff from FP32 to FP16 can affect accuracy; customers are encouraged to experiment to find what works best
for their situation. Performance results are based on testing as of June 13, 2018 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. For
more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Configuration: Testing by Intel as of June 13, 2018. Intel® Core™ i7-6700K CPU @ 2.90GHz fixed, GPU GT2 @
1.00GHz fixed Internal ONLY testing, Test v3.15.21 – Ubuntu* 16.04, OpenVINO 2018 RC4, Intel® Arria® 10 FPGA 1150GX. Tests were based on various parameters such as model used (these are public), batch size, and
other factors. Different models can be accelerated with different Intel hardware solutions, yet use the same Intel software tools.
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and
other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended
for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information
regarding the specific instruction sets covered by this notice. Notice revision #20110804
Public Models (Batch Size)
RelativePerformance
Improvement
Standard
Caffe*
Baseline
19.9x1
OpenVINO on CPU+Intel® FPGAOpenVINOon CPU+ Intel® Processor Graphics(GPU)/ (FP16)
Comparison of Frames per Second (FPS)
IncreaseDeepLearningWorkloadPerformance
onPublicModelsusingOpenVINO™toolkit&Intel®Architecture
oneAPI
Single Programming Model
to Deliver Cross-Architecture Performance
All information provided in this deck is subject to change without notice.
Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.
54
ProgrammingChallenge
Diverse set of data-centric hardware
No common programming language or APIs
Inconsistent tool support across platforms
Each platform requires unique
software investment
Spatial
FPGA
Matrix
AI
Vector
GPU
Scalar
CPU
SVMS
Optimization Notice
The future is a diverse mix of scalar,
vector, matrix, & spatial architectures
deployed in CPU, GPU, AI, FPGA & other
accelerators
DiverseWorkloadsrequireDIVERSEARCHITECTURES
55
Spatial
FPGA
Matrix
AI
Vector
GPU
Scalar
CPU
SVMS
Optimization Notice
56
Project oneAPI delivers a unified
programming model to simplify
development across diverse
architectures
Common developer experience across
Scalar, Vector, Matrix & Spatial
architectures (CPU, GPU, AI and FPGA)
Uncompromised native high-level
language performance
Based on industry standards & open
specifications
Optimized Applications
Optimized
Middleware / Frameworks
oneAPI Language & Libraries
Intel’soneAPICoreConcept
FPGAAIGPUCPU
Scalar Vector Matrix Spatial
oneAPI
Tools
Optimization Notice
Some capabilities may differ per architecture.
57
oneAPIforcross-architectureperformance
Optimized Applications
Optimized Middleware & Frameworks
oneAPI Product
Direct Programming
Data Parallel C++
API-Based Programming
Libraries
Analysis &
Debug Tools
Scalar Vector Matrix Spatial
FPGAAIGPUCPU
Optimization Notice
Language to deliver uncompromised parallel programming productivity and performance across CPUs
and accelerators
Based on C++ with language enhancements being driven through community project
Open, cross-industry alternative to single architecture proprietary language
DataparallelC++Standards-based, Cross-architecture Language
There will still be a need to tune for each architecture.
Optimization Notice
Visit TechDecoded.intel.io — a video series where developers learn to put into
practice key optimization strategies with Intel Development tools.
Focused conversations where
tech. visionaries share key
concepts on front-line topics,
what you need to know and why
it matters.
Put into practice — short videos
and articles that deliver the how-
to’s of specific programming
tasks using Intel tools.
Watch big picture videos Dig deeper with Essential Get started with Quick Hits
Webinars covering strategies,
practices and tools that help
you optimize applications and
solutions performance.
Visual Code Systems & IoT Data Science Data Center &
Computing Modernization Cloud Computing
GetTheMostFromYourCodeTodaywithIntelTech.Decoded
Optimization Notice
60
Tools for C/C++/Python/Fortran
developers – HPC, AI, IOT, Cloud
Partner programs focused
on Developer enablement
Developers, Customers,
Partners Trained
129Customers
Engaged
150K 67Programs
62Partners
software.intel.com Techdecoded.intel.io
Intel®Software Intel®DeveloperWorkshops
#IntelAIDC2019 | #AIonIntel | #IntelAI
LET’SACCELERATETHEFUTURETOGETHER
#IntelAIDC2019 | #AIonIntel | #IntelAI
SecuritybarrierRecognitionmodelUsing
intel®DeepLearningDeploymentToolkit
65
#IntelAIDC2019 | #AIonIntel | #IntelAI
Run Inference 1:
Model
vehicle-license-plate-
detection-barrier-0007
Detects Vehicles
Run Inference 2:
Model
vehicle-attributes-
recognition-barrier-0010
Classifies vehicle attributes
Run Inference 3:
Model
license-plate-recognition-
barrier-0001
Detects License Plates
Load Input Image(s)
Display Results
66
#IntelAIDC2019 | #AIonIntel | #IntelAI
67
End-to-EndVisionWorkflow
Decode
Pre-
Processing Inference
Post-
Processing Encode
GPUCPU GPUCPU FPGA VPU GPUCPU
Intel®
Media SDK
OpenCV*
Intel® Deep
Learning
Deployment
Toolkit
Intel
Media SDK
Video input Video output
with results
annotated
OpenCV
#IntelAIDC2019 | #AIonIntel | #IntelAI
68
KeyVisionSolutionsOptimizedbyIntel®
DistributionofOpenVINO™toolkit
Intel teamed with Philips to show that servers powered by Intel® Xeon®
Scalable processors & Intel® Distribution of OpenVINO™ toolkit can efficiently
perform deep learning inference on patients’ X-rays & computed tomography
(CT) scans, without the need for accelerators. Achieved breakthrough
performance for AI inferencing:
▪ 188x increase in throughput (images/sec) on Bone-age prediction model.1
▪ 38x increase in throughput (images/sec) on Lung segmentation model. 1
“Intel® Xeon® Scalable processors and OpenVINO toolkit appears to be the right solution for medical imaging AI
workloads. Our customers can use their existing hardware to its maximum potential, without having to complicate their
infrastructure, while still aiming to achieve quality output resolution at exceptional speeds."
— Vijayananda J., chief architect and fellow, Data Science and AI, Philips HealthSuite Insights, India
White Paper
1See white paper for performance details.
Philips
#IntelAIDC2019 | #AIonIntel | #IntelAI
69
The Intel® Distribution of OpenVINO™ toolkit helped GE deliver
optimized inferencing to its deep learning image-classification solution.
By bringing AI to its clinical diagnostic scanning, GE no longer needed
an expensive 3rd party accelerator board, achieving:
▪ 5.9x inferencing performance above the target1
▪ 14x inferencing speed over the baseline solution1
▪ Improved image quality, diagnostic capabilities, and clinical workflows
With the OpenVINO™ toolkit , we are now able to optimize inferencing across Intel® silicon, exceeding our throughput goals by almost 6x,”
said David Chevalier, Principal Engineer for GE Healthcare.
“We want to not only keep deployment costs down for our customers, but also offer a flexible, high-performance solution for a new era of
smarter medical imaging. Our partnership with Intel allows us to bring the power of AI to clinical diagnostic scanning and other healthcare
workflows in a cost-effective manner.”
GE Healthcare*
Intel-GE Healthcare, Intel® Distribution of OpenVINO™ Optimizes Deep Learning Performance for Healthcare Imaging
KeyVisionSolutionsOptimizedbyIntel®Distributionof
OpenVINO™toolkit
1See white paper for performance details.
70
DemonstratedIndustrySuccessAccess Developer Success Stories for details & more examples
#IntelAIDC2019 | #AIonIntel | #IntelAI
AIDC India  - AI on IA

More Related Content

What's hot

What's hot (20)

RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
TDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devicesTDC2019 Intel Software Day - Inferencia de IA em edge devices
TDC2019 Intel Software Day - Inferencia de IA em edge devices
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
TDC2019 Intel Software Day - Otimizacao grafica com o Intel GPA
TDC2019 Intel Software Day - Otimizacao grafica com o Intel GPATDC2019 Intel Software Day - Otimizacao grafica com o Intel GPA
TDC2019 Intel Software Day - Otimizacao grafica com o Intel GPA
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
Simple Single Instruction Multiple Data (SIMD) with the Intel® Implicit SPMD ...
 
AIDC Summit LA: Wipro Solutions Overview
AIDC Summit LA: Wipro Solutions Overview AIDC Summit LA: Wipro Solutions Overview
AIDC Summit LA: Wipro Solutions Overview
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AIDC Summit LA: LA Drones Solution Overview
AIDC Summit LA: LA Drones Solution OverviewAIDC Summit LA: LA Drones Solution Overview
AIDC Summit LA: LA Drones Solution Overview
 
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdown
 
How to Get the Best Deep Learning performance with OpenVINO Toolkit
How to Get the Best Deep Learning performance with OpenVINO ToolkitHow to Get the Best Deep Learning performance with OpenVINO Toolkit
How to Get the Best Deep Learning performance with OpenVINO Toolkit
 
AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12
 
Machine programming
Machine programmingMachine programming
Machine programming
 
FPGAs and Machine Learning
FPGAs and Machine LearningFPGAs and Machine Learning
FPGAs and Machine Learning
 
Intel® Graphics Performance Analyzers
Intel® Graphics Performance AnalyzersIntel® Graphics Performance Analyzers
Intel® Graphics Performance Analyzers
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 

Similar to AIDC India - AI on IA

“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
Edge AI and Vision Alliance
 

Similar to AIDC India - AI on IA (20)

E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
 
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overview
 
Accelerate Your AI Today
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
 
Driving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to ExascaleDriving Industrial InnovationOn the Path to Exascale
Driving Industrial InnovationOn the Path to Exascale
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
Accelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeAccelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the Edge
 
Scaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanovScaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanov
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
 
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
 
Intel 6th Gen vPro
Intel 6th Gen vProIntel 6th Gen vPro
Intel 6th Gen vPro
 
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 
Develop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster ReadyDevelop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster Ready
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
 
Intel® Xeon® processor E7-8800/4800 v3 Application Showcase
Intel® Xeon® processor E7-8800/4800 v3 Application ShowcaseIntel® Xeon® processor E7-8800/4800 v3 Application Showcase
Intel® Xeon® processor E7-8800/4800 v3 Application Showcase
 
Технологии Intel для виртуализации сетей операторов связи
Технологии Intel для виртуализации сетей операторов связиТехнологии Intel для виртуализации сетей операторов связи
Технологии Intel для виртуализации сетей операторов связи
 
Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with Partners
 
Introduction to container networking in K8s - SDN/NFV London meetup
Introduction to container networking in K8s - SDN/NFV  London meetupIntroduction to container networking in K8s - SDN/NFV  London meetup
Introduction to container networking in K8s - SDN/NFV London meetup
 

More from Intel® Software

More from Intel® Software (14)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
 
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
 
Intel® AI: Parameter Efficient Training
Intel® AI: Parameter Efficient TrainingIntel® AI: Parameter Efficient Training
Intel® AI: Parameter Efficient Training
 
Intel® AI: Non-Parametric Priors for Generative Adversarial Networks
Intel® AI: Non-Parametric Priors for Generative Adversarial Networks Intel® AI: Non-Parametric Priors for Generative Adversarial Networks
Intel® AI: Non-Parametric Priors for Generative Adversarial Networks
 
Persistent Memory Programming with Pmemkv
Persistent Memory Programming with PmemkvPersistent Memory Programming with Pmemkv
Persistent Memory Programming with Pmemkv
 
Big Data Uses with Distributed Asynchronous Object Storage
Big Data Uses with Distributed Asynchronous Object StorageBig Data Uses with Distributed Asynchronous Object Storage
Big Data Uses with Distributed Asynchronous Object Storage
 
Debugging Tools & Techniques for Persistent Memory Programming
Debugging Tools & Techniques for Persistent Memory ProgrammingDebugging Tools & Techniques for Persistent Memory Programming
Debugging Tools & Techniques for Persistent Memory Programming
 
Persistent Memory Development Kit (PMDK): State of the Project
Persistent Memory Development Kit (PMDK): State of the ProjectPersistent Memory Development Kit (PMDK): State of the Project
Persistent Memory Development Kit (PMDK): State of the Project
 

Recently uploaded

Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Lisi Hocke
 

Recently uploaded (20)

AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
Navigation in flutter – how to add stack, tab, and drawer navigators to your ...
 
Test Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdfTest Automation Design Patterns_ A Comprehensive Guide.pdf
Test Automation Design Patterns_ A Comprehensive Guide.pdf
 
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
Abortion Pill Prices Mthatha (@](+27832195400*)[ 🏥 Women's Abortion Clinic In...
 
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with GraphGraphSummit Milan - Neo4j: The Art of the Possible with Graph
GraphSummit Milan - Neo4j: The Art of the Possible with Graph
 
Software Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements EngineeringSoftware Engineering - Introduction + Process Models + Requirements Engineering
Software Engineering - Introduction + Process Models + Requirements Engineering
 
From Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST APIFrom Theory to Practice: Utilizing SpiraPlan's REST API
From Theory to Practice: Utilizing SpiraPlan's REST API
 
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
Abortion Pill Prices Jozini ](+27832195400*)[ 🏥 Women's Abortion Clinic in Jo...
 
Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024Modern binary build systems - PyCon 2024
Modern binary build systems - PyCon 2024
 
Rapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and InsightsRapidoform for Modern Form Building and Insights
Rapidoform for Modern Form Building and Insights
 
Transformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with LinksTransformer Neural Network Use Cases with Links
Transformer Neural Network Use Cases with Links
 
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
Abortion Clinic In Johannesburg ](+27832195400*)[ 🏥 Safe Abortion Pills in Jo...
 
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
Team Transformation Tactics for Holistic Testing and Quality (NewCrafts Paris...
 
Lessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdfLessons Learned from Building a Serverless Notifications System.pdf
Lessons Learned from Building a Serverless Notifications System.pdf
 
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
Abortion Clinic In Pretoria ](+27832195400*)[ 🏥 Safe Abortion Pills in Pretor...
 
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
Abortion Clinic Pretoria ](+27832195400*)[ Abortion Clinic Near Me ● Abortion...
 
A Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdfA Deep Dive into Secure Product Development Frameworks.pdf
A Deep Dive into Secure Product Development Frameworks.pdf
 
[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)[GRCPP] Introduction to concepts (C++20)
[GRCPP] Introduction to concepts (C++20)
 
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4jGraphSummit Milan - Visione e roadmap del prodotto Neo4j
GraphSummit Milan - Visione e roadmap del prodotto Neo4j
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 

AIDC India - AI on IA

  • 1.
  • 2. Intelarchitecturefor ArtificialIntelligence Austin Cherian Head - High Performance Computing Business, India austin.cherian@intel.com
  • 3. hardware Multi-purpose to purpose-built AI compute from device to cloud solutions Partner ecosystem to facilitate AI in finance, health, retail, industrial & more Intel analytics ecosystem to get your data ready Data Driving AI forward through R&D, investments & policy Future tools Software to accelerate development & deployment of real solutions Bring Your AI Vision to Life Using Intel’s Comprehensive Portfolio #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 4. Data-centricinfrastructure Move Faster Process EverythingStore More INTEL® SILICON PHOTONICS CPU AI ACCELERATORSINTEL® ETHERNET INTEL® OMNI-PATH FABRIC GPU (Integrated & Discrete) FPGA, GPU Powering the Future of Compute & Communications #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 5. HARDWARE Multi-purpose to purpose-built AI compute from cloud to device All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Deep Learning Training Inference AI Mainstream intensive Most other #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 6. HARDWARE Multi-purpose to purpose-built AI compute from device to cloud Large-scale data centers such as public cloud or comms service providers, gov’t & academia, large enterprise IT User-touch end point devices with lower power requirements such as laptops, tablets, smart home devices, drones Small-scale data centers, small business IT infrastructure, to few on-premise server racks & workstations All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Varies to <1ms <5ms <10-40ms ~100ms DatacenterEdgeEndpoint #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 7. HARDWARE Multi-purpose to purpose-built AI compute from device to cloud IoT SENSORS (Security, home, retail, industrial…) Display, Video, AR/VR, Gestures, Speech DESKTOP & MOBILITY Vision & Inference Speech SELF-DRIVING VEHICLES Autonomous Driving SERVERS, APPLIANCES & GATEWAYS Latency- Bound Inference Basic Inference, Media & Vision Most Use Cases SERVERS & APPLIANCES DatacenterEdgeEndpoint Flexible & Memory Bandwidth-Bound Use Cases Varies to <1ms <5ms <10-40ms ~100ms Dedicated Media & Vision Inference Most Use Cases Most Intensive Use Cases NNP-L M.2 CardSOC Special Purpose Special Purpose 1GNA=Gaussian Neural Accelerator All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Images are examples of intended applications but not an exhaustive list. Onesizedoesnotfitall
  • 8. Intel®Xeon®ScalableProcessorFamily Now build the AI you want on the CPU you know yourFOUNDATION forAI Getmaximumutilization running data center & AI workloads side-by-side Breakmemorybarriers in order to apply AI to large data sets & models Trainmodelsatscale through efficient scaling to many nodes Accessoptimizedtools including continuous performance gains for TensorFlow, MXNet, & more Runinthecloud including AWS, Microsoft, Alibaba, TenCent, Google, Baidu, & more Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of November 2016. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804
  • 9. Intel®Xeon®ScalableProcessorforAI ArtificialIntelligenceWithIntel®Xeon®ScalableProcessors Deep Learning INFERENCE & Deep Learning TRAINING Generational performance improvements Continuous software optimizations Lower precision integerops Scaling efficiency #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 10. Upto65%PerformanceBoostwithIntel®AVX-512 onIntel®Xeon®Platinum8180processor 1 1.37 1 1.65 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Intel® AVX-512 OFF Caffe GoogLeNet v1 Intel® AVX-512 ON Caffe GoogLeNet v1 Intel® AVX-512 OFF Caffe AlexNet Intel® AVX-512 ON Caffe AlexNet Convolution layer performance on Intel® Xeon® Platinum 8180 Processor Test results above quantify the value add of Intel® AVX-512 to Convolution layer performance. All results shown above are measured on Intel® Xeon® Platinum 8180 Processor running AI topologies on Caffe framework with and without Intel® AVX-512 enabled Convolutionlayerperformance (Measuredinmilliseconds) representedrelativetoabaseline1.0 Higherisbetter Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Batch Sizes AlexNet:256 GoogleNet-V1: 96 Configuration Details on Slide: 24 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 2017 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Generational performance improvements Enhanced compute performance with Intel® AVX-512 on Intel® Xeon® Scalable Processor
  • 11. INTRODUCING2nd GENERATION INTEL® XEON® SCALABLEPROCESSORS LEADERSHIP WORKLOAD PERFORMANCE GROUNDBREAKING MEMORY INNOVATION EMBEDDED ARTIFICIAL INTELLIGENCE ACCELERATION ENHANCED AGILITY & UTILIZATION HARDWARE ENHANCED SECURITY BUILT-IN VALUE UNINTERRUPTED #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 12. Intel®DeepLearningBoost(DLBoost)featuring Vector Neural Network Instructions (VNNI) INT8 07 06 05 04 03 02 01 00 Sign Mantissa NEW vpdpbusd OUTPUT INT32 CONSTANT INT32 INPUT INT8 INPUT INT8 AVX-512 (VNNI) instruction to accelerate INT8 convolutions: vpdpbusd INPUT INT8 INPUT INT8 vpmaddubsw vpmaddwd vpaddd OUTPUT INT16 OUTPUT INT32 CONSTANT INT16 CONSTANT INT32 OUTPUT INT32 Current AVX-512 instructions to perform INT8 convolutions: vpmaddubsw, vpmaddwd, vpaddd #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 13. IncreasingAIperformanceonIntel®Xeon®PROCESSORS Intel® Optimizations for Caffe ResNet-50 Inference Throughput Performance Intel® DL Boost Theoretical Throughput per core over 1st Generation Intel® Xeon® Scalable Processors BASE SKX launch July 2017 1st Generation Intel® Xeon® Scalable Processor 2S Intel® Xeon® Platinum 8280 processor (28 cores/S) 2S Intel®Xeon® Platinum 9282 processor (56 cores/S) vs. BASE vs. BASE 2S Intel® Xeon® Platinum 8180 processor (28 cores/S) 14x1 30x15.7x1 1 Based on Intel internal testing: 1X,5.7x,14x and 30x performance improvement based on Intel® Optimization for Café ResNet-50 inference throughput performance on Intel® Xeon® Scalable Processor. See Configuration Details slide 22 Performance results are based on testing as of 7/11/2017(1x) ,11/8/2018 (5.7x), 2/20/2019 (14x) and 2/26/2019 (30x) and may not reflect all publically available security updates. No product can be absolutely secure. See configuration slide 22 Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance 2nd Generation Intel® Xeon® Scalable Processor 1st Gen Xeon-SP Int8 1s Gen Xeon-SP FP32 Upto 1.3x 2nd Gen Xeon-SP Int8 w/ Intel® DL Boost 3 Instructions VPMADDUBSW VPMADDWD VPADDD 1st Gen Xeon-SP Int8 Upto 3x 1 Instruction VPDPBUSD Faster throughput, but inefficient Uses 3 instructions per operation DL Boost fixes this, combines 3 instructions into 1 vs. BASE #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 14. Intel® Nervana™neuralnetworkprocessors(NNP)¥ NNP-L NNP-I Highly-efficient multi-model inferencing for cloud, data center & intense appliances Fastest time-to-train with high bandwidth AI server connections for the most persistent, intense usage DEDICATED DL TRAINING DEDICATED DL INFERENCE ‡The Intel® Nervana™ Neural Network Processor is a future product that is not broadly available today All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
  • 15. INTEL® FPGAPRODUCTPORTFOLIO ‡The Intel® Nervana™ Neural Network Processor is a future product that is not broadly available today All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
  • 16. ‡The Intel® Nervana™ Neural Network Processor is a future product that is not broadly available today All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
  • 17. Intel® Movidius™ Visionprocessingunit(vPU) Power-Efficient Image Processing, Computer Vision & Deep Learning for Devices SURVEILLANCe Detection & Classification • Identification • Multi-Nodal Systems • Multi-Modal Sensing • Video, Image Capture • SERVICEROBOTS Navigation • 3D Vol. Mapping • Multi-Modal Sensing • WEARABLES Detection, Tracking • Recognition • Video, Image, Session Capture • DRONES • Sense & Avoid • GPS Denied Hovering • Pixel Labeling • Video, Image Capture SMARTHOME • Detection, Tracking • Perimeter, Presence Monitoring • Recognition, Classification • Multi-Nodal Systems • Multi-Modal Sensing • Video, Image Capture AR-VRHMD • 6DOF Pose, Position, Mapping • Gaze, Eye Tracking • Gesture Tracking, Recognition • See-Through Camera All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 19. Intelintegratedprocessorgraphics Built-in Deep Learning Inference Acceleration • Shipped in > 1billion Intel SOCs • Broad choice of performance/power offering across Intel® Atom™ , Intel® Core™ and Intel® Xeon™ processors Ubiquity/Scalability MediaLeadership • Intel® Quick Synch Video – fixed function media blocks to improve power and performance • Intel® Media SDK - API that provides access to hardware-accelerated codecs Powerful&FlexibleArchitecture • Rich data type support for 32bitFP, 16bitFP, 32bitInteger, 16bitInteger with SIMD multiply-accumulate instructions MemoryArchitecture • Shared memory architecture on die between CPU and GPU to enable lower latency and power Hardwareintegration Softwaresupport MacOS (CoreML and MPS1) Windows O/S (WinML) OpenVINO™ Toolkit (Win, Linux) clDNN All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
  • 20. Intel®Gaussianneuralaccelerator(GNA) https://software.intel.com/en- us/iot/speech-enabling-dev-kit TryitTODAY! Intel® Speech Enabling Developer Kit Learn more: https://sigport.org/sites/default/files/docs/PosterFinal.pdf Amplethroughput For speech, language, and other sensing inference Lowpower <100 mW power consumption for always-on applications Flexibility Gaussian mixture model (GMM) and neural network inference support Intel® GNA (IP) DSP Streaming Co-Processor for Low-Power Audio Inference & More All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
  • 21. less often → AccessDistribution Data Access Frequency Cooler data  more often Hot data DRAM HOT TIER SSD WARM TIER Intel®3DNandSSD Optimize performancegiven cost and power budget LLC Core CPU L2 L1 pico-secs nano-secs Memory Sub-System 10s GB <100nanosecs Network Storage SSD 10s TB <100millisecs 10s TB <100microsecs Compute 100s GB Move Data Closer to <1microsec 1s TB Maintain Persistenc< y10microsecs Goal:EfficientData-CentricArchitecture HDD / TAPE COLD TIER #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 22. Thebestofbothworldswith Intel®Optane™DCPersistentMemory Performance comparable to DRAM at low latencies1 Data persistence with higher capacity than DRAM2 Memoryattributes 1“Performance comparable to DRAM” - Intel persistent memory is expected to perform at latencies near DDR4 DRAM. “low latencies” - Data transferred across the memory bus causes latencies to be orders of magnitude lower when compared to transferring data across PCIe or I/O bus’ to NAND/Hard Disk. 2Intel persistent memory offers 3 different capacities – 128GB, 256GB, 512GB. Individual DIMMs of DDR4 DRAM max out at 256GB. Performance results are based on testing as of February 22, 2019 and may not reflect all publicly available security updates. See slide 24 for details. No product or component can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. Storageattributes #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 23. Connectivity High-speed connectivity for massively parallel & distributed AI All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Intel®SiliconPhotonics Connects memory and compute, integrating connectivity technologies onto a single die for affordable, scalable solutions Comingsoon SmartNIC(CascadeGlacier) Enables optimized performance for Intel® Xeon® processor-based systems Intel®Omni-PathArchitecture Provides low-latency interconnect to scale to hundreds of thousands of nodes without losing performance or reliability
  • 24. Intel®Omni-PathArchitectureEvolutionaryApproach,RevolutionaryFeatures,End-to-EndSolution HFI Adapters Single port x8 and x16 x8 Adapter (58 Gb/s) x16 Adapter (100 Gb/s) Edge Switches 1U Form Factor 24 and 48 port 24-port Edge Switch 48-port Edge Switch Director Switches QSFP-based 288 and 1,152 port 288-port Director Switch (7U chassis) 48-port Leaves 1,152-port Director Switch (20U chassis) 48-port Leaves Cables Third Party Vendors Passive Copper Active Optical Silicon OEM custom designs HFI and Switch ASICs Switch silicon up to 48 ports (1200 GB/s total b/w HFI silicon Up to 2 ports (50 GB/s total b/w) “-F” Processors w/integrated HFI Software Open Source Host Software and Fabric Manager #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 25. Edge Device ARTIFICIALINTELLIGENCE Platforms Finance Healthcare Energy Industrial Transport Retail Home More… Data Center TOOLKITSApp Developers librariesData Scientists foundationLibrary Developers * * * * FOR * * * * HardwareIT System Architects SolutionsSolution Architects AI Solutions Catalog (Public & Internal) DEEPLEARNINGACCELERATORS DEEPLEARNINGDEPLOYMENT OpenVINO™ † Intel® Movidius™ SDK Open Visual Inference & Neural Network Optimization toolkit for inference deployment on CPU, processor graphics, FPGA & VPU using TF, Caffe* & MXNet* Optimized inference deployment for all Intel® Movidius™ VPUs using TensorFlow* & Caffe* DEEPLEARNINGFRAMEWORKS Now optimized for CPU Optimizations in progress TensorFlow* MXNet* Caffe* BigDL/Spark* Caffe2* PyTorch* PaddlePaddle* DEEPLEARNING Intel® Deep Learning Studio‡ Open-source tool to compress deep learning development cycle MACHINELEARNINGLIBRARIES Python R Distributed •Scikit- learn •Pandas •NumPy •Cart •Random Forest •e1071 •MlLib (on Spark) •Mahout ANALYTICS,MACHINE&DEEPLEARNINGPRIMITIVES Python DAAL MKL-DNN clDNN Intel distribution optimized for machine learning Intel® Data Analytics Acceleration Library (for machine learning) Open-source deep neural network functions for CPU, processor graphics DEEPLEARNINGGRAPHCOMPILER Intel® nGraph™ Compiler (Alpha) Open-sourced compiler for deep learning model computations optimized for multiple devices (CPU, GPU, NNP) using multiple frameworks (TF, MXNet, ONNX) AIFOUNDATION A R T I F I C I A l I N T E L L I G E n C e NNP L-1000 * * * * † Formerly the Intel® Computer Vision SDK *Other names and brands may be claimed as the property of others. All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Ai.intel.com Inference #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 26. TransformaIwithSoftware Akanksha Balani Country Lead - Intel Software Tools – IAGS, Intel
  • 27. Intersectionofdata&computegrowth 4TBAutonomousvehicle 5TBCONNECTEDAIRPLANE 1PBSmartFactory 1.5GBAverage internetuser 750pBCloudvideoProvider DailyBy2020 Source: Amalgamation of analyst data and Intel analysis. Business Insights Operational Insights Security Insights #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 28. Consumer Health Finance Retail Government Energy Transport Industrial Other Smart Assistants Chatbots Search Personalization Augmented Reality Robots Enhanced Diagnostics Drug Discovery Patient Care Research Sensory Aids Algorithmic Trading Fraud Detection Research Personal Finance Risk Mitigation Support Experience Marketing Merchandising Loyalty Supply Chain Security Defense Data Insights Safety & Security Resident Engagement Smarter Cities Oil & Gas Exploration Smart Grid Operational Improvement Conservation Autonomous Cars Automated Trucking Aerospace Shipping Search & Rescue Factory Automation Predictive Maintenance Precision Agriculture Field Automation Advertising Education Gaming Professional & IT Services Telco/Media Sports Source: Intel forecast Aiwilltransform #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 29. Intel®AITools DELIVERING ROBUST TOOLSETS & POWERFUL RESOURCES. ACCELERATING INNOVATIVE AI SOLUTIONS.
  • 30. Edge Device ARTIFICIALINTELLIGENCE Platforms Finance Healthcare Energy Industrial Transport Retail Home More… Data Center TOOLKITSApp Developers librariesData Scientists foundationLibrary Developers * * * * FOR * * * * HardwareIT System Architects SolutionsSolution Architects AI Solutions Catalog (Public & Internal) DEEPLEARNINGACCELERATORS DEEPLEARNINGDEPLOYMENT OpenVINO™ † Intel® Movidius™ SDK Open Visual Inference & Neural Network Optimization toolkit for inference deployment on CPU, processor graphics, FPGA & VPU using TF, Caffe* & MXNet* Optimized inference deployment for all Intel® Movidius™ VPUs using TensorFlow* & Caffe* DEEPLEARNINGFRAMEWORKS Now optimized for CPU Optimizations in progress TensorFlow* MXNet* Caffe* BigDL/Spark* Caffe2* PyTorch* PaddlePaddle* DEEPLEARNING Intel® Deep Learning Studio‡ Open-source tool to compress deep learning development cycle MACHINELEARNINGLIBRARIES Python R Distributed •Scikit- learn •Pandas •NumPy •Cart •Random Forest •e1071 •MlLib (on Spark) •Mahout ANALYTICS,MACHINE&DEEPLEARNINGPRIMITIVES Python DAAL MKL-DNN clDNN Intel distribution optimized for machine learning Intel® Data Analytics Acceleration Library (for machine learning) Open-source deep neural network functions for CPU, processor graphics DEEPLEARNINGGRAPHCOMPILER Intel® nGraph™ Compiler (Alpha) Open-sourced compiler for deep learning model computations optimized for multiple devices (CPU, GPU, NNP) using multiple frameworks (TF, MXNet, ONNX) AIFOUNDATION AI NNP L-1000 * * * * † Formerly the Intel® Computer Vision SDK *Other names and brands may be claimed as the property of others. All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Ai.intel.com Inference #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 31. Intel®Xeon®processorsNow Optimized For Deep Learning INFERENCE THROUGHPUT Intel® Xeon® Platinum 8180 Processor higher Intel optimized Caffe GoogleNet v1 with Intel® MKL inference throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe TRAINING THROUGHPUT Intel® Xeon® Platinum 8180 Processor higher Intel Optimized Caffe AlexNet with Intel® MKL training throughput compared to Intel® Xeon® Processor E5-2699 v3 with BVLC-Caffe UP TO 241x 1 UP TO 277x 1 Optimized Frameworks Optimized Intel® MKL Libraries Inference and training throughput uses FP32 instructions Deliver significant AI performance with hardware & software optimizations on Intel® Xeon® Scalable family 1 The benchmark results may need to be revised as additional testing is conducted. The results depend on the specific platform configurations and workloads utilized in the testing, and may not be applicable to any particular user's components, computer system or workloads. The results are not necessarily representative of other benchmarks and other benchmark results may show greater or lesser impact from mitigations. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance.Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit: http://www.intel.com/performance Source: Intel measured as of June 2018. Configurations: See slide 4. #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 32. Intelsoftware–ExtractPerformance Build highly optimized media infrastructure, solutions, & applications Fast, Dense, High Quality Transcoding Improve performance, scalability, & reliability for applications and frameworks -Computing and ML/DL Technical & Enterprise compute, HPC, AI Take advantage of deep system-wide insight & analysis for system & embedded apps Manuf., Retail, Drones, Robots… Smart Cities, Auto. Driving, Gaming… Create solutions using Computer Vision – OpenVino Toolkit, Deep Learning, Graphics, Libraries, Media, OpenCL™, & more Optimization Tools , SDKs Edge to Data Center to Cloud Intel® Distribution of Python Intel® DAAL AI&IoT AI,HPC,Enterprise #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 33. Intel®ParallelStudioXE POWER THROUGH PERFORMANCE BOTTLENECKS. CODE NEW BREAKTHROUGHS FOR AI. #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 34. 34 AISoftwareOptimization Intel® Parallel Studio XE Up to 35X faster application performance **Intel® Xeon Phi™ Processor Software Ecosystem Momentum Guide Performance results are based tests from 2016-2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations & functions. Any change to any of those factors may cause the results to vary. You should consult other information & performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/performance. See configurations in individual case study links. Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804. For more complete information about compiler optimizations, see our Optimization Notice. NERSC (National Energy Research Scientific Computing Center) Read case study Science&research For more success stories, review Intel® Parallel Studio XE Case Studies Artificialintelligence Performance speedup of up to 23X faster with Intel optimized scikit-learn vs. stock scikit-learn Google Cloud Platform Read blog LifeScience Simulations ran up to 7.6X faster with 9X energy efficiency** LAMMPS code - Sandia National Laboratories Read technology brief #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 35. Artificial Intelligence Energy EDA Science & Research Manufacturing Government Computer Software IT Healthcare Digital Media Telecommunications 35 Intel®ParallelStudioXEforAI:HighPerformance, ScalableSoftwareacrossMultipleIndustries 4X 8X 1.35X Kyoto University the Walker Molecular Dynamics lab 3X 1.4X 4X 10X 11X 25X 2.5X 1.25X 1.3X 5X 2X 20X 2.5X Performance results are based tests from ~2015-2017 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. For more complete information about performance and benchmark results, visit www.intel.com/benchmark. See configurations in Intel® Parallel Studio XE Case Studies deck, & individual case studies links at this site More Success Stories ▪ Intel® Parallel Studio XE Case Studies deck ▪ Case studies site Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets & other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User & Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804 Google Cloud Platform 23X
  • 36. Intel®DistributionforPython* SUPERCHARGE PYTHON APPS. RETHINK HIGH PERFORMANCE FOR AI. #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 37. 3 Python*Landscape Challenge#1 Domain experts are not professional software programmers Adoption of Python continues to grow among domain experts & developers for its productivity benefits MostPopularCodingLanguagesof2018 Challenge#2 Python performance limits migration to production systems Intel’s Python Tools › Accelerate Python performance › Enable easy access › Empower the community #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 38. 38 1Available only in Intel® Parallel Studio Composer Edition. EcosystemcompatibilityGreaterProductivityFasterPerformance Supports Python 2.7 & 3.6, Conda & PIP Operating System: Windows*, Linux*, MacOS1* Intel® Architecture Platforms Performance Libraries, Parallelism, Multithreading, Language Extensions › Accelerated NumPy/SciPy/scikit-learn with Intel® MKL1 & Intel® DAAL2 › Data analytics, machine learning & deep learning with scikit-learn, pyDAAL, TensorFlow* & Caffe* › Scale with Numba* & Cython* › Includes optimized mpi4py, works with Dask* & PySpark* › Optimized for latest Intel® architecture › Prebuilt & optimized packages for numerical computing, machine/deep learning, HPC, & data analytics › Drop in replacement for existing Python- No code changes required › Jupyter* notebooks, Matplotlib included › Free download & free for all uses including commercial deployment › Supports Python 2.7 & 3.6, optimizations integrated in Anaconda* Distribution › Distribution & optimized packages available via Conda, PIP, APT GET, YUM, & DockerHub, numerical performance optimizations integrated in Anaconda Distribution › Optimizations upstreamed to main Python trunk › Priority Support with Intel® Parallel Studio XE 1Intel® Math Kernel Library 2Intel® Data Analytics Acceleration Library Prebuilt & Accelerated Packages AcceleratePython*withIntel®DistributionforPython*High Performance Python* for Scientific Computing, Data Analytics, Machine & Deep Learning Learn More: software.intel.com/distribution-for-python Operating System: Windows*, Linux*, MacOS1* Intel® Architecture Platforms
  • 39. Intel®PerformanceLibraries POWERFUL & AWARD-WINNING PERFORMANCE LIBRARIES TO OPTIMIZE CODE & ACCELERATE DEVELOPMENT. 1Data from Evans Data Software Developer surveys, 2011-2016
  • 40. Fast,ScalableCodewithIntel®MathKernelLibrary (Intel® MKL) 40 › Speeds computations for scientific, engineering, financial and machine learning applications by providing highly optimized, threaded, and vectorized math functions › Provides key functionality for dense and sparse linear algebra (BLAS, LAPACK, PARDISO), FFTs, vector math, summary statistics, deep learning, splines and more › Dispatches optimized code for each processor automatically without the need to branch code › Optimized for single core vectorization and cache utilization › Automatic parallelism for multi-core and many-core › Scales from core to clusters › Available at no cost & royalty free › Great performance with minimal effort! 1 Available only in Intel® Parallel Studio Composer Edition. Dense&SPARSELinearAlgebra FastFourierTransforms VectorMath VectorRNGs FastPoissonSolver &More! Intel®MKLLibraryOffers…
  • 41. SpeedupAnalytics&MachineLearningwith Intel®DataAnalyticsAccelerationLibrary(Intel®DAAL) › Highly tuned functions for classical machine learning & analytics performance from datacenter to edge running on Intel® processor-based devices › Simultaneously ingests data & computes results for highest throughput performance › Supports batch, streaming & distributed usage models to meet a range of application needs › Includes Python*, C++, Java* APIs, & connectors to popular data sources including Spark* & Hadoop* Pre-processing Transformation Analysis Modeling DecisionMaking Decompression, Filtering, Normalization Aggregation, Dimension Reduction Summary Statistics Clustering, etc. Machine Learning (Training) Parameter Estimation Simulation Forecasting Decision Trees, etc. Validation Hypothesis Testing Model Errors What’sNewinthe2019Release New Algorithms › Logistic Regression, most widely-used classification algorithm › Extended Gradient Boosting Functionality for inexact split calculations & user-defined callback canceling for greater flexibility › User-defined Data Modification Procedure supports a wide range of feature extraction & transformation techniques Learn More: software.intel.com/daal #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 42. Spark Core Feature Parity Lower TCO, improved ease of use Efficient Scale-Out HighPerformanceDeepLearningforApacheSpark*onCPUInfrastructure No need to deploy costly accelerators, duplicate data, or suffer through scaling headaches! Designed&OptimizedforIntel®Xeon®Processor Powered by Intel® MKL-DNN DataFrame ML Pipelines SQL SparkR Streaming MLlib GraphX BigDL #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 43. 4 Consumer Sentiment Analysis Image Similarity * Search Image Transfer Learning Image Generation 3D Image Support Fraud Detection Anomaly Detection Recommendation NCF Wide n Deep Object Detection Tensorflow support Low latency serving Health Finance Retail Manufacturing Infrastructure #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 44. Result Client JD.Com, 2nd largest online retailer in China, ~ 25 M users. ChallengE Building deep learning applications such as image similarity search without moving data. Solution Switched from GPU to CPU cluster. Using Apache Spark* with BigDL, running on Intel® Xeon® processors Intel® Xeon® CPU Processing ~380M images 4XGaiN CaseStudy: ImageRecognition #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 45. The integrated surveillance system connected to cameras at stadiums, which transmitted video data to operational HQ in each city. Intel® Distribution of OpenVINO™ toolkit allowed Axxonsoft to distribute the neural network video analytics of the video across all available Intel hardware, for zone entry detection, abandoned objects detection, and facial recognition. SecurityforStadiums atWorldCup2018 Result used to protect Surveillance Cameras9000+ fans2Million+ See case study for details.
  • 46. Result 60%increaseIn coverage rate and 20% accuracy rate, better than traditional rule-based approach Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. *Other names and brands may be claimed as the property of others. “Performance of Intel® Xeon® processors and the sustained optimization of Apache Spark were key [to deploy] a single- platform that consolidates and analyzes all types of data, from any channel, within a highly secure environment.” https://ai.intel.com/nervana/wp-content/uploads/sites/53/2018/06/Intel-White-Paper-Union-Pay_2_hir-res_Keep-the-Size-of-Figure-6.pdf https://www.intel.com/content/www/us/en/financial-services-it/union-pay-case-study.html Client China UnionPay*, which specializes in banking services and payment systems. It is the 3rd largest payment network in the world. ChallengE Detect fraudulent credit card transactions with more coverage and accuracy. Solution Using Cloudera Enterprise (Hadoop Cluster), Apache Spark* with BigDL, running on Intel® Xeon® and 5th Gen Intel® Core™ Processors for credit card fraud detection. Historical data is stored on Apache Hive*. Data preprocessing done with Apache Spark SQL*.
  • 47. Result Working closely with Intel’s Analytics Zoo team, Midea built a highly-optimized defect detection solution, and chose Intel® Xeon® Scalable 6130/6148 over GPU-based servers as it met their latency requirements and more easily integrated into their existing infrastructure https://software.intel.com/en-us/articles/industrial-inspection-platform-in-midea-and-kuka-using-distributed-tensorflow-on-analytics “Analytics Zoo from Intel provides a great tool for developing the end-to-end AI solutions, building pipelines across cloud and edge computing, and optimizing the hardware resources.” Zheng Hu, Director of Computer Vision Research Institute, Midea Public Client Midea Group is a Chinese electrical appliance manufacturer with 21 manufacturing plants and 260 logistics centers across 200 countries ChallengE Midea needed to eliminate defects caused by scratched surfaces, missing bolts, misaligned labeling on surfaces (glass, polished metal, painted), and human inspection was not able to meet target quality metrics or detection rate requirements. Solution An advanced defect inspection system built on top of Analytics Zoo, which provides a unified analytics + AI platform that seamlessly unites Spark, BigDL and TensorFlow* programs into an integrated pipeline. The system was based on Intel® Xeon Scalable 6130/6148 servers and Core i7 edge devices. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. *Other names and brands may be claimed as the property of others.
  • 48. Result The platform provides multiple functions from onboard Wi-Fi to computer vision applications such as human/vehicle detection at crossroads, onboard empty seat detection and intruder detection. OpenVINO™ provides a scalable, high performance common platform across a variety of hardware for greater efficiencies. In-train visionplatform Enables pedestrian & vehicle identification at crossroads + on-train empty seat detection #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 49. Intel®DistributionofOpenVINO™Toolkit COMPUTER VISION & DEEP LEARNING APPS... NOW FASTER. #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 50. OpenVINO™SoftwaretoolkitVisual Inferencing & Neural Network Optimization DEPLOY COMPUTER VISION & DEEP LEARNING CAPABILITIES TO THE EDGE HighPerformance,highEfficiencyfortheedge Writeonce+scaletoDiverseAccelerators BroadFrameworksupport Other names and brands may be claimed as the property of others VPU = Vision Processing Unit (Movidius)
  • 51. 51 What’sInsidetheOpenVINO™toolkit OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos Intel®Architecture-Based PlatformsSupport OS Support CentOS* 7.4 (64 bit) Ubuntu* 16.04.3 LTS (64 bit) Microsoft Windows* 10 (64 bit) Yocto Project* version Poky Jethro v2.0.3 (64 bit) Intel® Deep Learning Deployment Toolkit Traditional Computer Vision Tools & Libraries Model Optimizer Convert & Optimize Inference Engine Optimized InferenceIR OpenCV* OpenVX* Photography Vision Optimized Libraries IR = Intermediate Representation file For Intel® CPU & CPU with integrated graphics Increase Media/Video/Graphics Performance Intel® Media SDK Open Source version OpenCL™ Drivers & Runtimes For CPU with integrated graphics Optimize Intel® FPGA FPGA RunTime Environment (from Intel® FPGA SDK for OpenCL™) Bitstreams FPGA – Linux* only 20+ Pre-trained Models Code SamplesComputer Vision Algorithms Samples
  • 52. 0 2 4 6 8 10 12 14 16 18 20 GoogLeNet v1 Vgg16* Squeezenet* 1.1 GoogLeNet v1 (32) Vgg16* (32) Squeezenet* 1.1 (32) Std. Caffe on CPU OpenCV on CPU OpenVINO on CPU OpenVINO on GPU OpenVINO on FPGA 52 Get an even Bigger Performance Boost with Intel® FPGA 1Depending on workload, quality/resolution for FP16 may be marginally impacted. A performance/quality tradeoff from FP32 to FP16 can affect accuracy; customers are encouraged to experiment to find what works best for their situation. Performance results are based on testing as of June 13, 2018 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Configuration: Testing by Intel as of June 13, 2018. Intel® Core™ i7-6700K CPU @ 2.90GHz fixed, GPU GT2 @ 1.00GHz fixed Internal ONLY testing, Test v3.15.21 – Ubuntu* 16.04, OpenVINO 2018 RC4, Intel® Arria® 10 FPGA 1150GX. Tests were based on various parameters such as model used (these are public), batch size, and other factors. Different models can be accelerated with different Intel hardware solutions, yet use the same Intel software tools. Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 Public Models (Batch Size) RelativePerformance Improvement Standard Caffe* Baseline 19.9x1 OpenVINO on CPU+Intel® FPGAOpenVINOon CPU+ Intel® Processor Graphics(GPU)/ (FP16) Comparison of Frames per Second (FPS) IncreaseDeepLearningWorkloadPerformance onPublicModelsusingOpenVINO™toolkit&Intel®Architecture
  • 53. oneAPI Single Programming Model to Deliver Cross-Architecture Performance All information provided in this deck is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.
  • 54. 54 ProgrammingChallenge Diverse set of data-centric hardware No common programming language or APIs Inconsistent tool support across platforms Each platform requires unique software investment Spatial FPGA Matrix AI Vector GPU Scalar CPU SVMS Optimization Notice
  • 55. The future is a diverse mix of scalar, vector, matrix, & spatial architectures deployed in CPU, GPU, AI, FPGA & other accelerators DiverseWorkloadsrequireDIVERSEARCHITECTURES 55 Spatial FPGA Matrix AI Vector GPU Scalar CPU SVMS Optimization Notice
  • 56. 56 Project oneAPI delivers a unified programming model to simplify development across diverse architectures Common developer experience across Scalar, Vector, Matrix & Spatial architectures (CPU, GPU, AI and FPGA) Uncompromised native high-level language performance Based on industry standards & open specifications Optimized Applications Optimized Middleware / Frameworks oneAPI Language & Libraries Intel’soneAPICoreConcept FPGAAIGPUCPU Scalar Vector Matrix Spatial oneAPI Tools Optimization Notice
  • 57. Some capabilities may differ per architecture. 57 oneAPIforcross-architectureperformance Optimized Applications Optimized Middleware & Frameworks oneAPI Product Direct Programming Data Parallel C++ API-Based Programming Libraries Analysis & Debug Tools Scalar Vector Matrix Spatial FPGAAIGPUCPU Optimization Notice
  • 58. Language to deliver uncompromised parallel programming productivity and performance across CPUs and accelerators Based on C++ with language enhancements being driven through community project Open, cross-industry alternative to single architecture proprietary language DataparallelC++Standards-based, Cross-architecture Language There will still be a need to tune for each architecture. Optimization Notice
  • 59. Visit TechDecoded.intel.io — a video series where developers learn to put into practice key optimization strategies with Intel Development tools. Focused conversations where tech. visionaries share key concepts on front-line topics, what you need to know and why it matters. Put into practice — short videos and articles that deliver the how- to’s of specific programming tasks using Intel tools. Watch big picture videos Dig deeper with Essential Get started with Quick Hits Webinars covering strategies, practices and tools that help you optimize applications and solutions performance. Visual Code Systems & IoT Data Science Data Center & Computing Modernization Cloud Computing GetTheMostFromYourCodeTodaywithIntelTech.Decoded Optimization Notice
  • 60. 60 Tools for C/C++/Python/Fortran developers – HPC, AI, IOT, Cloud Partner programs focused on Developer enablement Developers, Customers, Partners Trained 129Customers Engaged 150K 67Programs 62Partners software.intel.com Techdecoded.intel.io Intel®Software Intel®DeveloperWorkshops #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 61.
  • 63.
  • 64.
  • 66. Run Inference 1: Model vehicle-license-plate- detection-barrier-0007 Detects Vehicles Run Inference 2: Model vehicle-attributes- recognition-barrier-0010 Classifies vehicle attributes Run Inference 3: Model license-plate-recognition- barrier-0001 Detects License Plates Load Input Image(s) Display Results 66 #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 67. 67 End-to-EndVisionWorkflow Decode Pre- Processing Inference Post- Processing Encode GPUCPU GPUCPU FPGA VPU GPUCPU Intel® Media SDK OpenCV* Intel® Deep Learning Deployment Toolkit Intel Media SDK Video input Video output with results annotated OpenCV #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 68. 68 KeyVisionSolutionsOptimizedbyIntel® DistributionofOpenVINO™toolkit Intel teamed with Philips to show that servers powered by Intel® Xeon® Scalable processors & Intel® Distribution of OpenVINO™ toolkit can efficiently perform deep learning inference on patients’ X-rays & computed tomography (CT) scans, without the need for accelerators. Achieved breakthrough performance for AI inferencing: ▪ 188x increase in throughput (images/sec) on Bone-age prediction model.1 ▪ 38x increase in throughput (images/sec) on Lung segmentation model. 1 “Intel® Xeon® Scalable processors and OpenVINO toolkit appears to be the right solution for medical imaging AI workloads. Our customers can use their existing hardware to its maximum potential, without having to complicate their infrastructure, while still aiming to achieve quality output resolution at exceptional speeds." — Vijayananda J., chief architect and fellow, Data Science and AI, Philips HealthSuite Insights, India White Paper 1See white paper for performance details. Philips #IntelAIDC2019 | #AIonIntel | #IntelAI
  • 69. 69 The Intel® Distribution of OpenVINO™ toolkit helped GE deliver optimized inferencing to its deep learning image-classification solution. By bringing AI to its clinical diagnostic scanning, GE no longer needed an expensive 3rd party accelerator board, achieving: ▪ 5.9x inferencing performance above the target1 ▪ 14x inferencing speed over the baseline solution1 ▪ Improved image quality, diagnostic capabilities, and clinical workflows With the OpenVINO™ toolkit , we are now able to optimize inferencing across Intel® silicon, exceeding our throughput goals by almost 6x,” said David Chevalier, Principal Engineer for GE Healthcare. “We want to not only keep deployment costs down for our customers, but also offer a flexible, high-performance solution for a new era of smarter medical imaging. Our partnership with Intel allows us to bring the power of AI to clinical diagnostic scanning and other healthcare workflows in a cost-effective manner.” GE Healthcare* Intel-GE Healthcare, Intel® Distribution of OpenVINO™ Optimizes Deep Learning Performance for Healthcare Imaging KeyVisionSolutionsOptimizedbyIntel®Distributionof OpenVINO™toolkit 1See white paper for performance details.
  • 70. 70 DemonstratedIndustrySuccessAccess Developer Success Stories for details & more examples #IntelAIDC2019 | #AIonIntel | #IntelAI