SlideShare a Scribd company logo
1 of 13
Download to read offline
Machine Learning
with New Hardware Challenges
Oscar M.K. Law
High Tech Challenges
Personal
Computer
Internet Smartphone Machine
Learning
1980 1995 2007 2016
Machine Learning
 Visual/Audio Applications
 Pattern Recognition
 Pattern Detection
 Voice Recognition
 Motion/Movement Control
 Self-Driving
 Drone Control
 Data Mining/Association
 Medical
 Financial
 Legal
Neural Network
 Convolutional Neural Network (CNN)
 Model
 Mathematical Based Model
 Hardware
 Graphics Processing Unit (Nvidia)
 Catapult Fabric (Microsoft)
 Tensor Processing Unit (Google)
 Applications
 DCNN, RCNN, Fast-RCNN, Faster-RCN, RFCN
 Spike Neural Network (SNN)
 Model
 Physical Based Model
 Hardware
 TrueNorth Processor (IBM)
 Zeroth Processor (Qualcomm)
Convolutional Neural Network
A. Krizhevsky, I. Sutskever and G.E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS-2012, p.1-p.9.
1000
2048
2048 2048
128
128
192
192
192
192
128
128
48
48
3
Max
pooling
Max
pooling
Max
pooling
224
224
55
55
55
55
27
27
27
27
13
13
13
13
13
13
13
13
13
13
13
13
2048
Convolutional Neural Network
 Architecture
 5 Convolutional Layers
 3 Fully Connected Layers
 Gaber Filters
 650k Neurons
 69M Parameters
 630M Connections
 Runtime
 Nvidia GTX 580 3Gb GPU
 One week
 Results
 Top-1 Error Rate: 37.5%
 Top-5 Error Rate: 17.0%
Convolutional Neural Network
Software Developer Platform Interface Hardware
Caffe Berkeley Vision and Learning
Center
Linux, OSX, Windows,
Android
C++. Python, Matlab CPU, GPU
MatConvNet Oxford Visual Geometry
Group
Linux, OSX, Windows Matlab CPU, GPU
Matlab MathWorks Linux, OSX, Windows Matlab CPU, GPU
Tensorflow Google Brain Team Linux, OSX, Windows C++, Python CPU, GPU, TPU
Torch 7 R. Collobert, K. Kavukcuoglu,
C. Farabet
Linux, OSX, iOS,
Android, Windows
Lua, LuaJIT, C CPU, GPU
Theano Universite de Montreal Cross-Platform Python CPU, GPU
CNTK Microsoft Linux, OSX, Windows Network Description
Language
CPU, GPU, FPGA
CPU vs GPU
Central Processing Unit (CPU) Graphics Processing Unit (GPU)
Architecture
Instruction Set Single Instruction Single Data (SISD) Single Instruction Multiple Data (SIMD)
Operation Sequential Parallel
Processor Core Few Many
Datapath Custom Synthesis
Clock Rate High Moderate
Bandwidth Medium Large
Power Moderate High
Temperature Moderate High
Graphics Processing Unit (Nvidia)
Pascal Architecture
Flag Chip GP100
Process TSMC 16nm FinFet Process
Maximum Transistors 15.3B
Stream Multiprocessor (SM) 56 (10SM/GPC)
CUDA Cores 3840 CC (60CU/SM)
Base Clock 1328MHz
Boost Clock 1480MHz
FP32 Performance 10.6 TFlops
FP64 Performance 5.3 TFlops
Memory Interface 4096bit HBM2
Maximum Bandwidth 720 GB/s
Maximum Power 300W
J. Walton, Nvidia Pascal P100 Architecture Deep Dive, PC Gamer, Apr 07, 2016.
Catapult Fabric (Microsoft)
 Purpose
 Design for Neural Network Classification
 Target for power reduction
 Architecture
 Field Programmable Gate Array (FPGA)
 Software configurable engine supports runtime multiple layer
configurations
 A spatially distributed array of processing elements can be scaled
up to thousand of units
 On-chip redistribution network with efficient data buffer
minimizes off-chip memory traffic
 Power dissipation is significantly reduced to 25W only
K. Ovtcharov, O. Ruwase, J.Y. Kim, J. Fowers, K. Strauss, E.S. Chung, Accelerating Deep Convolutional Networks Using Specialized Hardware, Microsoft
Research, Feb 2015.
Tensor Processor Unit (Google)
 Purpose
 Support Tensorflow algorithm
 Target for Neural Network Classification
 Architecture
 Application Specific Integrated Circuit (ASIC)
 Single Instruction Multiple Data (SIMD) Architecture
 Low computational precision
 Better performance/watt
Hardware Optimization
Algorithm Architecture
Chip Design
Thanks

More Related Content

What's hot

Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Junli Gu
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveJason Shih
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...Edge AI and Vision Alliance
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalJunli Gu
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchRyousei Takano
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Optimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyOptimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyDavid Lecomber
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...Edge AI and Vision Alliance
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
Intel's Machine Learning Strategy
Intel's Machine Learning StrategyIntel's Machine Learning Strategy
Intel's Machine Learning Strategyinside-BigData.com
 

What's hot (20)

AI Hardware
AI HardwareAI Hardware
AI Hardware
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
High performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspectiveHigh performance computing - building blocks, production & perspective
High performance computing - building blocks, production & perspective
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
 
Optimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for EnergyOptimizing High Performance Computing Applications for Energy
Optimizing High Performance Computing Applications for Energy
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Google warehouse scale computer
Google warehouse scale computerGoogle warehouse scale computer
Google warehouse scale computer
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Intel's Machine Learning Strategy
Intel's Machine Learning StrategyIntel's Machine Learning Strategy
Intel's Machine Learning Strategy
 

Viewers also liked

Startup Bootcamp - Session 4 of 8 - How to get your Startup Going
Startup Bootcamp - Session 4 of 8 - How to get your Startup GoingStartup Bootcamp - Session 4 of 8 - How to get your Startup Going
Startup Bootcamp - Session 4 of 8 - How to get your Startup GoingAmit Seth
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Julien SIMON
 
IoT: Autonomous and Smart- Paul Guermonprez
IoT: Autonomous and Smart- Paul GuermonprezIoT: Autonomous and Smart- Paul Guermonprez
IoT: Autonomous and Smart- Paul GuermonprezWithTheBest
 
1 구글의탄생
1 구글의탄생1 구글의탄생
1 구글의탄생Yongjin Yim
 
OPEN_POWER8_SESSION_20150316
OPEN_POWER8_SESSION_20150316OPEN_POWER8_SESSION_20150316
OPEN_POWER8_SESSION_20150316기한 김
 
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ..."Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...Edge AI and Vision Alliance
 
중국의 슈퍼컴퓨터 연구개발
중국의 슈퍼컴퓨터 연구개발중국의 슈퍼컴퓨터 연구개발
중국의 슈퍼컴퓨터 연구개발Lee Jysoo
 
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High GearIntel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High GearIntelAPAC
 
Truenorth - Ibm’s brain like chip
Truenorth - Ibm’s brain like chipTruenorth - Ibm’s brain like chip
Truenorth - Ibm’s brain like chipSandeep Yadav
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
IoT & Machine Learning
IoT & Machine LearningIoT & Machine Learning
IoT & Machine Learning신동 강
 
GPU Computing for Cognitive Robotics
GPU Computing for Cognitive RoboticsGPU Computing for Cognitive Robotics
GPU Computing for Cognitive RoboticsMartin Peniak
 
机器学习概述
机器学习概述机器学习概述
机器学习概述Dong Guo
 
How Zalando accelerates warehouse operations with neural networks - Calvin Se...
How Zalando accelerates warehouse operations with neural networks - Calvin Se...How Zalando accelerates warehouse operations with neural networks - Calvin Se...
How Zalando accelerates warehouse operations with neural networks - Calvin Se...Dataconomy Media
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardNVIDIA
 
Back-propagation Primer
Back-propagation PrimerBack-propagation Primer
Back-propagation PrimerAuro Tripathy
 
"Trends and Recent Developments in Processors for Vision," a Presentation fro...
"Trends and Recent Developments in Processors for Vision," a Presentation fro..."Trends and Recent Developments in Processors for Vision," a Presentation fro...
"Trends and Recent Developments in Processors for Vision," a Presentation fro...Edge AI and Vision Alliance
 
Preparing the Data Center for the Internet of Things
Preparing the Data Center for the Internet of ThingsPreparing the Data Center for the Internet of Things
Preparing the Data Center for the Internet of ThingsIntel IoT
 

Viewers also liked (20)

Startup Bootcamp - Session 4 of 8 - How to get your Startup Going
Startup Bootcamp - Session 4 of 8 - How to get your Startup GoingStartup Bootcamp - Session 4 of 8 - How to get your Startup Going
Startup Bootcamp - Session 4 of 8 - How to get your Startup Going
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)Deep Learning on AWS (November 2016)
Deep Learning on AWS (November 2016)
 
IoT: Autonomous and Smart- Paul Guermonprez
IoT: Autonomous and Smart- Paul GuermonprezIoT: Autonomous and Smart- Paul Guermonprez
IoT: Autonomous and Smart- Paul Guermonprez
 
1 구글의탄생
1 구글의탄생1 구글의탄생
1 구글의탄생
 
OPEN_POWER8_SESSION_20150316
OPEN_POWER8_SESSION_20150316OPEN_POWER8_SESSION_20150316
OPEN_POWER8_SESSION_20150316
 
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ..."Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
 
중국의 슈퍼컴퓨터 연구개발
중국의 슈퍼컴퓨터 연구개발중국의 슈퍼컴퓨터 연구개발
중국의 슈퍼컴퓨터 연구개발
 
Ibm truenorth
Ibm truenorthIbm truenorth
Ibm truenorth
 
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High GearIntel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear
Intel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear
 
Truenorth - Ibm’s brain like chip
Truenorth - Ibm’s brain like chipTruenorth - Ibm’s brain like chip
Truenorth - Ibm’s brain like chip
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
IoT & Machine Learning
IoT & Machine LearningIoT & Machine Learning
IoT & Machine Learning
 
GPU Computing for Cognitive Robotics
GPU Computing for Cognitive RoboticsGPU Computing for Cognitive Robotics
GPU Computing for Cognitive Robotics
 
机器学习概述
机器学习概述机器学习概述
机器学习概述
 
How Zalando accelerates warehouse operations with neural networks - Calvin Se...
How Zalando accelerates warehouse operations with neural networks - Calvin Se...How Zalando accelerates warehouse operations with neural networks - Calvin Se...
How Zalando accelerates warehouse operations with neural networks - Calvin Se...
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path Forward
 
Back-propagation Primer
Back-propagation PrimerBack-propagation Primer
Back-propagation Primer
 
"Trends and Recent Developments in Processors for Vision," a Presentation fro...
"Trends and Recent Developments in Processors for Vision," a Presentation fro..."Trends and Recent Developments in Processors for Vision," a Presentation fro...
"Trends and Recent Developments in Processors for Vision," a Presentation fro...
 
Preparing the Data Center for the Internet of Things
Preparing the Data Center for the Internet of ThingsPreparing the Data Center for the Internet of Things
Preparing the Data Center for the Internet of Things
 

Similar to Machine Learning with New Hardware Challegens

The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503Linaro
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarBill Wong
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfMuhammadAbdullah311866
 
intel Sync. & Edge Solution udpate xEng-v1.0.pptx
intel Sync. & Edge Solution udpate xEng-v1.0.pptxintel Sync. & Edge Solution udpate xEng-v1.0.pptx
intel Sync. & Edge Solution udpate xEng-v1.0.pptxAlex Wooram Kim
 
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...Edge AI and Vision Alliance
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...Rakuten Group, Inc.
 
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...Infoshare
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataBig Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataMatt Stubbs
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteNVIDIA
 

Similar to Machine Learning with New Hardware Challegens (20)

The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503The HPE Machine and Gen-Z - BUD17-503
The HPE Machine and Gen-Z - BUD17-503
 
Hardware in Space
Hardware in SpaceHardware in Space
Hardware in Space
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
intel Sync. & Edge Solution udpate xEng-v1.0.pptx
intel Sync. & Edge Solution udpate xEng-v1.0.pptxintel Sync. & Edge Solution udpate xEng-v1.0.pptx
intel Sync. & Edge Solution udpate xEng-v1.0.pptx
 
Evolution and Advancement in Chipsets
Evolution and Advancement in ChipsetsEvolution and Advancement in Chipsets
Evolution and Advancement in Chipsets
 
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...
 
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Zynq ultrascale
Zynq ultrascaleZynq ultrascale
Zynq ultrascale
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataBig Data LDN 2017: BI Converges with AI - GPUs for Fast Data
Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data
 
GTC 2022 Keynote
GTC 2022 KeynoteGTC 2022 Keynote
GTC 2022 Keynote
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 

Recently uploaded

Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 

Machine Learning with New Hardware Challegens

  • 1. Machine Learning with New Hardware Challenges Oscar M.K. Law
  • 2. High Tech Challenges Personal Computer Internet Smartphone Machine Learning 1980 1995 2007 2016
  • 3. Machine Learning  Visual/Audio Applications  Pattern Recognition  Pattern Detection  Voice Recognition  Motion/Movement Control  Self-Driving  Drone Control  Data Mining/Association  Medical  Financial  Legal
  • 4. Neural Network  Convolutional Neural Network (CNN)  Model  Mathematical Based Model  Hardware  Graphics Processing Unit (Nvidia)  Catapult Fabric (Microsoft)  Tensor Processing Unit (Google)  Applications  DCNN, RCNN, Fast-RCNN, Faster-RCN, RFCN  Spike Neural Network (SNN)  Model  Physical Based Model  Hardware  TrueNorth Processor (IBM)  Zeroth Processor (Qualcomm)
  • 5. Convolutional Neural Network A. Krizhevsky, I. Sutskever and G.E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS-2012, p.1-p.9. 1000 2048 2048 2048 128 128 192 192 192 192 128 128 48 48 3 Max pooling Max pooling Max pooling 224 224 55 55 55 55 27 27 27 27 13 13 13 13 13 13 13 13 13 13 13 13 2048
  • 6. Convolutional Neural Network  Architecture  5 Convolutional Layers  3 Fully Connected Layers  Gaber Filters  650k Neurons  69M Parameters  630M Connections  Runtime  Nvidia GTX 580 3Gb GPU  One week  Results  Top-1 Error Rate: 37.5%  Top-5 Error Rate: 17.0%
  • 7. Convolutional Neural Network Software Developer Platform Interface Hardware Caffe Berkeley Vision and Learning Center Linux, OSX, Windows, Android C++. Python, Matlab CPU, GPU MatConvNet Oxford Visual Geometry Group Linux, OSX, Windows Matlab CPU, GPU Matlab MathWorks Linux, OSX, Windows Matlab CPU, GPU Tensorflow Google Brain Team Linux, OSX, Windows C++, Python CPU, GPU, TPU Torch 7 R. Collobert, K. Kavukcuoglu, C. Farabet Linux, OSX, iOS, Android, Windows Lua, LuaJIT, C CPU, GPU Theano Universite de Montreal Cross-Platform Python CPU, GPU CNTK Microsoft Linux, OSX, Windows Network Description Language CPU, GPU, FPGA
  • 8. CPU vs GPU Central Processing Unit (CPU) Graphics Processing Unit (GPU) Architecture Instruction Set Single Instruction Single Data (SISD) Single Instruction Multiple Data (SIMD) Operation Sequential Parallel Processor Core Few Many Datapath Custom Synthesis Clock Rate High Moderate Bandwidth Medium Large Power Moderate High Temperature Moderate High
  • 9. Graphics Processing Unit (Nvidia) Pascal Architecture Flag Chip GP100 Process TSMC 16nm FinFet Process Maximum Transistors 15.3B Stream Multiprocessor (SM) 56 (10SM/GPC) CUDA Cores 3840 CC (60CU/SM) Base Clock 1328MHz Boost Clock 1480MHz FP32 Performance 10.6 TFlops FP64 Performance 5.3 TFlops Memory Interface 4096bit HBM2 Maximum Bandwidth 720 GB/s Maximum Power 300W J. Walton, Nvidia Pascal P100 Architecture Deep Dive, PC Gamer, Apr 07, 2016.
  • 10. Catapult Fabric (Microsoft)  Purpose  Design for Neural Network Classification  Target for power reduction  Architecture  Field Programmable Gate Array (FPGA)  Software configurable engine supports runtime multiple layer configurations  A spatially distributed array of processing elements can be scaled up to thousand of units  On-chip redistribution network with efficient data buffer minimizes off-chip memory traffic  Power dissipation is significantly reduced to 25W only K. Ovtcharov, O. Ruwase, J.Y. Kim, J. Fowers, K. Strauss, E.S. Chung, Accelerating Deep Convolutional Networks Using Specialized Hardware, Microsoft Research, Feb 2015.
  • 11. Tensor Processor Unit (Google)  Purpose  Support Tensorflow algorithm  Target for Neural Network Classification  Architecture  Application Specific Integrated Circuit (ASIC)  Single Instruction Multiple Data (SIMD) Architecture  Low computational precision  Better performance/watt