Machine Learning with New Hardware Challegens

•

2 likes•425 views

Describe basic neural network design and focus on Convolutional Neural Network architecture. Explain why CPU and GPU can't fulfill CNN hardware requirement. List out three hardware examples: Nvidia, Microsoft and Google. Finally highlight optimization approach for CNN design.

Engineering

Machine Learning
with New Hardware Challenges
Oscar M.K. Law

High Tech Challenges
Personal
Computer
Internet Smartphone Machine
Learning
1980 1995 2007 2016

Machine Learning
 Visual/Audio Applications
 Pattern Recognition
 Pattern Detection
 Voice Recognition
 Motion/Movement Control
 Self-Driving
 Drone Control
 Data Mining/Association
 Medical
 Financial
 Legal

Neural Network
 Convolutional Neural Network (CNN)
 Model
 Mathematical Based Model
 Hardware
 Graphics Processing Unit (Nvidia)
 Catapult Fabric (Microsoft)
 Tensor Processing Unit (Google)
 Applications
 DCNN, RCNN, Fast-RCNN, Faster-RCN, RFCN
 Spike Neural Network (SNN)
 Model
 Physical Based Model
 Hardware
 TrueNorth Processor (IBM)
 Zeroth Processor (Qualcomm)

Convolutional Neural Network
A. Krizhevsky, I. Sutskever and G.E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS-2012, p.1-p.9.
1000
2048
2048 2048
128
128
192
192
192
192
128
128
48
48
3
Max
pooling
Max
pooling
Max
pooling
224
224
55
55
55
55
27
27
27
27
13
13
13
13
13
13
13
13
13
13
13
13
2048

Convolutional Neural Network
 Architecture
 5 Convolutional Layers
 3 Fully Connected Layers
 Gaber Filters
 650k Neurons
 69M Parameters
 630M Connections
 Runtime
 Nvidia GTX 580 3Gb GPU
 One week
 Results
 Top-1 Error Rate: 37.5%
 Top-5 Error Rate: 17.0%

Convolutional Neural Network
Software Developer Platform Interface Hardware
Caffe Berkeley Vision and Learning
Center
Linux, OSX, Windows,
Android
C++. Python, Matlab CPU, GPU
MatConvNet Oxford Visual Geometry
Group
Linux, OSX, Windows Matlab CPU, GPU
Matlab MathWorks Linux, OSX, Windows Matlab CPU, GPU
Tensorflow Google Brain Team Linux, OSX, Windows C++, Python CPU, GPU, TPU
Torch 7 R. Collobert, K. Kavukcuoglu,
C. Farabet
Linux, OSX, iOS,
Android, Windows
Lua, LuaJIT, C CPU, GPU
Theano Universite de Montreal Cross-Platform Python CPU, GPU
CNTK Microsoft Linux, OSX, Windows Network Description
Language
CPU, GPU, FPGA

CPU vs GPU
Central Processing Unit (CPU) Graphics Processing Unit (GPU)
Architecture
Instruction Set Single Instruction Single Data (SISD) Single Instruction Multiple Data (SIMD)
Operation Sequential Parallel
Processor Core Few Many
Datapath Custom Synthesis
Clock Rate High Moderate
Bandwidth Medium Large
Power Moderate High
Temperature Moderate High

Graphics Processing Unit (Nvidia)
Pascal Architecture
Flag Chip GP100
Process TSMC 16nm FinFet Process
Maximum Transistors 15.3B
Stream Multiprocessor (SM) 56 (10SM/GPC)
CUDA Cores 3840 CC (60CU/SM)
Base Clock 1328MHz
Boost Clock 1480MHz
FP32 Performance 10.6 TFlops
FP64 Performance 5.3 TFlops
Memory Interface 4096bit HBM2
Maximum Bandwidth 720 GB/s
Maximum Power 300W
J. Walton, Nvidia Pascal P100 Architecture Deep Dive, PC Gamer, Apr 07, 2016.

Catapult Fabric (Microsoft)
 Purpose
 Design for Neural Network Classification
 Target for power reduction
 Architecture
 Field Programmable Gate Array (FPGA)
 Software configurable engine supports runtime multiple layer
configurations
 A spatially distributed array of processing elements can be scaled
up to thousand of units
 On-chip redistribution network with efficient data buffer
minimizes off-chip memory traffic
 Power dissipation is significantly reduced to 25W only
K. Ovtcharov, O. Ruwase, J.Y. Kim, J. Fowers, K. Strauss, E.S. Chung, Accelerating Deep Convolutional Networks Using Specialized Hardware, Microsoft
Research, Feb 2015.

Tensor Processor Unit (Google)
 Purpose
 Support Tensorflow algorithm
 Target for Neural Network Classification
 Architecture
 Application Specific Integrated Circuit (ASIC)
 Single Instruction Multiple Data (SIMD) Architecture
 Low computational precision
 Better performance/watt

Hardware Optimization
Algorithm Architecture
Chip Design

What's hot

AI HardwareShahzaib Mahesar

Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Junli Gu

APSys Presentation Final copy2Junli Gu

High performance computing - building blocks, production & perspectiveJason Shih

"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...Edge AI and Vision Alliance

Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan

OpenCL caffe IWOCL 2016 presentation finalJunli Gu

Hardware Acceleration for Machine LearningCastLabKAIST

Expectations for optical network from the viewpoint of system software researchRyousei Takano

From Rack scale computers to Warehouse scale computersRyousei Takano

Exascale CapablSagar Dolas

Lec04 gpu architectureTaras Zakharchenko

Ac922 cdac webinarGanesan Narayanasamy

Optimizing High Performance Computing Applications for EnergyDavid Lecomber

"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...Edge AI and Vision Alliance

GPU ProgrammingWilliam Cunningham

Google warehouse scale computerTejhaskar Ashok Kumar

Deep LearningBüşra İçöz

FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel

Intel's Machine Learning Strategyinside-BigData.com

What's hot (20)

AI Hardware

Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015

APSys Presentation Final copy2

High performance computing - building blocks, production & perspective

"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...

Evolution of Supermicro GPU Server Solution

OpenCL caffe IWOCL 2016 presentation final

Hardware Acceleration for Machine Learning

Expectations for optical network from the viewpoint of system software research

From Rack scale computers to Warehouse scale computers

Exascale Capabl

Lec04 gpu architecture

Ac922 cdac webinar

Optimizing High Performance Computing Applications for Energy

"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...

GPU Programming

Google warehouse scale computer

Deep Learning

FPGA Hardware Accelerator for Machine Learning

Intel's Machine Learning Strategy

Viewers also liked

Startup Bootcamp - Session 4 of 8 - How to get your Startup GoingAmit Seth

Deep Learning and the state of AI / 2016Grigory Sapunov

Deep Learning on AWS (November 2016)Julien SIMON

IoT: Autonomous and Smart- Paul GuermonprezWithTheBest

1 구글의탄생Yongjin Yim

OPEN_POWER8_SESSION_20150316기한 김

"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...Edge AI and Vision Alliance

중국의 슈퍼컴퓨터 연구개발Lee Jysoo

Ibm truenorthSam varghese

Intel APJ Enterprise Day - Intel puts Automotive Innovation into High GearIntelAPAC

Truenorth - Ibm’s brain like chipSandeep Yadav

CUDA and Caffe for deep learningAmgad Muhammad

IoT & Machine Learning신동 강

GPU Computing for Cognitive RoboticsMartin Peniak

机器学习概述Dong Guo

How Zalando accelerates warehouse operations with neural networks - Calvin Se...Dataconomy Media

Accelerated Computing: The Path ForwardNVIDIA

Back-propagation PrimerAuro Tripathy

"Trends and Recent Developments in Processors for Vision," a Presentation fro...Edge AI and Vision Alliance

Preparing the Data Center for the Internet of ThingsIntel IoT

Viewers also liked (20)

Startup Bootcamp - Session 4 of 8 - How to get your Startup Going

Deep Learning and the state of AI / 2016

Deep Learning on AWS (November 2016)

IoT: Autonomous and Smart- Paul Guermonprez

1 구글의탄생

OPEN_POWER8_SESSION_20150316

"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...

중국의 슈퍼컴퓨터 연구개발

Ibm truenorth

Intel APJ Enterprise Day - Intel puts Automotive Innovation into High Gear

Truenorth - Ibm’s brain like chip

CUDA and Caffe for deep learning

IoT & Machine Learning

GPU Computing for Cognitive Robotics

机器学习概述

How Zalando accelerates warehouse operations with neural networks - Calvin Se...

Accelerated Computing: The Path Forward

Back-propagation Primer

"Trends and Recent Developments in Processors for Vision," a Presentation fro...

Preparing the Data Center for the Internet of Things

Similar to Machine Learning with New Hardware Challegens

The HPE Machine and Gen-Z - BUD17-503Linaro

Hardware in SpaceAlison B. Lowndes

組み込みから HPC まで ARM コアで実現するエコシステムShinnosuke Furuya

Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes

Dell NVIDIA AI Powered Transformation WebinarBill Wong

No[1][1]51 lecture

NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfMuhammadAbdullah311866

intel Sync. & Edge Solution udpate xEng-v1.0.pptxAlex Wooram Kim

Evolution and Advancement in ChipsetsDr. Shivananda Koteshwar

“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...Edge AI and Vision Alliance

DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley

[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...Rakuten Group, Inc.

infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...Infoshare

20170602_OSSummit_an_intelligent_storageKohei KaiGai

NWU and HPCWilhelm van Belkum

Zynq ultrascaleel10namaste

Advances in GPU ComputingFrédéric Parienté

Big Data LDN 2017: BI Converges with AI - GPUs for Fast DataMatt Stubbs

GTC 2022 KeynoteAlison B. Lowndes

GPU Technology Conference 2014 KeynoteNVIDIA

Similar to Machine Learning with New Hardware Challegens (20)

The HPE Machine and Gen-Z - BUD17-503

Hardware in Space

組み込みから HPC まで ARM コアで実現するエコシステム

Harnessing the virtual realm for successful real world artificial intelligence

Dell NVIDIA AI Powered Transformation Webinar

No[1][1]

NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf

intel Sync. & Edge Solution udpate xEng-v1.0.pptx

Evolution and Advancement in Chipsets

“Fast-track Design Cycles Using Lattice’s FPGAs,” a Presentation from Lattice...

DPDK: Multi Architecture High Performance Packet Processing

[RakutenTechConf2013] [A-3] TSUBAME2.5 to 3.0 and Convergence with Extreme Bi...

infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...

20170602_OSSummit_an_intelligent_storage

NWU and HPC

Zynq ultrascale

Advances in GPU Computing

Big Data LDN 2017: BI Converges with AI - GPUs for Fast Data

GTC 2022 Keynote

GPU Technology Conference 2014 Keynote

Recently uploaded

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Architect Hassan Khalil Portfolio for 2024hassan khalil

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam

GDSC ASEB Gen AI study jams presentationGDSCAESB

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

What are the advantages and disadvantages of membrane structures.pptxwendy cai

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Introduction and different types of Ethernet.pptxupamatechverse

Extrusion Processes and Their Limitations120cr0395

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Recently uploaded (20)

Microscopic Analysis of Ceramic Materials.pptx

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service

VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

Architect Hassan Khalil Portfolio for 2024

SPICE PARK APR2024 ( 6,793 SPICE Models )

chaitra-1.pptx fake news detection using machine learning

GDSC ASEB Gen AI study jams presentation

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

What are the advantages and disadvantages of membrane structures.pptx

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

Call Girls Delhi {Jodhpur} 9711199012 high profile service

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Introduction and different types of Ethernet.pptx

Extrusion Processes and Their Limitations

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Machine Learning with New Hardware Challegens

1. Machine Learning with New Hardware Challenges Oscar M.K. Law

2. High Tech Challenges Personal Computer Internet Smartphone Machine Learning 1980 1995 2007 2016

3. Machine Learning  Visual/Audio Applications  Pattern Recognition  Pattern Detection  Voice Recognition  Motion/Movement Control  Self-Driving  Drone Control  Data Mining/Association  Medical  Financial  Legal

4. Neural Network  Convolutional Neural Network (CNN)  Model  Mathematical Based Model  Hardware  Graphics Processing Unit (Nvidia)  Catapult Fabric (Microsoft)  Tensor Processing Unit (Google)  Applications  DCNN, RCNN, Fast-RCNN, Faster-RCN, RFCN  Spike Neural Network (SNN)  Model  Physical Based Model  Hardware  TrueNorth Processor (IBM)  Zeroth Processor (Qualcomm)

5. Convolutional Neural Network A. Krizhevsky, I. Sutskever and G.E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS-2012, p.1-p.9. 1000 2048 2048 2048 128 128 192 192 192 192 128 128 48 48 3 Max pooling Max pooling Max pooling 224 224 55 55 55 55 27 27 27 27 13 13 13 13 13 13 13 13 13 13 13 13 2048

6. Convolutional Neural Network  Architecture  5 Convolutional Layers  3 Fully Connected Layers  Gaber Filters  650k Neurons  69M Parameters  630M Connections  Runtime  Nvidia GTX 580 3Gb GPU  One week  Results  Top-1 Error Rate: 37.5%  Top-5 Error Rate: 17.0%

7. Convolutional Neural Network Software Developer Platform Interface Hardware Caffe Berkeley Vision and Learning Center Linux, OSX, Windows, Android C++. Python, Matlab CPU, GPU MatConvNet Oxford Visual Geometry Group Linux, OSX, Windows Matlab CPU, GPU Matlab MathWorks Linux, OSX, Windows Matlab CPU, GPU Tensorflow Google Brain Team Linux, OSX, Windows C++, Python CPU, GPU, TPU Torch 7 R. Collobert, K. Kavukcuoglu, C. Farabet Linux, OSX, iOS, Android, Windows Lua, LuaJIT, C CPU, GPU Theano Universite de Montreal Cross-Platform Python CPU, GPU CNTK Microsoft Linux, OSX, Windows Network Description Language CPU, GPU, FPGA

8. CPU vs GPU Central Processing Unit (CPU) Graphics Processing Unit (GPU) Architecture Instruction Set Single Instruction Single Data (SISD) Single Instruction Multiple Data (SIMD) Operation Sequential Parallel Processor Core Few Many Datapath Custom Synthesis Clock Rate High Moderate Bandwidth Medium Large Power Moderate High Temperature Moderate High

9. Graphics Processing Unit (Nvidia) Pascal Architecture Flag Chip GP100 Process TSMC 16nm FinFet Process Maximum Transistors 15.3B Stream Multiprocessor (SM) 56 (10SM/GPC) CUDA Cores 3840 CC (60CU/SM) Base Clock 1328MHz Boost Clock 1480MHz FP32 Performance 10.6 TFlops FP64 Performance 5.3 TFlops Memory Interface 4096bit HBM2 Maximum Bandwidth 720 GB/s Maximum Power 300W J. Walton, Nvidia Pascal P100 Architecture Deep Dive, PC Gamer, Apr 07, 2016.

10. Catapult Fabric (Microsoft)  Purpose  Design for Neural Network Classification  Target for power reduction  Architecture  Field Programmable Gate Array (FPGA)  Software configurable engine supports runtime multiple layer configurations  A spatially distributed array of processing elements can be scaled up to thousand of units  On-chip redistribution network with efficient data buffer minimizes off-chip memory traffic  Power dissipation is significantly reduced to 25W only K. Ovtcharov, O. Ruwase, J.Y. Kim, J. Fowers, K. Strauss, E.S. Chung, Accelerating Deep Convolutional Networks Using Specialized Hardware, Microsoft Research, Feb 2015.

11. Tensor Processor Unit (Google)  Purpose  Support Tensorflow algorithm  Target for Neural Network Classification  Architecture  Application Specific Integrated Circuit (ASIC)  Single Instruction Multiple Data (SIMD) Architecture  Low computational precision  Better performance/watt

12. Hardware Optimization Algorithm Architecture Chip Design

13. Thanks

Machine Learning with New Hardware Challegens

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Machine Learning with New Hardware Challegens

Similar to Machine Learning with New Hardware Challegens (20)

Recently uploaded

Recently uploaded (20)

Machine Learning with New Hardware Challegens