SlideShare a Scribd company logo
1 of 24
Download to read offline
NovuTensor: Hardware Acceleration of Deep Convolutional
Neural Networks for AI
User Friendly
NovuTensor
AI Chip
NovuStar
Supercomputer
Neural
Network
Models
Training NovuBrain
AI Module
Get Data
Training ability
Model accuracy
Full-StackAI
Solutions
NovuMind
Industry Corporation
Customer
&Alliance
• The core of CNN is 3-D Convolution Computation
• Deeper CNN needs more 3-D Convolution Computation
• AlexNet -> VGG -> ResNet -> …
• DCNN is a network topology composed of basic computation elements
(e.g., 3x3 3-D Convolutions )
•
•
•
•
•
TPU with 2D Systolic Array
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for AI," a Presentation from NovuMind
1080Ti Throughput:
Peak: 11TOPS/s
Effective: 2.4TOPS/s
(ResNet18)
Efficiency: 22%
TPU Throughput:
Peak: 96TOPS/s
Average: 23TOPS/s
Efficiency: 24%
2D Accelerators for Deep Learning
Mesh Structure and Systolic Array
• Overhead to prepare the 3D data into 2D data
• Overhead to feed data
• TPU uses 256x256 systolic array->needs 65536 data to fill
• Systolic Array got 2D parallel acceleration – only when data is loaded
1D, 2D, 3D – An Ice Cream Example
How Do You Eat Ice Cream?
• Data Movement:
• Need a scoop,then the art of scooping
• Need a spoon or cone that is mouth size
• Arithmetic:
• Need a mouth to consume the ice cream
• Unnecessary Assumptions with Mesh:
• dqual distance; deighboring data dependency
• Collective Streaming, and CNN 3D Data Processing: Data independency b/w PEs:
• no data flow
• no need to be rectangular, PEs can be flexible to have dozens to thousands
of MACs
•
•
•
The Scooping: Tile Based Operation (patented)
• Efficient Data Movement
• Reduce unnecessary data movement off chip and on chip
• Fully exploit data sharing, re-using:
• IFM sharing between OFMs
• Weight sharing within IFM (X-Y)
• Weight sharing within batch
• Efficient Arithmetic Logic
• Reduce unnecessary computation
PEP0
PEP1
PEP2
PEP3
IFM
broadcast
OFM[0]
OFM[1]
OFM[2]
OFM[3]
concatenate
OFM[3:0]
•
•
•
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for AI," a Presentation from NovuMind
"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for AI," a Presentation from NovuMind
NT-BB
FPGA
Server
GPU
Perform
ance
(FPS)
301 666
Power
(W)
12 250
1/20 Power Consumption
1/2 Performance
ResNet18
ImageNet Classification
20
Name
Power
Consumption (w)
Performance
(Effective TOPS,
VGG16)
Performance Power
Ratio (TOPS/W)
#1 Embedded 1.5 (500MHz) 0.2 0.13
#2 Smartphone NA (use as IP) 0.19 NA
#3 Smartphone NA (use as IP) 0.3 NA
NovuMind NT-BB
(“BlackBear”)
12~18 2.5 0.21 (FPGA)
NovuMind NT-GZ
(“Grizzly”)
5~10 11 (peak 14) 2.25 (ASIC)
Embedded GPU 10 0.4 0.04
Data Center AI ASIC 40 23 0.58
Server GPU 250 6.6 0.026
Low Power
Low Performance
Cannot meet the
requirements of most
AI computing needs.
High Power
High Performance
Cannot meet the
requirements of most
edge application
scenarios.
NovuTensor: Server performance within embedded power
The most power-efficient chip on the market!
Major AI Chips in Market
using AI super resolution technology to upscale low-resolution
video into 4k or 8k UHD video in real time
The world's 1st in
➢ Intelligently fill the
details to ensure ultra-
high quality
➢ Solve the problem of
media content lacking
4K / 8K resolutions for
online video and air
broadcasting
Super Resolution at CES2018: Beauty Made by
Muscle
( ) ( )
NovuTensor Kestrel: When Grizzly Flies
Call to Action
➢ CES January 2018:
http://share.gmw.cn/tech/2018-
01/11/content_27322512.htm?from=message&isappinstalled=0
➢ HP Enterprise + NovuMind at GTC October 2017
“Improving Deep Learning Scalability on HPE Servers With NovuMind: GPU RDMA Made
Easy”
https://www.leiphone.com/news/201710/GG9umC93Gtav2Eac.html
➢ Real Time Endoscopic Application at West China Hospital, July~August 2017
SCTV Video: http://www.iqiyi.com/v_19rrefrw1g.html
People’s Daily News: http://health.people.com.cn/n1/2017/0801/c14739-29440444.html
CCTV: http://jiankang.cctv.com/2017/07/28/ARTIE26PkbwL9t34pFS8jbIW170728.shtml
➢ AI Start-up NovuMind Helps European City Improve Their Green Life
Resources

More Related Content

What's hot

Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchRyousei Takano
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraRyousei Takano
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Gao Boyang
 
Managing Large Datasets in LabVIEW
Managing Large Datasets in LabVIEWManaging Large Datasets in LabVIEW
Managing Large Datasets in LabVIEWJames McNally
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)Fatima Qayyum
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsSepidehShirkhanzadeh
 
Effective machine learning_with_tpu
Effective machine learning_with_tpuEffective machine learning_with_tpu
Effective machine learning_with_tpuAthul Suresh
 
Large_Data_Block_Size_NSIC898
Large_Data_Block_Size_NSIC898Large_Data_Block_Size_NSIC898
Large_Data_Block_Size_NSIC898Martin Hassner
 
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Tokyo Institute of Technology
 
Gossip-based resource allocation for green computing in large clouds
Gossip-based resource allocation for green computing in large cloudsGossip-based resource allocation for green computing in large clouds
Gossip-based resource allocation for green computing in large cloudsRerngvit Yanggratoke
 
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...Rahul Jain
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionNVIDIA Taiwan
 

What's hot (17)

Expectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software researchExpectations for optical network from the viewpoint of system software research
Expectations for optical network from the viewpoint of system software research
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1Boyang gao gpu k-means_gmm_final_v1
Boyang gao gpu k-means_gmm_final_v1
 
Managing Large Datasets in LabVIEW
Managing Large Datasets in LabVIEWManaging Large Datasets in LabVIEW
Managing Large Datasets in LabVIEW
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
Memory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware AcceleratorsMemory Requirements for Convolutional Neural Network Hardware Accelerators
Memory Requirements for Convolutional Neural Network Hardware Accelerators
 
Effective machine learning_with_tpu
Effective machine learning_with_tpuEffective machine learning_with_tpu
Effective machine learning_with_tpu
 
Large_Data_Block_Size_NSIC898
Large_Data_Block_Size_NSIC898Large_Data_Block_Size_NSIC898
Large_Data_Block_Size_NSIC898
 
Google warehouse scale computer
Google warehouse scale computerGoogle warehouse scale computer
Google warehouse scale computer
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
 
Gossip-based resource allocation for green computing in large clouds
Gossip-based resource allocation for green computing in large cloudsGossip-based resource allocation for green computing in large clouds
Gossip-based resource allocation for green computing in large clouds
 
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Trans...
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
Ultra Fast SOM using CUDA
Ultra Fast SOM using CUDAUltra Fast SOM using CUDA
Ultra Fast SOM using CUDA
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 

Similar to "NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for AI," a Presentation from NovuMind

In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxMemory Fabric Forum
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersCastLabKAIST
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Shien-Chun Luo
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioOPNFV
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerFörderverein Technische Fakultät
 
Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Vsevolod Shabad
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit SupercomputerVigneshwarRamaswamy
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Rakib Hossain
 
Oow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-dbOow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-dbbohanchen
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
 
Performance analysis of 3D Finite Difference computational stencils on Seamic...
Performance analysis of 3D Finite Difference computational stencils on Seamic...Performance analysis of 3D Finite Difference computational stencils on Seamic...
Performance analysis of 3D Finite Difference computational stencils on Seamic...Joshua Mora
 

Similar to "NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for AI," a Presentation from NovuMind (20)

In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxQ1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptx
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)Lightweight DNN Processor Design (based on NVDLA)
Lightweight DNN Processor Design (based on NVDLA)
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 
26_Fan.pdf
26_Fan.pdf26_Fan.pdf
26_Fan.pdf
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A SupercomputerIntroduction to National Supercomputer center in Tianjin TH-1A Supercomputer
Introduction to National Supercomputer center in Tianjin TH-1A Supercomputer
 
Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)Infrastructure optimization for seismic processing (eng)
Infrastructure optimization for seismic processing (eng)
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Brkdct 3101
Brkdct 3101Brkdct 3101
Brkdct 3101
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
 
Oow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-dbOow 2008 yahoo_pie-db
Oow 2008 yahoo_pie-db
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
 
Performance analysis of 3D Finite Difference computational stencils on Seamic...
Performance analysis of 3D Finite Difference computational stencils on Seamic...Performance analysis of 3D Finite Difference computational stencils on Seamic...
Performance analysis of 3D Finite Difference computational stencils on Seamic...
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...Edge AI and Vision Alliance
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic LeapEdge AI and Vision Alliance
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from InstrumentalEdge AI and Vision Alliance
 
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...Edge AI and Vision Alliance
 
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...Edge AI and Vision Alliance
 
May 2023 Embedded Vision Summit Opening Remarks (May 23)
May 2023 Embedded Vision Summit Opening Remarks (May 23)May 2023 Embedded Vision Summit Opening Remarks (May 23)
May 2023 Embedded Vision Summit Opening Remarks (May 23)Edge AI and Vision Alliance
 
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...Edge AI and Vision Alliance
 
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole GroupEdge AI and Vision Alliance
 
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...Edge AI and Vision Alliance
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...Edge AI and Vision Alliance
 
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...Edge AI and Vision Alliance
 
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...Edge AI and Vision Alliance
 
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental
 
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
 
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
 
May 2023 Embedded Vision Summit Opening Remarks (May 23)
May 2023 Embedded Vision Summit Opening Remarks (May 23)May 2023 Embedded Vision Summit Opening Remarks (May 23)
May 2023 Embedded Vision Summit Opening Remarks (May 23)
 
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...
 
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
 
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
 
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
 
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...
 

Recently uploaded

TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)IES VE
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingFrancesco Corti
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdfThe Good Food Institute
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechProduct School
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxKaustubhBhavsar6
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxSatishbabu Gunukula
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxNeo4j
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTopCSSGallery
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updateadam112203
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 

Recently uploaded (20)

TrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie WorldTrustArc Webinar - How to Live in a Post Third-Party Cookie World
TrustArc Webinar - How to Live in a Post Third-Party Cookie World
 
The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)The Importance of Indoor Air Quality (English)
The Importance of Indoor Air Quality (English)
 
Where developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is goingWhere developers are challenged, what developers want and where DevEx is going
Where developers are challenged, what developers want and where DevEx is going
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf2024.03.12 Cost drivers of cultivated meat production.pdf
2024.03.12 Cost drivers of cultivated meat production.pdf
 
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - TechWebinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
Webinar: The Art of Prioritizing Your Product Roadmap by AWS Sr PM - Tech
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
How to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptxHow to become a GDSC Lead GDSC MI AOE.pptx
How to become a GDSC Lead GDSC MI AOE.pptx
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
Oracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptxOracle Database 23c Security New Features.pptx
Oracle Database 23c Security New Features.pptx
 
Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptxEmil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
Emil Eifrem at GraphSummit Copenhagen 2024 - The Art of the Possible.pptx
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
Top 10 Squarespace Development Companies
Top 10 Squarespace Development CompaniesTop 10 Squarespace Development Companies
Top 10 Squarespace Development Companies
 
SheDev 2024
SheDev 2024SheDev 2024
SheDev 2024
 
Patch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 updatePatch notes explaining DISARM Version 1.4 update
Patch notes explaining DISARM Version 1.4 update
 
Planetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile BrochurePlanetek Italia Srl - Corporate Profile Brochure
Planetek Italia Srl - Corporate Profile Brochure
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 

"NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for AI," a Presentation from NovuMind

  • 1. NovuTensor: Hardware Acceleration of Deep Convolutional Neural Networks for AI
  • 2. User Friendly NovuTensor AI Chip NovuStar Supercomputer Neural Network Models Training NovuBrain AI Module Get Data Training ability Model accuracy Full-StackAI Solutions NovuMind Industry Corporation Customer &Alliance
  • 3. • The core of CNN is 3-D Convolution Computation • Deeper CNN needs more 3-D Convolution Computation • AlexNet -> VGG -> ResNet -> … • DCNN is a network topology composed of basic computation elements (e.g., 3x3 3-D Convolutions )
  • 5. TPU with 2D Systolic Array
  • 7. 1080Ti Throughput: Peak: 11TOPS/s Effective: 2.4TOPS/s (ResNet18) Efficiency: 22% TPU Throughput: Peak: 96TOPS/s Average: 23TOPS/s Efficiency: 24%
  • 8. 2D Accelerators for Deep Learning Mesh Structure and Systolic Array • Overhead to prepare the 3D data into 2D data • Overhead to feed data • TPU uses 256x256 systolic array->needs 65536 data to fill • Systolic Array got 2D parallel acceleration – only when data is loaded
  • 9. 1D, 2D, 3D – An Ice Cream Example
  • 10. How Do You Eat Ice Cream? • Data Movement: • Need a scoop,then the art of scooping • Need a spoon or cone that is mouth size • Arithmetic: • Need a mouth to consume the ice cream
  • 11. • Unnecessary Assumptions with Mesh: • dqual distance; deighboring data dependency • Collective Streaming, and CNN 3D Data Processing: Data independency b/w PEs: • no data flow • no need to be rectangular, PEs can be flexible to have dozens to thousands of MACs
  • 12. • • • The Scooping: Tile Based Operation (patented)
  • 13. • Efficient Data Movement • Reduce unnecessary data movement off chip and on chip • Fully exploit data sharing, re-using: • IFM sharing between OFMs • Weight sharing within IFM (X-Y) • Weight sharing within batch • Efficient Arithmetic Logic • Reduce unnecessary computation
  • 16.
  • 19. NT-BB FPGA Server GPU Perform ance (FPS) 301 666 Power (W) 12 250 1/20 Power Consumption 1/2 Performance ResNet18 ImageNet Classification
  • 20. 20 Name Power Consumption (w) Performance (Effective TOPS, VGG16) Performance Power Ratio (TOPS/W) #1 Embedded 1.5 (500MHz) 0.2 0.13 #2 Smartphone NA (use as IP) 0.19 NA #3 Smartphone NA (use as IP) 0.3 NA NovuMind NT-BB (“BlackBear”) 12~18 2.5 0.21 (FPGA) NovuMind NT-GZ (“Grizzly”) 5~10 11 (peak 14) 2.25 (ASIC) Embedded GPU 10 0.4 0.04 Data Center AI ASIC 40 23 0.58 Server GPU 250 6.6 0.026 Low Power Low Performance Cannot meet the requirements of most AI computing needs. High Power High Performance Cannot meet the requirements of most edge application scenarios. NovuTensor: Server performance within embedded power The most power-efficient chip on the market! Major AI Chips in Market
  • 21. using AI super resolution technology to upscale low-resolution video into 4k or 8k UHD video in real time The world's 1st in ➢ Intelligently fill the details to ensure ultra- high quality ➢ Solve the problem of media content lacking 4K / 8K resolutions for online video and air broadcasting Super Resolution at CES2018: Beauty Made by Muscle
  • 22. ( ) ( ) NovuTensor Kestrel: When Grizzly Flies
  • 24. ➢ CES January 2018: http://share.gmw.cn/tech/2018- 01/11/content_27322512.htm?from=message&isappinstalled=0 ➢ HP Enterprise + NovuMind at GTC October 2017 “Improving Deep Learning Scalability on HPE Servers With NovuMind: GPU RDMA Made Easy” https://www.leiphone.com/news/201710/GG9umC93Gtav2Eac.html ➢ Real Time Endoscopic Application at West China Hospital, July~August 2017 SCTV Video: http://www.iqiyi.com/v_19rrefrw1g.html People’s Daily News: http://health.people.com.cn/n1/2017/0801/c14739-29440444.html CCTV: http://jiankang.cctv.com/2017/07/28/ARTIE26PkbwL9t34pFS8jbIW170728.shtml ➢ AI Start-up NovuMind Helps European City Improve Their Green Life Resources