SlideShare a Scribd company logo
A Novel Packet-Based
Accelerator for Resource-
Constrained Edge Devices
Sharad Chole, Chief Scientist
Expedera
• Up to 40% of SoC resources are devoted to NPUs
• Todays NPUs are performance, power, and area inefficient
• Scaling only amplifies the problem – NPU power and area requirements have grown
significantly compared to performance
• NPUs can be, and must be, better to enable the true promise of edge AI
• Alternative architectures can be deployed to support the rapidly
increasing needs of edge AI
• More efficient, highly utilized processors
• Scalable designs for reuse across platforms
• Let's show you how we can do this…
AI Inference Solutions Are Not Sustainable
2
© 2022 Expedera
Tesla FSD
-74 TOPS-
Apple A14
-11 TOPS-
Edge AI – Today vs Tomorrow
3
© 2022 Expedera
Future Needs
Current Solutions
• Multi-core architectures with complex compilers
• Stagnated by low utilization & area efficiency
• Expensive for edge SoCs
• Require too much DDR (performance/GBps)
• Low power efficiency (performance/W)
• Applications are struggling to approach markets
• Single tenant, high latency
• Small resolutions & low network diversity
• Scalable monolith with robust software toolchain
• High utilization & area efficiency
• Built for edge deployments
• Reduced/no requirement for DDR
• Enable lower thermal and battery constraints
• Model deployments with SLA guarantees
• Multi-tenant real time processing, scalable to
4K and beyond
Accelerators must meet tight cost, bandwidth & power consumption constraints while still delivering high performance
A Growing Diversity of Networks Demands Flexibility
4
© 2022 Expedera
Network Typical Tasks
ResNet 50
Yolo V3 Object detection
UNet
FSRCNN
Image segmentation, denoising
Embedded super resolution
MobileBERT Language understanding
Image classification, feature extraction
backbone
ResNet 50
Yolo V3
Unet
FSRCNN
MobileBERT
ResNet 50
Yolo V3
Unet
FSRCNN
MobileBERT
ResNet 50
Yolo V3
Unet
FSRCNN
MobileBERT
Weights
(Million)
Operations
(Billion)
Activations
(Million)
Edge AI Needs a Different Definition of Compute
5
© 2022 Expedera
Instruction
e.g. CPU, GPGPU, DSP
• Need to distribute workload and gather results back
• Heavy use of synchronization primitives for scalability
• Hierarchical memory systems cost power and area
Current AI engines are limited by high cost of context switching and cannot break down
layers into more granular pieces. Resulting in poor utilization and worse PPA.
Two types of compute abstractions are used today:
Neural Network Layer
e.g. Systolic Arrays, DNN APIs
• Need large on chip memory to process a layer
• Limited by layer ordering optimizations
• NN orchestration done through controller CPU
• Aggregate of work with notion of dependencies and
performance deterministic execution, based on a network-
centric approach
• A contiguous fragment of a neural network layer with entire
context of execution; layer type, attributes, priority
• Manage activations better/more intelligently
• Reordering packets does not incur penalty – as DLA supports
zero cost context switching
• Through reorganizing packets in the optimum order without
hurting accuracy, Expedera produces the minimum number
of moves
• Greatly increases performance while lowering silicon power and
area requirements
Packets: a Radical Approach to AI Inference Optimization
6
© 2022 Expedera
Neural
Network (NN)
Throughput, Bandwidth &
Latency Goals +
Re-ordering Algorithms
Step 2:
Packet
Ordering
Step 1:
Convert NN to
Packets
Conv1
Conv2
Conv3
Conv4
MaxPool1
MaxPool2
MaxPool3
Packet
Stream
DLA
HW Sequencers Engines
Engines
Resource
Manager
Memory
• Example shown:
• YoloV3 608 x 608, batch of 2
• Total 63 M weights, largest layer 4.3 M
• 235 M activations, largest layer 24 M
• 280 B operations
• Packets reduce DDR transfers by >5x
• Less intermediate data movement, higher throughput
• Lower system power, reduced BOM cost
• Uniformly spread-out bandwidth
• Sustained utilization, tolerant towards latency variations
Packets Save Memory and Bandwidth
7
© 2022 Expedera
Expedera requires fewer DDR transfers compared to typical architectures:
Higher throughput, lower power consumption, less chip area required
0
50
100
150
200
250
300
350
4 MB On-chip Memory
Million
Bytes
Layer-Based Expedera Packet-based
Total DDR Transfers
Packets Allow Right-sizing DLA in Context for SoC
8
© 2022 Expedera
• Packet-stream guarantees cycle-accurate DLA
performance
• Packets are performance deterministic & complete -
exact execution cycles as well as memory &
bandwidth usage is known
• Quickly right-size DLA for AI workloads with
visibility
• Expedera Estimator opens a DLA-centric view
into AI workload performance of SoC components
Cycle Accurate DLA Trace
NN Segment Scheduling
Representative
Dataset SoC Architecture
Neural
Network
CPU
Input
Traffic
Applications
DLA
• Unique “building block” architecture allows area-
efficient DLA configurations matching customer needs
• Scale compute independent of memory
• Unified pipeline architecture compute
• 18 TOPS/W (average, not peak; TMSC 7 nm, ResNet50, 1 GHz, no
sparsity required)
• Zero overhead context switching
• 70-90% utilization across entire performance range
• Monolithic - single control system drives entire DLA
Expedera OriginTM Deep Learning Accelerator (DLA)
9
© 2022 Expedera
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
PSM PSM PSM PSM PSM PSM PSM PSM
VSP Logic
VSP Logic
VSP Logic
VSP Logic
VSP Logic
VSP Logic
VSP Logic
VSP Logic
VSP Mem
VSP Mem
VSP Mem
VSP Mem
VSP Mem
VSP Mem
VSP Mem
VSP Mem
SSP
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
VSP Logic
VSP Logic
VSP Mem
VSP Mem
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
PSM PSM
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
PSM PSM
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
VSP Logic
VSP Logic
VSP Mem
VSP Mem
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
MMP
Tile
• Ease of integration into SoC environment
• DL framework supported through Apache TVM
• Multi-target compilation
• Neural network orchestration across SoC
• Extended features
• Mixed precision quantization
• Custom layer support
• Multi-job APIs
• Exploration into FPS, latency & power metrics
from architectural to deployment phase
TVM-based Software Stack
10
© 2022 Expedera
Offline
Stack
Online
Stack
Hardware
TVM
Quantizer
TVM Adapter & NN Partitioning
Expedera IR
Expedera Packet Stream
Other SoC
Backends
Model Executable Estimator
Applications
Expedera Runtime
Scheduler Profiler Debugger
Expedera Driver
Other Drivers
SoC Expedera DLA
Packet-based AI Processing Leads the Industry
11
© 2022 Expedera
“CPU-IP Vendors Chase High-end Wins”
- Microprocessor Report, January 2022
“Expedera Redefines AI Acceleration for the Edge”
- Microprocessor Report, April 2021
Competitive Benchmarks
6X
IPS Performance per Watt,
versus ARM Ethos
2.7X
TOPS Performance per silicon area,
versus Apple A13
6-10X
Maximum single-core performance (TOPS),
versus IP from Cadence, CEVA, & Imagination
• Data from worldwide
top 5 OEM, device
available for purchase
• Expedera Origin
deployed for 4K video
low light denoising
• 18 TOPS total capacity
Real-World Customer Measured Results
12
© 2022 Expedera
With Expedera
Prior Gen CPU-based NPU
FPS Comparison
With Expedera
Prior Gen CPU-based NPU
Power Consumption
20X improved FPS while consuming less than half the power over prior CPU-based NPU
Expedera Resources
2022 Embedded Vision Summit
Contact Us
Within the Alliance
https://www.edge-ai-vision.com/companies/expedera/
Website, including technical briefs
https://www.expedera.com/products-overview/
Email
info@expedera.com
Social:
© 2022 Expedera 13
Visit us at booth #320
/Company/Expedera/
/ExpederaInc
@ExpederaInc

More Related Content

More from Edge AI and Vision Alliance

“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
Edge AI and Vision Alliance
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
Edge AI and Vision Alliance
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
Edge AI and Vision Alliance
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
 

“A Novel Packet-based Accelerator for Resource-Constrained Edge Devices,” a Presentation from Expedera

  • 1. A Novel Packet-Based Accelerator for Resource- Constrained Edge Devices Sharad Chole, Chief Scientist Expedera
  • 2. • Up to 40% of SoC resources are devoted to NPUs • Todays NPUs are performance, power, and area inefficient • Scaling only amplifies the problem – NPU power and area requirements have grown significantly compared to performance • NPUs can be, and must be, better to enable the true promise of edge AI • Alternative architectures can be deployed to support the rapidly increasing needs of edge AI • More efficient, highly utilized processors • Scalable designs for reuse across platforms • Let's show you how we can do this… AI Inference Solutions Are Not Sustainable 2 © 2022 Expedera Tesla FSD -74 TOPS- Apple A14 -11 TOPS-
  • 3. Edge AI – Today vs Tomorrow 3 © 2022 Expedera Future Needs Current Solutions • Multi-core architectures with complex compilers • Stagnated by low utilization & area efficiency • Expensive for edge SoCs • Require too much DDR (performance/GBps) • Low power efficiency (performance/W) • Applications are struggling to approach markets • Single tenant, high latency • Small resolutions & low network diversity • Scalable monolith with robust software toolchain • High utilization & area efficiency • Built for edge deployments • Reduced/no requirement for DDR • Enable lower thermal and battery constraints • Model deployments with SLA guarantees • Multi-tenant real time processing, scalable to 4K and beyond Accelerators must meet tight cost, bandwidth & power consumption constraints while still delivering high performance
  • 4. A Growing Diversity of Networks Demands Flexibility 4 © 2022 Expedera Network Typical Tasks ResNet 50 Yolo V3 Object detection UNet FSRCNN Image segmentation, denoising Embedded super resolution MobileBERT Language understanding Image classification, feature extraction backbone ResNet 50 Yolo V3 Unet FSRCNN MobileBERT ResNet 50 Yolo V3 Unet FSRCNN MobileBERT ResNet 50 Yolo V3 Unet FSRCNN MobileBERT Weights (Million) Operations (Billion) Activations (Million)
  • 5. Edge AI Needs a Different Definition of Compute 5 © 2022 Expedera Instruction e.g. CPU, GPGPU, DSP • Need to distribute workload and gather results back • Heavy use of synchronization primitives for scalability • Hierarchical memory systems cost power and area Current AI engines are limited by high cost of context switching and cannot break down layers into more granular pieces. Resulting in poor utilization and worse PPA. Two types of compute abstractions are used today: Neural Network Layer e.g. Systolic Arrays, DNN APIs • Need large on chip memory to process a layer • Limited by layer ordering optimizations • NN orchestration done through controller CPU
  • 6. • Aggregate of work with notion of dependencies and performance deterministic execution, based on a network- centric approach • A contiguous fragment of a neural network layer with entire context of execution; layer type, attributes, priority • Manage activations better/more intelligently • Reordering packets does not incur penalty – as DLA supports zero cost context switching • Through reorganizing packets in the optimum order without hurting accuracy, Expedera produces the minimum number of moves • Greatly increases performance while lowering silicon power and area requirements Packets: a Radical Approach to AI Inference Optimization 6 © 2022 Expedera Neural Network (NN) Throughput, Bandwidth & Latency Goals + Re-ordering Algorithms Step 2: Packet Ordering Step 1: Convert NN to Packets Conv1 Conv2 Conv3 Conv4 MaxPool1 MaxPool2 MaxPool3 Packet Stream DLA HW Sequencers Engines Engines Resource Manager Memory
  • 7. • Example shown: • YoloV3 608 x 608, batch of 2 • Total 63 M weights, largest layer 4.3 M • 235 M activations, largest layer 24 M • 280 B operations • Packets reduce DDR transfers by >5x • Less intermediate data movement, higher throughput • Lower system power, reduced BOM cost • Uniformly spread-out bandwidth • Sustained utilization, tolerant towards latency variations Packets Save Memory and Bandwidth 7 © 2022 Expedera Expedera requires fewer DDR transfers compared to typical architectures: Higher throughput, lower power consumption, less chip area required 0 50 100 150 200 250 300 350 4 MB On-chip Memory Million Bytes Layer-Based Expedera Packet-based Total DDR Transfers
  • 8. Packets Allow Right-sizing DLA in Context for SoC 8 © 2022 Expedera • Packet-stream guarantees cycle-accurate DLA performance • Packets are performance deterministic & complete - exact execution cycles as well as memory & bandwidth usage is known • Quickly right-size DLA for AI workloads with visibility • Expedera Estimator opens a DLA-centric view into AI workload performance of SoC components Cycle Accurate DLA Trace NN Segment Scheduling Representative Dataset SoC Architecture Neural Network CPU Input Traffic Applications DLA
  • 9. • Unique “building block” architecture allows area- efficient DLA configurations matching customer needs • Scale compute independent of memory • Unified pipeline architecture compute • 18 TOPS/W (average, not peak; TMSC 7 nm, ResNet50, 1 GHz, no sparsity required) • Zero overhead context switching • 70-90% utilization across entire performance range • Monolithic - single control system drives entire DLA Expedera OriginTM Deep Learning Accelerator (DLA) 9 © 2022 Expedera MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile PSM PSM PSM PSM PSM PSM PSM PSM VSP Logic VSP Logic VSP Logic VSP Logic VSP Logic VSP Logic VSP Logic VSP Logic VSP Mem VSP Mem VSP Mem VSP Mem VSP Mem VSP Mem VSP Mem VSP Mem SSP MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile VSP Logic VSP Logic VSP Mem VSP Mem MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile PSM PSM MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile PSM PSM MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile VSP Logic VSP Logic VSP Mem VSP Mem MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile MMP Tile
  • 10. • Ease of integration into SoC environment • DL framework supported through Apache TVM • Multi-target compilation • Neural network orchestration across SoC • Extended features • Mixed precision quantization • Custom layer support • Multi-job APIs • Exploration into FPS, latency & power metrics from architectural to deployment phase TVM-based Software Stack 10 © 2022 Expedera Offline Stack Online Stack Hardware TVM Quantizer TVM Adapter & NN Partitioning Expedera IR Expedera Packet Stream Other SoC Backends Model Executable Estimator Applications Expedera Runtime Scheduler Profiler Debugger Expedera Driver Other Drivers SoC Expedera DLA
  • 11. Packet-based AI Processing Leads the Industry 11 © 2022 Expedera “CPU-IP Vendors Chase High-end Wins” - Microprocessor Report, January 2022 “Expedera Redefines AI Acceleration for the Edge” - Microprocessor Report, April 2021 Competitive Benchmarks 6X IPS Performance per Watt, versus ARM Ethos 2.7X TOPS Performance per silicon area, versus Apple A13 6-10X Maximum single-core performance (TOPS), versus IP from Cadence, CEVA, & Imagination
  • 12. • Data from worldwide top 5 OEM, device available for purchase • Expedera Origin deployed for 4K video low light denoising • 18 TOPS total capacity Real-World Customer Measured Results 12 © 2022 Expedera With Expedera Prior Gen CPU-based NPU FPS Comparison With Expedera Prior Gen CPU-based NPU Power Consumption 20X improved FPS while consuming less than half the power over prior CPU-based NPU
  • 13. Expedera Resources 2022 Embedded Vision Summit Contact Us Within the Alliance https://www.edge-ai-vision.com/companies/expedera/ Website, including technical briefs https://www.expedera.com/products-overview/ Email info@expedera.com Social: © 2022 Expedera 13 Visit us at booth #320 /Company/Expedera/ /ExpederaInc @ExpederaInc