SlideShare a Scribd company logo
COMPARE PERFORMANCE-POWER OF ARM CORTEX
VS RISC-V FOR AI APPLICATIONS.
Deepak Shankar
Founder
Mirabilis Design Inc.
Email: dshankar@mirabilisdesign.com
Agenda
Requirements and motivation for comparison
Methodology to systems/SoC/Micro-architectures using different Instruction Sets
Examples to show how the methodology can be utilized
Improving accuracy of the comparison
Motivation for this modeling approach
Using ISA for Multi-Level Analysis
Processor Microarchitecture System SoC Design Embedded and Distributed system
Is it ARM v8 or 9, RISC-V
Or Leon, ARC, Tensilica?
How many cluster?
How many per clusters?
What is the system-level cache/bus?
Vector instr or GPU?
ARM or RISC-V?
How many cores/memory?
Requirements for a System Comparison
Common benchmark
#trial data
data=[]
for i in range(obs_size+1):
indata=[]
if i%2 ==0:
for j in range(train_data_size):
indata.append(np.random.randint(10))
indata.append(0)
else:
for j in range(train_data_size):
indata.append(np.random.randint(3))
indata.append(1)
data.append(indata)
print(indata)
costs=[]
learning_rate=0.005
for i in range(5000000):
z=0
ri=np.random.randint(len(data))
point=data[ri]
dz_dw=[]#partial derivatives
Instruction Set
System SoC
Workflow
Network and Traffic Models
Failure and Cyber Threat Modes
Concurrent Software Execution makes
System Sizing Challenging
Complex behavior
- input stream
- data dependent behavior
Contention
- limited resources
- scheduling/arbitration
Interference of multiple applications
- limited resources
- scheduling/arbitration
- anomalies
I/O
DSP
CPU1
CPU2
task1 task2 task3 task4
Narrowing the Range of Expected
System Operation
System with faster Bus is slower in places
Unpredictable System Response
Task Allocations and Timing Deadline
T3 Expected T3 Complete
Consider three tasks- T1, T2 and T3
All tasks allocated to a single resource
Each Task has processing time (Equal)
and priority (T1>T2>T3)
Design Impacts of buffering,
preemption, offset times and
processing capacity
Challenges must be studied globally, ideally in the context of power, performance, and reliability
METHODOLOGY
Deepak Shankar
Founder
Mirabilis Design Inc.
Email: dshankar@mirabilisdesign.com
Architecture Exploration of IP/SoC/Systems
Single Model from Microarchitecture, System SoC and Embedded Systems
Track 30 targets per minute
Wifi or 5G device
Process 3 cameras, 4 Lidars & 5 Radars
95% cache hit-ratio
Gateways to ECAN, WiFi, BLE and TSN
Product
Idea
Optimized Architecture
Core
DMA
Message
Network
AI/CCN
B
U
S/
N
O
C
Select IP- Third-
party & Custom
Assemble Models
Conduct Trade-offs
Architecture Optimization
Functional Analysis
Validation
Simulation Environment
Documentation
Traffic,
Workload
Use-cases
IP Models
SW Performance Measurement
Hardware-in-the-Loop
Integrate SoC into System Model
Optimizing, Diagnosing and Validating
Result(statistic) file
Recommendation
file
EXAMPLE
Deepak Shankar
Founder
Mirabilis Design Inc.
Email: dshankar@mirabilisdesign.com
Sample code in C -> Intermediate file
10/22/2021 MIRABILIS DESIGN INC. 12
aarch64-linux-gnu-gcc -g -Wa,-adhln cal2.c
1 .arch armv8-a
2 .file "cal2.c"
3 .text
4 .Ltext0:
5 .cfi_sections .debug_frame
6 .section .rodata
7 .align 3
8 .LC0:
9 0000 256400 .string "%d"
10 .text
11 .align 2
12 .global main
14 main:
15 .LFB5:
16 .file 1 "cal2.c"
1:cal2.c **** #include<stdio.h>
2:cal2.c **** #include<stdbool.h>
3:cal2.c **** #include <stdlib.h>
4:cal2.c **** int main(){
17 .loc 1 4 0
18 .cfi_startproc
19 0000 FD7BBDA9 stp x29, x30, [sp, -48]!
20 .cfi_def_cfa_offset 48
21 .cfi_offset 29, -48
22 .cfi_offset 30, -40
23 0004 FD030091 add x29, sp, 0
24 .cfi_def_cfa_register 29
5:cal2.c **** int a = 10, b = 20 ;
25 .loc 1 5 0
Generate Assembly using
gcc
aarch64-linux-gnu-gcc -g
-Wa,-adhln cal2.c
Using VisualSim Cycle-Accurate Library
to Create Out-of-Order Custom Processor
Integrate Behavior, Timing, Power and Software
Plots and Statistics
MODELING ACCURACY
Deepak Shankar
Founder
Mirabilis Design Inc.
Email: dshankar@mirabilisdesign.com
Comparing Results for ARM A53 with
Simulated Values
Benchmark FPGA VisualSim Difference
Integer processing 5.94ms 6.425ms 7.55%
Most load operations with random addresses 12.084ms 11.863ms 1.08%
Most store operations with random addresses 13.984ms 14.65ms 4.5%
Test System
Xilinx Ultrascale+ Zynq® UltraScale+™ XCZU9EG-2FFVB1156E MPSoC running on the ZCU102 board
4 core ARM Cortex A53 at 1200Mhz; 32KiB i-cache; 32KiB d-cache, 1MiB L2; 2GB DDR4 DRAM 2400
Over 93-98% accuracy
Power Comparison between
Simulated vs Measured
Frequency Simulated Power Measured Power Delta percentage
500.0 Mhz 0.037 W 0.038 W 2.63%
600.0 Mhz 0.053 W 0.051 W -3.92%
700.0 Mhz 0.073 W 0.080 W 8.75%
800.0 Mhz 0.097 W 0.090 W -7.77%
1000.0 Mhz 0.157 W 0.159 W 1.25%
1100.0 Mhz 0.193 W 0.188 W -2.65%
1200.0 Mhz 0.233 W 0.227 W -2.64%
1300.0 Mhz 0.277 W 0.269 W -2.97%
Source: Anandtech.com
Over 97% accuracy
COMPARISON BETWEEN ARM CORTEX A53, A77
AND RISCV – U74
C code
Model Configurations
ARM Cortex A53
Speed = 1200.0 MHz
L1 Cache :
◦ I_Cache : 1200 MHz
2 way associative
◦ D_Cache: 1200 MHz
4 way associative
L2 Cache : 1200 MHz
16 way associative
AXI : 600 MHz
DRAM : 1200 MHz
ARM Cortex A77
Speed = 3000.0 MHz
L0 Cache : 3000.0 MHz
: MoP cache
L1 cache :
◦ I_Cache : 3000.0 MHz
4 way associative
◦ D_Cache : 3000.0 MHz
4 way associative
L2 Cache : 3000.0 MHz
8 way associative
DSU Cache : 3000.0 MHz
16 way associative
AXI : 1200 MHz
DRAM : 1466.67 MHz
RISC-V u74
Speed = 1200.0 MHz
L1 Cache :
◦ I_Cache : 1200 MHz
4 way associative
◦ D_Cache: 1200 MHz
4 way associative
TileLink : 1200 MHz
L2 Cache : 1200 MHz
8 way associative
AXI : 600 MHz
DRAM : 1200 MHz
VisualSim Model Layout
Results – Latency for running the same
c-code
Processor Instructions Latency Max MIPS
ARM Cortex A53 ~ 56,66,000 0.0055846 ~ 1039
ARM Cortex A77 ~ 44,78,000 0.0011795 ~ 3960
RISC-V u74 ~ 60,58,000 0.007726 ~ 797
Results – Cache stats
Processor MoP Hit
Ratio
MoP
Mean
Latency
I1 Hit
Ratio
I1 Mean
Latency
D1 Hit
Ratio
D1 Mean
Latency
L2 Hit
Ratio
L2 Mean
Latency
DSU Hit
Ratio
DSU
Mean
Latency
ARM
Cortex
A53
- - 99.97 1.93E-09 99.98 2.02E-09 18.75 9.33E-08 - -
ARM
Cortex
A77
99.90 1.75E-09 67.22 6.25E-08 99.96 7.32E-10 14.19 1.82E-07 6.96 2.05E-09
RISC-V
u74
- - 99.98 4.15E-09 99.98 1.86E-09 39.58 5.25E-08 - -
INTRODUCTION TO MIRABILIS DESIGN
About Mirabilis Design
Started in 2007 and based in Santa Clara, CA, USA.
Development and support centers in US, India, Germany, China, Japan, Taiwan and Czech
Largest source of system modeling IP with embedded timing and power
System architecture exploration of electronics, semiconductors and software
Over 250 products worldwide across Semiconductors, Aerospace, Computing and Automotive
VisualSim- Modeling and simulation software
100+ man years experience in system design and exploration of digital electronics
Select the “Right” configuration to match customer request
VisualSim Architect
Traffic,
Traces,
Script,
Analytics
Hardware-
Core, DMA,
Bus, cache,
Memory
Software-
RTOS, Task
graph, code
execution
Network-
Protocols,
Gateways,
Security,
channels
Intelligent
Diagnostic
/multi-core
Simulation
Graphical and
Hierarchical
Stochastic
&m Cycle-
Accurate
Comprehensive Architecture Exploration Solution
 Performance, Power and Functional Trade-off
 Large library of parameterized components
 Stochastic, Hybrid and Cycle-Accurate models
 For multiple core, SoC, System and Software
 API for simulators, programs and traces
 Optimizer to detect the best configuration
Largest Systems-Level Model Library
Largest library of traffic, resources, hardware, software and analysis
Traffic
• Distribution
• Sequence
• Trace file
• Instruction profile
Reports
• Timing and Buffer
• Throughput/Util
• Ave/peak power
• Statistics
Power
• State power table
• Power
management
• Energy harvesters
• Battery
• RegEx operators
SoC Buses
• AMBA and Corelink
• AHB, AB, AXI, ACE,
CHI, CMN600
• Network-on-Chip
• TileLink
System Bus
• PCI/PCI-X/PCIe
• Rapid IO
• AFDX
• OpenVPX
• VME
• SPI 3.0
• 1553B
Processors
• GPU, DSP, mP and mC
• RISC-V
• Nvidia- Drive-PX
• PowerPC
• X86- Intel and AMD
• DSP- TI and ADI
• MIPS, Tensilica, SH
ARM
• M-, R-, 7TDMI
• A8, A53, A55, A72,
A76, A77
Custom Creator
• Script language
• 600 RegEx fn
• Task graph
• Tracer
• C/C++/Java
• Python
Support
• Listener and
Trace
• Debuggers
• Assertions
Stochastic
• FIFO/LIFO Queue
• Time Queue
• Quantity Queue
• System Resource
• Schedulers
• Cyber Security
RTOS
• Template
• ARINC 653
• AUTOSAR
Memory
• Memory Controller
• DDR DRAM 2,3,4, 5
• LPDDR 2, 3, 4
• HBM, HMC
• SDR, QDR, RDRAM
Storage
• Flash & NVMe
• Storage Array
• Disk and SATA
• Fibre Channel
• FireWire
Networking
• Ethernet & GiE
• Audio-Video Bridging
• 802.11 and Bluetooth
• 5G
• Spacewire
• CAN-FD
• TTEthernet
• FlexRay
• TSN & IEEE802.1Q
FPGA
• Xilinx- Zynq, Virtex, Kintex
• Intel-Stratix, Arria
• Microsemi- Smartfusion
• Programmable logic
template
• Interface traffic generator
Software
• GEM5
• Software code integration
• Instruction trace
• Statistical software model
• Task graph
Interfaces
• Virtual Channel
• DMA
• Crossbar
• Serial Switch
• Bridge
RTL-like
• Clock, Wire-Delay
• Registers, Latches
• Flip-flop
• ALU and FSM
• Mux, DeMux
• Lookup table
COMPARE PERFORMANCE-POWER OF ARM CORTEX
VS RISC-V FOR AI APPLICATIONS.
Deepak Shankar
Founder
Mirabilis Design Inc.
Email: dshankar@mirabilisdesign.com

More Related Content

What's hot

InfiniBand Essentials Every HPC Expert Must Know
InfiniBand Essentials Every HPC Expert Must KnowInfiniBand Essentials Every HPC Expert Must Know
InfiniBand Essentials Every HPC Expert Must Know
Mellanox Technologies
 
Uvm dac2011 final_color
Uvm dac2011 final_colorUvm dac2011 final_color
Uvm dac2011 final_color
Jamal EL HAITOUT
 
UVM Methodology Tutorial
UVM Methodology TutorialUVM Methodology Tutorial
UVM Methodology Tutorial
Arrow Devices
 
SPI Protocol
SPI ProtocolSPI Protocol
SPI Protocol
Anurag Tomar
 
ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC Machine
EdutechLearners
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
Deepak Shankar
 
RISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a TimeRISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a Time
Atish Patra
 
Introduction to ARM
Introduction to ARMIntroduction to ARM
Introduction to ARM
Puja Pramudya
 
Pcie basic
Pcie basicPcie basic
Pcie basic
Saifuddin Kaijar
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
Linaro
 
Bootloaders
BootloadersBootloaders
Bootloaders
Anil Kumar Pugalia
 
RISC-V Online Tutor
RISC-V Online TutorRISC-V Online Tutor
RISC-V Online Tutor
RISC-V International
 
CPU Verification
CPU VerificationCPU Verification
CPU Verification
Ramdas Mozhikunnath
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
Manish Prajapati
 
Introduction to System verilog
Introduction to System verilog Introduction to System verilog
Introduction to System verilog
Pushpa Yakkala
 
Mixed-critical adaptive AUTOSAR stack based on VxWorks, Linux, and virtualiza...
Mixed-critical adaptive AUTOSAR stack based on VxWorks, Linux, and virtualiza...Mixed-critical adaptive AUTOSAR stack based on VxWorks, Linux, and virtualiza...
Mixed-critical adaptive AUTOSAR stack based on VxWorks, Linux, and virtualiza...
Andrei Kholodnyi
 
Arm processor
Arm processorArm processor
Arm processor
PrashantSingh056
 
AMBA 5 COHERENT HUB INTERFACE.pptx
AMBA 5 COHERENT HUB INTERFACE.pptxAMBA 5 COHERENT HUB INTERFACE.pptx
AMBA 5 COHERENT HUB INTERFACE.pptx
Sairam Chebrolu
 
PCIe DL_layer_3.0.1 (1)
PCIe DL_layer_3.0.1 (1)PCIe DL_layer_3.0.1 (1)
PCIe DL_layer_3.0.1 (1)
Rakeshkumar Sachdev
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
RISC-V International
 

What's hot (20)

InfiniBand Essentials Every HPC Expert Must Know
InfiniBand Essentials Every HPC Expert Must KnowInfiniBand Essentials Every HPC Expert Must Know
InfiniBand Essentials Every HPC Expert Must Know
 
Uvm dac2011 final_color
Uvm dac2011 final_colorUvm dac2011 final_color
Uvm dac2011 final_color
 
UVM Methodology Tutorial
UVM Methodology TutorialUVM Methodology Tutorial
UVM Methodology Tutorial
 
SPI Protocol
SPI ProtocolSPI Protocol
SPI Protocol
 
ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC Machine
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
RISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a TimeRISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a Time
 
Introduction to ARM
Introduction to ARMIntroduction to ARM
Introduction to ARM
 
Pcie basic
Pcie basicPcie basic
Pcie basic
 
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
 
Bootloaders
BootloadersBootloaders
Bootloaders
 
RISC-V Online Tutor
RISC-V Online TutorRISC-V Online Tutor
RISC-V Online Tutor
 
CPU Verification
CPU VerificationCPU Verification
CPU Verification
 
Risc and cisc eugene clewlow
Risc and cisc   eugene clewlowRisc and cisc   eugene clewlow
Risc and cisc eugene clewlow
 
Introduction to System verilog
Introduction to System verilog Introduction to System verilog
Introduction to System verilog
 
Mixed-critical adaptive AUTOSAR stack based on VxWorks, Linux, and virtualiza...
Mixed-critical adaptive AUTOSAR stack based on VxWorks, Linux, and virtualiza...Mixed-critical adaptive AUTOSAR stack based on VxWorks, Linux, and virtualiza...
Mixed-critical adaptive AUTOSAR stack based on VxWorks, Linux, and virtualiza...
 
Arm processor
Arm processorArm processor
Arm processor
 
AMBA 5 COHERENT HUB INTERFACE.pptx
AMBA 5 COHERENT HUB INTERFACE.pptxAMBA 5 COHERENT HUB INTERFACE.pptx
AMBA 5 COHERENT HUB INTERFACE.pptx
 
PCIe DL_layer_3.0.1 (1)
PCIe DL_layer_3.0.1 (1)PCIe DL_layer_3.0.1 (1)
PCIe DL_layer_3.0.1 (1)
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
 

Similar to Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021

Webinar on RISC-V
Webinar on RISC-VWebinar on RISC-V
Webinar on RISC-V
Deepak Shankar
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
Deepak Shankar
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V International
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
Deepak Shankar
 
Processors selection
Processors selectionProcessors selection
Processors selection
Pradeep Shankhwar
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
Deepak Shankar
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
Deepak Shankar
 
Deview 2013 rise of the wimpy machines - john mao
Deview 2013   rise of the wimpy machines - john maoDeview 2013   rise of the wimpy machines - john mao
Deview 2013 rise of the wimpy machines - john mao
NAVER D2
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training Class
Deepak Shankar
 
Introduction to computer architecture .pptx
Introduction to computer architecture .pptxIntroduction to computer architecture .pptx
Introduction to computer architecture .pptx
Fatma Sayed Ibrahim
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例
Kazuhito Ohkawa
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
Anand Haridass
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
ScyllaDB
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
inside-BigData.com
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Databricks
 
Demystify OpenPOWER
Demystify OpenPOWERDemystify OpenPOWER
Demystify OpenPOWER
Anand Haridass
 
Explain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdfExplain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdf
arjunenterprises1978
 
Something about SSE and beyond
Something about SSE and beyondSomething about SSE and beyond
Something about SSE and beyond
Lihang Li
 
Andes RISC-V processor solutions
Andes RISC-V processor solutionsAndes RISC-V processor solutions
Andes RISC-V processor solutions
RISC-V International
 

Similar to Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021 (20)

Webinar on RISC-V
Webinar on RISC-VWebinar on RISC-V
Webinar on RISC-V
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
 
Processors selection
Processors selectionProcessors selection
Processors selection
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
 
Deview 2013 rise of the wimpy machines - john mao
Deview 2013   rise of the wimpy machines - john maoDeview 2013   rise of the wimpy machines - john mao
Deview 2013 rise of the wimpy machines - john mao
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training Class
 
Introduction to computer architecture .pptx
Introduction to computer architecture .pptxIntroduction to computer architecture .pptx
Introduction to computer architecture .pptx
 
Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例Kauli SSPにおけるVyOSの導入事例
Kauli SSPにおけるVyOSの導入事例
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them AllScylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
Scylla Summit 2022: ScyllaDB Rust Driver: One Driver to Rule Them All
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
Using Deep Learning on Apache Spark to Diagnose Thoracic Pathology from Chest...
 
Demystify OpenPOWER
Demystify OpenPOWERDemystify OpenPOWER
Demystify OpenPOWER
 
Explain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdfExplain briefly about the major enhancements in ARM processor archite.pdf
Explain briefly about the major enhancements in ARM processor archite.pdf
 
Something about SSE and beyond
Something about SSE and beyondSomething about SSE and beyond
Something about SSE and beyond
 
Andes RISC-V processor solutions
Andes RISC-V processor solutionsAndes RISC-V processor solutions
Andes RISC-V processor solutions
 

More from Deepak Shankar

How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
Deepak Shankar
 
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Deepak Shankar
 
Modeling Abstraction
Modeling AbstractionModeling Abstraction
Modeling Abstraction
Deepak Shankar
 
Accelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim ArchitectAccelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim Architect
Deepak Shankar
 
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
Deepak Shankar
 
Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers. Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers.
Deepak Shankar
 
Automotive network and gateway simulation
Automotive network and gateway simulationAutomotive network and gateway simulation
Automotive network and gateway simulation
Deepak Shankar
 
Using ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionicsUsing ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionics
Deepak Shankar
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
Deepak Shankar
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
Deepak Shankar
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Deepak Shankar
 
Develop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML applicationDevelop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML application
Deepak Shankar
 
Webinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE networkWebinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE network
Deepak Shankar
 
Webinar on radar
Webinar on radarWebinar on radar
Webinar on radar
Deepak Shankar
 
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationWebinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Deepak Shankar
 
Using VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System AnalysisUsing VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System Analysis
Deepak Shankar
 
Webinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System AnalysisWebinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System Analysis
Deepak Shankar
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
Deepak Shankar
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
Deepak Shankar
 
How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?
Deepak Shankar
 

More from Deepak Shankar (20)

How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
 
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
 
Modeling Abstraction
Modeling AbstractionModeling Abstraction
Modeling Abstraction
 
Accelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim ArchitectAccelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim Architect
 
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
 
Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers. Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers.
 
Automotive network and gateway simulation
Automotive network and gateway simulationAutomotive network and gateway simulation
Automotive network and gateway simulation
 
Using ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionicsUsing ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionics
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
 
Develop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML applicationDevelop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML application
 
Webinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE networkWebinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE network
 
Webinar on radar
Webinar on radarWebinar on radar
Webinar on radar
 
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationWebinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
 
Using VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System AnalysisUsing VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System Analysis
 
Webinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System AnalysisWebinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System Analysis
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?How to create innovative architecture using VisualSim?
How to create innovative architecture using VisualSim?
 
How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?How to create innovative architecture using ViualSim?
How to create innovative architecture using ViualSim?
 

Recently uploaded

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021

  • 1. COMPARE PERFORMANCE-POWER OF ARM CORTEX VS RISC-V FOR AI APPLICATIONS. Deepak Shankar Founder Mirabilis Design Inc. Email: dshankar@mirabilisdesign.com
  • 2. Agenda Requirements and motivation for comparison Methodology to systems/SoC/Micro-architectures using different Instruction Sets Examples to show how the methodology can be utilized Improving accuracy of the comparison Motivation for this modeling approach
  • 3. Using ISA for Multi-Level Analysis Processor Microarchitecture System SoC Design Embedded and Distributed system Is it ARM v8 or 9, RISC-V Or Leon, ARC, Tensilica? How many cluster? How many per clusters? What is the system-level cache/bus? Vector instr or GPU? ARM or RISC-V? How many cores/memory?
  • 4. Requirements for a System Comparison Common benchmark #trial data data=[] for i in range(obs_size+1): indata=[] if i%2 ==0: for j in range(train_data_size): indata.append(np.random.randint(10)) indata.append(0) else: for j in range(train_data_size): indata.append(np.random.randint(3)) indata.append(1) data.append(indata) print(indata) costs=[] learning_rate=0.005 for i in range(5000000): z=0 ri=np.random.randint(len(data)) point=data[ri] dz_dw=[]#partial derivatives Instruction Set System SoC Workflow Network and Traffic Models Failure and Cyber Threat Modes
  • 5. Concurrent Software Execution makes System Sizing Challenging Complex behavior - input stream - data dependent behavior Contention - limited resources - scheduling/arbitration Interference of multiple applications - limited resources - scheduling/arbitration - anomalies I/O DSP CPU1 CPU2 task1 task2 task3 task4
  • 6. Narrowing the Range of Expected System Operation System with faster Bus is slower in places Unpredictable System Response
  • 7. Task Allocations and Timing Deadline T3 Expected T3 Complete Consider three tasks- T1, T2 and T3 All tasks allocated to a single resource Each Task has processing time (Equal) and priority (T1>T2>T3) Design Impacts of buffering, preemption, offset times and processing capacity Challenges must be studied globally, ideally in the context of power, performance, and reliability
  • 8. METHODOLOGY Deepak Shankar Founder Mirabilis Design Inc. Email: dshankar@mirabilisdesign.com
  • 9. Architecture Exploration of IP/SoC/Systems Single Model from Microarchitecture, System SoC and Embedded Systems Track 30 targets per minute Wifi or 5G device Process 3 cameras, 4 Lidars & 5 Radars 95% cache hit-ratio Gateways to ECAN, WiFi, BLE and TSN Product Idea Optimized Architecture Core DMA Message Network AI/CCN B U S/ N O C Select IP- Third- party & Custom Assemble Models Conduct Trade-offs Architecture Optimization Functional Analysis Validation Simulation Environment Documentation Traffic, Workload Use-cases IP Models SW Performance Measurement Hardware-in-the-Loop Integrate SoC into System Model
  • 10. Optimizing, Diagnosing and Validating Result(statistic) file Recommendation file
  • 11. EXAMPLE Deepak Shankar Founder Mirabilis Design Inc. Email: dshankar@mirabilisdesign.com
  • 12. Sample code in C -> Intermediate file 10/22/2021 MIRABILIS DESIGN INC. 12 aarch64-linux-gnu-gcc -g -Wa,-adhln cal2.c 1 .arch armv8-a 2 .file "cal2.c" 3 .text 4 .Ltext0: 5 .cfi_sections .debug_frame 6 .section .rodata 7 .align 3 8 .LC0: 9 0000 256400 .string "%d" 10 .text 11 .align 2 12 .global main 14 main: 15 .LFB5: 16 .file 1 "cal2.c" 1:cal2.c **** #include<stdio.h> 2:cal2.c **** #include<stdbool.h> 3:cal2.c **** #include <stdlib.h> 4:cal2.c **** int main(){ 17 .loc 1 4 0 18 .cfi_startproc 19 0000 FD7BBDA9 stp x29, x30, [sp, -48]! 20 .cfi_def_cfa_offset 48 21 .cfi_offset 29, -48 22 .cfi_offset 30, -40 23 0004 FD030091 add x29, sp, 0 24 .cfi_def_cfa_register 29 5:cal2.c **** int a = 10, b = 20 ; 25 .loc 1 5 0 Generate Assembly using gcc aarch64-linux-gnu-gcc -g -Wa,-adhln cal2.c
  • 13. Using VisualSim Cycle-Accurate Library to Create Out-of-Order Custom Processor Integrate Behavior, Timing, Power and Software
  • 15. MODELING ACCURACY Deepak Shankar Founder Mirabilis Design Inc. Email: dshankar@mirabilisdesign.com
  • 16. Comparing Results for ARM A53 with Simulated Values Benchmark FPGA VisualSim Difference Integer processing 5.94ms 6.425ms 7.55% Most load operations with random addresses 12.084ms 11.863ms 1.08% Most store operations with random addresses 13.984ms 14.65ms 4.5% Test System Xilinx Ultrascale+ Zynq® UltraScale+™ XCZU9EG-2FFVB1156E MPSoC running on the ZCU102 board 4 core ARM Cortex A53 at 1200Mhz; 32KiB i-cache; 32KiB d-cache, 1MiB L2; 2GB DDR4 DRAM 2400 Over 93-98% accuracy
  • 17. Power Comparison between Simulated vs Measured Frequency Simulated Power Measured Power Delta percentage 500.0 Mhz 0.037 W 0.038 W 2.63% 600.0 Mhz 0.053 W 0.051 W -3.92% 700.0 Mhz 0.073 W 0.080 W 8.75% 800.0 Mhz 0.097 W 0.090 W -7.77% 1000.0 Mhz 0.157 W 0.159 W 1.25% 1100.0 Mhz 0.193 W 0.188 W -2.65% 1200.0 Mhz 0.233 W 0.227 W -2.64% 1300.0 Mhz 0.277 W 0.269 W -2.97% Source: Anandtech.com Over 97% accuracy
  • 18. COMPARISON BETWEEN ARM CORTEX A53, A77 AND RISCV – U74
  • 20. Model Configurations ARM Cortex A53 Speed = 1200.0 MHz L1 Cache : ◦ I_Cache : 1200 MHz 2 way associative ◦ D_Cache: 1200 MHz 4 way associative L2 Cache : 1200 MHz 16 way associative AXI : 600 MHz DRAM : 1200 MHz ARM Cortex A77 Speed = 3000.0 MHz L0 Cache : 3000.0 MHz : MoP cache L1 cache : ◦ I_Cache : 3000.0 MHz 4 way associative ◦ D_Cache : 3000.0 MHz 4 way associative L2 Cache : 3000.0 MHz 8 way associative DSU Cache : 3000.0 MHz 16 way associative AXI : 1200 MHz DRAM : 1466.67 MHz RISC-V u74 Speed = 1200.0 MHz L1 Cache : ◦ I_Cache : 1200 MHz 4 way associative ◦ D_Cache: 1200 MHz 4 way associative TileLink : 1200 MHz L2 Cache : 1200 MHz 8 way associative AXI : 600 MHz DRAM : 1200 MHz
  • 22. Results – Latency for running the same c-code Processor Instructions Latency Max MIPS ARM Cortex A53 ~ 56,66,000 0.0055846 ~ 1039 ARM Cortex A77 ~ 44,78,000 0.0011795 ~ 3960 RISC-V u74 ~ 60,58,000 0.007726 ~ 797
  • 23. Results – Cache stats Processor MoP Hit Ratio MoP Mean Latency I1 Hit Ratio I1 Mean Latency D1 Hit Ratio D1 Mean Latency L2 Hit Ratio L2 Mean Latency DSU Hit Ratio DSU Mean Latency ARM Cortex A53 - - 99.97 1.93E-09 99.98 2.02E-09 18.75 9.33E-08 - - ARM Cortex A77 99.90 1.75E-09 67.22 6.25E-08 99.96 7.32E-10 14.19 1.82E-07 6.96 2.05E-09 RISC-V u74 - - 99.98 4.15E-09 99.98 1.86E-09 39.58 5.25E-08 - -
  • 25. About Mirabilis Design Started in 2007 and based in Santa Clara, CA, USA. Development and support centers in US, India, Germany, China, Japan, Taiwan and Czech Largest source of system modeling IP with embedded timing and power System architecture exploration of electronics, semiconductors and software Over 250 products worldwide across Semiconductors, Aerospace, Computing and Automotive VisualSim- Modeling and simulation software 100+ man years experience in system design and exploration of digital electronics Select the “Right” configuration to match customer request
  • 26. VisualSim Architect Traffic, Traces, Script, Analytics Hardware- Core, DMA, Bus, cache, Memory Software- RTOS, Task graph, code execution Network- Protocols, Gateways, Security, channels Intelligent Diagnostic /multi-core Simulation Graphical and Hierarchical Stochastic &m Cycle- Accurate Comprehensive Architecture Exploration Solution  Performance, Power and Functional Trade-off  Large library of parameterized components  Stochastic, Hybrid and Cycle-Accurate models  For multiple core, SoC, System and Software  API for simulators, programs and traces  Optimizer to detect the best configuration
  • 27. Largest Systems-Level Model Library Largest library of traffic, resources, hardware, software and analysis Traffic • Distribution • Sequence • Trace file • Instruction profile Reports • Timing and Buffer • Throughput/Util • Ave/peak power • Statistics Power • State power table • Power management • Energy harvesters • Battery • RegEx operators SoC Buses • AMBA and Corelink • AHB, AB, AXI, ACE, CHI, CMN600 • Network-on-Chip • TileLink System Bus • PCI/PCI-X/PCIe • Rapid IO • AFDX • OpenVPX • VME • SPI 3.0 • 1553B Processors • GPU, DSP, mP and mC • RISC-V • Nvidia- Drive-PX • PowerPC • X86- Intel and AMD • DSP- TI and ADI • MIPS, Tensilica, SH ARM • M-, R-, 7TDMI • A8, A53, A55, A72, A76, A77 Custom Creator • Script language • 600 RegEx fn • Task graph • Tracer • C/C++/Java • Python Support • Listener and Trace • Debuggers • Assertions Stochastic • FIFO/LIFO Queue • Time Queue • Quantity Queue • System Resource • Schedulers • Cyber Security RTOS • Template • ARINC 653 • AUTOSAR Memory • Memory Controller • DDR DRAM 2,3,4, 5 • LPDDR 2, 3, 4 • HBM, HMC • SDR, QDR, RDRAM Storage • Flash & NVMe • Storage Array • Disk and SATA • Fibre Channel • FireWire Networking • Ethernet & GiE • Audio-Video Bridging • 802.11 and Bluetooth • 5G • Spacewire • CAN-FD • TTEthernet • FlexRay • TSN & IEEE802.1Q FPGA • Xilinx- Zynq, Virtex, Kintex • Intel-Stratix, Arria • Microsemi- Smartfusion • Programmable logic template • Interface traffic generator Software • GEM5 • Software code integration • Instruction trace • Statistical software model • Task graph Interfaces • Virtual Channel • DMA • Crossbar • Serial Switch • Bridge RTL-like • Clock, Wire-Delay • Registers, Latches • Flip-flop • ALU and FSM • Mux, DeMux • Lookup table
  • 28. COMPARE PERFORMANCE-POWER OF ARM CORTEX VS RISC-V FOR AI APPLICATIONS. Deepak Shankar Founder Mirabilis Design Inc. Email: dshankar@mirabilisdesign.com