SlideShare a Scribd company logo
1 of 53
Introduction to AMD Versal SOC-FPGA
Versal adaptive SoCs deliver unparalleled application- and
system-level value for cloud, network, and edge
applications​.
The disruptive 7 nm architecture combines heterogeneous
compute engines with a breadth of hardened memory and
interfacing technologies for superior performance/watt
System-on-chip (SoC) combines CPUs, DSPs, I/O and RAM
control along with programmable hardware logic
Built around an integrated shell composed of a
programmable network on chip (NoC), which enables
seamless memory-mapped access to the full height and
width of the device.
Current Approach to Architecting FPGA
Math algorithm modeling
◦ Conducts Functional or math simulation to study precision and fidelity of new algorithms
Requirements database
◦ Requirements modeling list and static test changes
List the delays in spreadsheet and add them up
◦ Average or worst case without concurrent activity
Emulation/Boards
◦ Run benchmarks and capture the latency and memory throughput
FPGA Designer Requirements
1. High-level system architecture mapping. System architects (MATLAB) to evaluate the advantage
provided by Xilinx Versal heterogeneous architecture
• Persona: System architect
• High-level application capture w/ models of key application building blocks
• Fast application exploration to heterogeneous HW targets (PS, PL, AIE, PL+AIE)
2. Algorithm Trade-Offs: Initial architecture application mapping decisions to choose the right fabric for
optimal performance for each algorithm
• Persona: PL/RTL, AIE, & PS designers
• Make trade studies of how different implementations and mappings change system performance
• Identify potential bottlenecks
Introduction to VisualSim System-Level- IP
Architecture Platform for AMD Versal FPGA
VisualSim is system-level modeling and simulation SW
Platform for rapid Trade-off
◦ Performance/Power/Area during planning
◦ Study speed, power, failure and bottlenecks
Optimize implementation, resource and timing constraints of
algorithm tasks
New Versal FPGA IP is a Stochastic model containing
◦ Heterogeneous compute resources
◦ DDR and HBM memory interfaces
◦ Statistics for latency, throughput, utilization and power
◦ User expandable resource usage table
◦ External Interfaces
◦ Task block with mapping function
◦ Traffic generator for workloads
System-Level Application Exploration Tool
App
Model
Resource
Model
App resource
target
Simulate/test
Identify
congestion
?
What is it?
• Stochastic models focused on early application mapping
exploration to heterogeneous SoC compute resources.
• Users are system architect responsible for high-level
complex system mapping, not design entry
• Rapid trade-off and design iteration prior to algorithm
and design entry
Existing AMD-Xilinx tools
Where does it fit?
• Extends AMD-Xilinx general toolset with pre-design-
entry focused application analysis
• Lower fidelity stochastic based model vs. full design
entry simulation tools
• Provides guidance to down-stream sub-system (AIE, PL,
PS) design entry development teams
Persona
Generates
System
Architect
Subsystem
requirements
Developer
PL: RTL/HLS
AIE: C/C++
PS: C/C++/Python
System Eng
BootFW
Gen configs
SW Eng
Runtime SW
OTA life-cycle
App Exploration
Analysis Tool
Design Entry
RTL, C/C++
Design Generation
*.bit, *.pdi, *.elf
On-Target Runtime
Deployment
(Linux, libs, ??
VisualSim
App Exploration
Analysis Tool
Design Entry
(Vivado & Vitis)
RTL, C/C++
Design Generation
(Vivado & Vitis)
*.bit, *.pdi, *.elf
On-Target Runtime
Deployment
(Linux, libs, ??
App Exploration Tool – Elements
Resource models
• AIE tile & subsystem
• NOC backbone, AXI interconnects and Direct Path
• PL function “task” models
• DDR memory controller & devices
• Arm CPU models
User app functions & stimulus
• Target persona: System architect
• Traffic pattern generators
• Task/compute behavior models
• Task & data description via XML semantic language including SysML
• C-code for CPU & GPU like targets including Tarmac, Gem5 trace
• Stochastic and cycle-accurate function models for FPGA
App Exploration
Analysis Tool
Design Entry
(Vivado & Vitis)
RTL, C/C++
Design Generation
(Vivado & Vitis)
*.bit, *.pdi, *.elf
On-Target Runtime
Deployment
(Linux, libs,
AIE AIE AIE
AIE AIE AIE
Interconnect
Model
Interconnect
Model
Interconnect
Model
Interconnect
Model
Interconnect
Model
Interconnect
Model
NOC Backbone
PL Custom
Models
PL Custom
Models
PL Custom
Models
DDR Memory
PS Subsystem
Interfaces
Architecture Trade-off using Versal FPGA for Image
Processing Algorithm
Mapping Algorithm to Versal SoC-FPGA
Implementing an Image Processing
algorithm on AMD-Xilinx Versal FPGA.
Each task is mapped to a resources
Standard
Library
Component
Basic/Starting Configuration
Grayscale_Conversion - PS [A72 Core 1]
IIR – Logic (PL)
FFT – AI Engine Tile
Edge_Image - Logic (PL)
iFFT – AI Engine Tile
Edge_Image_Enhancement – Logic (PL)
Segmentation – PS [A72 Core 2]
Image
Processing
Algorithm
Algorithm Task Table
This table is used to define the number of resources consumed by each tasks across various resources (PS, PL,
AI Tiles) if they were to be mapped to any of them. Each of these tasks are mapped to the resource of choice
from the behavioural flow.
Requirements and AI-based Tracking
All the requirements (Latency, throughput, power, utilization etc.) can be listed in this csv file.
At the end of simulation, a report which says whether the requirements were met is generated.
Run 1 – Base Configuration
Application latency increasing over time.
Increase in latency is due to Segmentation.
Remap segmentation task AI Engine Tiles
Run 2 – Segmentation Mapped to AI Engine
Application latency is in a bounded range.
NoC Utilization is high.
To reduce utilization, changed interconnect for Segmentation from NoC to Direct
Run 3 – Using Direct Path between PS and AI
Latency if deterministic
Latency requirement (App latency < 80 msec) is met.
Utilization across NoC is acceptable
VisualSim Algorithm Mapping Methodology using
Radar Application
Radar Signal Processing Application
Explore Analog and Digital Algorithms
Multi-Domain Simulator with pre-built library provides significant precision and accuracy
Mapping to DSP on FPGA
Behavior Graph and Mapping File
VisualSim Architect
Dispatcher sends it to the target hardware module for processing and Handle Transitions
Map individual functions to
resources in Mapping Table
Simulate Base Model (Clk = 600 MHz)
The Requirements – latency for both ST
(Static Target) and MT (Moving Target)
estimation is not met
Parameter Regression on Multi-Core
Different parameter combinations based on the
configured ranges are generated and simulated
AI-Based Study using Requirements
• Run number 19 – clock
frequency at 1000 MHz satisfied
the performance requirements
we had set.
• Since the frequency was
increased from 600 MHz, the
total power consumption went
up while running the system at
1000 MHz
• Architect can evaluate
different processing
resources – DSP vs Xeon
cores vs Power cores if
they have stringent power
thresholds
Requirements being evaluated for each simulation
run in the parameter sweep
Overall Results – We can identify the simulation runs which
meet the requirements and select the right configuration
after considering cost vs performance trade-offs
Generating Documentation - Interactive
Failure Analysis
Hardware Failure
Loss of processing cores, limited storage, reduced or loss
memory device or bus overload/incorrect signals
Software failure
Resource starvation, deadlocks, data overwrite
Network failure
Network Congestion, misconfiguration, link loss and network
errors
RTOS failure
Unable to achieve real-time deadlines, malicious change in
schedule table, and executes beyond time slots
Power Failure
Both reduced and full power failure. Slower processing speed,
limited number of resources can be executing concurrently
MIRABILIS DESIGN INC. 24
Functional Unit Testing
Test Software with
direct integration
or in FPGA
Software Design and Optimization
GCC Compiler –
Target arch.
Compile and
disassembly
Source code
Objdump –
Disassembly
Trace in VisualSim
usable format
Select
Processor
core
Obtain Pipeline
structure from
official
documentation
Create the list
of parameters
and their
possible values
Map parameters
Stats
Reconfigure parameter map to
improve performance
Update Source code to improve
performance
VisualSim Interconnect Optimization for the
Network-on-Chip
Interconnect Architecture Exploration
 Analyze SoC NOC and Memory sub-System
architectures
 Coherent .vs. Non-Coherent sub-systems connectivity
 IO Coherency BW allocation
 QoS – control, configuration and data intensive
 Analyze SoC end to end flow control, credits,
queueing and arbitration mechanisms
 Analyze scheduling and distribution of tasks throughout the
compute pipeline
 Analyze the importance of different flow control
mechanisms
 e.g., credit allocation schemes, token bucket mechanisms
and rate limit configurations
 Analyze SW-HW interfaces and communication
End-to-End Latency - Time taken for the return trip
1. Cross point delay
2. Buffering at cross point and slave
3. Transfer and control delay at cross point, slave and
cache coherent domains
4. Memory read or write delay
5. Wire delay
Network Latency – Latency across cross point
Throughput– Memory and PL-AIE bandwidth
Multi-Level NoC- Vertical vs Horizontal
NoCufgsdcf
NoCufgsdcfdfd
Analysis Scenarios
Scenarios 1 2
Optimal network configuration
 Packets only have to take one or two
hops to reach destination
Yes Non-Optimal network
configuration – Non-
optimal placement of
nodes
Router Frequency MHZ
 Frequency at which the XP operates
2500 2500
Flit_Size (Bytes)
 Max packet size allowed on the network
 If the incoming packet is more than the
Flit_Size, the packet is fragmented
256 1024
X-dimension
Y-dimension
8
8
8
8
Packet Size 256 1024
Analysis shows the HBM Throughput is 40GBps because of Optimal network configuration and high frequency
VisualSim Power Optimization
Power Evaluation (Case 1)
Cumulative
Power
Average Power
Spikes indicates the number of
devices are turned “on” and “off”
.
.
Power States
VisualSim AI Engine Network Optimization
Mapping to AIE Tile System PE – 12x14
Results for 5*4 AIE Tiles Configuration
Spikes in the
Processing
Element latency
plot – Offchip
memory
accesses
Results for 14*12 AIE Tiles Configuraton
Power
Consumption
significantly
higher – more
PEs are active,
more memory
accesses
Results – with DDR4 vs DDR3
Compared to DDR3, the Average
latency per PE has reduce and
more MACs have completed
processing
Integrating FPGA in a End-User Application
Architecting Hardware-Software for
Infotainment System
DRAM
Display
IO
A
M
B
A
A
X
I
B
u
s
CPU
GPU
Display
Ctrl
P
C
I
e
Video Camera
SRAM
Packet
System Overview
◦ Camera : 30fps, VGA corresponds
◦ CPU : Multi-core ARM Cortex-A53 1.2GHz
◦ GPU : 64Cores(8Warps×8PEs), 32Threads, 1GHz
◦ DisplayCtrl : DisplayBuffer 293,888Byte
◦ SRAM: SDR, 64MB, 1.0GHz
◦ DRAM : DDR3, 64MB, 2.4GHz
Explore at the board- and semiconductor-level to size uP/GPU, memory bandwidth and bus/switch configuration
Develop an integrated Infotainment Processor
• Size GPU, AXI bus and memory controller
• Target is a high-end Automotive
infotainment
• Ensure sequence of flows from Video
Camera to Display Controller is correct
• Determine the maximum throughput that
can be processor with no overflows
VisualSim Model of Infotainment System
NXP i.MX6 /
nVIDIA Drive PX
Xilinx FPGA
Kintex 8
Discrete
DMA
ARM A53
GPU
Display Ctrl
SRAM3
DRAM3
Video IN
Parameters
Video OUT
Conducting Architecture Trade-off
• By changing the amount of video input data (packet number), observe the SRAM -> DRAM transfer
performance and examine the upper limit performance of the video input that the system can tolerate.
210Packet/Sec
12ms
21Packet/Sec
41.4us
300Packet/Sec
• 250 Packet/Sec is the system limit
• With 300 Packet/Sec, simulation cannot be
executed due to FIFO buffer overflow.
Using FPGA Model in Communication System
End-to-End System Simulation
End-to-end simulation
Antenna to Ethernet
Protocol, Baseband,
Analog, RF,
Antenna and Channel
Rx vs Tx pwr
Link margin
Component selection
Baseband
Microwave
Antenna
Implementation with Domain Tools
Integrating RF and Analog Flow
RF
Baseband
Antenna Modeling
MIRABILIS DESIGN LEADS THE SYSTEM DESIGN INNOVATION
About Mirabilis Design
About Mirabilis Design
Software Company based in Silicon Valley
Integrates Model-based Systems Engineering with the electronics development flow
Development and Support Centers
USA, India, China, South Korea, Japan and Europe
VisualSim Architect - Modeling and Simulation Software
Graphical modeling, multi-domain simulator, system-level IP, analysis tools and open API
Digital Enablement of the Electronics Product Development Front-End
Market Segments
Semiconductors, Automotive and, Aerospace and Defense
Design Enablement
Architecture trade-offs, system validation, early functional testing and communication
Networking
System Design Solution and Platform
VisualSim
Architect
• Graphical and
Hierarchical
Modeling
System-Level IP
• Parameterized
components that
cover hardware,
software and
networking
Multi-domain
Simulator
Digital, FSM
Untimed &
Continuous
MBSE
Linking
Requirements with
multi-core
Regression with AI
Cloud and Desktop Version available
Key Innovations
• Parameterized library components for hardware to
create any vendor variation
• Real-time plotting
• Single-event calendar that can communicate with
both VisualSim and external simulators
• Behavior to architecture mapping
• Support for all design and analysis from Concept to
implementation flow for electronics
VisualSim System Level IP
Custom Creator
Algorithm
Power
Control, analog, DSP,
communication,audio
imaging Table, Energy harvesters,
Battery
Distribution, Sequence,
Trace file, Instruction
profile
Traffic
Reports
Latency, Throughput,
Utilization, Ave/peak
power, Statistics
RTL-Like
RTOS
Clock, Wire-Delay,
Registers, Latches and
Flip-flop, ALU and FSM,
Mux, DeMux, Lookup
table
Generic RTOS, ARINC
653, AUTOSAR
AMBA (AHB/ APB/ AXI), Corelink,
CoreConnect, Network-on-Chip,
Virtual Channel, DMA, Crossbar,
Serial Switch, Bridge
SOC
Board-
Level
VME, PCI/PCI-X/PCIe, SPI 3.0,
Rapid IO, 1553B, FlexRay, CAN-
FD, AFDX, TTEthernet, OpenVPX
Processors ARM (M-Series), ARM (A8, A72, A53,
A76), RISC-V, Nvidia- Drive-PX,
Configurable GPU, DSP, mP and mC,
PowerPC, X86- Intel and AMD, DSP- TI
and ADI, Others: MIPS, Tensilica,
Renesas SH, Marvel
Stochastic
Queue ,Time
Queue, Quantity
Queue, System
Resources,
Scheduling
algorithms
Script language,
600 RegEx, Task
graph, Use cases,
Programming
languages
Storage Flash, NVMe, Disk
Memory Controller, MPMC,
Fibre Channel, Fire Wire
Switched Ethernet, Resilient Packet Ring,
RP3, Wireless LAN 802.11, Bluetooth and
PAN, Spacewire, Audio-Video Bridging,
IEEE802.1Q
Networking
Memory
• Memory Controller, SDR, DDR
DRAM 2,3,4, LPDDR 2, 3, 4,
HBM, HMC, QDR, RDRAM
FPGA Xilinx- Zynq, Virtex, Kintex,
Intel-Stratix, Arria,
Microsemi- Smartfusion,
Programmable logic
generator, External links to
I/O, Network and Memory
Largest Library of System Modeling IP Components
VisualSim Integrated System Design Flow
MBSE Concept
Failure &
Security
Functional
Unit Testing
Embedded
Systems
FPGA/
ASIC
Misison and
Vehicles
RF/Analog/
Antenna
Hardware/
Software Flow
To Implementation
(Schematics, HDL, Embedded C/C++/Java
Emulators, test equipment, FPGA Boards)
Document
Generation
External Users
Government
Systems
Integrator
Protocols &
DSP/Imaging
3rd Party
Provided
4.
Communication
& Sharing
5.
Functional
Testing
1. Algorithmic
Optimization
(Fidelity & Precision)
Systems
engineering
Marketing
VisualSim
Architect
VisualSim
Architect
Integrating What-if’s to Functional Testing
2. Architecture
Exploration (Speed,
Power & Area)
3. Specialized
Testing and Demo
VisualSim drives Efficiency & Productivity
Model Creation (6)
Implementation (18)
Using Current Design Methodology
Project Schedule
)
Implementation (12)
Using VisualSim Design Methodology
Time savings
based on 24
month project
is 20-40%
Note: All times in months
TM
Communication and Refinement (4)
Analysis (2.5)
Model Creation (0.5)
Analysis (1.5)
Communication and Refinement (6)
Advantageous over generic modeling environment due to less time & greater applicability across the organization
Mirabilis_Design AMD Versal System-Level IP Library

More Related Content

Similar to Mirabilis_Design AMD Versal System-Level IP Library

Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemDeepak Shankar
 
Accelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim ArchitectAccelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim ArchitectDeepak Shankar
 
Varun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas
 
Using VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System AnalysisUsing VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System AnalysisDeepak Shankar
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Deepak Shankar
 
Hari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna
 
Develop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML applicationDevelop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML applicationDeepak Shankar
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Webinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE networkWebinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE networkDeepak Shankar
 

Similar to Mirabilis_Design AMD Versal System-Level IP Library (20)

Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
 
Accelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim ArchitectAccelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim Architect
 
Varun Gatne - Resume - Final
Varun Gatne - Resume - FinalVarun Gatne - Resume - Final
Varun Gatne - Resume - Final
 
3.9-Software.pptx
3.9-Software.pptx3.9-Software.pptx
3.9-Software.pptx
 
Webinar on RISC-V
Webinar on RISC-VWebinar on RISC-V
Webinar on RISC-V
 
Webinar on radar
Webinar on radarWebinar on radar
Webinar on radar
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application PerformanceUsing a Field Programmable Gate Array to Accelerate Application Performance
Using a Field Programmable Gate Array to Accelerate Application Performance
 
Using VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System AnalysisUsing VisualSim Architect for Semiconductor System Analysis
Using VisualSim Architect for Semiconductor System Analysis
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
 
Choosing the right processor
Choosing the right processorChoosing the right processor
Choosing the right processor
 
Modeling Abstraction
Modeling AbstractionModeling Abstraction
Modeling Abstraction
 
Hari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna Vetsa Resume
Hari Krishna Vetsa Resume
 
Develop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML applicationDevelop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML application
 
Resume
ResumeResume
Resume
 
Shubhankar pawade resume
Shubhankar pawade resumeShubhankar pawade resume
Shubhankar pawade resume
 
Introduction to EDA Tools
Introduction to EDA ToolsIntroduction to EDA Tools
Introduction to EDA Tools
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
Webinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE networkWebinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE network
 

More from Deepak Shankar

How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? Deepak Shankar
 
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...Deepak Shankar
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
 
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSDeepak Shankar
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Deepak Shankar
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsDeepak Shankar
 
Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers. Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers. Deepak Shankar
 
Automotive network and gateway simulation
Automotive network and gateway simulationAutomotive network and gateway simulation
Automotive network and gateway simulationDeepak Shankar
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture explorationDeepak Shankar
 
Using ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionicsUsing ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionicsDeepak Shankar
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Deepak Shankar
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training ClassDeepak Shankar
 
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationWebinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationDeepak Shankar
 
Webinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System AnalysisWebinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System AnalysisDeepak Shankar
 
Is accurate system-level power measurement challenging? Check this out!
Is accurate system-level power measurement challenging? Check this out!Is accurate system-level power measurement challenging? Check this out!
Is accurate system-level power measurement challenging? Check this out!Deepak Shankar
 
Architectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidthArchitectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidthDeepak Shankar
 
Mirabilis design Inc - Brochure
Mirabilis design Inc - BrochureMirabilis design Inc - Brochure
Mirabilis design Inc - BrochureDeepak Shankar
 

More from Deepak Shankar (17)

How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
 
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
 
Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers. Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers.
 
Automotive network and gateway simulation
Automotive network and gateway simulationAutomotive network and gateway simulation
Automotive network and gateway simulation
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
 
Using ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionicsUsing ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionics
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training Class
 
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based SimulationWebinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
Webinar: Detecting Deadlocks in Electronic Systems using Time-based Simulation
 
Webinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System AnalysisWebinar on Functional Safety Analysis using Model-based System Analysis
Webinar on Functional Safety Analysis using Model-based System Analysis
 
Is accurate system-level power measurement challenging? Check this out!
Is accurate system-level power measurement challenging? Check this out!Is accurate system-level power measurement challenging? Check this out!
Is accurate system-level power measurement challenging? Check this out!
 
Architectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidthArchitectural tricks to maximize memory bandwidth
Architectural tricks to maximize memory bandwidth
 
Mirabilis design Inc - Brochure
Mirabilis design Inc - BrochureMirabilis design Inc - Brochure
Mirabilis design Inc - Brochure
 

Recently uploaded

Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .Satyam Kumar
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 

Recently uploaded (20)

Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Churning of Butter, Factors affecting .
Churning of Butter, Factors affecting  .Churning of Butter, Factors affecting  .
Churning of Butter, Factors affecting .
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 

Mirabilis_Design AMD Versal System-Level IP Library

  • 1.
  • 2. Introduction to AMD Versal SOC-FPGA Versal adaptive SoCs deliver unparalleled application- and system-level value for cloud, network, and edge applications​. The disruptive 7 nm architecture combines heterogeneous compute engines with a breadth of hardened memory and interfacing technologies for superior performance/watt System-on-chip (SoC) combines CPUs, DSPs, I/O and RAM control along with programmable hardware logic Built around an integrated shell composed of a programmable network on chip (NoC), which enables seamless memory-mapped access to the full height and width of the device.
  • 3. Current Approach to Architecting FPGA Math algorithm modeling ◦ Conducts Functional or math simulation to study precision and fidelity of new algorithms Requirements database ◦ Requirements modeling list and static test changes List the delays in spreadsheet and add them up ◦ Average or worst case without concurrent activity Emulation/Boards ◦ Run benchmarks and capture the latency and memory throughput
  • 4. FPGA Designer Requirements 1. High-level system architecture mapping. System architects (MATLAB) to evaluate the advantage provided by Xilinx Versal heterogeneous architecture • Persona: System architect • High-level application capture w/ models of key application building blocks • Fast application exploration to heterogeneous HW targets (PS, PL, AIE, PL+AIE) 2. Algorithm Trade-Offs: Initial architecture application mapping decisions to choose the right fabric for optimal performance for each algorithm • Persona: PL/RTL, AIE, & PS designers • Make trade studies of how different implementations and mappings change system performance • Identify potential bottlenecks
  • 5. Introduction to VisualSim System-Level- IP Architecture Platform for AMD Versal FPGA VisualSim is system-level modeling and simulation SW Platform for rapid Trade-off ◦ Performance/Power/Area during planning ◦ Study speed, power, failure and bottlenecks Optimize implementation, resource and timing constraints of algorithm tasks New Versal FPGA IP is a Stochastic model containing ◦ Heterogeneous compute resources ◦ DDR and HBM memory interfaces ◦ Statistics for latency, throughput, utilization and power ◦ User expandable resource usage table ◦ External Interfaces ◦ Task block with mapping function ◦ Traffic generator for workloads
  • 6. System-Level Application Exploration Tool App Model Resource Model App resource target Simulate/test Identify congestion ? What is it? • Stochastic models focused on early application mapping exploration to heterogeneous SoC compute resources. • Users are system architect responsible for high-level complex system mapping, not design entry • Rapid trade-off and design iteration prior to algorithm and design entry Existing AMD-Xilinx tools Where does it fit? • Extends AMD-Xilinx general toolset with pre-design- entry focused application analysis • Lower fidelity stochastic based model vs. full design entry simulation tools • Provides guidance to down-stream sub-system (AIE, PL, PS) design entry development teams Persona Generates System Architect Subsystem requirements Developer PL: RTL/HLS AIE: C/C++ PS: C/C++/Python System Eng BootFW Gen configs SW Eng Runtime SW OTA life-cycle App Exploration Analysis Tool Design Entry RTL, C/C++ Design Generation *.bit, *.pdi, *.elf On-Target Runtime Deployment (Linux, libs, ?? VisualSim App Exploration Analysis Tool Design Entry (Vivado & Vitis) RTL, C/C++ Design Generation (Vivado & Vitis) *.bit, *.pdi, *.elf On-Target Runtime Deployment (Linux, libs, ??
  • 7. App Exploration Tool – Elements Resource models • AIE tile & subsystem • NOC backbone, AXI interconnects and Direct Path • PL function “task” models • DDR memory controller & devices • Arm CPU models User app functions & stimulus • Target persona: System architect • Traffic pattern generators • Task/compute behavior models • Task & data description via XML semantic language including SysML • C-code for CPU & GPU like targets including Tarmac, Gem5 trace • Stochastic and cycle-accurate function models for FPGA App Exploration Analysis Tool Design Entry (Vivado & Vitis) RTL, C/C++ Design Generation (Vivado & Vitis) *.bit, *.pdi, *.elf On-Target Runtime Deployment (Linux, libs, AIE AIE AIE AIE AIE AIE Interconnect Model Interconnect Model Interconnect Model Interconnect Model Interconnect Model Interconnect Model NOC Backbone PL Custom Models PL Custom Models PL Custom Models DDR Memory PS Subsystem Interfaces
  • 8. Architecture Trade-off using Versal FPGA for Image Processing Algorithm
  • 9. Mapping Algorithm to Versal SoC-FPGA Implementing an Image Processing algorithm on AMD-Xilinx Versal FPGA. Each task is mapped to a resources Standard Library Component Basic/Starting Configuration Grayscale_Conversion - PS [A72 Core 1] IIR – Logic (PL) FFT – AI Engine Tile Edge_Image - Logic (PL) iFFT – AI Engine Tile Edge_Image_Enhancement – Logic (PL) Segmentation – PS [A72 Core 2] Image Processing Algorithm
  • 10. Algorithm Task Table This table is used to define the number of resources consumed by each tasks across various resources (PS, PL, AI Tiles) if they were to be mapped to any of them. Each of these tasks are mapped to the resource of choice from the behavioural flow.
  • 11. Requirements and AI-based Tracking All the requirements (Latency, throughput, power, utilization etc.) can be listed in this csv file. At the end of simulation, a report which says whether the requirements were met is generated.
  • 12. Run 1 – Base Configuration Application latency increasing over time. Increase in latency is due to Segmentation. Remap segmentation task AI Engine Tiles
  • 13. Run 2 – Segmentation Mapped to AI Engine Application latency is in a bounded range. NoC Utilization is high. To reduce utilization, changed interconnect for Segmentation from NoC to Direct
  • 14. Run 3 – Using Direct Path between PS and AI Latency if deterministic Latency requirement (App latency < 80 msec) is met. Utilization across NoC is acceptable
  • 15. VisualSim Algorithm Mapping Methodology using Radar Application
  • 16. Radar Signal Processing Application
  • 17. Explore Analog and Digital Algorithms Multi-Domain Simulator with pre-built library provides significant precision and accuracy
  • 18. Mapping to DSP on FPGA
  • 19. Behavior Graph and Mapping File VisualSim Architect Dispatcher sends it to the target hardware module for processing and Handle Transitions Map individual functions to resources in Mapping Table
  • 20. Simulate Base Model (Clk = 600 MHz) The Requirements – latency for both ST (Static Target) and MT (Moving Target) estimation is not met
  • 21. Parameter Regression on Multi-Core Different parameter combinations based on the configured ranges are generated and simulated
  • 22. AI-Based Study using Requirements • Run number 19 – clock frequency at 1000 MHz satisfied the performance requirements we had set. • Since the frequency was increased from 600 MHz, the total power consumption went up while running the system at 1000 MHz • Architect can evaluate different processing resources – DSP vs Xeon cores vs Power cores if they have stringent power thresholds Requirements being evaluated for each simulation run in the parameter sweep Overall Results – We can identify the simulation runs which meet the requirements and select the right configuration after considering cost vs performance trade-offs
  • 24. Failure Analysis Hardware Failure Loss of processing cores, limited storage, reduced or loss memory device or bus overload/incorrect signals Software failure Resource starvation, deadlocks, data overwrite Network failure Network Congestion, misconfiguration, link loss and network errors RTOS failure Unable to achieve real-time deadlines, malicious change in schedule table, and executes beyond time slots Power Failure Both reduced and full power failure. Slower processing speed, limited number of resources can be executing concurrently MIRABILIS DESIGN INC. 24
  • 25. Functional Unit Testing Test Software with direct integration or in FPGA
  • 26. Software Design and Optimization GCC Compiler – Target arch. Compile and disassembly Source code Objdump – Disassembly Trace in VisualSim usable format Select Processor core Obtain Pipeline structure from official documentation Create the list of parameters and their possible values Map parameters Stats Reconfigure parameter map to improve performance Update Source code to improve performance
  • 27. VisualSim Interconnect Optimization for the Network-on-Chip
  • 28. Interconnect Architecture Exploration  Analyze SoC NOC and Memory sub-System architectures  Coherent .vs. Non-Coherent sub-systems connectivity  IO Coherency BW allocation  QoS – control, configuration and data intensive  Analyze SoC end to end flow control, credits, queueing and arbitration mechanisms  Analyze scheduling and distribution of tasks throughout the compute pipeline  Analyze the importance of different flow control mechanisms  e.g., credit allocation schemes, token bucket mechanisms and rate limit configurations  Analyze SW-HW interfaces and communication End-to-End Latency - Time taken for the return trip 1. Cross point delay 2. Buffering at cross point and slave 3. Transfer and control delay at cross point, slave and cache coherent domains 4. Memory read or write delay 5. Wire delay Network Latency – Latency across cross point Throughput– Memory and PL-AIE bandwidth
  • 29. Multi-Level NoC- Vertical vs Horizontal NoCufgsdcf NoCufgsdcfdfd
  • 30. Analysis Scenarios Scenarios 1 2 Optimal network configuration  Packets only have to take one or two hops to reach destination Yes Non-Optimal network configuration – Non- optimal placement of nodes Router Frequency MHZ  Frequency at which the XP operates 2500 2500 Flit_Size (Bytes)  Max packet size allowed on the network  If the incoming packet is more than the Flit_Size, the packet is fragmented 256 1024 X-dimension Y-dimension 8 8 8 8 Packet Size 256 1024 Analysis shows the HBM Throughput is 40GBps because of Optimal network configuration and high frequency
  • 32. Power Evaluation (Case 1) Cumulative Power Average Power Spikes indicates the number of devices are turned “on” and “off” . .
  • 34. VisualSim AI Engine Network Optimization
  • 35. Mapping to AIE Tile System PE – 12x14
  • 36. Results for 5*4 AIE Tiles Configuration Spikes in the Processing Element latency plot – Offchip memory accesses
  • 37. Results for 14*12 AIE Tiles Configuraton Power Consumption significantly higher – more PEs are active, more memory accesses
  • 38. Results – with DDR4 vs DDR3 Compared to DDR3, the Average latency per PE has reduce and more MACs have completed processing
  • 39. Integrating FPGA in a End-User Application
  • 40. Architecting Hardware-Software for Infotainment System DRAM Display IO A M B A A X I B u s CPU GPU Display Ctrl P C I e Video Camera SRAM Packet System Overview ◦ Camera : 30fps, VGA corresponds ◦ CPU : Multi-core ARM Cortex-A53 1.2GHz ◦ GPU : 64Cores(8Warps×8PEs), 32Threads, 1GHz ◦ DisplayCtrl : DisplayBuffer 293,888Byte ◦ SRAM: SDR, 64MB, 1.0GHz ◦ DRAM : DDR3, 64MB, 2.4GHz Explore at the board- and semiconductor-level to size uP/GPU, memory bandwidth and bus/switch configuration Develop an integrated Infotainment Processor • Size GPU, AXI bus and memory controller • Target is a high-end Automotive infotainment • Ensure sequence of flows from Video Camera to Display Controller is correct • Determine the maximum throughput that can be processor with no overflows
  • 41. VisualSim Model of Infotainment System NXP i.MX6 / nVIDIA Drive PX Xilinx FPGA Kintex 8 Discrete DMA ARM A53 GPU Display Ctrl SRAM3 DRAM3 Video IN Parameters Video OUT
  • 42. Conducting Architecture Trade-off • By changing the amount of video input data (packet number), observe the SRAM -> DRAM transfer performance and examine the upper limit performance of the video input that the system can tolerate. 210Packet/Sec 12ms 21Packet/Sec 41.4us 300Packet/Sec • 250 Packet/Sec is the system limit • With 300 Packet/Sec, simulation cannot be executed due to FIFO buffer overflow.
  • 43. Using FPGA Model in Communication System
  • 44. End-to-End System Simulation End-to-end simulation Antenna to Ethernet Protocol, Baseband, Analog, RF, Antenna and Channel Rx vs Tx pwr Link margin Component selection Baseband Microwave Antenna Implementation with Domain Tools
  • 45. Integrating RF and Analog Flow RF Baseband
  • 47. MIRABILIS DESIGN LEADS THE SYSTEM DESIGN INNOVATION About Mirabilis Design
  • 48. About Mirabilis Design Software Company based in Silicon Valley Integrates Model-based Systems Engineering with the electronics development flow Development and Support Centers USA, India, China, South Korea, Japan and Europe VisualSim Architect - Modeling and Simulation Software Graphical modeling, multi-domain simulator, system-level IP, analysis tools and open API Digital Enablement of the Electronics Product Development Front-End Market Segments Semiconductors, Automotive and, Aerospace and Defense Design Enablement Architecture trade-offs, system validation, early functional testing and communication Networking
  • 49. System Design Solution and Platform VisualSim Architect • Graphical and Hierarchical Modeling System-Level IP • Parameterized components that cover hardware, software and networking Multi-domain Simulator Digital, FSM Untimed & Continuous MBSE Linking Requirements with multi-core Regression with AI Cloud and Desktop Version available Key Innovations • Parameterized library components for hardware to create any vendor variation • Real-time plotting • Single-event calendar that can communicate with both VisualSim and external simulators • Behavior to architecture mapping • Support for all design and analysis from Concept to implementation flow for electronics
  • 50. VisualSim System Level IP Custom Creator Algorithm Power Control, analog, DSP, communication,audio imaging Table, Energy harvesters, Battery Distribution, Sequence, Trace file, Instruction profile Traffic Reports Latency, Throughput, Utilization, Ave/peak power, Statistics RTL-Like RTOS Clock, Wire-Delay, Registers, Latches and Flip-flop, ALU and FSM, Mux, DeMux, Lookup table Generic RTOS, ARINC 653, AUTOSAR AMBA (AHB/ APB/ AXI), Corelink, CoreConnect, Network-on-Chip, Virtual Channel, DMA, Crossbar, Serial Switch, Bridge SOC Board- Level VME, PCI/PCI-X/PCIe, SPI 3.0, Rapid IO, 1553B, FlexRay, CAN- FD, AFDX, TTEthernet, OpenVPX Processors ARM (M-Series), ARM (A8, A72, A53, A76), RISC-V, Nvidia- Drive-PX, Configurable GPU, DSP, mP and mC, PowerPC, X86- Intel and AMD, DSP- TI and ADI, Others: MIPS, Tensilica, Renesas SH, Marvel Stochastic Queue ,Time Queue, Quantity Queue, System Resources, Scheduling algorithms Script language, 600 RegEx, Task graph, Use cases, Programming languages Storage Flash, NVMe, Disk Memory Controller, MPMC, Fibre Channel, Fire Wire Switched Ethernet, Resilient Packet Ring, RP3, Wireless LAN 802.11, Bluetooth and PAN, Spacewire, Audio-Video Bridging, IEEE802.1Q Networking Memory • Memory Controller, SDR, DDR DRAM 2,3,4, LPDDR 2, 3, 4, HBM, HMC, QDR, RDRAM FPGA Xilinx- Zynq, Virtex, Kintex, Intel-Stratix, Arria, Microsemi- Smartfusion, Programmable logic generator, External links to I/O, Network and Memory Largest Library of System Modeling IP Components
  • 51. VisualSim Integrated System Design Flow MBSE Concept Failure & Security Functional Unit Testing Embedded Systems FPGA/ ASIC Misison and Vehicles RF/Analog/ Antenna Hardware/ Software Flow To Implementation (Schematics, HDL, Embedded C/C++/Java Emulators, test equipment, FPGA Boards) Document Generation External Users Government Systems Integrator Protocols & DSP/Imaging 3rd Party Provided 4. Communication & Sharing 5. Functional Testing 1. Algorithmic Optimization (Fidelity & Precision) Systems engineering Marketing VisualSim Architect VisualSim Architect Integrating What-if’s to Functional Testing 2. Architecture Exploration (Speed, Power & Area) 3. Specialized Testing and Demo
  • 52. VisualSim drives Efficiency & Productivity Model Creation (6) Implementation (18) Using Current Design Methodology Project Schedule ) Implementation (12) Using VisualSim Design Methodology Time savings based on 24 month project is 20-40% Note: All times in months TM Communication and Refinement (4) Analysis (2.5) Model Creation (0.5) Analysis (1.5) Communication and Refinement (6) Advantageous over generic modeling environment due to less time & greater applicability across the organization

Editor's Notes

  1. Instant Power represents the instantaneous power consumption of the devices (mentioned in power table) at every instance of clock cycle Average Power represents the average of power for each devices at different states-> (Standby, Active, Wait, Idle and Retention)
  2. Here the Maximum Network Latency is 3.5x10-9 (Which is in Nano seconds) and Maximum End to End Latency is 1.6x10-7. From the analysis we can see that an optimal network configuration and high frequency results in better latency.