SlideShare a Scribd company logo
Evaluating UCIe based multi-die SoC to
meet timing and power
Logistics of the Webinar
2
All attendees will be placed on mute
To ask a question, click on Cloud Chat sign and type the
question. Folks are standing by to answer your questions.
There will also be a time at the end for Q&A
Agenda
Overview of UCIe™ — Universal Chiplet Interconnect Express™
Introduction to system modeling with UCIe and other Intellectual Properties
Assembling System models using UCIe protocol
Examples of SoC architectures using UCIe
Use Case
Mirabilis Design and VisualSim Architect
UCIe Background Information
Background on die-to-die Interconnect
•Packing large number of functions at different clock rate onto a monolithic die is not scalable
•Solution: Integrate multiple dies into a single package – Chiplets
•Chiplet Challenge:
• Die-to-die communication is very slow and consumes too much power
• No single standard available to handle the routing, signalling and multiple clock domains
• Cache coherency across dies
• Support for multiple protocols
•Exploration:
• Need a mechanism to predict the expected latency and power consumption
• Test feasibility of different configurations and assign compute resources on individual dies
• Study the impact of failures or extreme latency
• Explore different scheduling and Quality-of-Service algorithms
Universal Chiplet Interconnect Express or
UCIe is the Future
•Customizable, package-level integration of chiplets
• Combines best-in-class die-to-die interconnect and protocol connections from an interoperable, multi-vendor ecosystem
• open industry standard interconnect
•Offers high-bandwidth, low-latency, power-efficient, and cost-effective on-package connectivity
•Implement compute in an advanced process node to deliver power-efficient performance at higher cost
with memory and I/O controller reused from earlier design in an established (n-1 or n-2) process node
•Future design will incorporate interaction between AI engines on different dies connected and require
deterministic latency
•Optimal design requires accurate assignment of resource pooling, resource sharing and messaging passing
•UCIe theoretical bandwidth is 4x bandwidth of PCIe 6.0 (Tbps range)
•Actual bandwidth depends on burst data available, buffer size for the Tx and replay buffer
UCIe Specification at a Glance
How UCIe Works?
Multiple layers separate out the interconnect tasks
Physical layer is responsible for the electrical signaling,
clocking, link training and sideband
Die-to-Die adapter provides the link state management
and parameter negotiation for the chiplets. It optionally
guarantees reliable delivery of data through CRC and link
level retry mechanism.
◦ When multiple protocols are supported, it defines the
underlying arbitration mechanism.
The FLIT (flow control unit) defines the underlying transfer
mechanism when the adapter is responsible for reliable
transfer
UCIe Packaging
Two package types - standard and advanced.
◦ Standard has 16 lanes
◦ Advanced has 64 lanes
To increase bandwidth, support for multi-module
For 2 modules
◦ Standard has 32 lanes
◦ Advanced has 128 lanes
◦ Each will send different bytes of data
Increase in number of lanes as module count
increases
Multiple PHY logic provides for greater data transfer
with better scheduling
Transition to UCIe
CXL
2.0
PCIe Gen 6
interface
PCIe Gen 6
interface
Strea
ming
Typical SoC – monolithic approach Next Gen SoC – Use Chiplets in modular approach
Commonality with PCIe 6.0
•UCIe protocol emulates PCIe for chiplets
•UCIe transfers packets in FLITs
• PCIe 6.0 uses fixed value of 256 bytes
• UCIe FLIT Size is variable based on the sender and receiver protocol
•Credit based flow control mechanism
•Packets use ACK or NAK to confirm good reception
• Selective and Standard ACK options
•Advanced port status and error checking
• CRC checksum
•Bandwidth depends on the number of lanes
• Standard vs Advanced package
• Multi-Module option
System-level Architecture Analysis of UCIe
based multi-die SoC
Multi-Media Application –
UCIe Template provided by Intel
CPU – High
Performance
cores
CPU – Low
Power cores
Audio/Video
Encoder/Decoder
I/O Tile
M
E
M
M
E
M
M
E
M
PCIe 6.0 PCIe 6.0
P
C
I
e
6
.
0
C
X
L
3
.
0
UCIe
Retimer
Off-Package
Interconnect
NVMe SSD
chiplet
UCIe
Retimer
C
X
L
3
.
0
How much should the
retimer timeout be set to?
Do we need a multi module setup?
How much
should the
transfer rate
between UCIe
links be set to?
4 GTs or 8 GTs
… or 32 GTs?
Start with a System Block Diagram
VisualSim Model
Create a VisualSim model using existing building blocks
Stats
Advanced package, 4 module, 32 GT/s config Standard package, Single module, 4 GT/s config
~300x latency
difference can be
observed. However, for
non-time critical
applications, Standard
UCIe package option
looks attractive
Study the statistics to decide on the best configuration
Application Examples of UCIe based multi-
die SoC
Example 1 – Multi-Media applications
CPU – High
Performance
cores
CPU – Low
Power cores
Audio/Video
Encoder/Decoder
I/O Tile
M
E
M
M
E
M
M
E
M
PCIe 6.0 PCIe 6.0
P
C
I
e
6
.
0
C
X
L
3
.
0
Retimer Off-Package Interconnect
Example 2 :
Automotive Autonomous Driving
UCIe
AI Engine Tiles
Warp
Scheduler
PE
PE
PE
PE
Local Mem
GPU
Analog Chiplet
ADC DAC
PLL
ADC DAC
PLL
Processor subsystem
Core L1
B
u
s
SLC
Example 3 : Cache Coherency using UCIe
UCIe
SERDES
32nm
GPU
7nm
RISC-V Cores
5nm
ARM Cores
10nm
DSP
10nm
SLC chiplet
22nm
LPDDR5
28nm
C
a
c
h
e
C
a
c
h
e
C
a
c
h
e
C
a
c
h
e
Design Challenges in Implementing UCIe
•Huge memory transaction blocks a high priority control access
• For time critical application, these situations are not desirable
• Example : Automotive communication system
•Multiple chiplets can be connected easily and efficiently
• Resource sizing per chiplets needs to be correct to maximize bandwidth usage
• Example applications : Data Center and AI Accelerators
•Migrating from monolithic die to Chiplet in smartphones is efficient
• Limited memory needs to be partitioned for different dies to access with minimal contention
• Example: Apple M1 Ultra uses Chiplets to double the performance
Performance challenges
•User defines CXL stacks with two protocols sharing the physical link.
•Arbiter across the Die-to-Die adapter must send Flits alternatively between the 2 protocols.
• If one of the Protocol layers doesn’t have data to transmit, then instead of payload, “NOP” frames are
inserted. If one of the Protocol stacks is idle for most of the time, then bandwidth could essentially be wasted
on the “NOP” frames.
•Increasing the number of modules for either the standard or advanced package provides more
bandwidth.
• But is that extra bandwidth needed for the application?
•What happens if multiple chiplets in your design require the data stored at the same address location
which is in another chiplet?
• Consider the impact of cache coherency
•Can peak throughput be guaranteed for your application in a shared resource environment?
• AI Engine distributed across multiple dies
Analyzing UCIe based multi-die SoC using
VisualSim System Model
Autonomous driving
UCIe
AI Engine Tiles
Warp
Scheduler
PE
PE
PE
PE
Local Mem
GPU
Memory chiplet
ADC
DDR5
Processor subsystem
Core L1
B
u
s
SLC
• Optimal
mesh size
(mxn) ?
• Best sample
size (16
bytes vs 32
bytes etc) ?
Use a single protocol
stack or multi protocol
stack?
Do we need PCIe
gen6 or still use
gen5 for meeting
application
requirements?
VisualSim System Model using UCIe in
ADAS SoC
Statistics for Multi-Die SoC
• Note the AI Engine
latency spikes
• For multi protocol,
half bandwidth for
each protocol.
• Older gen protocols
are mixed with PCIe 6,
• Lower FLIT size
increases latency.
Comparing Different Configurations using
UCIe Interface
All Die Adapters use PCIe 6.0
Die Adapters use PCIe 6.0 and
Streaming Protocols (AXI)
Lower latency when using PCIe 6.0
Mirabilis Design
VisualSim Architect
About Mirabilis Design
Engineering Solutions focused on innovation in electronics
Based in Silicon Valley, USA
Development and support centers in US, India, Japan, China and Czech
60 large corporations, research centers and 73 universities as customers
Enabled 250 products in semiconductors, automotive, defense and space
VisualSim Architect is the system simulation and IP for hardware, software and networking
Mirabilis Design – Milestones
VisualSim Aerospace
Simulator of the Year
Hardware
Modeling
2003
Company
Incorporated
2005
Modeling Services
1st Customer
2008
Stochastic Modeling
Innovation Award
2010
Integration API
10th customer
2011
Network Modeling
University Program
2013
2015
2018
Best ESL at DAC
2nd at Arm TechCon
2019
VisualSim Automotive
Europe operations
2020
Failure Analysis
Created Asia Team
2021
Best Embedded Systems
Presentation Award – DAC
2021
SysML API
Requirements
2018
New
VisualSim
2022
Best in Show
Embedded World
2023
Communication System
Designer
2022
System Verilog and
UPF/CPF Link
VisualSim Architect
Cloud and
Desktop
Multi-simulation
engine- Digital,
Untimed &
Continuous
Library of Systems,
Networks, Semi,
FPGA & Software
Generate statistics,
documentation &
traces
Algorithms
Protocol
AI Insight
Performance
Power
Functional
Stochastic
Scripting
Sim API
Performance
Latency, Throughput, Buffer occupancy
Power
Instant, Average, Cumulative, Heat, Temperature
Battery and power generation sizing
Functionality
Correctness, efficiency and Quality-of-Service
Failure Analysis and Functional Safety
Generate errors and test for compliance
Software Evaluation
Test quality of C++ and impact on system performance
System-level Modeling and Simulation Software
that integrates requirements, exploration & verification
Over 500 Systems-Level IP Components
Comprehensive implementation-accurate Library
Traffic
• Distribution
• Sequence
• Trace file
• Instruction
profile
Power
• State power table
• Power management
• Energy harvesters
• Battery
• RegEx operators
SoC Buses
• AMBA and Corelink
• AHB, APB, AXI, ACE,
CHI, CMN600
• Network-on-Chip
• TileLink
System Bus
• PCI / PCI-
X / PCIe
• Rapid IO
• AFDX
• OpenVPX
• VME
• SPI 3.0
• 1553B
ARM
• M-, R-, 7TDMI
• A8, A53, A55, A72, A76,
A77, Neoverse
Custom
Creator
• Script language
• 600 RegEx fn
• Task graph
• Tracer
• C/C++/Java
• Python
Stochastic
• FIFO/LIFO Queue
• Time Queue
• Quantity Queue
• System Resource
• Schedulers
• Cyber Security
Memory
• Memory Controller
• DDR DRAM 2,3,4, 5
• LPDDR 2, 3, 4
• HBM, HMC
• SDR, QDR, RDRAM
Networking
• Ethernet & GiE
• Audio-Video Bridging
• 802.11 and Bluetooth
• 5G
• Spacewire
• CAN-FD
• TTEthernet
• FlexRay
• TSN & IEEE802.1Q
• ARINC 664/AFDX
Interfaces
• Virtual
Channel
• DMA
• Crossbar
• Serial
Switch
• Bridge
Algorithms
• Signal Processing
• Analog
• Antenna
RTOS
• Template
• ARINC 653
• AUTOSAR
Storage
• Flash & NVMe
• Storage Array
• Disk and SATA
• Fibre Channel
• FireWire
Software
• GEM5
• Software
code
integration
• Instruction
trace
• Statistical
software
model
• Task graph
RTL-Like
• Clock, Wire-Delay
• Registers, Latches
• Flip-flop
• ALU and FSM
• Mux, DeMux
• Lookup table
Processors
• GPU, DSP, mP and mC
• RISC-V
• SiFive u74
• Nvidia- Drive-PX
• PowerPC
• X86- Intel and AMD
• DSP- TI and ADI
• MIPS, Tensilica, SH
Reports
• Timing and Buffer
• Throughput/Util
• Ave/peak power
• Statistics
FPGA
• Xilinx- Zynq, Virtex,
Kintex
• Intel-Stratix, Arria
• Microsemi-
Smartfusion
• Programmable logic
template
• Interface traffic
generator
Evaluating UCIe based multi-die SoC to
meet timing and power

More Related Content

What's hot

PCI express
PCI expressPCI express
PCI express
sarangaprabod
 
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
gnkeshava
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD
 
IP PCIe
IP PCIeIP PCIe
IP PCIe
SILKAN
 
Pcie drivers basics
Pcie drivers basicsPcie drivers basics
Pcie drivers basics
Venkatesh Malla
 
Pc ie tl_layer (3)
Pc ie tl_layer (3)Pc ie tl_layer (3)
Pc ie tl_layer (3)
Rakeshkumar Sachdev
 
SoC Design
SoC DesignSoC Design
Soc architecture and design
Soc architecture and designSoc architecture and design
Soc architecture and design
Satya Harish
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptx
Memory Fabric Forum
 
System on chip buses
System on chip busesSystem on chip buses
System on chip buses
A B Shinde
 
Pci express modi
Pci express modiPci express modi
Pci express modi
proma_goswami
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
Allan Cantle
 
PCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingPCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingDVClub
 
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON
 
Soc - Intro, Design Aspects, HLS, TLM
Soc - Intro, Design Aspects, HLS, TLMSoc - Intro, Design Aspects, HLS, TLM
Soc - Intro, Design Aspects, HLS, TLMSubhash Iyer
 
System On Chip
System On ChipSystem On Chip
System On Chip
A B Shinde
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-ExpressDVClub
 
Routing.ppt
Routing.pptRouting.ppt
Routing.ppt
Sunesh N.V
 

What's hot (20)

PCI express
PCI expressPCI express
PCI express
 
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
PCIe and PCIe driver in WEC7 (Windows Embedded compact 7)
 
SOC design
SOC design SOC design
SOC design
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
IP PCIe
IP PCIeIP PCIe
IP PCIe
 
Pcie drivers basics
Pcie drivers basicsPcie drivers basics
Pcie drivers basics
 
Pc ie tl_layer (3)
Pc ie tl_layer (3)Pc ie tl_layer (3)
Pc ie tl_layer (3)
 
SoC Design
SoC DesignSoC Design
SoC Design
 
Soc architecture and design
Soc architecture and designSoc architecture and design
Soc architecture and design
 
Broadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptxBroadcom PCIe & CXL Switches OCP Final.pptx
Broadcom PCIe & CXL Switches OCP Final.pptx
 
System on chip buses
System on chip busesSystem on chip buses
System on chip buses
 
Pci express modi
Pci express modiPci express modi
Pci express modi
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
 
PCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingPCI Express Verification using Reference Modeling
PCI Express Verification using Reference Modeling
 
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
 
Soc - Intro, Design Aspects, HLS, TLM
Soc - Intro, Design Aspects, HLS, TLMSoc - Intro, Design Aspects, HLS, TLM
Soc - Intro, Design Aspects, HLS, TLM
 
System On Chip
System On ChipSystem On Chip
System On Chip
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
 
AMBA 2.0 PPT
AMBA 2.0 PPTAMBA 2.0 PPT
AMBA 2.0 PPT
 
Routing.ppt
Routing.pptRouting.ppt
Routing.ppt
 

Similar to Evaluating UCIe based multi-die SoC to meet timing and power

Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC design
AishwaryaRavishankar8
 
Cloud Networking Trends
Cloud Networking TrendsCloud Networking Trends
Cloud Networking Trends
Michelle Holley
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Educating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-VEducating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-V
RISC-V International
 
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
Deepak Shankar
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
ODSA Sub-Project Launch
ODSA Sub-Project LaunchODSA Sub-Project Launch
ODSA Sub-Project Launch
ODSA Workgroup
 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
Netronome
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017
Radisys Corporation
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
 
SoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based NetworkingSoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based Networking
Netronome
 
Quantum Cryptography
Quantum Cryptography  Quantum Cryptography
The Art of Displaying Industrial Data
The Art of Displaying Industrial DataThe Art of Displaying Industrial Data
The Art of Displaying Industrial Data
Inductive Automation
 
Build the network of the future on your terms today
Build the network of the future on your terms todayBuild the network of the future on your terms today
Build the network of the future on your terms today
Dell World
 
Syste O CHip Concepts for Students.ppt
Syste O CHip Concepts for Students.pptSyste O CHip Concepts for Students.ppt
Syste O CHip Concepts for Students.ppt
monzhalabs
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
Hitesh Mohapatra
 
Using Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M usersUsing Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M users
Mirantis
 
Advantech Intelligent Communication Gateways are ARM-based robust platforms w...
Advantech Intelligent Communication Gateways are ARM-based robust platforms w...Advantech Intelligent Communication Gateways are ARM-based robust platforms w...
Advantech Intelligent Communication Gateways are ARM-based robust platforms w...
samveed
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spacejsvetter
 
TRACK B: Multicores & Network On Chip Architectures/ Oren Hollander
TRACK B: Multicores & Network On Chip Architectures/ Oren HollanderTRACK B: Multicores & Network On Chip Architectures/ Oren Hollander
TRACK B: Multicores & Network On Chip Architectures/ Oren Hollanderchiportal
 

Similar to Evaluating UCIe based multi-die SoC to meet timing and power (20)

Trends and challenges in IP based SOC design
Trends and challenges in IP based SOC designTrends and challenges in IP based SOC design
Trends and challenges in IP based SOC design
 
Cloud Networking Trends
Cloud Networking TrendsCloud Networking Trends
Cloud Networking Trends
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Educating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-VEducating the computer architects of tomorrow's critical systems with RISC-V
Educating the computer architects of tomorrow's critical systems with RISC-V
 
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERSROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
ROLE OF DIGITAL SIMULATION IN CONFIGURING NETWORK PARAMETERS
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
ODSA Sub-Project Launch
ODSA Sub-Project LaunchODSA Sub-Project Launch
ODSA Sub-Project Launch
 
ODSA Sub-Project Launch
 ODSA Sub-Project Launch ODSA Sub-Project Launch
ODSA Sub-Project Launch
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
SoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based NetworkingSoC Solutions Enabling Server-Based Networking
SoC Solutions Enabling Server-Based Networking
 
Quantum Cryptography
Quantum Cryptography  Quantum Cryptography
Quantum Cryptography
 
The Art of Displaying Industrial Data
The Art of Displaying Industrial DataThe Art of Displaying Industrial Data
The Art of Displaying Industrial Data
 
Build the network of the future on your terms today
Build the network of the future on your terms todayBuild the network of the future on your terms today
Build the network of the future on your terms today
 
Syste O CHip Concepts for Students.ppt
Syste O CHip Concepts for Students.pptSyste O CHip Concepts for Students.ppt
Syste O CHip Concepts for Students.ppt
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Using Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M usersUsing Kubernetes to make cellular data plans cheaper for 50M users
Using Kubernetes to make cellular data plans cheaper for 50M users
 
Advantech Intelligent Communication Gateways are ARM-based robust platforms w...
Advantech Intelligent Communication Gateways are ARM-based robust platforms w...Advantech Intelligent Communication Gateways are ARM-based robust platforms w...
Advantech Intelligent Communication Gateways are ARM-based robust platforms w...
 
Exploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design spaceExploring emerging technologies in the HPC co-design space
Exploring emerging technologies in the HPC co-design space
 
TRACK B: Multicores & Network On Chip Architectures/ Oren Hollander
TRACK B: Multicores & Network On Chip Architectures/ Oren HollanderTRACK B: Multicores & Network On Chip Architectures/ Oren Hollander
TRACK B: Multicores & Network On Chip Architectures/ Oren Hollander
 

More from Deepak Shankar

How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
Deepak Shankar
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
Deepak Shankar
 
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Deepak Shankar
 
Modeling Abstraction
Modeling AbstractionModeling Abstraction
Modeling Abstraction
Deepak Shankar
 
Accelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim ArchitectAccelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim Architect
Deepak Shankar
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Deepak Shankar
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
Deepak Shankar
 
Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers. Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers.
Deepak Shankar
 
Automotive network and gateway simulation
Automotive network and gateway simulationAutomotive network and gateway simulation
Automotive network and gateway simulation
Deepak Shankar
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
Deepak Shankar
 
Using ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionicsUsing ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionics
Deepak Shankar
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
Deepak Shankar
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
Deepak Shankar
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Deepak Shankar
 
Develop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML applicationDevelop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML application
Deepak Shankar
 
Webinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE networkWebinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE network
Deepak Shankar
 
Webinar on radar
Webinar on radarWebinar on radar
Webinar on radar
Deepak Shankar
 
Webinar on RISC-V
Webinar on RISC-VWebinar on RISC-V
Webinar on RISC-V
Deepak Shankar
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training Class
Deepak Shankar
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
Deepak Shankar
 

More from Deepak Shankar (20)

How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
Mastering IoT Design: Sense, Process, Connect: Processing: Turning IoT Data i...
 
Modeling Abstraction
Modeling AbstractionModeling Abstraction
Modeling Abstraction
 
Accelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim ArchitectAccelerated development in Automotive E/E Systems using VisualSim Architect
Accelerated development in Automotive E/E Systems using VisualSim Architect
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
 
Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers. Capacity Planning and Power Management of Data Centers.
Capacity Planning and Power Management of Data Centers.
 
Automotive network and gateway simulation
Automotive network and gateway simulationAutomotive network and gateway simulation
Automotive network and gateway simulation
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
 
Using ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionicsUsing ai for optimal time sensitive networking in avionics
Using ai for optimal time sensitive networking in avionics
 
Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0Designing memory controller for ddr5 and hbm2.0
Designing memory controller for ddr5 and hbm2.0
 
Task allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed systemTask allocation on many core-multi processor distributed system
Task allocation on many core-multi processor distributed system
 
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
Introduction to Architecture Exploration of Semiconductor, Embedded Systems, ...
 
Develop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML applicationDevelop High-bandwidth/low latency electronic systems for AI/ML application
Develop High-bandwidth/low latency electronic systems for AI/ML application
 
Webinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE networkWebinar on Latency and throughput computation of automotive EE network
Webinar on Latency and throughput computation of automotive EE network
 
Webinar on radar
Webinar on radarWebinar on radar
Webinar on radar
 
Webinar on RISC-V
Webinar on RISC-VWebinar on RISC-V
Webinar on RISC-V
 
System Architecture Exploration Training Class
System Architecture Exploration Training ClassSystem Architecture Exploration Training Class
System Architecture Exploration Training Class
 
Exploration of Radars and Software Defined Radios using VisualSim
Exploration of  Radars and Software Defined Radios using VisualSimExploration of  Radars and Software Defined Radios using VisualSim
Exploration of Radars and Software Defined Radios using VisualSim
 

Recently uploaded

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
Kamal Acharya
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
PrashantGoswami42
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
MuhammadTufail242431
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
Automobile Management System Project Report.pdf
Automobile Management System Project Report.pdfAutomobile Management System Project Report.pdf
Automobile Management System Project Report.pdf
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 

Evaluating UCIe based multi-die SoC to meet timing and power

  • 1. Evaluating UCIe based multi-die SoC to meet timing and power
  • 2. Logistics of the Webinar 2 All attendees will be placed on mute To ask a question, click on Cloud Chat sign and type the question. Folks are standing by to answer your questions. There will also be a time at the end for Q&A
  • 3. Agenda Overview of UCIe™ — Universal Chiplet Interconnect Express™ Introduction to system modeling with UCIe and other Intellectual Properties Assembling System models using UCIe protocol Examples of SoC architectures using UCIe Use Case Mirabilis Design and VisualSim Architect
  • 5. Background on die-to-die Interconnect •Packing large number of functions at different clock rate onto a monolithic die is not scalable •Solution: Integrate multiple dies into a single package – Chiplets •Chiplet Challenge: • Die-to-die communication is very slow and consumes too much power • No single standard available to handle the routing, signalling and multiple clock domains • Cache coherency across dies • Support for multiple protocols •Exploration: • Need a mechanism to predict the expected latency and power consumption • Test feasibility of different configurations and assign compute resources on individual dies • Study the impact of failures or extreme latency • Explore different scheduling and Quality-of-Service algorithms
  • 6. Universal Chiplet Interconnect Express or UCIe is the Future •Customizable, package-level integration of chiplets • Combines best-in-class die-to-die interconnect and protocol connections from an interoperable, multi-vendor ecosystem • open industry standard interconnect •Offers high-bandwidth, low-latency, power-efficient, and cost-effective on-package connectivity •Implement compute in an advanced process node to deliver power-efficient performance at higher cost with memory and I/O controller reused from earlier design in an established (n-1 or n-2) process node •Future design will incorporate interaction between AI engines on different dies connected and require deterministic latency •Optimal design requires accurate assignment of resource pooling, resource sharing and messaging passing •UCIe theoretical bandwidth is 4x bandwidth of PCIe 6.0 (Tbps range) •Actual bandwidth depends on burst data available, buffer size for the Tx and replay buffer
  • 8. How UCIe Works? Multiple layers separate out the interconnect tasks Physical layer is responsible for the electrical signaling, clocking, link training and sideband Die-to-Die adapter provides the link state management and parameter negotiation for the chiplets. It optionally guarantees reliable delivery of data through CRC and link level retry mechanism. ◦ When multiple protocols are supported, it defines the underlying arbitration mechanism. The FLIT (flow control unit) defines the underlying transfer mechanism when the adapter is responsible for reliable transfer
  • 9. UCIe Packaging Two package types - standard and advanced. ◦ Standard has 16 lanes ◦ Advanced has 64 lanes To increase bandwidth, support for multi-module For 2 modules ◦ Standard has 32 lanes ◦ Advanced has 128 lanes ◦ Each will send different bytes of data Increase in number of lanes as module count increases Multiple PHY logic provides for greater data transfer with better scheduling
  • 10. Transition to UCIe CXL 2.0 PCIe Gen 6 interface PCIe Gen 6 interface Strea ming Typical SoC – monolithic approach Next Gen SoC – Use Chiplets in modular approach
  • 11. Commonality with PCIe 6.0 •UCIe protocol emulates PCIe for chiplets •UCIe transfers packets in FLITs • PCIe 6.0 uses fixed value of 256 bytes • UCIe FLIT Size is variable based on the sender and receiver protocol •Credit based flow control mechanism •Packets use ACK or NAK to confirm good reception • Selective and Standard ACK options •Advanced port status and error checking • CRC checksum •Bandwidth depends on the number of lanes • Standard vs Advanced package • Multi-Module option
  • 12. System-level Architecture Analysis of UCIe based multi-die SoC
  • 13. Multi-Media Application – UCIe Template provided by Intel CPU – High Performance cores CPU – Low Power cores Audio/Video Encoder/Decoder I/O Tile M E M M E M M E M PCIe 6.0 PCIe 6.0 P C I e 6 . 0 C X L 3 . 0 UCIe Retimer Off-Package Interconnect NVMe SSD chiplet UCIe Retimer C X L 3 . 0 How much should the retimer timeout be set to? Do we need a multi module setup? How much should the transfer rate between UCIe links be set to? 4 GTs or 8 GTs … or 32 GTs? Start with a System Block Diagram
  • 14. VisualSim Model Create a VisualSim model using existing building blocks
  • 15. Stats Advanced package, 4 module, 32 GT/s config Standard package, Single module, 4 GT/s config ~300x latency difference can be observed. However, for non-time critical applications, Standard UCIe package option looks attractive Study the statistics to decide on the best configuration
  • 16. Application Examples of UCIe based multi- die SoC
  • 17. Example 1 – Multi-Media applications CPU – High Performance cores CPU – Low Power cores Audio/Video Encoder/Decoder I/O Tile M E M M E M M E M PCIe 6.0 PCIe 6.0 P C I e 6 . 0 C X L 3 . 0 Retimer Off-Package Interconnect
  • 18. Example 2 : Automotive Autonomous Driving UCIe AI Engine Tiles Warp Scheduler PE PE PE PE Local Mem GPU Analog Chiplet ADC DAC PLL ADC DAC PLL Processor subsystem Core L1 B u s SLC
  • 19. Example 3 : Cache Coherency using UCIe UCIe SERDES 32nm GPU 7nm RISC-V Cores 5nm ARM Cores 10nm DSP 10nm SLC chiplet 22nm LPDDR5 28nm C a c h e C a c h e C a c h e C a c h e
  • 20. Design Challenges in Implementing UCIe •Huge memory transaction blocks a high priority control access • For time critical application, these situations are not desirable • Example : Automotive communication system •Multiple chiplets can be connected easily and efficiently • Resource sizing per chiplets needs to be correct to maximize bandwidth usage • Example applications : Data Center and AI Accelerators •Migrating from monolithic die to Chiplet in smartphones is efficient • Limited memory needs to be partitioned for different dies to access with minimal contention • Example: Apple M1 Ultra uses Chiplets to double the performance
  • 21. Performance challenges •User defines CXL stacks with two protocols sharing the physical link. •Arbiter across the Die-to-Die adapter must send Flits alternatively between the 2 protocols. • If one of the Protocol layers doesn’t have data to transmit, then instead of payload, “NOP” frames are inserted. If one of the Protocol stacks is idle for most of the time, then bandwidth could essentially be wasted on the “NOP” frames. •Increasing the number of modules for either the standard or advanced package provides more bandwidth. • But is that extra bandwidth needed for the application? •What happens if multiple chiplets in your design require the data stored at the same address location which is in another chiplet? • Consider the impact of cache coherency •Can peak throughput be guaranteed for your application in a shared resource environment? • AI Engine distributed across multiple dies
  • 22. Analyzing UCIe based multi-die SoC using VisualSim System Model
  • 23. Autonomous driving UCIe AI Engine Tiles Warp Scheduler PE PE PE PE Local Mem GPU Memory chiplet ADC DDR5 Processor subsystem Core L1 B u s SLC • Optimal mesh size (mxn) ? • Best sample size (16 bytes vs 32 bytes etc) ? Use a single protocol stack or multi protocol stack? Do we need PCIe gen6 or still use gen5 for meeting application requirements?
  • 24. VisualSim System Model using UCIe in ADAS SoC
  • 25. Statistics for Multi-Die SoC • Note the AI Engine latency spikes • For multi protocol, half bandwidth for each protocol. • Older gen protocols are mixed with PCIe 6, • Lower FLIT size increases latency.
  • 26. Comparing Different Configurations using UCIe Interface All Die Adapters use PCIe 6.0 Die Adapters use PCIe 6.0 and Streaming Protocols (AXI) Lower latency when using PCIe 6.0
  • 28. About Mirabilis Design Engineering Solutions focused on innovation in electronics Based in Silicon Valley, USA Development and support centers in US, India, Japan, China and Czech 60 large corporations, research centers and 73 universities as customers Enabled 250 products in semiconductors, automotive, defense and space VisualSim Architect is the system simulation and IP for hardware, software and networking
  • 29. Mirabilis Design – Milestones VisualSim Aerospace Simulator of the Year Hardware Modeling 2003 Company Incorporated 2005 Modeling Services 1st Customer 2008 Stochastic Modeling Innovation Award 2010 Integration API 10th customer 2011 Network Modeling University Program 2013 2015 2018 Best ESL at DAC 2nd at Arm TechCon 2019 VisualSim Automotive Europe operations 2020 Failure Analysis Created Asia Team 2021 Best Embedded Systems Presentation Award – DAC 2021 SysML API Requirements 2018 New VisualSim 2022 Best in Show Embedded World 2023 Communication System Designer 2022 System Verilog and UPF/CPF Link
  • 30. VisualSim Architect Cloud and Desktop Multi-simulation engine- Digital, Untimed & Continuous Library of Systems, Networks, Semi, FPGA & Software Generate statistics, documentation & traces Algorithms Protocol AI Insight Performance Power Functional Stochastic Scripting Sim API Performance Latency, Throughput, Buffer occupancy Power Instant, Average, Cumulative, Heat, Temperature Battery and power generation sizing Functionality Correctness, efficiency and Quality-of-Service Failure Analysis and Functional Safety Generate errors and test for compliance Software Evaluation Test quality of C++ and impact on system performance System-level Modeling and Simulation Software that integrates requirements, exploration & verification
  • 31. Over 500 Systems-Level IP Components Comprehensive implementation-accurate Library Traffic • Distribution • Sequence • Trace file • Instruction profile Power • State power table • Power management • Energy harvesters • Battery • RegEx operators SoC Buses • AMBA and Corelink • AHB, APB, AXI, ACE, CHI, CMN600 • Network-on-Chip • TileLink System Bus • PCI / PCI- X / PCIe • Rapid IO • AFDX • OpenVPX • VME • SPI 3.0 • 1553B ARM • M-, R-, 7TDMI • A8, A53, A55, A72, A76, A77, Neoverse Custom Creator • Script language • 600 RegEx fn • Task graph • Tracer • C/C++/Java • Python Stochastic • FIFO/LIFO Queue • Time Queue • Quantity Queue • System Resource • Schedulers • Cyber Security Memory • Memory Controller • DDR DRAM 2,3,4, 5 • LPDDR 2, 3, 4 • HBM, HMC • SDR, QDR, RDRAM Networking • Ethernet & GiE • Audio-Video Bridging • 802.11 and Bluetooth • 5G • Spacewire • CAN-FD • TTEthernet • FlexRay • TSN & IEEE802.1Q • ARINC 664/AFDX Interfaces • Virtual Channel • DMA • Crossbar • Serial Switch • Bridge Algorithms • Signal Processing • Analog • Antenna RTOS • Template • ARINC 653 • AUTOSAR Storage • Flash & NVMe • Storage Array • Disk and SATA • Fibre Channel • FireWire Software • GEM5 • Software code integration • Instruction trace • Statistical software model • Task graph RTL-Like • Clock, Wire-Delay • Registers, Latches • Flip-flop • ALU and FSM • Mux, DeMux • Lookup table Processors • GPU, DSP, mP and mC • RISC-V • SiFive u74 • Nvidia- Drive-PX • PowerPC • X86- Intel and AMD • DSP- TI and ADI • MIPS, Tensilica, SH Reports • Timing and Buffer • Throughput/Util • Ave/peak power • Statistics FPGA • Xilinx- Zynq, Virtex, Kintex • Intel-Stratix, Arria • Microsemi- Smartfusion • Programmable logic template • Interface traffic generator
  • 32. Evaluating UCIe based multi-die SoC to meet timing and power

Editor's Notes

  1. Replace background with something relevant to VisualSim
  2. Replace background with something relevant to VisualSim
  3. Replace background with something relevant to VisualSim
  4. Replace background with something relevant to VisualSim
  5. Replace background with something relevant to VisualSim
  6. Replace background with something relevant to VisualSim
  7. Replace background with something relevant to VisualSim