SlideShare a Scribd company logo
1 of 24
Download to read offline
AndesClarity for RISC-V
Vector Processor
Chuan-Hua Chang, Ph.D.
Associate VP, Architecture
Andes Technology
RISC-V Summit, 12/2020
Agenda
Overview of AndesClarity
1
AndesCore™ NX27V Pipeline
2
Program Analysis Example
3
Concluding Remarks
4
AndesClarity
Taking RISC-V® Mainstream 4
Overview of AndesClarity
• A pipeline visualizer/analyzer for Andes V5 processors.
– Performance statistics and execution bottleneck.
– Ideal for complex pipelines, esp. NX27V vector processor.
• Integrated into AndeSight™ IDE as a plugin
– Using execution log from Andes core simulator (AndeSim).
– Easily linked to source code from RISC-V instructions.
• Representing graphically with performance information
– High-level “Instruction Per Cycle” information
– Instruction execution pipelining flow
– Data dependencies & resource usages
Taking RISC-V® Mainstream 5
Usage scenarios for AndesClarity
• Algorithm tuning:
– Identify bottleneck from pipeline stalls or resource usage.
– Experiment on different enhancements of the same task.
• Compiler enhancement and flag tuning:
– Identify issues of compiler generated code.
– Compare performance with different compiler options.
• Architecture exploration:
– Explore different SoC architecture and processor configurations.
– Discover potential processor micro-architecture improvements.
Taking RISC-V® Mainstream 6
AndesClarity Main Interfaces
• Performance viewer
– IPC (Instruction Per Cycle)
– OPC (Operation Per Cycle), designed for vector instructions.
• Pipeline stage viewer
– Instruction-centric view
– Resource-centric view
Taking RISC-V® Mainstream 7
Performance Viewer
• Timeline view of “Instruction per Cycle” or “Operation per
Cycle”.
• A vector instruction has VL operations, instead of 1.
• User can zoom in to find more details.
Taking RISC-V® Mainstream 8
Instruction-Centric Pipeline Viewer
• Instruction sequence vs instruction pipeline stage flow. Along
with utilized resources.
• Focused instruction can be highlighted.
Taking RISC-V® Mainstream 9
Resource-Centric Pipeline Viewer
• Pipeline stages/resources vs instruction occupancy.
• Instruction footprint on multiple paths & resources can be
examined.
Taking RISC-V® Mainstream 10
Display Dependency & Stall Reason
• Dependent instructions (producer & consumer) can be
highlighted.
• Display stall reason to help identify performance issues.
Example Vector
Processor Pipeline
Taking RISC-V® Mainstream 12
AndesCore™ NX27V
Fetch Decode Execute Memory Retire
Integer
Execution
Unit
Data
Cache
Exception
Handling
IFU GPR
V P U
Vector
scalar
V
I
Q
V/F/D
insn
More
Custom
Coprocessor
command
data
A C E pipeline Execute
Streaming
Ports
Vector Program
Analysis Example
Taking RISC-V® Mainstream 14
Vectorizing FDCT: Initial Development
• With repeated code sequence as follows:
vmul.vv v2, v10, v20
vredsum.vs v3, v2, v1
vmv.x.s a4, v3
sw a4, 136(sp)
• Average performance / Iteration: 16 cycles.
• Discovery:
– “sw” is waiting for a4 to be ready and blocking later vector
instructions from entering into vector pipeline.
– Frequent interaction between scalar and vector pipeline is not good.
Taking RISC-V® Mainstream 15
AndesClarity for FDCT Initial Opt.
32 cycles / 2 iterations
Vector instruction queue is not as full.
Taking RISC-V® Mainstream 16
Vectorizing FDCT: 2nd Optimization
• Do not move data to scalar GPR, use masked vector store
instead.
• With repeated code sequence as follows:
vmul.vv v2, v10, v20
vredsum.vs v3, v2, v1
vsw.v v3, (s10), v0.t
addi s10, sp, imm
• Average performance / Iteration: 8.3 cycles.
• Discovery:
– Vector instruction queue is full. This is good. However, …
– Not efficient to vector store just one element.
Taking RISC-V® Mainstream 17
AndesClarity for FDCT 2nd Opt.
25 cycles / 3 iterations
Vector instruction queue is full now.
Taking RISC-V® Mainstream 18
Vectorizing FDCT: 3rd Optimization
• Use vslideup to gather data into vector registers.
• With repeated code sequence as follows:
vmul.vv v2, v10, v20
vredsum.vs v3, v2, v1
vslideup.vi v24, v3, const, v0.t
• Average performance / Iteration: 6.75 cycles.
• Discovery:
– “vmul” of next iteration cannot enter VIQ.
– Too much dependency. Functional units overlapping is low.
Taking RISC-V® Mainstream 19
AndesClarity for FDCT 3rd Opt.
27 cycles / 4 iterations
Taking RISC-V® Mainstream 20
Vectorizing FDCT: 4th Optimization
• Interleave iterations 3 times to reduce dependency.
• Contains repeated code sequence as follows:
vmul.vv v2,…; vmul.vv v4,…; vmul.vv v6,…
vredsum.vs v3, v2,…; vredsum.vs v5, v4,…; vredsum.vs v7, v6,…
vslideup.vi v24, v3,…; vslideup.vi v25, v3,…; vslideup.vi v26, v3,…
• Average performance / Iteration: 6.1 cycles.
• Discovery:
– Utilization of function units increases.
– Iteration latency is dominated by non-pipelined “vredsum”.
Taking RISC-V® Mainstream 21
AndesClarity for FDCT 4th Opt.
37 cycles / 6 iterations
Taking RISC-V® Mainstream 22
What Users Can Learn
• Processor micro-architecture characteristics.
• Execution latencies of instructions.
• Interaction between scalar pipeline and vector pipeline.
– Vector load/store on data cache.
– Data movement between scalar and vector GPRs.
• Reasons of lost performance.
• Resource utilization and data bandwidth under different code
sequences.
Taking RISC-V® Mainstream 23
Concluding Remarks
• AndesClarity can help a user to quickly
– understand the bottleneck of an application code sequences.
– learn the complex pipeline/micro-architecture characteristics of a
powerful vector processor.
– discover enhancements to improve the application performance.
• AndesClarity is a powerful tool for vector processors, e.g.,
AndesCore™ NX27V Vector processor.
• AndesClarity is integrated in AndeSight™ development
environment as a plugin for easy application profiling and
debugging.
Andes andes clarity for risc-v vector processor

More Related Content

What's hot

FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
 
RISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a TimeRISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a TimeAtish Patra
 
Basic of AI Accelerator Design using Verilog HDL
Basic of AI Accelerator Design using Verilog HDLBasic of AI Accelerator Design using Verilog HDL
Basic of AI Accelerator Design using Verilog HDLJoohan KIM
 
Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR ...
Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR ...Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR ...
Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR ...Förderverein Technische Fakultät
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingRISC-V International
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesAMD
 
VLSI Fresher Resume
VLSI Fresher ResumeVLSI Fresher Resume
VLSI Fresher Resumevikas kumar
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
RISC-V Summit 2020: The Next Ten Years
RISC-V Summit 2020: The Next Ten YearsRISC-V Summit 2020: The Next Ten Years
RISC-V Summit 2020: The Next Ten YearsRISC-V International
 
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD
 
Closing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzingClosing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzingRISC-V International
 

What's hot (20)

FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
RISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a TimeRISC-V Boot Process: One Step at a Time
RISC-V Boot Process: One Step at a Time
 
Basic of AI Accelerator Design using Verilog HDL
Basic of AI Accelerator Design using Verilog HDLBasic of AI Accelerator Design using Verilog HDL
Basic of AI Accelerator Design using Verilog HDL
 
Open j9 jdk on RISC-V
Open j9 jdk on RISC-VOpen j9 jdk on RISC-V
Open j9 jdk on RISC-V
 
ASIC_Design.pdf
ASIC_Design.pdfASIC_Design.pdf
ASIC_Design.pdf
 
Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR ...
Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR ...Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR ...
Versatile Video Coding – Video Compression beyond HEVC: Coding Tools for SDR ...
 
AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
 
Introduction to RISC-V
Introduction to RISC-VIntroduction to RISC-V
Introduction to RISC-V
 
Getting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testingGetting started with RISC-V verification what's next after compliance testing
Getting started with RISC-V verification what's next after compliance testing
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
VLSI Fresher Resume
VLSI Fresher ResumeVLSI Fresher Resume
VLSI Fresher Resume
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Verification Challenges and Methodologies
Verification Challenges and MethodologiesVerification Challenges and Methodologies
Verification Challenges and Methodologies
 
Vhdl
VhdlVhdl
Vhdl
 
RISC-V Summit 2020: The Next Ten Years
RISC-V Summit 2020: The Next Ten YearsRISC-V Summit 2020: The Next Ten Years
RISC-V Summit 2020: The Next Ten Years
 
Project ACRN hypervisor introduction
Project ACRN hypervisor introduction Project ACRN hypervisor introduction
Project ACRN hypervisor introduction
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
Closing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzingClosing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzing
 
Pci express technology 3.0
Pci express technology 3.0Pci express technology 3.0
Pci express technology 3.0
 

Similar to Andes andes clarity for risc-v vector processor

Soc architecture and design
Soc architecture and designSoc architecture and design
Soc architecture and designSatya Harish
 
VLSI lab manual Part A, VTU 7the sem KIT-tiptur
VLSI lab manual Part A, VTU 7the sem KIT-tipturVLSI lab manual Part A, VTU 7the sem KIT-tiptur
VLSI lab manual Part A, VTU 7the sem KIT-tipturPramod Kumar S
 
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinksVSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinksOPNFV
 
Tutorial getting started with RISC-V verification
Tutorial getting started with RISC-V verificationTutorial getting started with RISC-V verification
Tutorial getting started with RISC-V verificationRISC-V International
 
Presentación Laboratorio NFV de Telefónica de Antonio Elizondo
Presentación Laboratorio NFV de Telefónica de Antonio ElizondoPresentación Laboratorio NFV de Telefónica de Antonio Elizondo
Presentación Laboratorio NFV de Telefónica de Antonio Elizondovideos
 
mSwitch: A Highly-Scalable, Modular Software Switch
mSwitch: A Highly-Scalable, Modular Software SwitchmSwitch: A Highly-Scalable, Modular Software Switch
mSwitch: A Highly-Scalable, Modular Software Switchmicchie
 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationRECAP Project
 
VMworld 2013: Designing Network Virtualization for Data-Centers: Greenfield D...
VMworld 2013: Designing Network Virtualization for Data-Centers: Greenfield D...VMworld 2013: Designing Network Virtualization for Data-Centers: Greenfield D...
VMworld 2013: Designing Network Virtualization for Data-Centers: Greenfield D...VMworld
 
Understanding network and service virtualization
Understanding network and service virtualizationUnderstanding network and service virtualization
Understanding network and service virtualizationSDN Hub
 
IBM InterConnect: Java vs JavaScript for Enterprise WebApps
IBM InterConnect: Java vs JavaScript for Enterprise WebAppsIBM InterConnect: Java vs JavaScript for Enterprise WebApps
IBM InterConnect: Java vs JavaScript for Enterprise WebAppsChris Bailey
 
VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...
VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...
VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...VMworld
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresRISC-V International
 
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationBKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationLinaro
 
Network and Service Virtualization tutorial at ONUG Spring 2015
Network and Service Virtualization tutorial at ONUG Spring 2015Network and Service Virtualization tutorial at ONUG Spring 2015
Network and Service Virtualization tutorial at ONUG Spring 2015SDN Hub
 
VMworld 2013: Datacenter Transformation with Network Virtualization: Today an...
VMworld 2013: Datacenter Transformation with Network Virtualization: Today an...VMworld 2013: Datacenter Transformation with Network Virtualization: Today an...
VMworld 2013: Datacenter Transformation with Network Virtualization: Today an...VMworld
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Net1674 final emea
Net1674 final emeaNet1674 final emea
Net1674 final emeaVMworld
 
DESIGN AND IMPLEMENTATION OF I2C AND UART BLOCK IMPLEMENTATION FOR RISC-V SOC
DESIGN AND IMPLEMENTATION OF I2C AND UART BLOCK IMPLEMENTATION FOR RISC-V SOCDESIGN AND IMPLEMENTATION OF I2C AND UART BLOCK IMPLEMENTATION FOR RISC-V SOC
DESIGN AND IMPLEMENTATION OF I2C AND UART BLOCK IMPLEMENTATION FOR RISC-V SOCIRJET Journal
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane Michelle Holley
 

Similar to Andes andes clarity for risc-v vector processor (20)

Soc architecture and design
Soc architecture and designSoc architecture and design
Soc architecture and design
 
VLSI lab manual Part A, VTU 7the sem KIT-tiptur
VLSI lab manual Part A, VTU 7the sem KIT-tipturVLSI lab manual Part A, VTU 7the sem KIT-tiptur
VLSI lab manual Part A, VTU 7the sem KIT-tiptur
 
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinksVSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
 
Tutorial getting started with RISC-V verification
Tutorial getting started with RISC-V verificationTutorial getting started with RISC-V verification
Tutorial getting started with RISC-V verification
 
Presentación Laboratorio NFV de Telefónica de Antonio Elizondo
Presentación Laboratorio NFV de Telefónica de Antonio ElizondoPresentación Laboratorio NFV de Telefónica de Antonio Elizondo
Presentación Laboratorio NFV de Telefónica de Antonio Elizondo
 
mSwitch: A Highly-Scalable, Modular Software Switch
mSwitch: A Highly-Scalable, Modular Software SwitchmSwitch: A Highly-Scalable, Modular Software Switch
mSwitch: A Highly-Scalable, Modular Software Switch
 
Optimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource ConfigurationOptimising Service Deployment and Infrastructure Resource Configuration
Optimising Service Deployment and Infrastructure Resource Configuration
 
VMworld 2013: Designing Network Virtualization for Data-Centers: Greenfield D...
VMworld 2013: Designing Network Virtualization for Data-Centers: Greenfield D...VMworld 2013: Designing Network Virtualization for Data-Centers: Greenfield D...
VMworld 2013: Designing Network Virtualization for Data-Centers: Greenfield D...
 
Understanding network and service virtualization
Understanding network and service virtualizationUnderstanding network and service virtualization
Understanding network and service virtualization
 
IBM InterConnect: Java vs JavaScript for Enterprise WebApps
IBM InterConnect: Java vs JavaScript for Enterprise WebAppsIBM InterConnect: Java vs JavaScript for Enterprise WebApps
IBM InterConnect: Java vs JavaScript for Enterprise WebApps
 
VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...
VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...
VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...
 
Simplify Networking for Containers
Simplify Networking for ContainersSimplify Networking for Containers
Simplify Networking for Containers
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V cores
 
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP IntegrationBKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
BKK16-409 VOSY Switch Port to ARMv8 Platforms and ODP Integration
 
Network and Service Virtualization tutorial at ONUG Spring 2015
Network and Service Virtualization tutorial at ONUG Spring 2015Network and Service Virtualization tutorial at ONUG Spring 2015
Network and Service Virtualization tutorial at ONUG Spring 2015
 
VMworld 2013: Datacenter Transformation with Network Virtualization: Today an...
VMworld 2013: Datacenter Transformation with Network Virtualization: Today an...VMworld 2013: Datacenter Transformation with Network Virtualization: Today an...
VMworld 2013: Datacenter Transformation with Network Virtualization: Today an...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Net1674 final emea
Net1674 final emeaNet1674 final emea
Net1674 final emea
 
DESIGN AND IMPLEMENTATION OF I2C AND UART BLOCK IMPLEMENTATION FOR RISC-V SOC
DESIGN AND IMPLEMENTATION OF I2C AND UART BLOCK IMPLEMENTATION FOR RISC-V SOCDESIGN AND IMPLEMENTATION OF I2C AND UART BLOCK IMPLEMENTATION FOR RISC-V SOC
DESIGN AND IMPLEMENTATION OF I2C AND UART BLOCK IMPLEMENTATION FOR RISC-V SOC
 
Scaling the Container Dataplane
Scaling the Container Dataplane Scaling the Container Dataplane
Scaling the Container Dataplane
 

More from RISC-V International

London Open Source Meetup for RISC-V
London Open Source Meetup for RISC-VLondon Open Source Meetup for RISC-V
London Open Source Meetup for RISC-VRISC-V International
 
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...RISC-V International
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VRISC-V International
 
Standardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-VStandardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-VRISC-V International
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipRISC-V International
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V International
 
RISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_genRISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_genRISC-V International
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V International
 
RISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V International
 
RISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notesRISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notesRISC-V International
 
RISC-V software state of the union
RISC-V software state of the unionRISC-V software state of the union
RISC-V software state of the unionRISC-V International
 
Ripes tracking computer architecture throught visual and interactive simula...
Ripes   tracking computer architecture throught visual and interactive simula...Ripes   tracking computer architecture throught visual and interactive simula...
Ripes tracking computer architecture throught visual and interactive simula...RISC-V International
 
Open source manufacturable pdk for sky water 130nm process node
Open source manufacturable pdk for sky water 130nm process nodeOpen source manufacturable pdk for sky water 130nm process node
Open source manufacturable pdk for sky water 130nm process nodeRISC-V International
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processorsRISC-V International
 

More from RISC-V International (20)

WD RISC-V inliner work effort
WD RISC-V inliner work effortWD RISC-V inliner work effort
WD RISC-V inliner work effort
 
RISC-V Zce Extension
RISC-V Zce ExtensionRISC-V Zce Extension
RISC-V Zce Extension
 
RISC-V Online Tutor
RISC-V Online TutorRISC-V Online Tutor
RISC-V Online Tutor
 
London Open Source Meetup for RISC-V
London Open Source Meetup for RISC-VLondon Open Source Meetup for RISC-V
London Open Source Meetup for RISC-V
 
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...Ziptillion   boosting RISC-V with an efficient and os transparent memory comp...
Ziptillion boosting RISC-V with an efficient and os transparent memory comp...
 
Static partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-VStatic partitioning virtualization on RISC-V
Static partitioning virtualization on RISC-V
 
Standardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-VStandardizing the tee with global platform and RISC-V
Standardizing the tee with global platform and RISC-V
 
Security and functional safety
Security and functional safetySecurity and functional safety
Security and functional safety
 
Reverse Engineering of Rocket Chip
Reverse Engineering of Rocket ChipReverse Engineering of Rocket Chip
Reverse Engineering of Rocket Chip
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor Family
 
RISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_genRISC-V 30910 kassem_ summit 2020 - so_c_gen
RISC-V 30910 kassem_ summit 2020 - so_c_gen
 
RISC-V 30908 patra
RISC-V 30908 patraRISC-V 30908 patra
RISC-V 30908 patra
 
RISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentorRISC-V 30907 summit 2020 joint picocom_mentor
RISC-V 30907 summit 2020 joint picocom_mentor
 
RISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmwareRISC-V 30906 hex five multi_zone iot firmware
RISC-V 30906 hex five multi_zone iot firmware
 
RISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notesRISC-V 30946 manuel_offenberg_v3_notes
RISC-V 30946 manuel_offenberg_v3_notes
 
RISC-V software state of the union
RISC-V software state of the unionRISC-V software state of the union
RISC-V software state of the union
 
Ripes tracking computer architecture throught visual and interactive simula...
Ripes   tracking computer architecture throught visual and interactive simula...Ripes   tracking computer architecture throught visual and interactive simula...
Ripes tracking computer architecture throught visual and interactive simula...
 
Porting tock to open titan
Porting tock to open titanPorting tock to open titan
Porting tock to open titan
 
Open source manufacturable pdk for sky water 130nm process node
Open source manufacturable pdk for sky water 130nm process nodeOpen source manufacturable pdk for sky water 130nm process node
Open source manufacturable pdk for sky water 130nm process node
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processors
 

Recently uploaded

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 

Recently uploaded (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

Andes andes clarity for risc-v vector processor

  • 1. AndesClarity for RISC-V Vector Processor Chuan-Hua Chang, Ph.D. Associate VP, Architecture Andes Technology RISC-V Summit, 12/2020
  • 2. Agenda Overview of AndesClarity 1 AndesCore™ NX27V Pipeline 2 Program Analysis Example 3 Concluding Remarks 4
  • 4. Taking RISC-V® Mainstream 4 Overview of AndesClarity • A pipeline visualizer/analyzer for Andes V5 processors. – Performance statistics and execution bottleneck. – Ideal for complex pipelines, esp. NX27V vector processor. • Integrated into AndeSight™ IDE as a plugin – Using execution log from Andes core simulator (AndeSim). – Easily linked to source code from RISC-V instructions. • Representing graphically with performance information – High-level “Instruction Per Cycle” information – Instruction execution pipelining flow – Data dependencies & resource usages
  • 5. Taking RISC-V® Mainstream 5 Usage scenarios for AndesClarity • Algorithm tuning: – Identify bottleneck from pipeline stalls or resource usage. – Experiment on different enhancements of the same task. • Compiler enhancement and flag tuning: – Identify issues of compiler generated code. – Compare performance with different compiler options. • Architecture exploration: – Explore different SoC architecture and processor configurations. – Discover potential processor micro-architecture improvements.
  • 6. Taking RISC-V® Mainstream 6 AndesClarity Main Interfaces • Performance viewer – IPC (Instruction Per Cycle) – OPC (Operation Per Cycle), designed for vector instructions. • Pipeline stage viewer – Instruction-centric view – Resource-centric view
  • 7. Taking RISC-V® Mainstream 7 Performance Viewer • Timeline view of “Instruction per Cycle” or “Operation per Cycle”. • A vector instruction has VL operations, instead of 1. • User can zoom in to find more details.
  • 8. Taking RISC-V® Mainstream 8 Instruction-Centric Pipeline Viewer • Instruction sequence vs instruction pipeline stage flow. Along with utilized resources. • Focused instruction can be highlighted.
  • 9. Taking RISC-V® Mainstream 9 Resource-Centric Pipeline Viewer • Pipeline stages/resources vs instruction occupancy. • Instruction footprint on multiple paths & resources can be examined.
  • 10. Taking RISC-V® Mainstream 10 Display Dependency & Stall Reason • Dependent instructions (producer & consumer) can be highlighted. • Display stall reason to help identify performance issues.
  • 12. Taking RISC-V® Mainstream 12 AndesCore™ NX27V Fetch Decode Execute Memory Retire Integer Execution Unit Data Cache Exception Handling IFU GPR V P U Vector scalar V I Q V/F/D insn More Custom Coprocessor command data A C E pipeline Execute Streaming Ports
  • 14. Taking RISC-V® Mainstream 14 Vectorizing FDCT: Initial Development • With repeated code sequence as follows: vmul.vv v2, v10, v20 vredsum.vs v3, v2, v1 vmv.x.s a4, v3 sw a4, 136(sp) • Average performance / Iteration: 16 cycles. • Discovery: – “sw” is waiting for a4 to be ready and blocking later vector instructions from entering into vector pipeline. – Frequent interaction between scalar and vector pipeline is not good.
  • 15. Taking RISC-V® Mainstream 15 AndesClarity for FDCT Initial Opt. 32 cycles / 2 iterations Vector instruction queue is not as full.
  • 16. Taking RISC-V® Mainstream 16 Vectorizing FDCT: 2nd Optimization • Do not move data to scalar GPR, use masked vector store instead. • With repeated code sequence as follows: vmul.vv v2, v10, v20 vredsum.vs v3, v2, v1 vsw.v v3, (s10), v0.t addi s10, sp, imm • Average performance / Iteration: 8.3 cycles. • Discovery: – Vector instruction queue is full. This is good. However, … – Not efficient to vector store just one element.
  • 17. Taking RISC-V® Mainstream 17 AndesClarity for FDCT 2nd Opt. 25 cycles / 3 iterations Vector instruction queue is full now.
  • 18. Taking RISC-V® Mainstream 18 Vectorizing FDCT: 3rd Optimization • Use vslideup to gather data into vector registers. • With repeated code sequence as follows: vmul.vv v2, v10, v20 vredsum.vs v3, v2, v1 vslideup.vi v24, v3, const, v0.t • Average performance / Iteration: 6.75 cycles. • Discovery: – “vmul” of next iteration cannot enter VIQ. – Too much dependency. Functional units overlapping is low.
  • 19. Taking RISC-V® Mainstream 19 AndesClarity for FDCT 3rd Opt. 27 cycles / 4 iterations
  • 20. Taking RISC-V® Mainstream 20 Vectorizing FDCT: 4th Optimization • Interleave iterations 3 times to reduce dependency. • Contains repeated code sequence as follows: vmul.vv v2,…; vmul.vv v4,…; vmul.vv v6,… vredsum.vs v3, v2,…; vredsum.vs v5, v4,…; vredsum.vs v7, v6,… vslideup.vi v24, v3,…; vslideup.vi v25, v3,…; vslideup.vi v26, v3,… • Average performance / Iteration: 6.1 cycles. • Discovery: – Utilization of function units increases. – Iteration latency is dominated by non-pipelined “vredsum”.
  • 21. Taking RISC-V® Mainstream 21 AndesClarity for FDCT 4th Opt. 37 cycles / 6 iterations
  • 22. Taking RISC-V® Mainstream 22 What Users Can Learn • Processor micro-architecture characteristics. • Execution latencies of instructions. • Interaction between scalar pipeline and vector pipeline. – Vector load/store on data cache. – Data movement between scalar and vector GPRs. • Reasons of lost performance. • Resource utilization and data bandwidth under different code sequences.
  • 23. Taking RISC-V® Mainstream 23 Concluding Remarks • AndesClarity can help a user to quickly – understand the bottleneck of an application code sequences. – learn the complex pipeline/micro-architecture characteristics of a powerful vector processor. – discover enhancements to improve the application performance. • AndesClarity is a powerful tool for vector processors, e.g., AndesCore™ NX27V Vector processor. • AndesClarity is integrated in AndeSight™ development environment as a plugin for easy application profiling and debugging.