SlideShare a Scribd company logo
1 of 6
Techniques for Efficient RTL Clock and
Memory Gating Takedown of Next Generation
High-performance Microprocessor Designs
Arun Joseph, Spandana Rachamalla, Rahul Rao, Shashidhar Reddy
IBM Systems, Contact: arujosep@in.ibm.com
 Over the last decade or so, several techniques were proposed for enabling RTL analysis. But with the
advent of FinFET based designs [1], there is renewed focus on dynamic power analysis and mitigation [2],
especially early in the design flow [3] using techniques like clock gating [4] and memory activity takedown.
Additionally, high performance microprocessor design blocks are getting larger, with increasing number of
clock gating domains and with notable differences in activity across these domains and workloads [5].
 The turn around time for performing RTL analysis using quick synthesis followed by netlist based tool
engines [6,7,8] is not efficient for rapid clock and memory activity exploration. Formal techniques [9,10],
though comprehensive, do not accurately capture the dependency of clock and memory activity on the
workloads.
 Techniques in [3], for early activity analysis, require development of dedicated software for test-bench
creation and activity analysis. Also, even though thorough analysis is presented in a graphically rich
manner, further debug and activity reduction of highly active blocks is non-intuitive to the designer.
 In this paper, we present a new platform for enabling rapid clock and memory activity takedown and its
application for design of a next generation industry class high performance microprocessor [1]. To the
best of our knowledge, this is the first such platform which brings together principles of designer level logic
verification, logic debug and light-weight netlist creation, while specifically catering to the requirements of
rapid RTL activity takedown.
Slide 2
Motivation
Slide 3
Main Idea
 Bring together principles of designer level logic
verification trace import and simulation replay [11],
logic debug [12, 14] and virtual logic netlist based
RTL analysis [13], to specifically cater to the
requirements of rapid RTL activity takedown.
 Fig1 shows key EDA building blocks for building the
platform and Fig2 shows how these were “tied
together” to create the EinsCG+ platform for
enabling rapid clock and memory activity takedown.
(Details in foot notes)
Figure 1. Clock & memory gating platform
– pieces of the puzzle
Figure 2. EinsCG+ software architecture
 Enables logic designers with a pre-configured (yet
familiar) platform for meeting clock and memory
gating targets.
 Enables rapid exploration of power saving
opportunities across IP blocks, use-scenarios,
workloads and workload windows. (Input block RTL to
next set of pin-pointed opportunities in ~3 minutes)
 Low platform development cost (~1 month). Except
building block 9, rest are available in modern day
industry class EDA-suites like [14].
Key Idea Key Benefits
 Fig3 shows the generic use model of EinsCG+ to rapidly takedown
clock and memory activity. The quick analyses illustrated in Fig4
aids quick decision making and tracking.
 One key benefit of EinsCG+ is that, if activity needs to be further
reduced, it provides the RTL designer a familiar logic debug
environment specifically preconfigured for clock gating (Fig5).
 The wave view is automatically preloaded with pin-pointed clock
gating opportunities in the design, sorted on return of investment.
The compiled version of design is also preloaded to enable
structural assisted debug like “why”, “trace-back” analysis.
Seamless per simulation cycle clock gating debug across the wave
window, RTL source, hierarchy and logic browser is also enabled.
Slide 4
EinsCG+ Iteration
Figure 3. EinsCG+ iteration: Generic use-model
Figure 4. EinsCG+ quick analyses
(a) Tracking across releases (b) Per clock gating domain multi-workload clock and
data activity report (c) Multi workload memory activity report
Figure 5. EinsCG+ advanced clock gating configured debug view
Source View Wave View
Preloaded with
Sorted & Pin-pointed
Clock Gating
Opportunities
Hierarchy View Logic View
(a)
(b)
(c)
 Use-case1: EinsCG+ helped identify sub-design blocks of a design under test (DUT) not meeting clock and/or memory gating
criteria and independently iterate on those blocks to close on targets, before redoing the analysis on DUT. Capability of
EinsCG+ to perform on-the-fly re-simulation from an existing DUT trace, eliminated the need for higher level DUT analysis for
individual RTL update iteration, while allowing for evaluation of individual block level updates for clock gating by different
designers in parallel.
 Use-case2: Use of vendor IP in microprocessor designs is becoming increasingly common in the era of Open Compute [15].
Such IP blocks are often used in different modes across the design. While the vendor IP blocks may be designed efficiently for
power, incorrect mode configuration can result in high activity and power. Independent EinsCG+ iterative analysis on specific
vendor IP instances in the design enabled ensuring the correct mode configuration.
 Use-case3: EinsCG+ analysis on the design was used to identify activity peaks and corresponding simulation windows. To
takedown these activity peaks, EinsCG+ was used iteratively on simulation windows of interest, especially for larger workloads.
Slide 5
Experimental Evaluation
Figure 7. Use-case2
Vendor IP mode configuration
Figure 6. Use-case1
EinsCG+ iterations on sub-designs
Figure 8. Use-case3
EinsCG+ workload analysis
Slide 6
Summary
 We introduced a first such platform, which brings together principles of designer level logic
verification trace import and simulation replay, logic debug and virtual logic netlist based
RTL analysis to specifically cater to the requirements of rapid RTL clock and memory activity
takedown.
 We demonstrated how the platform was developed in ~1 month using existing EDA building
blocks used in an industry context.
 We presented the application of the platform for the design of a next generation industry
class microprocessor, across a range of use-cases. The platform enabled the path from an
input block RTL to the next iteration of pin-pointed opportunities in ~3 minutes.
 We believe the techniques described in the paper are generic and advocate application of
the same techniques to enable rapid activity takedown.

More Related Content

Similar to Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generation High-performance Microprocessor Designs

Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionShobha Kumar
 
SMI_SNUG_paper_v10
SMI_SNUG_paper_v10SMI_SNUG_paper_v10
SMI_SNUG_paper_v10Igor Lesik
 
Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...eSAT Journals
 
Run time dynamic partial reconfiguration using
Run time dynamic partial reconfiguration usingRun time dynamic partial reconfiguration using
Run time dynamic partial reconfiguration usingeSAT Publishing House
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET Journal
 
Optimizing Linux Kernel for Real-time Performance On Multi-Core Architecture
Optimizing Linux Kernel for Real-time Performance On Multi-Core ArchitectureOptimizing Linux Kernel for Real-time Performance On Multi-Core Architecture
Optimizing Linux Kernel for Real-time Performance On Multi-Core ArchitectureCSCJournals
 
System on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsjournalBEEI
 
RTI-CODES+ISSS-2012-Submission-1
RTI-CODES+ISSS-2012-Submission-1RTI-CODES+ISSS-2012-Submission-1
RTI-CODES+ISSS-2012-Submission-1Serge Amougou
 
IRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance AnalysisIRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance AnalysisIRJET Journal
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
big-book-of-data-science-2ndedition.pdf
big-book-of-data-science-2ndedition.pdfbig-book-of-data-science-2ndedition.pdf
big-book-of-data-science-2ndedition.pdfssuserd397dd
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
Zheng Ma Resume
Zheng Ma ResumeZheng Ma Resume
Zheng Ma ResumeZheng Ma
 

Similar to Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generation High-performance Microprocessor Designs (20)

Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solution
 
SMI_SNUG_paper_v10
SMI_SNUG_paper_v10SMI_SNUG_paper_v10
SMI_SNUG_paper_v10
 
Cisco project ideas
Cisco   project ideasCisco   project ideas
Cisco project ideas
 
Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...Run time dynamic partial reconfiguration using microblaze soft core processor...
Run time dynamic partial reconfiguration using microblaze soft core processor...
 
Run time dynamic partial reconfiguration using
Run time dynamic partial reconfiguration usingRun time dynamic partial reconfiguration using
Run time dynamic partial reconfiguration using
 
Tutor1
Tutor1Tutor1
Tutor1
 
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
IRJET- A Review- FPGA based Architectures for Image Capturing Consequently Pr...
 
Optimizing Linux Kernel for Real-time Performance On Multi-Core Architecture
Optimizing Linux Kernel for Real-time Performance On Multi-Core ArchitectureOptimizing Linux Kernel for Real-time Performance On Multi-Core Architecture
Optimizing Linux Kernel for Real-time Performance On Multi-Core Architecture
 
UNIT 5.docx
UNIT 5.docxUNIT 5.docx
UNIT 5.docx
 
System on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power ElectronicsSystem on Chip Based RTC in Power Electronics
System on Chip Based RTC in Power Electronics
 
RTI-CODES+ISSS-2012-Submission-1
RTI-CODES+ISSS-2012-Submission-1RTI-CODES+ISSS-2012-Submission-1
RTI-CODES+ISSS-2012-Submission-1
 
IRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance AnalysisIRJET- Security, Issues and Algorithm and their Performance Analysis
IRJET- Security, Issues and Algorithm and their Performance Analysis
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Chaitanya_updated resume
Chaitanya_updated resumeChaitanya_updated resume
Chaitanya_updated resume
 
big-book-of-data-science-2ndedition.pdf
big-book-of-data-science-2ndedition.pdfbig-book-of-data-science-2ndedition.pdf
big-book-of-data-science-2ndedition.pdf
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Lab6 rtos
Lab6 rtosLab6 rtos
Lab6 rtos
 
PID2143641
PID2143641PID2143641
PID2143641
 
Zheng Ma Resume
Zheng Ma ResumeZheng Ma Resume
Zheng Ma Resume
 
Modeling and Real-Time Simulation of Induction Motor Using RT-LAB
Modeling and Real-Time Simulation of Induction Motor Using RT-LABModeling and Real-Time Simulation of Induction Motor Using RT-LAB
Modeling and Real-Time Simulation of Induction Motor Using RT-LAB
 

More from Arun Joseph

Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...Arun Joseph
 
FVCAG: A framework for formal verification driven power modelling and verific...
FVCAG: A framework for formal verification driven power modelling and verific...FVCAG: A framework for formal verification driven power modelling and verific...
FVCAG: A framework for formal verification driven power modelling and verific...Arun Joseph
 
Process synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memoriesProcess synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memoriesArun Joseph
 
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...Arun Joseph
 
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...Arun Joseph
 
End to End Self-Heating Analysis Methodology and Toolset for High Performance...
End to End Self-Heating Analysis Methodology and Toolset for High Performance...End to End Self-Heating Analysis Methodology and Toolset for High Performance...
End to End Self-Heating Analysis Methodology and Toolset for High Performance...Arun Joseph
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysisArun Joseph
 

More from Arun Joseph (9)

Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
 
FVCAG: A framework for formal verification driven power modelling and verific...
FVCAG: A framework for formal verification driven power modelling and verific...FVCAG: A framework for formal verification driven power modelling and verific...
FVCAG: A framework for formal verification driven power modelling and verific...
 
FreqLeak
FreqLeakFreqLeak
FreqLeak
 
Process synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memoriesProcess synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memories
 
FirmLeak
FirmLeakFirmLeak
FirmLeak
 
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
 
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
 
End to End Self-Heating Analysis Methodology and Toolset for High Performance...
End to End Self-Heating Analysis Methodology and Toolset for High Performance...End to End Self-Heating Analysis Methodology and Toolset for High Performance...
End to End Self-Heating Analysis Methodology and Toolset for High Performance...
 
Per domain power analysis
Per domain power analysisPer domain power analysis
Per domain power analysis
 

Recently uploaded

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generation High-performance Microprocessor Designs

  • 1. Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generation High-performance Microprocessor Designs Arun Joseph, Spandana Rachamalla, Rahul Rao, Shashidhar Reddy IBM Systems, Contact: arujosep@in.ibm.com
  • 2.  Over the last decade or so, several techniques were proposed for enabling RTL analysis. But with the advent of FinFET based designs [1], there is renewed focus on dynamic power analysis and mitigation [2], especially early in the design flow [3] using techniques like clock gating [4] and memory activity takedown. Additionally, high performance microprocessor design blocks are getting larger, with increasing number of clock gating domains and with notable differences in activity across these domains and workloads [5].  The turn around time for performing RTL analysis using quick synthesis followed by netlist based tool engines [6,7,8] is not efficient for rapid clock and memory activity exploration. Formal techniques [9,10], though comprehensive, do not accurately capture the dependency of clock and memory activity on the workloads.  Techniques in [3], for early activity analysis, require development of dedicated software for test-bench creation and activity analysis. Also, even though thorough analysis is presented in a graphically rich manner, further debug and activity reduction of highly active blocks is non-intuitive to the designer.  In this paper, we present a new platform for enabling rapid clock and memory activity takedown and its application for design of a next generation industry class high performance microprocessor [1]. To the best of our knowledge, this is the first such platform which brings together principles of designer level logic verification, logic debug and light-weight netlist creation, while specifically catering to the requirements of rapid RTL activity takedown. Slide 2 Motivation
  • 3. Slide 3 Main Idea  Bring together principles of designer level logic verification trace import and simulation replay [11], logic debug [12, 14] and virtual logic netlist based RTL analysis [13], to specifically cater to the requirements of rapid RTL activity takedown.  Fig1 shows key EDA building blocks for building the platform and Fig2 shows how these were “tied together” to create the EinsCG+ platform for enabling rapid clock and memory activity takedown. (Details in foot notes) Figure 1. Clock & memory gating platform – pieces of the puzzle Figure 2. EinsCG+ software architecture  Enables logic designers with a pre-configured (yet familiar) platform for meeting clock and memory gating targets.  Enables rapid exploration of power saving opportunities across IP blocks, use-scenarios, workloads and workload windows. (Input block RTL to next set of pin-pointed opportunities in ~3 minutes)  Low platform development cost (~1 month). Except building block 9, rest are available in modern day industry class EDA-suites like [14]. Key Idea Key Benefits
  • 4.  Fig3 shows the generic use model of EinsCG+ to rapidly takedown clock and memory activity. The quick analyses illustrated in Fig4 aids quick decision making and tracking.  One key benefit of EinsCG+ is that, if activity needs to be further reduced, it provides the RTL designer a familiar logic debug environment specifically preconfigured for clock gating (Fig5).  The wave view is automatically preloaded with pin-pointed clock gating opportunities in the design, sorted on return of investment. The compiled version of design is also preloaded to enable structural assisted debug like “why”, “trace-back” analysis. Seamless per simulation cycle clock gating debug across the wave window, RTL source, hierarchy and logic browser is also enabled. Slide 4 EinsCG+ Iteration Figure 3. EinsCG+ iteration: Generic use-model Figure 4. EinsCG+ quick analyses (a) Tracking across releases (b) Per clock gating domain multi-workload clock and data activity report (c) Multi workload memory activity report Figure 5. EinsCG+ advanced clock gating configured debug view Source View Wave View Preloaded with Sorted & Pin-pointed Clock Gating Opportunities Hierarchy View Logic View (a) (b) (c)
  • 5.  Use-case1: EinsCG+ helped identify sub-design blocks of a design under test (DUT) not meeting clock and/or memory gating criteria and independently iterate on those blocks to close on targets, before redoing the analysis on DUT. Capability of EinsCG+ to perform on-the-fly re-simulation from an existing DUT trace, eliminated the need for higher level DUT analysis for individual RTL update iteration, while allowing for evaluation of individual block level updates for clock gating by different designers in parallel.  Use-case2: Use of vendor IP in microprocessor designs is becoming increasingly common in the era of Open Compute [15]. Such IP blocks are often used in different modes across the design. While the vendor IP blocks may be designed efficiently for power, incorrect mode configuration can result in high activity and power. Independent EinsCG+ iterative analysis on specific vendor IP instances in the design enabled ensuring the correct mode configuration.  Use-case3: EinsCG+ analysis on the design was used to identify activity peaks and corresponding simulation windows. To takedown these activity peaks, EinsCG+ was used iteratively on simulation windows of interest, especially for larger workloads. Slide 5 Experimental Evaluation Figure 7. Use-case2 Vendor IP mode configuration Figure 6. Use-case1 EinsCG+ iterations on sub-designs Figure 8. Use-case3 EinsCG+ workload analysis
  • 6. Slide 6 Summary  We introduced a first such platform, which brings together principles of designer level logic verification trace import and simulation replay, logic debug and virtual logic netlist based RTL analysis to specifically cater to the requirements of rapid RTL clock and memory activity takedown.  We demonstrated how the platform was developed in ~1 month using existing EDA building blocks used in an industry context.  We presented the application of the platform for the design of a next generation industry class microprocessor, across a range of use-cases. The platform enabled the path from an input block RTL to the next iteration of pin-pointed opportunities in ~3 minutes.  We believe the techniques described in the paper are generic and advocate application of the same techniques to enable rapid activity takedown.

Editor's Notes

  1. [1] Thompto, et al. “POWER9 Processor for the Cognitive Era,” HotChips 2016 [2] http://www.edn.com/electronics-blogs/eda-power-up/4438874/FinFET-impact-on-dynamic-power [3] Putting On the Dynamic Power Glasses: A FinFET-Aware Approach for Early Realistic Block Activity Analysis and Exploration, DAC’16 [4] Jacobson et al. "Stretching the limits of clock-gating efficiency in server-class processors." In High-Performance Computer Architecture, 2005. HPCA-11. [5] Efficient Techniques for Per Clock Gating Domain Contributor based Power Abstraction of IP Blocks for Hierarchical Power Analysis, DAC’16 [6] Guy D et al., “VDHL/Verilog expertise and gate synthesis automation system”, Patent US 6,289,498 Bl. [7] P. Hurst. Automatic synthesis of clock gating logic with controlled netlist perturbation. In Proc. of DAC ’08, pages 654–657, 2008. [8] Sundaresan, K. et al;, “A Tool for Exploring Advanced RTL Clock Gating Opportunities in Microprocessor Design” [9] Arbel, E.; Eisner, C.; Rokhlenko, O., "Resurrecting infeasible clock-gating functions," Design Automation Conference, 2009. DAC '09. [10] Y. Kuo, S. Weng, and S. Chang. A novel sequential circuit optimization with clock gating logic. In Proc. Of the ICCAD ’08, pages 230–233, 2008. Memory activity takedown: These are techniques to essentially reduce say the number of reads and writes per cycle to arrays and other memory elements.
  2. [11] S. Bergman et al., "Designer-level verification — An industrial experience story," DATE 2015 [12] B. Wile, J. C. Goss, and W. Roesner, Comprehensive Functional Verification: The Complete Industry Cycle. Elsevier, 2005 [13] “Virtual logic netlist: Enabling efficient RTL analysis”, Sixteenth International Symposium on Quality Electronic Design, 2015 [14] Darringer et al., "EDA in IBM: past, present, and future," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Dec 2000. Details of the flow: As shown in figure 2, the input RTL is first compiled and then structural analysis is performed to generate structural data and simulation directives. Simulation directives are essentially used for determining which signals in the simulation model need to be monitored to get clock and data switching activity. These are often the signals at the input, output of latches and design macros, read/write ports of arrays and other memory elements. Additional structural information like per clock domain information (how many clock gating domains in a macro, which latches fall into each of those domains) is also extracted from the RTL model. This is later on used for capturing and tracking clock activity and data activity of macros on a per clock gating domain basis. Next, from any higher level (this can be a system, chip, core, or unit level) simulation trace (either from pre-silicon logic simulators, hardware-accelerated simulators etc), for the simulation window of interest, a scenario is generated using a designer level verification tool [11]. The simulation window can be the full simulation cycle or a subset of the cycle where high clock or memory activity is seen or any other window of interest. For example, from a higher level simulation of 0-3000 cycle, the EinsCG+ sim window can be 1200-1300 cycles where a activity peak is seen. Logic simulation is then performed using this scenario as a test case, along with the simulation directives and a simulation model (generated from the input RTL), to generate switching activity data. Light-weight netlist creators like VLN [13] are used to enable the reuse of some backend power analysis engines for performing workload aware clock and memory gating analyses. Virtual logic netlist (VLN), is a incomplete yet logical netlist graph of the design. VLN enables rapid RTL analysis using backend tool engines without the need for time-intensive synthesis techniques. The outputs of EinsCG+ are various activity analyses and a logic debug environment, specially preconfigured with pin-pointed opportunities for further clock gating debug. Multiple activity analyses like per clock gating domain clock and data analysis, memory gating analysis, redundancy removal estimation are performed and reports are generated. Benefits: When evaluated across different macros used in the design of [1], the time from an input block RTL to next set of pin-pointed gating opportunities for the same block is within ~3 minutes. This was evaluated across a broad range of macros from the core and uncore macros (blocks) of a next generation high performance microprocessor design [1], and also across different workloads traces. The turn around time for smaller macros was ~1 minute or even lesser. For very large macros this was within ~8 minutes, while for most macros it was within ~3 minutes. Bigger macros are those with much more complex logic, more number of latches and clock gating domains (>~1000). The development cost for the overall platform was ~1 month. This effort was primarily related to tying together the different building blocks 1-9 and also for overall user-experience improvements.
  3. Details: Figure 4: In addition to the reporting and visualization techniques in [3], EinsCG+ analysis also enables reporting (and visualization) of both clock and data activity of macros on a per clock gating domain basis (as shown in Fig4) across multiple workloads. This is critical for next generation high performance microprocessor designs like [1], where macros are getting larger with every generation, and with increasing number of clock gating domains (in some cases, with even more that 1000 clock gating domains in a single macro) and with notable differences in activity across these clock gating domains and across workloads [5]. Additionally, EinsCG+ also enables analysis of memory activity events like readspercycle, writespercycle etc to help focus on the reduction of memory activity. Figure 5: To further reduce activity the logic designer can simply focus on debugging the pin-pointed opportunities for gating presented in the familiar logic debug frontend, in the order listed in the wave viewer (Fig5). Once some set of actions to reduce activity have been taken, the logic designer can simply iterate (as shown in Fig3), to get to the next set of opportunities in ~3 minutes. Once the activity targets have been met, further opportunities are not presented.
  4. Presented are the three specific use-cases of EinsCG+ during design of next generation microprocessors like [1]. [15] Open Compute Project: http://www.opencompute.org/