SlideShare a Scribd company logo
1 of 10
1
Efficient Techniques for Per Clock Gating Domain
Contributor based Power Abstraction of IP Blocks
for Hierarchical Power Analysis
Arun Joseph, Nagu Dhanwada, Spandana Rachamalla, William Dungan,
Ricardo Nigaglioni
IBM Systems Group
Motivation
 IP blocks are becoming larger with increased number of
clock gating domains.
 Much more aggressive solutions are being adopted for
improving the clock gating of these very large IP blocks.
 There are several workloads for which significant
heterogeneity in both clock and data activity is seen across
the multiple clock gating domains within an IP block.
2
3
Background: Contributor based Power Analysis Flow
Library
Characterization
Corner 1 ………. Corner N
Contributor
Power
Model
Contributor
based Macro
Power
Abstract
Chip Level
Power Analysis
Corner 1 ……. Corner N
Workload 1….Workload N
Input to Wafer Test,
System Planning,
Power Sorting and Binning
Cell
Library
Macro Power
Abstract
Generation
Macro/IP Block
Chip
Background
 Accurate and efficient full chip power analysis is an important step in
the design of power efficient microprocessor and SoC chips.
 A power model abstraction flow based on contributors is PVT-
independent and enables efficient hierarchical chip level power
analysis.
 The dynamic power and leakage power model abstraction primarily
targeted for full chip power analysis was presented in [3]. This prior
work was based on parameterizing capacitance switching due to clock
gating onto a single clock gate control for the entire macro.
 This approximation to a single macro wide clock gate control works
fairly well for full chip dynamic power analysis but there is a need to
explore improved dynamic power abstraction techniques for enabling
more accurate power analysis.
4
Main Idea: Multi Clock Gating Domain Abstract
Slide 5
Single Clock Gate Control Base Power Abstract Multiple Clock Gate Domain Power Abstract
d1
d3
d2
d4
IP Block
IP Block Power Abstract
1. Case Setup
2. Simulation
3. Power Contributor Element Generation
and Contributor Accumulation
IP Block Power Abstract Generation
Multiple clock gating
domains in the IP block.
Multiple clock gating
domains in the IP block.
Approximated to a single
macro wide clock gate
control as a part of the
abstraction process.
Approximated to a single
macro wide clock gate
control as a part of the
abstraction process.
Chip Level
Power Analysis
Weights and activity
factors set during activity
extraction in chip level
power analysis.
Weights and activity
factors set during activity
extraction in chip level
power analysis.
Name Weight Activity factor(s)
AlwaysCeff
GatableCeff (1 -
clock_gating)
PiSfDepCeff input_switch_rate
LoSfDepCeff latch_output_switch_rate
PiLoXPCeff input_switch_rate,
latch_output_switch_rate
Chip level
power analysis
Per clock
gating
domain activity
extraction
Per clock
gating
domain activity
extraction
1.1. Marking and Domain IdentificationMarking and Domain Identification
2. Case Setup
3. Simulation
4.4. Power Contributor Element Generation and AccumulationPower Contributor Element Generation and Accumulation
IP Block Power Abstract Generation
Name Weight Activity factor(s)
AlwaysCeff
PiSfDepCeff input_switch_rate
GatableCeff (1 - clock_gating)
LoSfDepCeff latch_output_switch_rate
PiLoXPCeff input_switch_rate,
latch_output_switch_rate
GatableCeff.d1 (1 - clock_gating.d1)
LoSfDepCeff.d1 latch_output_switch_rate.d1
PiLoXPCeff.d1 input_switch_rate,
latch_output_switch_rate.d.d1
GatableCeff.d2 (1 - clock_gating.d2)
LoSfDepCeff.d2 latch_output_switch_rate.d2
PiLoXPCeff.d2 input_switch_rate,
latch_output_switch_rate.d2
LoSfDepCeff.d1-d2 latch_output_switch_rate.d1
latch_output_switch_rate.d2
PiLoXPCeff.d1-d2 input_switch_rate,
latch_output_switch_rate.d1
latch_output_switch_rate.d2
Multi Clock gate domain Abstraction: Three Variants
6
Domain
identification
Marking of
domains
Domains
combination
list creation
Per case
simulations
Per domain and
Single domain ceffs
computation
and accumulation
Per domain IP
Power abstract
creation
Per domain
Bill of
materials
file
generation
No-sim based
clock power
only abstraction
Domain
collapsing
Domain
parameterized
clock power
abstract
Domain
parameterize
d
clock and
data power
abstract
• Quick tracing based clock
power only abstraction,
• Clock and Data Power
abstraction based on
domain merging using
Domain combination lists,
• Domain collapsing for
handling large extensively
gated designs
7
Experimentation
Workload based
Simulation
Abstraction based Power
Analysis
In sync
Gate Level Block
Power
Abstract
Activity Files
(SAIF Like)
Abstraction
based Power
Unit Level
Activity
Extraction and
Power Rollup
Compare
Gate Level Block
(Workload driven
model)
Workload
driven power
VHDL Sim Data
Activity Files
Waveform File
Comparison of workload driven power simulation with the power abstract based
estimation for the three approaches
Experimental Results
8
Comparison for Design D4. D4 has 87 latches, 2 clock gating
domains, 4 domain combinations, ~12000 gates and nets.
Comparison for Design D3. D3 has 1200 latches, 21 clock
gating domains, 83 domain combinations.
Comparison for Design D1. D2 has 640 latches, 9 clock
gating domains, 44 domain combinations, ~13000 standard
cell instances and nets.
Comparison for Design D2. D2 has 520 latches, 3 clock
gating domains, 3 domain combinations, ~10000 gates and
nets.
Approach
Model Size
Increase
Data Power
%Error
Clock Power
%Error
TAT
Benefit
Single domain -54.87 -7.26 2.02
No-sim clock 1.17 -54.87 -0.02 1.93
Domain combinations 2.03 -16.30 -0.02 1.58
Domain combinations & collapse 1.13 -17.41 -0.02 1.78
Approach
Model Size
Increase
Data Power
%Error
Clock Power
%Error
TAT
Benefit
Single domain -6.7 4.8 1.82
No-sim clock 1.03 -6.7 0.3 1.75
Domain combinations 1.12 -0.7 0.3 1.61
Domain combinations & collapse 1.12 -0.7 0.3 1.61
Approach
Model Size
Increase
Data Power
%Error
Clock Power
%Error
TAT
Benefit
Single domain -8.7 4.1 2.35
No-sim clock 1.06 -8.7 0.3 2.22
Domain combinations 1.98 -2.3 0.3 1.64
Domain combinations & collapse 1.08 -4.4 0.3 1.96
Approach
Model Size
Increase
Data Power
%Error
Clock Power
%Error
TAT
Benefit
Single domain -22.4 0.4 1.51
No-sim clock 1.10 -22.4 0.4 1.47
Domain combinations 1.25 2.7 0.4 1.41
Domain combinations & collapse 1.25 2.7 0.4 1.41
9
Conclusion
 Presented approaches for generation of multiple clock gating domain
parameterized PVT independent power abstracts for large IP blocks.
 We accomplish the gating domain parameterization through separation of
the attribution of switching due to each single domain through a marking
and tracing process, thereby precluding the need for separate domain by
domain simulation to achieve the parameterization.
 Experimental results comparing proposed approach on IP blocks of
varying sizes from a real industry strength microprocessor design clearly
highlight accuracy impact while keeping run time and model size increase
in an acceptable range.
 In terms of extensions, we are exploring approaches where we could
preserve each of the domains independently, for which we are looking into
formulations based on constructing clock gating domain conflict hyper
graphs and coloring them to determine domain interactions.
9
Conclusion
 Presented approaches for generation of multiple clock gating domain
parameterized PVT independent power abstracts for large IP blocks.
 We accomplish the gating domain parameterization through separation of
the attribution of switching due to each single domain through a marking
and tracing process, thereby precluding the need for separate domain by
domain simulation to achieve the parameterization.
 Experimental results comparing proposed approach on IP blocks of
varying sizes from a real industry strength microprocessor design clearly
highlight accuracy impact while keeping run time and model size increase
in an acceptable range.
 In terms of extensions, we are exploring approaches where we could
preserve each of the domains independently, for which we are looking into
formulations based on constructing clock gating domain conflict hyper
graphs and coloring them to determine domain interactions.

More Related Content

What's hot

WIND SPEED & POWER FORECASTING USING ARTIFICIAL NEURAL NETWORK (NARX) FOR NEW...
WIND SPEED & POWER FORECASTING USING ARTIFICIAL NEURAL NETWORK (NARX) FOR NEW...WIND SPEED & POWER FORECASTING USING ARTIFICIAL NEURAL NETWORK (NARX) FOR NEW...
WIND SPEED & POWER FORECASTING USING ARTIFICIAL NEURAL NETWORK (NARX) FOR NEW...Journal For Research
 
Fuzzified Single Phase Automatic Sequential Reactive Power Compensation with ...
Fuzzified Single Phase Automatic Sequential Reactive Power Compensation with ...Fuzzified Single Phase Automatic Sequential Reactive Power Compensation with ...
Fuzzified Single Phase Automatic Sequential Reactive Power Compensation with ...TELKOMNIKA JOURNAL
 
Core Objective 1: Highlights from the Central Data Resource
Core Objective 1: Highlights from the Central Data ResourceCore Objective 1: Highlights from the Central Data Resource
Core Objective 1: Highlights from the Central Data ResourceAnubhav Jain
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...
Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...
Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...paperpublications3
 
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLEA SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLEEditor IJMTER
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」Preferred Networks
 
Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...
Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...
Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...IJERA Editor
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...Salford Systems
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ijdpsjournal
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...AMD Developer Central
 
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...William Nadolski
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from IntelEdge AI and Vision Alliance
 
Convolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibConvolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibDataWorks Summit
 
Multi-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generationMulti-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generationMahesh Khadatare
 

What's hot (19)

WIND SPEED & POWER FORECASTING USING ARTIFICIAL NEURAL NETWORK (NARX) FOR NEW...
WIND SPEED & POWER FORECASTING USING ARTIFICIAL NEURAL NETWORK (NARX) FOR NEW...WIND SPEED & POWER FORECASTING USING ARTIFICIAL NEURAL NETWORK (NARX) FOR NEW...
WIND SPEED & POWER FORECASTING USING ARTIFICIAL NEURAL NETWORK (NARX) FOR NEW...
 
Fuzzified Single Phase Automatic Sequential Reactive Power Compensation with ...
Fuzzified Single Phase Automatic Sequential Reactive Power Compensation with ...Fuzzified Single Phase Automatic Sequential Reactive Power Compensation with ...
Fuzzified Single Phase Automatic Sequential Reactive Power Compensation with ...
 
Implementation of Low Power Test Pattern Generator Using LFSR
Implementation of Low Power Test Pattern Generator Using LFSRImplementation of Low Power Test Pattern Generator Using LFSR
Implementation of Low Power Test Pattern Generator Using LFSR
 
Core Objective 1: Highlights from the Central Data Resource
Core Objective 1: Highlights from the Central Data ResourceCore Objective 1: Highlights from the Central Data Resource
Core Objective 1: Highlights from the Central Data Resource
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...
Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...
Combination of Immune Genetic Particle Swarm Optimization algorithm with BP a...
 
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLEA SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
A SURVEY - COMPARISON OF MULTIPLIERS USING DIFFERENT LOGIC STYLE
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...
Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...
Design of Low Power Sequential System Using Multi Bit FLIP-FLOP With Data Dri...
 
Carry
CarryCarry
Carry
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
 
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
ASSESSING THE PERFORMANCE AND ENERGY USAGE OF MULTI-CPUS, MULTI-CORE AND MANY...
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
 
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
Times Series Feature Extraction Methods of Wearable Signal Data for Deep Lear...
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
 
Convolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlibConvolutional Neural Networks at scale in Spark MLlib
Convolutional Neural Networks at scale in Spark MLlib
 
Multi-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generationMulti-core GPU – Fast parallel SAR image generation
Multi-core GPU – Fast parallel SAR image generation
 

Viewers also liked

EDUC5102G Session 4 Presentation
EDUC5102G Session 4 PresentationEDUC5102G Session 4 Presentation
EDUC5102G Session 4 PresentationRobert Power
 
Reportaxe Escrita
Reportaxe EscritaReportaxe Escrita
Reportaxe Escritarachelsone
 
FVCAG: A framework for formal verification driven power modelling and verific...
FVCAG: A framework for formal verification driven power modelling and verific...FVCAG: A framework for formal verification driven power modelling and verific...
FVCAG: A framework for formal verification driven power modelling and verific...Arun Joseph
 
Creativetechnopreneur
Creativetechnopreneur Creativetechnopreneur
Creativetechnopreneur Sony Baghtiar
 
портфоліо
портфоліо портфоліо
портфоліо stepanyuk434
 
Metodologi penelitian program mm kelas bni 2012
Metodologi penelitian program mm kelas bni 2012Metodologi penelitian program mm kelas bni 2012
Metodologi penelitian program mm kelas bni 2012Reins Tangkowit
 
ヴォーンDC使い方 スライド
ヴォーンDC使い方 スライドヴォーンDC使い方 スライド
ヴォーンDC使い方 スライドroadcruise
 
10 frases para motivar e vender mais!
10 frases para motivar e vender mais!10 frases para motivar e vender mais!
10 frases para motivar e vender mais!Adm. Daniel Paulino
 
Segmentação de mercado - LUXO
Segmentação de mercado - LUXOSegmentação de mercado - LUXO
Segmentação de mercado - LUXODaniel Silva
 
09 atendente de farmácia (organização de uma farmácia)
09   atendente de farmácia (organização de uma farmácia)09   atendente de farmácia (organização de uma farmácia)
09 atendente de farmácia (organização de uma farmácia)Elizeu Ferro
 
Sortez de la meute : Réussir son branding personnel avec les médias sociaux
Sortez de la meute : Réussir son branding personnel avec les médias sociauxSortez de la meute : Réussir son branding personnel avec les médias sociaux
Sortez de la meute : Réussir son branding personnel avec les médias sociauxJean-François Lévesque, LL.M.
 

Viewers also liked (14)

EDUC5102G Session 4 Presentation
EDUC5102G Session 4 PresentationEDUC5102G Session 4 Presentation
EDUC5102G Session 4 Presentation
 
Reportaxe Escrita
Reportaxe EscritaReportaxe Escrita
Reportaxe Escrita
 
burton
burtonburton
burton
 
FVCAG: A framework for formal verification driven power modelling and verific...
FVCAG: A framework for formal verification driven power modelling and verific...FVCAG: A framework for formal verification driven power modelling and verific...
FVCAG: A framework for formal verification driven power modelling and verific...
 
Keddy Minette
Keddy MinetteKeddy Minette
Keddy Minette
 
Creativetechnopreneur
Creativetechnopreneur Creativetechnopreneur
Creativetechnopreneur
 
портфоліо
портфоліо портфоліо
портфоліо
 
Metodologi penelitian program mm kelas bni 2012
Metodologi penelitian program mm kelas bni 2012Metodologi penelitian program mm kelas bni 2012
Metodologi penelitian program mm kelas bni 2012
 
ヴォーンDC使い方 スライド
ヴォーンDC使い方 スライドヴォーンDC使い方 スライド
ヴォーンDC使い方 スライド
 
Welcome to Jeunesse
Welcome to JeunesseWelcome to Jeunesse
Welcome to Jeunesse
 
10 frases para motivar e vender mais!
10 frases para motivar e vender mais!10 frases para motivar e vender mais!
10 frases para motivar e vender mais!
 
Segmentação de mercado - LUXO
Segmentação de mercado - LUXOSegmentação de mercado - LUXO
Segmentação de mercado - LUXO
 
09 atendente de farmácia (organização de uma farmácia)
09   atendente de farmácia (organização de uma farmácia)09   atendente de farmácia (organização de uma farmácia)
09 atendente de farmácia (organização de uma farmácia)
 
Sortez de la meute : Réussir son branding personnel avec les médias sociaux
Sortez de la meute : Réussir son branding personnel avec les médias sociauxSortez de la meute : Réussir son branding personnel avec les médias sociaux
Sortez de la meute : Réussir son branding personnel avec les médias sociaux
 

Similar to Per domain power analysis

Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009James McGalliard
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? Deepak Shankar
 
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...Arun Joseph
 
Design and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemesDesign and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemesCSITiaesprime
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluationGIORGOS STAMELOS
 
Power Optimization with Efficient Test Logic Partitioning for Full Chip Design
Power Optimization with Efficient Test Logic Partitioning for Full Chip DesignPower Optimization with Efficient Test Logic Partitioning for Full Chip Design
Power Optimization with Efficient Test Logic Partitioning for Full Chip DesignPankaj Singh
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsNECST Lab @ Politecnico di Milano
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...NECST Lab @ Politecnico di Milano
 
hetshah_resume
hetshah_resumehetshah_resume
hetshah_resumehet shah
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and VerificationDVClub
 
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...Arun Joseph
 
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...Saikiran perfect
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
VHDL Implementation of High Speed and Low Power BIST Based Vedic Multiplier
VHDL Implementation of High Speed and Low Power BIST Based Vedic MultiplierVHDL Implementation of High Speed and Low Power BIST Based Vedic Multiplier
VHDL Implementation of High Speed and Low Power BIST Based Vedic MultiplierIRJET Journal
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudGábor Szárnyas
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 
PowerArtist: RTL Design for Power Platform
PowerArtist: RTL Design for Power PlatformPowerArtist: RTL Design for Power Platform
PowerArtist: RTL Design for Power PlatformAnsys
 
A Modified Design of Test Pattern Generator for Built-In-Self- Test Applications
A Modified Design of Test Pattern Generator for Built-In-Self- Test ApplicationsA Modified Design of Test Pattern Generator for Built-In-Self- Test Applications
A Modified Design of Test Pattern Generator for Built-In-Self- Test ApplicationsIJERA Editor
 
Kakarla Sriram K _resume_sep_2016
Kakarla Sriram K _resume_sep_2016Kakarla Sriram K _resume_sep_2016
Kakarla Sriram K _resume_sep_2016srkkakarla
 

Similar to Per domain power analysis (20)

Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009Benchmark Analysis of Multi-core Processor Memory Contention April 2009
Benchmark Analysis of Multi-core Processor Memory Contention April 2009
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
 
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
 
Design and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemesDesign and testing of systolic array multiplier using fault injecting schemes
Design and testing of systolic array multiplier using fault injecting schemes
 
Performance and Energy evaluation
Performance and Energy evaluationPerformance and Energy evaluation
Performance and Energy evaluation
 
Power Optimization with Efficient Test Logic Partitioning for Full Chip Design
Power Optimization with Efficient Test Logic Partitioning for Full Chip DesignPower Optimization with Efficient Test Logic Partitioning for Full Chip Design
Power Optimization with Efficient Test Logic Partitioning for Full Chip Design
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environments
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
hetshah_resume
hetshah_resumehetshah_resume
hetshah_resume
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and Verification
 
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
A Hybrid Approach to Standard Cell Power Characterization based on PVT Indepe...
 
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
DESIGN OF SIMULATION DIFFERENT 8-BIT MULTIPLIERS USING VERILOG CODE BY SAIKIR...
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
VHDL Implementation of High Speed and Low Power BIST Based Vedic Multiplier
VHDL Implementation of High Speed and Low Power BIST Based Vedic MultiplierVHDL Implementation of High Speed and Low Power BIST Based Vedic Multiplier
VHDL Implementation of High Speed and Low Power BIST Based Vedic Multiplier
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
PowerArtist: RTL Design for Power Platform
PowerArtist: RTL Design for Power PlatformPowerArtist: RTL Design for Power Platform
PowerArtist: RTL Design for Power Platform
 
A Modified Design of Test Pattern Generator for Built-In-Self- Test Applications
A Modified Design of Test Pattern Generator for Built-In-Self- Test ApplicationsA Modified Design of Test Pattern Generator for Built-In-Self- Test Applications
A Modified Design of Test Pattern Generator for Built-In-Self- Test Applications
 
Kakarla Sriram K _resume_sep_2016
Kakarla Sriram K _resume_sep_2016Kakarla Sriram K _resume_sep_2016
Kakarla Sriram K _resume_sep_2016
 

More from Arun Joseph

Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...Arun Joseph
 
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...Arun Joseph
 
Process synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memoriesProcess synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memoriesArun Joseph
 
End to End Self-Heating Analysis Methodology and Toolset for High Performance...
End to End Self-Heating Analysis Methodology and Toolset for High Performance...End to End Self-Heating Analysis Methodology and Toolset for High Performance...
End to End Self-Heating Analysis Methodology and Toolset for High Performance...Arun Joseph
 

More from Arun Joseph (6)

Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
Rapidly Building Next Generation Web-based EDA Applications and Platforms fro...
 
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...
Techniques for Efficient RTL Clock and Memory Gating Takedown of Next Generat...
 
FreqLeak
FreqLeakFreqLeak
FreqLeak
 
Process synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memoriesProcess synchronization in multi core systems using on-chip memories
Process synchronization in multi core systems using on-chip memories
 
FirmLeak
FirmLeakFirmLeak
FirmLeak
 
End to End Self-Heating Analysis Methodology and Toolset for High Performance...
End to End Self-Heating Analysis Methodology and Toolset for High Performance...End to End Self-Heating Analysis Methodology and Toolset for High Performance...
End to End Self-Heating Analysis Methodology and Toolset for High Performance...
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Per domain power analysis

  • 1. 1 Efficient Techniques for Per Clock Gating Domain Contributor based Power Abstraction of IP Blocks for Hierarchical Power Analysis Arun Joseph, Nagu Dhanwada, Spandana Rachamalla, William Dungan, Ricardo Nigaglioni IBM Systems Group
  • 2. Motivation  IP blocks are becoming larger with increased number of clock gating domains.  Much more aggressive solutions are being adopted for improving the clock gating of these very large IP blocks.  There are several workloads for which significant heterogeneity in both clock and data activity is seen across the multiple clock gating domains within an IP block. 2
  • 3. 3 Background: Contributor based Power Analysis Flow Library Characterization Corner 1 ………. Corner N Contributor Power Model Contributor based Macro Power Abstract Chip Level Power Analysis Corner 1 ……. Corner N Workload 1….Workload N Input to Wafer Test, System Planning, Power Sorting and Binning Cell Library Macro Power Abstract Generation Macro/IP Block Chip
  • 4. Background  Accurate and efficient full chip power analysis is an important step in the design of power efficient microprocessor and SoC chips.  A power model abstraction flow based on contributors is PVT- independent and enables efficient hierarchical chip level power analysis.  The dynamic power and leakage power model abstraction primarily targeted for full chip power analysis was presented in [3]. This prior work was based on parameterizing capacitance switching due to clock gating onto a single clock gate control for the entire macro.  This approximation to a single macro wide clock gate control works fairly well for full chip dynamic power analysis but there is a need to explore improved dynamic power abstraction techniques for enabling more accurate power analysis. 4
  • 5. Main Idea: Multi Clock Gating Domain Abstract Slide 5 Single Clock Gate Control Base Power Abstract Multiple Clock Gate Domain Power Abstract d1 d3 d2 d4 IP Block IP Block Power Abstract 1. Case Setup 2. Simulation 3. Power Contributor Element Generation and Contributor Accumulation IP Block Power Abstract Generation Multiple clock gating domains in the IP block. Multiple clock gating domains in the IP block. Approximated to a single macro wide clock gate control as a part of the abstraction process. Approximated to a single macro wide clock gate control as a part of the abstraction process. Chip Level Power Analysis Weights and activity factors set during activity extraction in chip level power analysis. Weights and activity factors set during activity extraction in chip level power analysis. Name Weight Activity factor(s) AlwaysCeff GatableCeff (1 - clock_gating) PiSfDepCeff input_switch_rate LoSfDepCeff latch_output_switch_rate PiLoXPCeff input_switch_rate, latch_output_switch_rate Chip level power analysis Per clock gating domain activity extraction Per clock gating domain activity extraction 1.1. Marking and Domain IdentificationMarking and Domain Identification 2. Case Setup 3. Simulation 4.4. Power Contributor Element Generation and AccumulationPower Contributor Element Generation and Accumulation IP Block Power Abstract Generation Name Weight Activity factor(s) AlwaysCeff PiSfDepCeff input_switch_rate GatableCeff (1 - clock_gating) LoSfDepCeff latch_output_switch_rate PiLoXPCeff input_switch_rate, latch_output_switch_rate GatableCeff.d1 (1 - clock_gating.d1) LoSfDepCeff.d1 latch_output_switch_rate.d1 PiLoXPCeff.d1 input_switch_rate, latch_output_switch_rate.d.d1 GatableCeff.d2 (1 - clock_gating.d2) LoSfDepCeff.d2 latch_output_switch_rate.d2 PiLoXPCeff.d2 input_switch_rate, latch_output_switch_rate.d2 LoSfDepCeff.d1-d2 latch_output_switch_rate.d1 latch_output_switch_rate.d2 PiLoXPCeff.d1-d2 input_switch_rate, latch_output_switch_rate.d1 latch_output_switch_rate.d2
  • 6. Multi Clock gate domain Abstraction: Three Variants 6 Domain identification Marking of domains Domains combination list creation Per case simulations Per domain and Single domain ceffs computation and accumulation Per domain IP Power abstract creation Per domain Bill of materials file generation No-sim based clock power only abstraction Domain collapsing Domain parameterized clock power abstract Domain parameterize d clock and data power abstract • Quick tracing based clock power only abstraction, • Clock and Data Power abstraction based on domain merging using Domain combination lists, • Domain collapsing for handling large extensively gated designs
  • 7. 7 Experimentation Workload based Simulation Abstraction based Power Analysis In sync Gate Level Block Power Abstract Activity Files (SAIF Like) Abstraction based Power Unit Level Activity Extraction and Power Rollup Compare Gate Level Block (Workload driven model) Workload driven power VHDL Sim Data Activity Files Waveform File Comparison of workload driven power simulation with the power abstract based estimation for the three approaches
  • 8. Experimental Results 8 Comparison for Design D4. D4 has 87 latches, 2 clock gating domains, 4 domain combinations, ~12000 gates and nets. Comparison for Design D3. D3 has 1200 latches, 21 clock gating domains, 83 domain combinations. Comparison for Design D1. D2 has 640 latches, 9 clock gating domains, 44 domain combinations, ~13000 standard cell instances and nets. Comparison for Design D2. D2 has 520 latches, 3 clock gating domains, 3 domain combinations, ~10000 gates and nets. Approach Model Size Increase Data Power %Error Clock Power %Error TAT Benefit Single domain -54.87 -7.26 2.02 No-sim clock 1.17 -54.87 -0.02 1.93 Domain combinations 2.03 -16.30 -0.02 1.58 Domain combinations & collapse 1.13 -17.41 -0.02 1.78 Approach Model Size Increase Data Power %Error Clock Power %Error TAT Benefit Single domain -6.7 4.8 1.82 No-sim clock 1.03 -6.7 0.3 1.75 Domain combinations 1.12 -0.7 0.3 1.61 Domain combinations & collapse 1.12 -0.7 0.3 1.61 Approach Model Size Increase Data Power %Error Clock Power %Error TAT Benefit Single domain -8.7 4.1 2.35 No-sim clock 1.06 -8.7 0.3 2.22 Domain combinations 1.98 -2.3 0.3 1.64 Domain combinations & collapse 1.08 -4.4 0.3 1.96 Approach Model Size Increase Data Power %Error Clock Power %Error TAT Benefit Single domain -22.4 0.4 1.51 No-sim clock 1.10 -22.4 0.4 1.47 Domain combinations 1.25 2.7 0.4 1.41 Domain combinations & collapse 1.25 2.7 0.4 1.41
  • 9. 9 Conclusion  Presented approaches for generation of multiple clock gating domain parameterized PVT independent power abstracts for large IP blocks.  We accomplish the gating domain parameterization through separation of the attribution of switching due to each single domain through a marking and tracing process, thereby precluding the need for separate domain by domain simulation to achieve the parameterization.  Experimental results comparing proposed approach on IP blocks of varying sizes from a real industry strength microprocessor design clearly highlight accuracy impact while keeping run time and model size increase in an acceptable range.  In terms of extensions, we are exploring approaches where we could preserve each of the domains independently, for which we are looking into formulations based on constructing clock gating domain conflict hyper graphs and coloring them to determine domain interactions.
  • 10. 9 Conclusion  Presented approaches for generation of multiple clock gating domain parameterized PVT independent power abstracts for large IP blocks.  We accomplish the gating domain parameterization through separation of the attribution of switching due to each single domain through a marking and tracing process, thereby precluding the need for separate domain by domain simulation to achieve the parameterization.  Experimental results comparing proposed approach on IP blocks of varying sizes from a real industry strength microprocessor design clearly highlight accuracy impact while keeping run time and model size increase in an acceptable range.  In terms of extensions, we are exploring approaches where we could preserve each of the domains independently, for which we are looking into formulations based on constructing clock gating domain conflict hyper graphs and coloring them to determine domain interactions.

Editor's Notes

  1. For instance, such analysis is required for power sort process, which is used for determining product shipping frequencies. The key enablers of this flow are the concept of contributor based power models [1, 2], an abstract definition and a method for generating such abstracts for complex IP blocks.
  2. Base (Single Clock Gate Control) Abstraction: The dynamic power abstraction introduced in [3] is performed in terms of the dynamic power contributors. It characterizes power as a function of a clock_gating weight factor, input_switch_rate and latch_output_switch_rate activity factors. The capacitance, weight and activity factors (computed during higher level power analysis) are computed by approximating (into a single macro wide clock gate control) across the clock gating domains to compute power. Proposed Multi-Domain Abstraction: In the proposed abstraction, there are clock and data Ceff components that correspond to the individual clock gating domains. The per domain Ceff, along with weight and activity factors (computed on a per clock gating domain basis during higher level analysis) are used for hierarchical per clock gating domain power analysis. This makes it more efficient, accurate and usable to drive more aggressive use of clock gating in the logic design process.
  3. Workload driven gate level simulation (GLS) based power is compared with abstraction (ABS) based power for validation of proposed abstractions. GLS based power: A netlist for an IP block is simulated for several thousand cycles by applying switching patterns extracted from RTL simulations of higher level realistic workloads. The switching at every net is computed to get an average power dissipated for the simulated switching patterns. ABS based power: Same netlist is simulated to generate different power abstracts. For the same switching patterns, switching activity factors including the clock gating factor, switching factor at the primary inputs and latch outputs on a per gating domain basis are computed. The computed activity factors are applied to the generated power abstract (base, and per clock gating domain) to calculate ABS based power. All experiments were run on 24 core 2.6GHz Xeon machine running RHEL 5 with 256GB memory. Designs from both core and uncore units of the microprocessor were studied. A variant of a thermal design point workload is used for comparison
  4. “TAT Benefit” is the improvement seen in runtime while using ABS based power computation, when compared against the runtime of GLS based power computation. “Model size increase” here refers to ratio of the size of per clock gating domain power abstract (in bytes when stored on the disk) to the size of the base power abstract. The domain collapse procedure is triggered when the size of domain combinations list is greater than a certain threshold (DT), and is collapsed to DX% of the size of the domain combinations list. Both DT and DX are programmable, and they were empirically chosen as DT=10 and DX=10%.