This document describes a new approach for developing a high-level synthesis tool for low power VLSI designs called Gaut_w. Gaut_w is composed of low power modules that are used before an architectural synthesis tool to optimize designs at the behavioral and architectural levels for power savings. The key modules of Gaut_w are high level power estimation, module selection to choose optimal operators and supply voltages, optimization criteria to minimize area and power, and operator assignment to decrease switching activity. Experimental results on discrete wavelet transform algorithms show power savings from using Gaut_w.
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...ijgca
The ever-increasing status of the cloud computing h
ypothesis and the budding concept of federated clou
d
computing have enthused research efforts towards in
tellectual cloud service selection aimed at develop
ing
techniques for enabling the cloud users to gain max
imum benefit from cloud computing by selecting
services which provide optimal performance at lowes
t possible cost. Cloud computing is a novel paradig
m
for the provision of computing infrastructure, whic
h aims to shift the location of the computing
infrastructure to the network in order to reduce th
e maintenance costs of hardware and software resour
ces.
Cloud computing systems vitally provide access to l
arge pools of resources. Resources provided by clou
d
computing systems hide a great deal of services fro
m the user through virtualization. In this paper, t
he
cloud data center is modelled as
queuing system with a single task arrivals
and a task request buffer of infinite capacity.
A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...VLSICS Design
The stringent power budget of fine grained power managed digital integrated circuits have driven chip designers to optimize power at the cost of area and delay, which were the traditional cost criteria for circuit optimization. The emerging scenario motivates us to revisit the classical operator scheduling problem under the availability of DVFS enabled functional units that can trade-off cycles with power. We study the design space defined due to this trade-off and present a branch-and-bound(B/B) algorithm to explore this state space and report the pareto-optimal front with respect to area and power. The scheduling also aims at maximum resource sharing and is able to attain sufficient area and power gains for complex benchmarks when timing constraints are relaxed by sufficient amount. Experimental results show that the algorithm that operates without any user constraint(area/power) is able to solve the problem for mostavailable benchmarks, and the use of power budget or area budget constraints leads to significant performance gain.
REAL-TIME ADAPTIVE ENERGY-SCHEDULING ALGORITHM FOR VIRTUALIZED CLOUD COMPUTINGijdpsjournal
Cloud computing becomes an ideal computing paradigm for scientific and commercial applications. The
increased availability of the cloud models and allied developing models creates easier computing cloud
environment. Energy consumption and effective energy management are the two important challenges in
virtualized computing platforms. Energy consumption can be minimized by allocating computationally
intensive tasks to a resource at a suitable frequency. An optimal Dynamic Voltage and Frequency Scaling
(DVFS) based strategy of task allocation can minimize the overall consumption of energy and meet the
required QoS. However, they do not control the internal and external switching to server frequencies,
which causes the degradation of performance. In this paper, we propose the Real Time Adaptive EnergyScheduling (RTAES) algorithm by manipulating the reconfiguring proficiency of Cloud ComputingVirtualized Data Centers (CCVDCs) for computationally intensive applications. The RTAES algorithm
minimizes consumption of energy and time during computation, reconfiguration and communication. Our
proposed model confirms the effectiveness of its implementation, scalability, power consumption and
execution time with respect to other existing approaches.
Discrete wavelet transform-based RI adaptive algorithm for system identificationIJECEIAES
In this paper, we propose a new adaptive filtering algorithm for system identifica- tion. The algorithm is based on the recursive inverse (RI) adaptive algorithm which suffers from low convergence rates in some applications; i.e., the eigenvalue spread of the autocorrelation matrix is relatively high. The proposed algorithm applies discrete-wavelet transform (DWT) to the input signal which, in turn, helps to overcome the low convergence rate of the RI algorithm with relatively small step-size(s). Different scenarios has been investigated in different noise environments in system identification setting. Experiments demonstrate the advantages of the proposed DWT recursive inverse (DWT-RI) filter in terms of convergence rate and mean-square-error (MSE) compared to the RI, discrete cosine transform LMS (DCT-LMS), discretewavelet transform LMS (DWT-LMS) and recursive-least-squares (RLS) algorithms under same conditions.
Harmony Search Algorithm Based Optimal Placement of Static Capacitors for Los...IRJET Journal
This document proposes using a Harmony Search Algorithm (HSA) to optimally place static capacitors in a radial distribution network in order to reduce power losses. HSA is a metaheuristic optimization technique inspired by musical improvisation that can handle both continuous and discrete variables. It is applied to determine the best locations and sizes of capacitors at different load levels on the IEEE 69-bus test system. The results show that optimal capacitor placement using HSA reduces power losses and improves the voltage profile compared to other techniques like genetic algorithms.
Averaging Method for PWM DC-DC Converters Operating in Discontinuous Conducti...CSCJournals
In this paper, a one-cycle-average (OCA) discrete-time model for PWM dc-dc converter based on closed-loop control method for discontinuous conduction mode (DCM) is presented. It leads to exact discrete-time mathematical representation of the OCA values of the output signal even at low frequency. It also provides the exact discrete-time mathematical representation of the average values of other internal signals with little increase in simulation time. A comparison of this model to other existing models is presented through a numerical example of boost converter. Detailed simulation results confirm the better accuracy and speed of the proposed model.
The document discusses a resource provisioning framework for MapReduce jobs running in the cloud. It proposes using a signature matching algorithm to identify optimal configurations by matching a job's resource consumption signature to a database. If no match is found, it uses an SLO-based algorithm to calculate the minimum number of map and reduce slots needed to finish a job within a deadline. It also describes algorithms for priority-based scheduling, skew mitigation, bottleneck detection and removal, and deadlock prevention to improve performance.
An Algorithm for Optimized Cost in a Distributed Computing SystemIRJET Journal
This document summarizes an algorithm for optimized cost allocation in a distributed computing system. The algorithm considers a set of tasks that need to be assigned to processors across multiple phases. It calculates execution costs, residing costs, communication costs, and reallocation costs to determine the optimal allocation that minimizes overall system costs. The algorithm is demonstrated on a sample problem involving 4 tasks to be allocated across 2 processors over 5 phases. Cost matrices are provided and the algorithm partitions the problem into subproblems to determine the lowest cost allocation for each phase and overall.
PERFORMANCE FACTORS OF CLOUD COMPUTING DATA CENTERS USING [(M/G/1) : (∞/GDM O...ijgca
The ever-increasing status of the cloud computing h
ypothesis and the budding concept of federated clou
d
computing have enthused research efforts towards in
tellectual cloud service selection aimed at develop
ing
techniques for enabling the cloud users to gain max
imum benefit from cloud computing by selecting
services which provide optimal performance at lowes
t possible cost. Cloud computing is a novel paradig
m
for the provision of computing infrastructure, whic
h aims to shift the location of the computing
infrastructure to the network in order to reduce th
e maintenance costs of hardware and software resour
ces.
Cloud computing systems vitally provide access to l
arge pools of resources. Resources provided by clou
d
computing systems hide a great deal of services fro
m the user through virtualization. In this paper, t
he
cloud data center is modelled as
queuing system with a single task arrivals
and a task request buffer of infinite capacity.
A MULTI-OBJECTIVE PERSPECTIVE FOR OPERATOR SCHEDULING USING FINEGRAINED DVS A...VLSICS Design
The stringent power budget of fine grained power managed digital integrated circuits have driven chip designers to optimize power at the cost of area and delay, which were the traditional cost criteria for circuit optimization. The emerging scenario motivates us to revisit the classical operator scheduling problem under the availability of DVFS enabled functional units that can trade-off cycles with power. We study the design space defined due to this trade-off and present a branch-and-bound(B/B) algorithm to explore this state space and report the pareto-optimal front with respect to area and power. The scheduling also aims at maximum resource sharing and is able to attain sufficient area and power gains for complex benchmarks when timing constraints are relaxed by sufficient amount. Experimental results show that the algorithm that operates without any user constraint(area/power) is able to solve the problem for mostavailable benchmarks, and the use of power budget or area budget constraints leads to significant performance gain.
REAL-TIME ADAPTIVE ENERGY-SCHEDULING ALGORITHM FOR VIRTUALIZED CLOUD COMPUTINGijdpsjournal
Cloud computing becomes an ideal computing paradigm for scientific and commercial applications. The
increased availability of the cloud models and allied developing models creates easier computing cloud
environment. Energy consumption and effective energy management are the two important challenges in
virtualized computing platforms. Energy consumption can be minimized by allocating computationally
intensive tasks to a resource at a suitable frequency. An optimal Dynamic Voltage and Frequency Scaling
(DVFS) based strategy of task allocation can minimize the overall consumption of energy and meet the
required QoS. However, they do not control the internal and external switching to server frequencies,
which causes the degradation of performance. In this paper, we propose the Real Time Adaptive EnergyScheduling (RTAES) algorithm by manipulating the reconfiguring proficiency of Cloud ComputingVirtualized Data Centers (CCVDCs) for computationally intensive applications. The RTAES algorithm
minimizes consumption of energy and time during computation, reconfiguration and communication. Our
proposed model confirms the effectiveness of its implementation, scalability, power consumption and
execution time with respect to other existing approaches.
Discrete wavelet transform-based RI adaptive algorithm for system identificationIJECEIAES
In this paper, we propose a new adaptive filtering algorithm for system identifica- tion. The algorithm is based on the recursive inverse (RI) adaptive algorithm which suffers from low convergence rates in some applications; i.e., the eigenvalue spread of the autocorrelation matrix is relatively high. The proposed algorithm applies discrete-wavelet transform (DWT) to the input signal which, in turn, helps to overcome the low convergence rate of the RI algorithm with relatively small step-size(s). Different scenarios has been investigated in different noise environments in system identification setting. Experiments demonstrate the advantages of the proposed DWT recursive inverse (DWT-RI) filter in terms of convergence rate and mean-square-error (MSE) compared to the RI, discrete cosine transform LMS (DCT-LMS), discretewavelet transform LMS (DWT-LMS) and recursive-least-squares (RLS) algorithms under same conditions.
Harmony Search Algorithm Based Optimal Placement of Static Capacitors for Los...IRJET Journal
This document proposes using a Harmony Search Algorithm (HSA) to optimally place static capacitors in a radial distribution network in order to reduce power losses. HSA is a metaheuristic optimization technique inspired by musical improvisation that can handle both continuous and discrete variables. It is applied to determine the best locations and sizes of capacitors at different load levels on the IEEE 69-bus test system. The results show that optimal capacitor placement using HSA reduces power losses and improves the voltage profile compared to other techniques like genetic algorithms.
Averaging Method for PWM DC-DC Converters Operating in Discontinuous Conducti...CSCJournals
In this paper, a one-cycle-average (OCA) discrete-time model for PWM dc-dc converter based on closed-loop control method for discontinuous conduction mode (DCM) is presented. It leads to exact discrete-time mathematical representation of the OCA values of the output signal even at low frequency. It also provides the exact discrete-time mathematical representation of the average values of other internal signals with little increase in simulation time. A comparison of this model to other existing models is presented through a numerical example of boost converter. Detailed simulation results confirm the better accuracy and speed of the proposed model.
The document discusses a resource provisioning framework for MapReduce jobs running in the cloud. It proposes using a signature matching algorithm to identify optimal configurations by matching a job's resource consumption signature to a database. If no match is found, it uses an SLO-based algorithm to calculate the minimum number of map and reduce slots needed to finish a job within a deadline. It also describes algorithms for priority-based scheduling, skew mitigation, bottleneck detection and removal, and deadlock prevention to improve performance.
An Algorithm for Optimized Cost in a Distributed Computing SystemIRJET Journal
This document summarizes an algorithm for optimized cost allocation in a distributed computing system. The algorithm considers a set of tasks that need to be assigned to processors across multiple phases. It calculates execution costs, residing costs, communication costs, and reallocation costs to determine the optimal allocation that minimizes overall system costs. The algorithm is demonstrated on a sample problem involving 4 tasks to be allocated across 2 processors over 5 phases. Cost matrices are provided and the algorithm partitions the problem into subproblems to determine the lowest cost allocation for each phase and overall.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
This document summarizes a study that uses fuzzy logic to solve the unit commitment problem in power generation systems. The unit commitment problem aims to determine the optimal on/off schedule of generating units to minimize operating costs while meeting demand and constraints. The study models several factors like generator capacity, fuel costs, and startup costs as fuzzy variables. It defines membership functions for these fuzzy inputs and the output of production cost. It then develops 45 fuzzy logic rules relating the inputs to the output. Finally, it uses the centroid defuzzification method to obtain crisp numerical solutions for production cost from the fuzzy model. The goal is to demonstrate that a fuzzy logic approach can efficiently solve the unit commitment problem.
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd Iaetsd
This document proposes a flexible accelerator architecture that exploits carry-save arithmetic to efficiently implement digital signal processing kernels. The architecture includes flexible computational units that can be configured to perform chained addition, multiplication, and addition operations directly on carry-save formatted data without intermediate conversions. Experimental results show the proposed architecture delivers average gains of 61.91% in area-delay product and 54.43% in energy consumption compared to state-of-the-art flexible data paths.
This document discusses adaptive system-level scheduling under fluid traffic flow conditions in multiprocessor systems. It proposes a scheduling mechanism that accounts for traffic-centric system design. The mechanism evaluates scheduling methods based on effectiveness, robustness, and flexibility. It also introduces a processor-FPGA scheduling approach that reduces schedule length by taking advantage of FPGA reconfiguration. Simulation results show that processor-FPGA scheduling outperforms multiprocessor-only scheduling under certain traffic conditions. Future work will focus on formulating a traffic-centric scheduling method.
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...Tiziano De Matteis
This talk has been given at PPoPP 2016 (Barcelona)
The paper addresses the problem of designing control strategies for elastic stream processing applications. Elasticity allows applications to rapidly change their configuration (e.g. the number of used resources) on-the-fly, in response to fluctuations of their workload. In this work we face this problem by adopting the Model Predictive Control technique, a control-theoretic method aimed at finding the optimal application configuration along a limited prediction horizon by solving an online optimization problem. Our control strategies are designed to address latency constraints, by using Queueing Theory models, and energy consumption by changing the number of used cores and the CPU frequency through the Dynamic Voltage and Frequency Scaling (DVFS) function of modern multi-core CPUs. The proactive capabilities, in addition to the latency- and energy-awareness, represent the novel features of our approach. Experiments performed using a high-frequency trading application show the effectiveness compared with state-of-the-art techniques.
A full version of the slides (with transitions) is available at: https://docs.google.com/presentation/d/1VZ3y3RQDLFi_xA7Rl0Vj1iqBdoerxCMG4y53uMz9Ziw/edit?usp=sharing
Design and Implementation of a Programmable Truncated Multiplierijsrd.com
Truncated multiplication reduces part of the power required by multipliers by only computing the most-significant bits of the product. The most common approach to truncation includes physical reduction of the partial product matrix and a compensation for the reduced bits via different hardware compensation sub circuits. However, this result in fixed systems optimized for a given application at design time. A novel approach to truncation is proposed, where a full precision multiplier is implemented, but the active section of the partial product matrix is selected dynamically at run-time. This allows a power reduction trade off against signal degradation which can be modified at run time. Such architecture brings together the power reduction benefits from truncated multipliers and the flexibility of reconfigurable and general purpose devices. Efficient implementation of such a multiplier is presented in a custom digital signal processor where the concept of software compensation is introduced and analysed for different applications. Experimental results and power measurements are studied, including power measurements from both post-synthesis simulations and a fabricated IC implementation. This is the first system-level DSP core using a fine-grain truncated multiplier. Results demonstrate the effectiveness of the programmable truncated MAC (PTMAC) in achieving power reduction, with minimum impact on functionality for a number of applications. Software compensation also shown to be effective when deploying truncated multipliers in a system.
Scheduling of Heterogeneous Tasks in Cloud Computing using Multi Queue (MQ) A...IRJET Journal
This document proposes a Multi Queue (MQ) task scheduling algorithm for heterogeneous tasks in cloud computing. It aims to improve upon the Round Robin and Weighted Round Robin algorithms by overcoming their drawbacks. The MQ algorithm splits tasks and resources into separate queues based on size/length and speed. Small tasks are scheduled on slower resources and large tasks on faster resources. The document compares the performance of MQ to Round Robin and Weighted Round Robin algorithms based on makespan, average resource utilization, and load balancing level using CloudSim simulations. The results show that MQ scheduling performs better than the other algorithms in most cases in terms of these metrics.
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLAeeiej_journal
This document discusses the implementation of an unsigned multiplier using a modified carry select adder technique. It begins with an introduction to digital arithmetic operations like multiplication and addition. It then describes the proposed system, which uses a modified carry select adder based multiplier to reduce area over a traditional carry look ahead adder based multiplier, while maintaining similar delay times. The document provides details on the design of regular and modified square root carry select adders used in the multiplier. It discusses how replacing ripple carry adders with binary to excess-1 converters in the modified design can further reduce area and power consumption.
This document presents a GPU-accelerated algorithm for solving the Group Steiner Problem (GSP) which arises in routing phases of VLSI circuit design. The algorithm uses a depth-bounded heuristic to construct approximate minimum-cost Steiner trees in parallel using CUDA-aware MPI. Evaluation on a supercomputer shows the parallel implementation achieves up to 302x speedup over serial algorithms and scales well with problem size and processor count.
Memory Polynomial Based Adaptive Digital PredistorterIJERA Editor
Digital predistortion (DPD) is a baseband signal processing technique that corrects for impairments in RF
power amplifiers (PAs). These impairments cause out-of-band emissions or spectral regrowth and in-band
distortion, which correlate with an increased bit error rate (BER). Wideband signals with a high peak-to-average
ratio, are more susceptible to these unwanted effects. So to reduce these impairments, this paper proposes the
modeling of the digital predistortion for the power amplifier using GSA algorithm.
Design and Implementation of Low Power DSP Core with Programmable Truncated V...ijsrd.com
The programmable truncated Vedic multiplication is the method which uses Vedic multiplier and programmable truncation control bits and which reduces part of the area and power required by multipliers by only computing the most-significant bits of the product. The basic process of truncation includes physical reduction of the partial product matrix and a compensation for the reduced bits via different hardware compensation sub circuits. These results in fixed systems optimized for a given application at design time. A novel approach to truncation is proposed, where a full precision vedic multiplier is implemented, but the active section of the truncation is selected by truncation control bits dynamically at run-time. Such architecture brings together the power reduction benefits from truncated multipliers and the flexibility of reconfigurable and general purpose devices. Efficient implementation of such a multiplier is presented in a custom digital signal processor where the concept of software compensation is introduced and analyzed for different applications. Experimental results and power measurements are studied, including power measurements from both post-synthesis simulations and a fabricated IC implementation. This is the first system-level DSP core using a high speed Vedic truncated multiplier. Results demonstrate the effectiveness of the programmable truncated MAC (PTMAC) in achieving power reduction, with minimum impact on functionality for a number of applications. On comparison with the previous parallel multipliers Vedic should be much more fast and area should be reduced. Programmable truncated Vedic multiplier (PTVM) should be the basic part implemented for the arithmetic and PTMAC units.
On the-joint-optimization-of-performance-and-power-consumption-in-data-centersCemal Ardil
The document summarizes research on jointly optimizing performance and power consumption in data centers. It models the process of mapping tasks in a data center onto machines as a multi-objective problem to minimize both energy consumption and response time (makespan), subject to deadline and architectural constraints. It proposes using a simple goal programming technique that guarantees Pareto optimal solutions with good convergence. Simulation results show the technique achieves superior performance compared to other approaches and is competitive with optimal solutions for small-scale problems.
This document describes a proposed VLSI implementation of a high-speed DCT architecture for H.264 video codec design. It presents a Booth radix-8 multiplier-based multiply-accumulate (MAC) unit to improve throughput and minimize area complexity for 8x8 2D DCT computation. The proposed MAC architecture achieves a maximum operating frequency of 129.18MHz while reducing area by 64% compared to a regular merged MAC unit with a conventional multiplier. FPGA implementation and performance analysis demonstrate the suitability of the proposed DCT architecture for applications in HDTV systems.
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Power consumption is an important metric tool in the context of the wireless sensor networks
(WSNs). In this paper, we described a new Energy-Degree (EDD) Clustering Algorithm for the
WSNs. A node with higher residual energy and higher degree is more likely elected as a
clusterhead (CH). The intercluster and intracluster communications are realized on one hop.
The principal goal of our algorithm is to optimize the energy power and energy load among all
nodes. By comparing EDD clustering algorithm with LEACH algorithm, simulation results
have showen its effectiveness in saving energy.
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Progres...AtakanAral
This document summarizes Atakan Aral's PhD thesis progress report on modeling and optimizing resource allocation in cloud computing. The report outlines Aral's contributions, including developing a topology-based matching algorithm for distributed VM placement and evaluating it against baseline methods. Evaluation covers factors like bandwidth, costs, loads, and optimization criteria including deployment time, communication latency, throughput, and rejection rates. Future work is planned to enhance the algorithm and evaluation.
This document discusses topographical synthesis and its benefits. Topographical synthesis enables congestion prediction and avoidance early in the design flow during synthesis. It uses advanced optimization algorithms to deliver high quality results that correlate well to physical implementation. This eliminates costly iterations between synthesis and layout. Key benefits include accurately predicting congestion hotspots, performing optimizations to minimize congestion, and delivering results matched to post-layout timing and area without needing wire load models.
This document provides an overview of a course on Synthesis and Optimization of Digital Circuits. The course aims to present automatic logic synthesis techniques for computer-aided design of very large-scale integrated circuits and systems. The course will broadly survey the state of the art in logic-level synthesis and optimization techniques for combinational and sequential circuits. It will cover various representations of Boolean functions and their applications in logic synthesis. Students will implement Verilog and C/C++ projects focusing on optimization and synthesis methods discussed in class.
A VLSI (Very Large Scale Integration) system integrates millions of “electronic components” in a small area (few mm2 few cm2).
design “efficient” VLSI systems that has:
Circuit Speed ( high )
Power consumption ( low )
Design Area ( low )
This document summarizes a presentation given at the 4th International Conference on Advances in Energy Research titled "Pinch Analysis for MultiDimensional Sustainable Energy Systems Planning". It discusses how pinch analysis, a process integration technique, can be applied to model multi-objective optimization problems in sustainable energy system planning by considering factors like energy return on investment, cost, and carbon emissions. A case study applying this approach to the energy system in the Philippines is presented, showing a Pareto optimal front of solutions balancing these objectives.
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
This document summarizes a study that uses fuzzy logic to solve the unit commitment problem in power generation systems. The unit commitment problem aims to determine the optimal on/off schedule of generating units to minimize operating costs while meeting demand and constraints. The study models several factors like generator capacity, fuel costs, and startup costs as fuzzy variables. It defines membership functions for these fuzzy inputs and the output of production cost. It then develops 45 fuzzy logic rules relating the inputs to the output. Finally, it uses the centroid defuzzification method to obtain crisp numerical solutions for production cost from the fuzzy model. The goal is to demonstrate that a fuzzy logic approach can efficiently solve the unit commitment problem.
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd Iaetsd
This document proposes a flexible accelerator architecture that exploits carry-save arithmetic to efficiently implement digital signal processing kernels. The architecture includes flexible computational units that can be configured to perform chained addition, multiplication, and addition operations directly on carry-save formatted data without intermediate conversions. Experimental results show the proposed architecture delivers average gains of 61.91% in area-delay product and 54.43% in energy consumption compared to state-of-the-art flexible data paths.
This document discusses adaptive system-level scheduling under fluid traffic flow conditions in multiprocessor systems. It proposes a scheduling mechanism that accounts for traffic-centric system design. The mechanism evaluates scheduling methods based on effectiveness, robustness, and flexibility. It also introduces a processor-FPGA scheduling approach that reduces schedule length by taking advantage of FPGA reconfiguration. Simulation results show that processor-FPGA scheduling outperforms multiprocessor-only scheduling under certain traffic conditions. Future work will focus on formulating a traffic-centric scheduling method.
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...Tiziano De Matteis
This talk has been given at PPoPP 2016 (Barcelona)
The paper addresses the problem of designing control strategies for elastic stream processing applications. Elasticity allows applications to rapidly change their configuration (e.g. the number of used resources) on-the-fly, in response to fluctuations of their workload. In this work we face this problem by adopting the Model Predictive Control technique, a control-theoretic method aimed at finding the optimal application configuration along a limited prediction horizon by solving an online optimization problem. Our control strategies are designed to address latency constraints, by using Queueing Theory models, and energy consumption by changing the number of used cores and the CPU frequency through the Dynamic Voltage and Frequency Scaling (DVFS) function of modern multi-core CPUs. The proactive capabilities, in addition to the latency- and energy-awareness, represent the novel features of our approach. Experiments performed using a high-frequency trading application show the effectiveness compared with state-of-the-art techniques.
A full version of the slides (with transitions) is available at: https://docs.google.com/presentation/d/1VZ3y3RQDLFi_xA7Rl0Vj1iqBdoerxCMG4y53uMz9Ziw/edit?usp=sharing
Design and Implementation of a Programmable Truncated Multiplierijsrd.com
Truncated multiplication reduces part of the power required by multipliers by only computing the most-significant bits of the product. The most common approach to truncation includes physical reduction of the partial product matrix and a compensation for the reduced bits via different hardware compensation sub circuits. However, this result in fixed systems optimized for a given application at design time. A novel approach to truncation is proposed, where a full precision multiplier is implemented, but the active section of the partial product matrix is selected dynamically at run-time. This allows a power reduction trade off against signal degradation which can be modified at run time. Such architecture brings together the power reduction benefits from truncated multipliers and the flexibility of reconfigurable and general purpose devices. Efficient implementation of such a multiplier is presented in a custom digital signal processor where the concept of software compensation is introduced and analysed for different applications. Experimental results and power measurements are studied, including power measurements from both post-synthesis simulations and a fabricated IC implementation. This is the first system-level DSP core using a fine-grain truncated multiplier. Results demonstrate the effectiveness of the programmable truncated MAC (PTMAC) in achieving power reduction, with minimum impact on functionality for a number of applications. Software compensation also shown to be effective when deploying truncated multipliers in a system.
Scheduling of Heterogeneous Tasks in Cloud Computing using Multi Queue (MQ) A...IRJET Journal
This document proposes a Multi Queue (MQ) task scheduling algorithm for heterogeneous tasks in cloud computing. It aims to improve upon the Round Robin and Weighted Round Robin algorithms by overcoming their drawbacks. The MQ algorithm splits tasks and resources into separate queues based on size/length and speed. Small tasks are scheduled on slower resources and large tasks on faster resources. The document compares the performance of MQ to Round Robin and Weighted Round Robin algorithms based on makespan, average resource utilization, and load balancing level using CloudSim simulations. The results show that MQ scheduling performs better than the other algorithms in most cases in terms of these metrics.
IMPLEMENTATION OF UNSIGNED MULTIPLIER USING MODIFIED CSLAeeiej_journal
This document discusses the implementation of an unsigned multiplier using a modified carry select adder technique. It begins with an introduction to digital arithmetic operations like multiplication and addition. It then describes the proposed system, which uses a modified carry select adder based multiplier to reduce area over a traditional carry look ahead adder based multiplier, while maintaining similar delay times. The document provides details on the design of regular and modified square root carry select adders used in the multiplier. It discusses how replacing ripple carry adders with binary to excess-1 converters in the modified design can further reduce area and power consumption.
This document presents a GPU-accelerated algorithm for solving the Group Steiner Problem (GSP) which arises in routing phases of VLSI circuit design. The algorithm uses a depth-bounded heuristic to construct approximate minimum-cost Steiner trees in parallel using CUDA-aware MPI. Evaluation on a supercomputer shows the parallel implementation achieves up to 302x speedup over serial algorithms and scales well with problem size and processor count.
Memory Polynomial Based Adaptive Digital PredistorterIJERA Editor
Digital predistortion (DPD) is a baseband signal processing technique that corrects for impairments in RF
power amplifiers (PAs). These impairments cause out-of-band emissions or spectral regrowth and in-band
distortion, which correlate with an increased bit error rate (BER). Wideband signals with a high peak-to-average
ratio, are more susceptible to these unwanted effects. So to reduce these impairments, this paper proposes the
modeling of the digital predistortion for the power amplifier using GSA algorithm.
Design and Implementation of Low Power DSP Core with Programmable Truncated V...ijsrd.com
The programmable truncated Vedic multiplication is the method which uses Vedic multiplier and programmable truncation control bits and which reduces part of the area and power required by multipliers by only computing the most-significant bits of the product. The basic process of truncation includes physical reduction of the partial product matrix and a compensation for the reduced bits via different hardware compensation sub circuits. These results in fixed systems optimized for a given application at design time. A novel approach to truncation is proposed, where a full precision vedic multiplier is implemented, but the active section of the truncation is selected by truncation control bits dynamically at run-time. Such architecture brings together the power reduction benefits from truncated multipliers and the flexibility of reconfigurable and general purpose devices. Efficient implementation of such a multiplier is presented in a custom digital signal processor where the concept of software compensation is introduced and analyzed for different applications. Experimental results and power measurements are studied, including power measurements from both post-synthesis simulations and a fabricated IC implementation. This is the first system-level DSP core using a high speed Vedic truncated multiplier. Results demonstrate the effectiveness of the programmable truncated MAC (PTMAC) in achieving power reduction, with minimum impact on functionality for a number of applications. On comparison with the previous parallel multipliers Vedic should be much more fast and area should be reduced. Programmable truncated Vedic multiplier (PTVM) should be the basic part implemented for the arithmetic and PTMAC units.
On the-joint-optimization-of-performance-and-power-consumption-in-data-centersCemal Ardil
The document summarizes research on jointly optimizing performance and power consumption in data centers. It models the process of mapping tasks in a data center onto machines as a multi-objective problem to minimize both energy consumption and response time (makespan), subject to deadline and architectural constraints. It proposes using a simple goal programming technique that guarantees Pareto optimal solutions with good convergence. Simulation results show the technique achieves superior performance compared to other approaches and is competitive with optimal solutions for small-scale problems.
This document describes a proposed VLSI implementation of a high-speed DCT architecture for H.264 video codec design. It presents a Booth radix-8 multiplier-based multiply-accumulate (MAC) unit to improve throughput and minimize area complexity for 8x8 2D DCT computation. The proposed MAC architecture achieves a maximum operating frequency of 129.18MHz while reducing area by 64% compared to a regular merged MAC unit with a conventional multiplier. FPGA implementation and performance analysis demonstrate the suitability of the proposed DCT architecture for applications in HDTV systems.
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Power consumption is an important metric tool in the context of the wireless sensor networks
(WSNs). In this paper, we described a new Energy-Degree (EDD) Clustering Algorithm for the
WSNs. A node with higher residual energy and higher degree is more likely elected as a
clusterhead (CH). The intercluster and intracluster communications are realized on one hop.
The principal goal of our algorithm is to optimize the energy power and energy load among all
nodes. By comparing EDD clustering algorithm with LEACH algorithm, simulation results
have showen its effectiveness in saving energy.
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Modeling and Optimization of Resource Allocation in Cloud [PhD Thesis Progres...AtakanAral
This document summarizes Atakan Aral's PhD thesis progress report on modeling and optimizing resource allocation in cloud computing. The report outlines Aral's contributions, including developing a topology-based matching algorithm for distributed VM placement and evaluating it against baseline methods. Evaluation covers factors like bandwidth, costs, loads, and optimization criteria including deployment time, communication latency, throughput, and rejection rates. Future work is planned to enhance the algorithm and evaluation.
This document discusses topographical synthesis and its benefits. Topographical synthesis enables congestion prediction and avoidance early in the design flow during synthesis. It uses advanced optimization algorithms to deliver high quality results that correlate well to physical implementation. This eliminates costly iterations between synthesis and layout. Key benefits include accurately predicting congestion hotspots, performing optimizations to minimize congestion, and delivering results matched to post-layout timing and area without needing wire load models.
This document provides an overview of a course on Synthesis and Optimization of Digital Circuits. The course aims to present automatic logic synthesis techniques for computer-aided design of very large-scale integrated circuits and systems. The course will broadly survey the state of the art in logic-level synthesis and optimization techniques for combinational and sequential circuits. It will cover various representations of Boolean functions and their applications in logic synthesis. Students will implement Verilog and C/C++ projects focusing on optimization and synthesis methods discussed in class.
A VLSI (Very Large Scale Integration) system integrates millions of “electronic components” in a small area (few mm2 few cm2).
design “efficient” VLSI systems that has:
Circuit Speed ( high )
Power consumption ( low )
Design Area ( low )
This document summarizes a presentation given at the 4th International Conference on Advances in Energy Research titled "Pinch Analysis for MultiDimensional Sustainable Energy Systems Planning". It discusses how pinch analysis, a process integration technique, can be applied to model multi-objective optimization problems in sustainable energy system planning by considering factors like energy return on investment, cost, and carbon emissions. A case study applying this approach to the energy system in the Philippines is presented, showing a Pareto optimal front of solutions balancing these objectives.
This document discusses architectural synthesis of DSP structured datapaths. It provides an overview of the architectural level synthesis problem and subtasks like scheduling, binding, and architecture optimization. The document describes using novel mathematical programming formulations to optimize performance and structural complexity for DSP synthesis. It also discusses techniques to improve the solution time for integer linear programming formulations, and provides results for typical high-level synthesis benchmarks.
PMU-Based Real-Time Damping Control System Software and Hardware Architecture...Luigi Vanfretti
Poster Presentation at the IEEE PES General Meeting. Low-frequency, electromechanically induced, inter- area oscillations are of concern in the continued stability of inter- connected power systems. Wide Area Monitoring, Protection and Control (WAMPAC) systems based on wide-area measurements such as synchrophasor (C37.118) data can be exploited to address the inter-area oscillation problem. This work develops a hardware prototype of a synchrophasor-based oscillation damping control system. A Compact Reconfigurable Input Output (cRIO) con- troller from National Instruments is used to implement the real- time prototype. This paper presents the design process followed for the development of the software architecture. The design method followed a three step process of design proposal, design refinement and finally attempted implementation. The goals of the design, the challenges faced and the refinements necessary are presented. The design implemented is tested and validated on OPAL-RT’s eMEGASIM real-time simulation platform and a brief discussion of the experimental results is included.
The document explores design processes used by architectural and engineering organizations. It begins by stating that all such organizations have design processes, whether documented or informal. It then indicates it will examine some common design processes from simple to more professional approaches. Finally, the document requests feedback from relevant disciplines like architecture, structure, electrical, and more on a tender stage design process for building projects that the author has created in Microsoft Project to integrate all necessary disciplines.
This document discusses responsive web design and how it changes the design process. It recommends prioritizing content and identifying content chunks when designing for different screen sizes. Designers should decide on breakpoints and create grid templates for different device widths. The process involves wireframing and designing for both desktop and mobile simultaneously through iteration. Effective collaboration between designers and developers is important when screen sizes are considered.
This document discusses various concepts related to physical design implementation. It describes the inputs and outputs of physical design tools, important checks to perform before starting design such as clock and high fanout net budgeting, and concepts like floorplanning, placement, routing, libraries, multi-voltage design, and clock tree synthesis and optimization.
Logic synthesis with synopsys design compilernaeemtayyab
This document provides an overview of logic synthesis with Synopsys Design Compiler. It discusses the ASIC design flow, logic synthesis process, the Design Compiler tool, and the steps to use Design Compiler including project setup, reading the design, setting constraints, optimizing the design, and analyzing results. The goals of logic synthesis are to convert HDL to an optimized gate-level design given a library and constraints. Design Compiler is used to perform logic synthesis and optimization for area, speed or power.
Human: Thank you, that is a concise 3 sentence summary that captures the key aspects of the document.
The document discusses various aspects of physical design in VLSI circuits. It describes the physical design cycle which involves transforming a circuit diagram into a layout through steps like partitioning, floorplanning, placement, routing, and compaction. It also discusses different design styles like full-custom, standard cell, and gate array. Full-custom design allows maximum flexibility but has higher complexity, while restricted models like standard cell and gate array simplify the design process at the cost of less optimization in the layout. Physical design aims to produce layouts that meet timing and area constraints.
Episode 55 : Conceptual Process Synthesis-Design
Process Flowsheet Synthesis: Method to determine a process flowsheet that satisfies all product, operational and other requirements
SAJJAD KHUDHUR ABBAS
Ceo , Founder & Head of SHacademy
Chemical Engineering , Al-Muthanna University, Iraq
Oil & Gas Safety and Health Professional – OSHACADEMY
Trainer of Trainers (TOT) - Canadian Center of Human
Development
Building codes govern the design and construction of buildings to ensure safety and establish standards. Codes have existed for millennia and are updated regularly to reflect advances in technology and materials. The modern building code focuses on occupancy classifications, fire prevention, structural integrity, accessibility, and other life safety issues. Architects and engineers use the building code throughout the design process to ensure their designs meet all applicable requirements.
This document discusses sequential and combinational circuits. Combinational circuits are made of logic gates and their outputs depend only on the current inputs. Sequential circuits contain memory elements like flip-flops in addition to combinational logic, so their outputs depend on current inputs and the circuit's previous state. There are two types of sequential circuits: synchronous use a clock signal to control state changes while asynchronous circuits change state directly in response to input changes.
The document discusses various concepts and methodologies related to software design including design specification modules, design languages like use case diagrams and class diagrams, fundamental design concepts like abstraction and modularity, modular design methods and criteria for evaluation, control terminology, effective modular design principles of high cohesion and low coupling, design heuristics, and ten heuristics for user interface design.
Sequential circuits consist of combinational logic and memory elements like latches and flip-flops. There are different types of latches and flip-flops that differ in their trigger mechanisms and outputs, including SR latches, D latches, and edge-triggered flip-flops like SR, D, and JK flip-flops. Asynchronous inputs can directly set or reset flip-flop outputs independent of the clock signal.
This document discusses green computing and proposes a simulator for evaluating green scheduling algorithms. It begins with background on green computing and why it is important. It then outlines the key components of the simulator, including: a computation model using DAGs, an energy consumption model based on CPU throttling levels, and an abstraction for energy-aware schedulers. The document describes classes for modeling cores, throttling levels, and the overall simulation framework, which is designed to be extensible to different scheduling algorithms, core types, and energy models. The goal is to simulate and evaluate scheduling heuristics to minimize energy consumption while meeting performance targets.
A verilog based simulation methodology for estimating statistical test for th...ijsrd.com
The low Power estimation is an important aspect in digital VLSI circuit design. The estimation includes a power dissipation of a circuit and hence this to be reduces. The power estimations are specific to a particular component of power. The process of optimization of circuits for low power, user should know the effects of design techniques on each component. There are different power dissipation methods for reduction in power component. In this paper, estimating the power like short circuit and the total power, power reduction technique and the application of different proposed technique has been presented here. Hence, it is necessary to provide the information about the effect on each of these components.
Analysis Of Optimization Techniques For Low Power VLSI DesignAmy Cernava
This document summarizes optimization techniques for low power VLSI design. It discusses that power management is a key challenge in deep sub-micrometer designs due to increased complexity. It surveys state-of-the-art optimization methods at different levels of abstraction that target designing low power digital circuits. These include techniques at the technology level like multi-threshold CMOS and multi-supply voltages. At the circuit level, transistor sizing and at the logic level, techniques like don't-care optimization, path balancing, and factorization are discussed to reduce switching activity and power dissipation.
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...VLSICS Design
Increased downscaling of CMOS circuits with respect to feature size and threshold voltage has a result of
dramatically increasing in leakage current. So, leakage power reduction is an important design issue for
active and standby modes as long as the technology scaling increased. In this paper, a simultaneous active
and standby energy optimization methodology is proposed for 22 nm sub-threshold CMOS circuits. In the
first phase, we investigate the dual threshold voltage design for active energy per cycle minimization. A
slack based genetic algorithm is proposed to find the optimal reverse body bias assignment to set of noncritical paths gates to ensure low active energy per cycle with the maximum allowable frequency at the
optimal supply voltage. The second phase, determine the optimal reverse body bias that can be applied to
all gates for standby power optimization at the optimal supply voltage determined from the first phase.
Therefore, there exist two sets of gates and two reverse body bias values for each set. The reverse body bias
is switched between these two values in response to the mode of operation. Experimental results are
obtained for some ISCAS-85 benchmark circuits such as 74L85, 74283, ALU74181, and 16 bit RCA. The
optimized circuits show significant energy saving ranged (from 14.5% to 42.28%) and standby power
saving ranged (from 62.8% to 67%).
Implementation of Area Effective Carry Select AddersKumar Goud
Abstract: In the design of Integrated circuit area occupancy plays a vital role because of increasing the necessity of portable systems. Carry Select Adder (CSLA) is a fast adder used in data processing processors for performing fast arithmetic functions. From the structure of the CSLA, the scope is reducing the area of CSLA based on the efficient gate-level modification. In this paper 16 bit, 32 bit, 64 bit and 128 bit Regular Linear CSLA, Modified Linear CSLA, Regular Square-root CSLA (SQRT CSLA) and Modified SQRT CSLA architectures have been developed and compared. However, the Regular CSLA is still area-consuming due to the dual Ripple-Carry Adder (RCA) structure. For reducing area, the CSLA can be implemented by using a single RCA and an add-one circuit instead of using dual RCA. Comparing the Regular Linear CSLA with Regular SQRT CSLA, the Regular SQRT CSLA has reduced area as well as comparing the Modified Linear CSLA with Modified SQRT CSLA; the Modified SQRT CSLA has reduced area. The results and analysis show that the Modified Linear CSLA and Modified SQRT CSLA provide better outcomes than the Regular Linear CSLA and Regular SQRT CSLA respectively. This project was aimed for implementing high performance optimized FPGA architecture. Modelsim 10.0c is used for simulating the CSLA and synthesized using Xilinx PlanAhead13.4. Then the implementation is done in Virtex5 FPGA Kit.
Keywords: Field Programmable Gate Array (FPGA), efficient, Carry Select Adder (CSLA), Square-root CSLA (SQRTCSLA).
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
Call for paper 2012, hard copy of Certificate, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJCER, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, research and review articles, IJCER Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathematics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer review journal, indexed journal, research and review articles, engineering journal, www.ijceronline.com, research journals,
yahoo journals, bing journals, International Journal of Computational Engineering Research, Google journals, hard copy of Certificate,
journal of engineering, online Submission
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
1) The document proposes techniques for reducing power consumption in VLSI circuits, including minimizing bus transitions using coding schemes, resistive feedback paths to eliminate glitches, and voltage scaling.
2) A resistive feedback method is developed to eliminate glitches in CMOS circuits which reduces power consumption and improves performance.
3) Simulation results show that the proposed resistive feedback technique is effective at minimizing glitches and reducing unnecessary power dissipation compared to a design without feedback paths.
Design Analysis of Delay Register with PTL Logic using 90 nm TechnologyIJEEE
This paper presents low area and power efficient delay register using CMOS transistors. The proposed register has reduced area than the conventional register. This resistor design consists of 6 NMOS and 6 PMOS. The proposed delay register has been designed in logic editor and simulated using 90nm technology. Also the layout simulation and parametric analysis has been done to find out the results. In this paper register has been designed using full automatic layout design and semicustom layout design. Then the performance of these different designs has been analyzed and compared in terms of power, delay and area. The simulation result shows that circuit design of delay register using PTL techniques improved by power 0.05% and 61.8% area.
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...VLSICS Design
Increased downscaling of CMOS circuits with respect to feature size and threshold voltage has a result of
dramatically increasing in leakage current. So, leakage power reduction is an important design issue for
active and standby modes as long as the technology scaling increased. In this paper, a simultaneous active
and standby energy optimization methodology is proposed for 22 nm sub-threshold CMOS circuits. In the
first phase, we investigate the dual threshold voltage design for active energy per cycle minimization. A
slack based genetic algorithm is proposed to find the optimal reverse body bias assignment to set of noncritical
paths gates to ensure low active energy per cycle with the maximum allowable frequency at the
optimal supply voltage. The second phase, determine the optimal reverse body bias that can be applied to
all gates for standby power optimization at the optimal supply voltage determined from the first phase.
Therefore, there exist two sets of gates and two reverse body bias values for each set. The reverse body bias
is switched between these two values in response to the mode of operation. Experimental results are
obtained for some ISCAS-85 benchmark circuits such as 74L85, 74283, ALU74181, and 16 bit RCA. The
optimized circuits show significant energy saving ranged (from 14.5% to 42.28%) and standby power
saving ranged (from 62.8% to 67%).
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...VLSICS Design
Increased downscaling of CMOS circuits with respect to feature size and threshold voltage has a result of dramatically increasing in leakage current. So, leakage power eduction is an important design issue for active and standby modes as long as the technology scaling increased. In this paper, a simultaneous active and standbyrgy optimization methodology is proposed for 22 nm sub-threshold CMOS circuits. In thefirst phase, we investigate the dual threshold voltage design for active energy per cycle minimization. A slack based genetic algorithm is proposed to find the optimal reverse body bias assignment to set of noncritical paths gates to ensure low active energy per cycle with the maximum allowable frequency at the optimal supply voltage. The second phase, determine the optimal reverse body bias that can be applied to
all gates for standby power optimization at the optimal supply voltage determined from the first phase. Therefore, there exist two sets of gates and two reverse body bias values for each set. The reverse body bias is switched between these two values in response to the mode of operation. Experimental results are obtained for some ISCAS-85 benchmark circuits such as 74L85, 74283, ALU74181, and 16 bit RCA. The optimized circuits show significant energy saving ranged (from 14.5% to 42.28%) and standby power saving ranged (from 62.8% to 67%).
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...VLSICS Design
Increased downscaling of CMOS circuits with respect to feature size and threshold voltage has a result of dramatically increasing in leakage current. So, leakage power reduction is an important design issue for active and standby modes as long as the technology scaling increased. In this paper, a simultaneous active and standby energy optimization methodology is proposed for 22 nm sub-threshold CMOS circuits. In the first phase, we investigate the dual threshold voltage design for active energy per cycle minimization. A
slack based genetic algorithm is proposed to find the optimal reverse body bias assignment to set of noncritical
paths gates to ensure low active energy per cycle with the maximum allowable frequency at the optimal supply voltage. The second phase, determine the optimal reverse body bias that can be applied to all gates for standby power optimization at the optimal supply voltage determined from the first phase.
Therefore, there exist two sets of gates and two reverse body bias values for each set. The reverse body bias is switched between these two values in response to the mode of operation. Experimental results are obtained for some ISCAS-85 benchmark circuits such as 74L85, 74283, ALU74181, and 16 bit RCA. The optimized circuits show significant energy saving ranged (from 14.5% to 42.28%) and standby power
saving ranged (from 62.8% to 67%)
Implementation of low power divider techniques using radixeSAT Journals
This document summarizes a research paper on implementing low power divider techniques using radix-8 division. It describes the design of a radix-8 divider that aims to reduce energy dissipation without penalizing latency. The algorithm and implementation of the divider are explained. Several low power techniques are then applied, including switching off inactive blocks, retiming the recurrence, reducing transitions in multiplexers, changing the redundant representation, partitioning the selection function, and modifying the conversion and rounding process. Applying these techniques can reduce energy dissipation by up to 70% compared to a standard implementation. The performance of the radix-8 divider is also compared to a radix-4 divider and one using overlapped radix-2 stages.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This document describes the design and verification of an 8x8 Vedic multiplier using a 90nm CMOS process. It presents the design methodology, including the use of Vedic multiplication algorithms to reduce computational steps compared to traditional methods. Transistor-level schematics for 2x2 and 4x4 multiplier modules are designed in Cadence using a 90nm library. The 4x4 module uses ripple carry adders to sum partial products in parallel. Simulation results verify the transistor-level designs match an ideal multiplier designed in Verilog, demonstrating an efficient digital multiplier based on Vedic mathematics.
Dual technique of reconfiguration and capacitor placement for distribution sy...IJECEIAES
Radial Distribution System (RDS) suffer from high real power losses and lower bus voltages. Distribution System Reconfiguration (DSR) and Optimal Capacitor Placement (OCP) techniques are ones of the most economic and efficient approaches for loss reduction and voltage profile improvement while satisfy RDS constraints. The advantages of these two approaches can be concentrated using of both techniques together. In this study two techniques are used in different ways. First, the DSR technique is applied individually. Second, the dual technique has been adopted of DSR followed by OCP in order to identify the technique that provides the most effective performance. Three optimization algorithms have been used to obtain the optimal design in individual and dual technique. Two IEEE case studies (33bus, and 69 bus) used to check the effectiveness of proposed approaches. A Direct Backward Forward Sweep Method (DBFSM) has been used in order to calculate the total losses and voltage of each bus. Results show the capability of the proposed dual technique using Modified Biogeography Based Optimization (MBBO) algorithm to find the optimal solution for significant loss reduction and voltage profile enhancement. In addition, comparisons with literature works done to show the superiority of proposed algorithms in both techniques.
DESIGN & ANALYSIS OF A CHARGE RE-CYCLE BASED NOVEL LPHS ADIABATIC LOGIC CIRCU...VLSICS Design
This paper focuses on principles of adiabatic logic, its classification and comparison of various adiabatic logic designs. An attempt has been made in this paper to modify 2PASCL (Two Phase Adiabatic Static CMOS Logic) adiabatic logic circuit to minimize delay of the different 2PASCL circuit designs. This modifications in the circuits leads to improvement of Power Delay Product (PDP) which is one of the figure of merit to optimize the circuit with factors like power dissipation and delay of the circuit. This paper investigates the design approaches of low power adiabatic gates in terms of energy dissipation and uses of Simple PN diode instead of MOS diode which reduces the effect of Capacitances at high transition and power clock frequency. A computer simulation using SPECTRE from Cadence is carried out on different adiabatic circuits, such as Inverter, NAND, NOR, XOR and 2:1 MUX.
This document analyzes the energy dissipation of digital half band filters operated in the sub-threshold region with throughput constraints. It explores various architectures of a 12-bit half band filter including the basic implementation and unfolded structures. Simulation results show that the unfolded by 2 architecture dissipates 22% less energy per sample compared to the original filter, making it the most energy efficient. The unfolded by 4 architecture best meets throughput requirements of 120K-1M samples/sec, dissipating less energy than other implementations in this speed range.
A Review: Compensation of Mismatches in Time Interleaved Analog to Digital Co...IJERA Editor
The execution of today's correspondence frameworks is exceedingly subject to the utilized Analog-to-Digital converters (ADCs), and with a specific end goal to give more flexibility and exactness to the developing correspondence innovations, superior-ADCs are needed. In this respect, the time-interleaved operation of an exhibit of ADCs (TI-ADC) might be a sensible result. A TI-ADC can build its throughput by utilizing M channel ADCs or sub converters in parallel and examining the data motion in a period-interleaved way. In any case, the execution of a TI-ADC gravely suffers from the bungles around the channel ADCs. In this paper we survey the advancement in the configuration of low-intricacy advanced remedy structures and calculations for time-interleaved ADCs in the course of the most recent five years. We devise a discrete-time model, state the outline issue, and finally infer the calculations and structures. Specifically, we examine proficient calculations to outline time-differing remedy filters and additionally iterative structures using polynomial based filters. Thusly, the remuneration structure may be utilized to repay time-differing recurrence reaction befuddles in time-interleaved ADCs, and in addition to remake uniform examples from nonuniformly tested indicators. We examine the recompense structure, research its execution, and exhibit requisition zones of the structure through various illustrations. At long last, we give a standpoint to future examination questions.
ENERGY PERFORMANCE OF A COMBINED HORIZONTAL AND VERTICAL COMPRESSION APPROACH...IJCNCJournal
Energy efficiency is an essential issue to be reckoned in wireless sensor networks development. Since the low-powered sensor nodes deplete their energy in transmitting the collected information, several strategies have been proposed to investigate the communication power consumption, in order to reduce the amount of transmitted data without affecting the information reliability. Lossy compression is a promising solution recently adapted to overcome the challenging energy consumption, by exploiting the data correlation and discarding the redundant information. In this paper, we propose a hybrid compression approach based on two dimensions specified as horizontal (HC) and vertical compression (VC), typically implemented in cluster-based routing architecture. The proposed scheme considers two key performance metrics, energy expenditure, and data accuracy to decide the adequate compression approach based on HC-VC or VC-HC configuration according to each WSN application requirement. Simulation results exhibit the performance of both proposed approaches in terms of extending the clustering network lifetime.
DESIGN OF IMPROVED RESISTOR LESS 45NM SWITCHED INVERTER SCHEME (SIS) ANALOG T...VLSICS Design
This work presents three different approaches which eliminates the resistor ladder completely and hence
reduce the power demand drastically of a Analog to Digital Converter. The first approach is Switched
Inverter Scheme (SIS) ADC; The test result obtained for it on 45nm technology indicates an offset error of
0.014 LSB. The full scale error is of -0.112LSB. The gain error is of 0.07 LSB, actual full scale range of
0.49V, worst case DNL & INL each of -0.3V. The power dissipation for the SIS ADC is 207.987 μwatts;
Power delay product (PDP) is 415.9 fWs, and the area is 1.89μm2. The second and third approaches are
clocked SIS ADC and Sleep transistor SIS ADC. Both of them show significant improvement in power
dissipation as 57.5% & 71% respectively. Whereas PDP is 229.7 fWs and area is 0.05 μm2 for Clocked SIS
ADC and 107.3 fWs & 1.94 μm2 for Sleep transistor SIS ADC.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
BÀI TẬP DẠY THÊM TIẾNG ANH LỚP 7 CẢ NĂM FRIENDS PLUS SÁCH CHÂN TRỜI SÁNG TẠO ...
Low power tool paper
1. 1
How to transform
an architectural synthesis tool for
low power VLSI designs
S. Gailhard, N. Julien, J.-Ph. Diguet, E. Martin
IUP-LESTER
2, rue le Coat Saint Haouen
56100 Lorient
France
E-mail: gailhard@univ-ubs.fr
Fax : (33) 2.97.88.05.51 Phone : (33) 2.97.88.05.45
2. 2
How to transform
an architectural synthesis tool for
low power VLSI designs
Abstract - High Level Synthesis (HLS) for Low Power VLSI
design is a complex optimization problem due to the
Area/Time/Power interdependence. As few low power design
tools are available, a new approach providing a modular low
power synthesis method is proposed. Although based for the
moment on a generic architectural synthesis tool Gaut, the use
of different "commercial" tools is possible
Our Gaut_w HLS tool is constituted of low power modules =
High level power dissipation estimation, Assignment, Module
selection (Operators and supply voltage), Optimization criteria
and Operators library.
As illustration, power saving factors on DWT algorithms are
presented.
Keywords - Digital Signal Processing (DSP), High Level
Synthesis (HLS), Low-Power Methods, Modular approach,
Discrete Wavelet Transform (DWT).
3. 3
1. Introduction
Most VLSI researches have been focused on optimizing circuits speed and area to
perform complex signal processing. Indeed, the rapid advance in VLSI technology and the
increasing computation required for recent real time DSP applications infer large chips density
and high clock frequency. Then, the power dissipation minimization in modern circuits, such as
mobile systems, is a crucial problem. To minimize the power consumption of a system, VLSI
researches work on low power HLS tools. Power optimization can be processed at several
levels and the expecting power saving factor at each level can be expressed as follows :
behavioral (10-100%), architectural (10-100%), logical (10-50%) or physical (<20%) [3].
Therefore, at behavioral and architectural levels, the expected saving power is more significant
: these two domains have been less explored in the literature. The method integrated in our
HLS Gaut_w proposes techniques in order to optimize the design at these two levels.
Actually, a lot of architectural synthesis tools have been proposed. These tools differ from
the applications kind they can implement and from the targeted architecture. Therefore, it is
important to develop methods for low-power that can be quickly integrated in each user
architectural synthesis tool. For such a purpose, the best way is to develop low power modules
that are used before the architectural synthesis tool in the design flow.
In this paper, a new approach to HLS tool for low power and time constrained DSP
systems is proposed. The second section presents previous works on power estimation and
optimization. Section 3 explores briefly our Gaut_w HLS tool and its different modules (High
Level Power Estimation, Module Selection, Optimization Criteria, Assignment). In the section
4, results on DWT algorithm [2] illustrating our method are given before a conclusion.
2. Previous Works
2.1 Power estimation
There are three major sources of power dissipation in digital CMOS circuits that are
summarized in the following equation (1). The first term represents the power switching
component, the second term is due to direct-path short circuit current and the last term is the
power dissipation due to leakage current.
P P P P C p V fswitching shortcircuit leakage dd
2
= ( . ) . .+ + = (1)
where C is the physical capacitance of the CMOS circuit, p is the power consuming
transition (0→1) probability and Vdd the supply voltage
However for CMOS circuits, it assumed that both short-circuit and leakage power
dissipation is negligible. Therefore, the power dissipation is given by (2) [4] :
P C V fcomp dd
2
= . . (2)
Where Ccomp is the average capacitance switched to perform a computation and f the sampling
frequency.
It is well known that the signals statistics influence the power dissipation [5]. Nevertheless, at a
high level, it can be supposed that signal values are uniformly distributed. Then, the
capacitance Ccomp is described by the following expression (3) [6] :
4. 4
C N C N C C Ccomp i i reg reg erconnect control
i
= + + +∑ . . int (3)
Where Ni is the number of i operations, Ci the i operator capacitance that contains the
intrablock routing and gates capacitance, NregCreg the effective registers capacitance, Cinterconnect
the physical interconnected capacitance estimation and Ccontrol the controller capacitance
estimation.
At the architectural level, the complete architecture is known and signals statistics in the
circuit can be calculated. Therefore, power dissipation estimation in a typical execution can be
computed more precisely [5].
2.2 Power optimization
Low power optimization is achieved by reducing either Ci with technology considerations
or Ccomp by the operators selection and the architectural synthesis. Moreover, voltage scaling
has a great impact on the power dissipation because of the quadratic dependence of Vdd (2).
Unfortunately, reducing Vdd increases the operators latency T significantly when the supply
voltage approaches the device threshold voltage Vt (4) [4] :
( )
T
V
V V
dd
dd t
∝
−
2 (4)
Therefore it is important to find a good trade-off between power dissipation and time
performance for DSP applications.
2.3 Low power HLS tools
Few HLS optimization tools have been proposed in the literature. Hyper_LP [6] has
integrated high level transformations and an ATP space exploration that shows the supply
voltage scaling impact on the design. Concerning the module selection, most of previous works
only goals design area optimization [7]. Moreover, a module selection (different operators at
different supply voltages) has been proposed which gives the lower bounds on Area,
Performance and Power dissipation [8]. Scheduling methods have been presented to schedule
Multi-Voltage Datapaths [9].
Nevertheless, no HLS synthesis tool using a modular approach and a classical
architectural synthesis tool has been proposed yet.
3. HLS tool Gaut_w
3.1 Overview of the design flow
The design flow of our HLS tool is illustrated in Figure 1. The three entries are the
algorithmic description of the application (behavioral VHDL), the operators library and the
optimization criteria. At present, this tool is based on the Gaut architectural synthesis tool
[1,14], but such a global method could be extended through any another tool. Gaut_w contains
two low power modules : Module Selection that selects the best operators set and the best
supply voltage for a given time constraint, and Assignment that assigns operations of the Data
Flow Graph (DFG) on physical operators in order to decrease the transition rate of operators
entries. These modules minimize a cost function (depending on both area and power) whose
weighted parameters are chosen by the user (Optimization Criteria). The design is optimized
5. 5
through a Fast Power dissipation estimation based on average power dissipation of resources
(Operators library) used in the architecture and a probabilistic estimation of accesses on each
different resource (registers, memory, operators).
Graph
Assigned
DFG
Data Flow
Graph
Compilation
Assignment
Optimisation
criteria
Users needs
Algorithmic
specification
Operators
library
Architectural
synthesis
Tool
Fast Power
dissipation
estimation
High Level
Power
Dissipation
Estimation
Module
Selection
(operators,
supply
voltage)
RTL
specification
of the circuit
Logic
synthesis
Tool
VLSI Circuit
Gaut_w
Figure 1 : Low Power DSP HLS Tool
3.2 Formal library
The synchronous architecture synthesized by Gaut tool used operators, memories,
registers and a clock tree.
3.2.1 Operators
Each operator has been simulated by the logic simulator Compass to estimate the average
power dissipation [10]. The power estimation is the mean result of 5 simulations with 10.000
values length stimuli [11] (using Compass and Powercalc), implying a 95% confidence interval
and a less than 1% error. This method provides a good trade-off between the simulation time
and the results precision.
Therefore, the library contains operators with different ATP characteristics. For example,
an addition function in the DFG can be realized with 4 different operators (Table I).
6. 6
3.2.2 Memory library
The great number of transistors in a memory implies that the power dissipation due to the
leakage current is no more negligible according to the switching component. Therefore, the
power dissipation estimator integrated in the Compass CAD tool is used to determine the
switching and the leakage component of the power dissipation. These terms depend on the
memory size.
Therefore, the memory power dissipation is based on a static power dissipation
Stat(S).Vdd2
and on a dynamic effective capacitance Cdyn(S). We do not have an analytic
expression, but only values corresponding to each different memory size.
3.2.3 clock library
The Cclk clock tree capacitance estimation is based on the ( )N SgRe estimated registers
number and on the Creg clock tree capacitance per register: . ( )C S N CClk reg reg= . .
3.2.4 Library example
Table I represents a library extract, where T represents the latency time for a 5.5 Volts
supply voltage, S the area and P the power dissipation estimation for a 5.5 Volts supply
voltage. The operator Ci effective capacitance can be calculated with expression (2) (P, Vdd
and f are known). The latency time at different supply voltages is estimated using expression
(4) (Vth is a known device characteristic).
16 bits Operators T
(ns)
S
(10-3
mm2)
P
(µW)
Add (DataPath) 14 73 412
Fast add (DataPath) 8.6 122 711
Add (VHDL) 18.9 53 229
Fast add (VHDL) 9.9 117 626
Mult (DataPath) 32.3 1337 18827
Pipeline mult (DataPath) 22 1337 18821
Mult (VHDL) 53.3 1023 11820
Fast mult (VHDL) 33.2 1270 16231
Register 4 70 71
Clock per register - - 65
memory 28×16 bits
Static : 36 mW
20 1163 961
Table I : Library example (CMOS 1 µµm technology)
3.3 Power dissipation estimation
The aim of this module is to give a fast power dissipation estimation calculated at a high
level of abstraction.Currently, the method targets time constrained DSP applications :
operators used in the circuit have an activation rate close to 100%.
The first estimation step is the scheduling probability computation of each operation in the
DFG [12]. This permits to estimate the storage probability of each variable (transfer via a
register, the memory and a register or transfer via a register) and then the probable activity of
7. 7
the memory (Nmem) and the registers (NReg). The probable registers number can also be
estimated allowing to estimate the power dissipation of the clock tree. The Ni operators
activity is calculated on the DFG. Therefore, this probable activity of resources and their
average power dissipation available in the library leads to the total power dissipation Power(S)
(5) :
( )
( ) ( ) ( )( )
( )
Power S V
T
C S C S C S
T
C Static S
dd
r
Op g Dyn Mem
Clk
Clk Mem
=
+ +
+ +
2
1
1
.
.
Re _
(5)
With :
( )C S N COp i ii
N
= =∑ .1
0
( )C S N Cg reg regRe =
( )C S N CClk reg reg= .
( )C S N CDyn Mem mem dyn_ .=
Where COp(S) and CReg(S) represent respectively the effective capacitance of operators and
registers, each of these capacitances switching every Tr ns (time constraint). Cclk is the clock
tree capacitance estimation, ( )N SgRe the probable registers number and Creg the capacitance
per register (CClk switches every TClk ns : internal clock frequency). Stat(S).Vdd2
is the static
power dissipation and CDyn_mem(S) the dynamic effective capacitance.
3.4 Optimization criteria
Each designer has its own optimization problem : circuits for mobile electronic systems
require low power dissipation, whereas some designs need a minimum area circuit in order to
be cheaper. Therefore, the optimization criteria must depend on both area and power (Figure
2).
Area
Performance
Power
Time Constraint
(down limit)
Optimizing ?
• Power
• Area
Figure 2 : Optimization criteria
8. 8
A lot of functions can be integrated in our optimization criteria module. Cost functions
using linear functions of both area and power dissipation have been chosen; an α factor (0 ≤ α
≤1) is introduced to select the power and area rate. β is a normalization coefficient (6).
( ) ( ) ( ) ( )Cost S . . Area S Power S
α
αα β= − +1 . . (6)
where Area(S) is the area estimation (operators, registers, interconnections and bus [7]),
Power(S) is the power dissipation estimation (5).
3.5 Assignment
Operations assignment on operators in the circuit is an important task in the architectural
synthesis in order to decrease the power dissipation. A high switching factor at operator entries
implies a high switching rate of operator gates and then a high power dissipation [5]. But, at a
high level of abstraction, it is impossible to know exactly the circuit architecture and to
calculate the activity rate of all operators entries. Therefore, only two models of operators
power dissipation are used :
• both of the operator entries change between two operations with white noise inputs
• only one of the operator entry changes between two operations, the other one is
constant
The power dissipation is greater in the first model than in the second one in a 25 to 50%
order depending on the operator. The first model is an estimation of the power dissipation for
two variables operation when the second model is an estimation of the power dissipation on a
variable and a constant one. Therefore, at a high level of abstraction, it is important to guide
the architectural synthesis to use the second model operator as often as possible in order to
optimize the design power dissipation. Let's take an example of this following DFG. A, B, C,
D and E are variables and their values change every clock cycle, but Cst is a constant that
keeps the same value for every clock cycle.
*
+
+
*
*
*
BA
Cst
rt
C
D E
R
Cst
R
Figure 3 : A DFG example
Multiplication on the DFG can be scheduled in different ways on one or more multipliers :
the operators power dissipation depends on the scheduling task. With the power dissipation
model used, it is clear that the minimum power dissipation is obtained when one multiplier is
dedicated only for multiplication with the Cst value (Constant value) : two multiplications are
9. 9
done with the second model (Cst*R, Cst*C) and two multiplications with the first model (A*B
and D*E) to compute the DFG. If only one multiplier is used, the operator power dissipation is
greater : for example, if Cst*R is scheduled after A*B, then both of the two operator entries
change (A→ Cst and B→R) and then the multiplication is done in the first model.
Therefore, an assignment module detecting operations with constant has been developed.
Then, the Assignment Module assigns or not operators performing operations with constant to
optimize the cost function (6) defined by the user (Optimization Criteria).
3.6 Module selection
The module selection is the second step in the HLS GAUT design flow (Figure 1). Its
utility is to determine the optimal supply voltage and the optimal operators set from a given
library according to a DSP application (an algorithm and a time constraint) for design
optimization (cost functions defined upper).
The module selection presented here explores all the different solutions dealing with
different operators (ATP, mono or multi-function, pipeline operators) and different supply
voltages. Our DFG processing is carried in the order defined in Figure 4. To estimate the cost
function (6), the operators allocation has to be obtained from the number of pipeline stages.
This requires to know the operators latency and therefore the selected operators and the supply
voltage (4). To simplify the ATP space exploration and reduce the cost estimation number, the
following lemma is proposed [7] : the use of multi-functional units is necessary only if one of
its functions is underused, e.g. if there is at least one mono-functional operator in the operators
set S with an efficiency less than 100%. Therefore, a first step provides the best selection from
mono-functional operators. If mono-functional operators are underused, an optimization with
multi-functional units is processed for reducing the circuit area.
In conclusion, our module selection algorithm solves the best operators set for each
supply voltage (mono and multi-functions) and the best supply voltage selection in a range
from Vdd_min to Vdd_max (Figure 4) [13].
for Supply voltage from Vdd_min to Vdd_max
for all mono-functional operators set
Latency time calculation of each selected operator (4)
Number of pipeline stages calculation
Allocation estimation in each pipeline stage
Cost calculation (6)
End mono-functional operators set
Multifunctional operators resolution
ððBest operators set selection for each supply voltage
End for Supply voltage
ððBest selection (Supply voltage and operators set)
Figure 4 : Module Selection Algorithm
Nevertheless, it is possible to use normalised supply voltages (5, 3.3 ... Volts).
This ATP exploration leads to the minimum of the cost function chosen by the designer.
4. Results
In this submission, only few results on DWT algorithm are presented. However, more
results will be given in the final version of this paper with impacts of each different methods
(Voltage scaling, Selection of different operators, Assignment) on the power dissipation.
10. 10
4.1 Discrete Wavelet Transform algorithm
There are a lot of image transformations as Fast Fourier Transform, Discrete Cosinus
Transform, and Wavelet transform. The last one has got several interesting properties and is
more and more used. The wavelet transform used here is based on a "g" high pass filter and on
a "h" low pass filter [2]. Several steps are needed to calculate the results as shown in the
Figure 5 and 6. The H and G filters are first computed on rows and after on columns of the
initial image. To calculate the DWT with one more resolution level, the same computation is
realized on a quarter of the image.
C 0
H
H
H 1:2
H
H
G
G
1:2
1:2
2:1
2:1
G
1:2
1:2
1:2
1:2
1:2G
G
G
2:1
2:1
H
D-
1d
D-
1h
D-
1v
C-1
C-2
D-
2h
D-
2d
D-
2v
First resolution level Second resolution level
Rows Columns
Rows Columns
2:1 : Keep one column out of 2 1:2 : Keep one row out of 2
Figure 5 : DWT with three resolution levels
11. 11
D-1dD-1v
D-1h
D-2d
D-2h
D-2v
C-2
Figure 6 : Result after the DWT algorithm with two resolution levels
Several h and g filters are possible and two linear phase FIR filters with different sizes
have been chosen because of their good results [2] :
9-3 : 9th
linear phase FIR filter (h), 3rd
linear phase FIR filter (g)
9-7 : 9th
linear phase FIR filter (h), 7th
linear phase FIR filter (g)
The algorithm can be used with different resolution levels. The example of 512×512 pixels
images at the frequency of 10 images per second is treated : it implies a time constraint at
1/10=100 ms per image. To satisfy this constraint, one algorithm is specified on a n×n block
(algo_bloc) and another one realizes the calculus only on one image point (algo_point).
The power dissipation of both algorithms is compared just after their description.
Therefore, the power dissipation estimation module permits to select at the first step of the
system design the more suitable algorithm according to the power dissipation.
Table II gives results of the power dissipation estimation and the required CPU time on a
Sparc Station. It has been proved on a moving detection application [11] that the absolute
power dissipation error is less than 20% compared with the value obtained by a logic simulator
Compass (after architectural and logical synthesis with Gaut [1,14])
Algorithm
Results
algo_bloc algo_point
Power dissipation
estimation
1131 mW 436 mW
CPU time on a
Sparc Station
13 minutes 7 seconds
Table II : Comparison of two DWT transforms implantation
Therefore, at a high level of abstraction, the fast power dissipation estimation allows to
compare two different implantations and to choose the better algorithm at the early stage of
the design flow. These results show that the algo_point algorithm is more convenient to a
power minimisation.
4.2 Module selection results
The algo_point algorithm is selected to illustrate results on Module Selection. The circuit
has to implement the algorithm with three resolution levels on 10 images per second. So, it
12. 12
needs to specify a processor that performs two FIR filters (g, a 7th and h, a 9th linear phase
FIR filters) (7) every 300 ns has been specified :
( ) ( )( )
( ) ( )( )
y n h x h x x
y n g x g x x
n i n i n i
i
n i n i n i
i
= + +
= + +
− −
=
− −
=
∑
∑
0 2 2
0
3
0 2 2
0
2
. .
. .'
(7)
Table I in section 3 is the part of the operators library used in our HLS tool for this
application. The time constraint is 300 ns. Figure 7 shows the results (Power(S) (5), Area(S)
and Cost(S) (6)) of the best operators set for each supply voltage, with α equal to 0.5.
Titre:
Créé par:
Date de création:
Figure 7 : Cost as a function of Vdd on a DWT algorithm
The power dissipation decreases with the supply voltage because of the quadratic
dependence of Vdd on the power dissipation (1). However, when the supply voltage decreases,
operators latency (4) decreases, that involves more operators allocation and the design area
increases.
Table III shows the HLS GAUT tool results for some supply voltages : best operators set
(Module selection), operators power dissipation estimation, total area and operators number in
the circuit after an architectural synthesis.
Vdd
(Volts)
Number of selected Operator (latency time (ns)) Power (mW) Area (mm2
)
4.9 1 Adder Datapath (20) 3 Mult Vhdl (70) 360 5.16
3.9 2 Adder Vhdl (40) 3 Mult Vhdl (90) 222 5.76
3.8 2 Add Vhdl (40) 2 Fast Mult Vhdl (60) 277 4.82
3.2 2 Add Datapath (40) 3 Fast Mult Vhdl (80) 199 6.14
Table III : Module selection results on a DWT algorithm
13. 13
Breaks in the Figure 7 are explained by the table III. For a 3.9 Volts supply voltage, three
multipliers and two adders are allocated. However for a 3.8 Volts supply voltage, two adders
and only two multipliers are allocated. This infers that the area cost is smaller at 4.5 Volts
whereas the power dissipation estimation is greater.
Cost functions (6) depend on the α term which represents the relative importance between
the area and the power dissipation. Figure 8 shows the impact of this term variation on the cost
function. The optimal supply voltage when 0.2<α<1 is around 3.8 Volts. It means that the
supply voltage can decrease from 5 Volts to 3.8 Volts without a significant area penalty (only
few per cents) and with a 40% power dissipation optimization. Therefore, the α term permits
to each user to optimize the design according to his application problem (area or power
dissipation).
Titre:
Créé par:
Date de création:
Figure 8 : Apha impact on the cost function
Module Selection requires less than one CPU minute on a Sparc station. This implies that
the method can be applied on a large operators library and on complex applications without
computation time problems.
5. Conclusion
A generic method for low power VLSI design, integrated in the Gaut_w HLS tool
(Estimation and Optimization), is presented here. This new approach is modular and is realized
before the architectural synthesis. Therefore, it can be applied to any other architectural
synthesis tool. A fast power dissipation estimation at a high level of abstraction has also been
developed which compares different time constrained DSP algorithms at an early stage of the
design.
Results presented on DWT algorithms show that the High level power dissipation
estimation permits to obtain quickly power savings equal to 3 by the selection of the best
algorithm. Moreover, a 40% power reduction can be reasonably envisioned by using Module
selection without a significant area penality.
14. 14
References :
[1] E. Martin, O. Sentieys, H. Dubois, J.-L. Philippe
GAUT, an architectural synthesis tool for dedicated signal processors
Proceedings EURO-DAC 93
[2] M. Antonini, M. Barlaud, P. Mathieu, I. Daubechies,
Image coding using wavelet transform
IEEE trans. on Image Processing, Vol. 1, No 2, Avril 92, pp. 205-220
[3] P. E. Landman
Low-power Architectural design methodologies
PhD Thesis, Berkeley, August 94
[4] A. Chandrakasan, S. Sheng, R. W. Brodersen
Low Power CMOS digital Design
IEEE journ. of solid state circuits, Vol. 22, N. 4, pp. 473-484
[5] P. E. Landman, J. M. Rabaey
Architectural power analysis: the dual bit method
IEEE trans. on VLSI systems, Vol. 3, No 2, Junev 1995, pp. 173-187
[6] A. Chandrakasan, M. Potkonjak, R. Mehra, J. Rabaey, R. Brodersen
Optimising Power Using Transformations
IEEE Trans. on CAD and Systems, Vol. 14, No 1, pp. 12-30, Jan. 95
[7] O. Sentieys, J.-Ph. Diguet, J. L. Philippe, E. Martin
Hardware Module Selection for Real Time Pipeline Architectures using
Probabilistic Estimation
9th Annual IEEE Inter. ASIC Conf. and Exhibit, 1996, pp.147-150
[8] Z. S. Shen, C. C. Jong
Exploring module selection space for architectural synthesis of low power designs
IEEE Int. Symp. on circuits and systems, Hong Kong, June 97, pp.1532-1535
[9] M. C. Johnson, K. roy
Scheduling and optimal voltage selection for Low-Power multi-volatge DSP
Datapaths
IEEE Int. Symp. on circuits and systems, Hong Kong, June 97, pp.2152-2155
[10] R. Burch, F. Najm, P. Yang, T. Trick
McPOWER : a monte carlo approach to power estimation
IEEE CAD 1992, pp. 90-97
[11] S. Gailhard, O. Ingremeau, N. Julien, J.-Ph. Diguet, E. Martin
Une méthode probabiliste pour estimer la consommation à un niveau
algorithmique
Colloque CAO, Villars de Lans, Jan. 97, pp. 124-127
15. 15
[12] J-Ph. Diguet, O. Sentieys, J. L. Philippe, E. Martin
Probabilistic resource Estimation for Pipeline Architecture
IEEE Workshop on VLSI S.P.,Sakai, Japan, Oct. 95
[13] S. Gailhard, O. Sentieys, N. Julien, E. Martin
Area/Time /Power Space Exploration in Module Selection for DSP High Level
Synthesis
Int. Workshop, PATMOS’97, Louvain la Neuve, Belgium, Sept. 97, pp.35-44
[14] A. Baganne, J.-L Philippe, E. Martin
Hardware Interface Design For Real Time Embedded Systems
7-th IEEE GLS-VLSI 97, Urbana Champain , Illinois, Mars 1997