This document summarizes a research paper that presents a methodology for cycle-accurate simulation of energy dissipation in embedded systems. The methodology tightly couples component models to enable accurate estimates of performance and energy consumption within 5% of hardware measurements. The simulator can be used to explore hardware design alternatives and estimate the impact of software changes. It also includes a profiler to relate energy consumption to source code, allowing quick identification and evaluation of energy-efficient software optimizations. The tools were tested on an industrial application called the SmartBadge, reducing energy consumption by 77% for an MP3 audio decoder example.
Reengineering including reverse & forward EngineeringMuhammad Chaudhry
The document discusses forward engineering and reverse engineering. Reverse engineering involves extracting design information from source code through activities like data and processing analysis. This helps understand the internal structure and functionality. Forward engineering applies modern principles to redevelop existing applications, extending their capabilities. It may involve migrating applications to new architectures like client-server or object-oriented models. Reverse engineering provides critical information for both maintenance and reengineering existing software.
Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley...IRJET Journal
This document reviews designs for low power multiply and accumulate (MAC) units. It summarizes several papers on MAC unit architectures that aim to improve speed and reduce power consumption. For 8-bit MAC units, designs using a Baugh-Wooley multiplier have increased delay but very low power compared to other techniques. For 16-bit MAC units, a proposed 2-cycle MAC architecture has less power and delay than other 2-cycle and 3-cycle MAC units. For 32-bit MAC units, designs using a Baugh-Wooley multiplier with a high performance multiplier tree exhibit comparable delay, less power dissipation, and smaller area than designs using a modified Booth multiplier. In general, incorporating a Baugh-Wooley multiplier
The document discusses optimizing facility efficiency in federal mission-critical environments. It recommends taking a long-term approach to planning by understanding organizational goals and bridging IT and facilities. Key steps include assessing existing facilities, selecting efficient equipment, right-sizing capacity, and establishing monitoring, maintenance, and benchmarking programs to ensure optimization over time. Regular maintenance is emphasized as critical for sustained efficiency gains and reliability.
A Brief Survey of Current Power Limiting StrategiesIRJET Journal
This document provides an overview of current power limiting strategies used in computing systems. It begins with background on power capping capabilities in Intel processors using Running Average Power Limit (RAPL). The document then reviews over 15 strategies that directly limit power consumption while minimizing performance impact, including feedback controllers, dynamic voltage and frequency scaling, task scheduling policies, and power budgeting techniques. It concludes that power limiting has become important for maximizing exascale system performance within power constraints.
This document discusses energy efficient computing architectures. It examines whether architectural features that improve performance also improve energy efficiency. Cache parameters like size affect both performance and energy consumption. Reduced Instruction Set Computing (RISC) architectures can improve performance and efficiency by making instructions a static size. Register file architectures decrease memory traffic and improve efficiency by holding small registers in cache. Wireless sensor networks aim to have long battery life through efficient hardware and software designs like turning radios off during idle periods using protocols like T-MAC.
A survey on dynamic energy management at virtualization level in cloud data c...csandit
Data centers have become indispensable infrastructure for data storage and facilitating the
development of diversified network services and applications offered by the cloud. Rapid
development of these applications and services imposes various resource demands that results
in increased energy consumption. This necessitates the development of efficient energy
management techniques in data center not only for operational cost but also to reduce the
amount of heat released from storage devices. Virtualization is a powerful tool for energy
management that achieves efficient utilization of data center resources. Though, energy
management at data centers can be static or dynamic, virtualization level energy management
techniques contributes more energy conservation than hardware level. This paper surveys
various issues related to dynamic energy management at virtualization level in cloud data
centers.
Design and Implementation of Low Power DSP Core with Programmable Truncated V...ijsrd.com
The programmable truncated Vedic multiplication is the method which uses Vedic multiplier and programmable truncation control bits and which reduces part of the area and power required by multipliers by only computing the most-significant bits of the product. The basic process of truncation includes physical reduction of the partial product matrix and a compensation for the reduced bits via different hardware compensation sub circuits. These results in fixed systems optimized for a given application at design time. A novel approach to truncation is proposed, where a full precision vedic multiplier is implemented, but the active section of the truncation is selected by truncation control bits dynamically at run-time. Such architecture brings together the power reduction benefits from truncated multipliers and the flexibility of reconfigurable and general purpose devices. Efficient implementation of such a multiplier is presented in a custom digital signal processor where the concept of software compensation is introduced and analyzed for different applications. Experimental results and power measurements are studied, including power measurements from both post-synthesis simulations and a fabricated IC implementation. This is the first system-level DSP core using a high speed Vedic truncated multiplier. Results demonstrate the effectiveness of the programmable truncated MAC (PTMAC) in achieving power reduction, with minimum impact on functionality for a number of applications. On comparison with the previous parallel multipliers Vedic should be much more fast and area should be reduced. Programmable truncated Vedic multiplier (PTVM) should be the basic part implemented for the arithmetic and PTMAC units.
This document summarizes an approach for designing system-on-chip (SOC) devices. It describes analyzing initial designs to meet key requirements, then systematically optimizing designs by addressing issues related to memory, interconnect, processors, and customization. It provides an example application study of analyzing and optimizing AES encryption designs to meet various throughput requirements for applications like Wi-Fi. Potential optimizations explored include modifying cache size and exploring parallel pipelined architectures. The discussion aims to illustrate SOC design techniques using a simplified example.
Reengineering including reverse & forward EngineeringMuhammad Chaudhry
The document discusses forward engineering and reverse engineering. Reverse engineering involves extracting design information from source code through activities like data and processing analysis. This helps understand the internal structure and functionality. Forward engineering applies modern principles to redevelop existing applications, extending their capabilities. It may involve migrating applications to new architectures like client-server or object-oriented models. Reverse engineering provides critical information for both maintenance and reengineering existing software.
Review On Design Of Low Power Multiply And Accumulate Unit Using Baugh-Wooley...IRJET Journal
This document reviews designs for low power multiply and accumulate (MAC) units. It summarizes several papers on MAC unit architectures that aim to improve speed and reduce power consumption. For 8-bit MAC units, designs using a Baugh-Wooley multiplier have increased delay but very low power compared to other techniques. For 16-bit MAC units, a proposed 2-cycle MAC architecture has less power and delay than other 2-cycle and 3-cycle MAC units. For 32-bit MAC units, designs using a Baugh-Wooley multiplier with a high performance multiplier tree exhibit comparable delay, less power dissipation, and smaller area than designs using a modified Booth multiplier. In general, incorporating a Baugh-Wooley multiplier
The document discusses optimizing facility efficiency in federal mission-critical environments. It recommends taking a long-term approach to planning by understanding organizational goals and bridging IT and facilities. Key steps include assessing existing facilities, selecting efficient equipment, right-sizing capacity, and establishing monitoring, maintenance, and benchmarking programs to ensure optimization over time. Regular maintenance is emphasized as critical for sustained efficiency gains and reliability.
A Brief Survey of Current Power Limiting StrategiesIRJET Journal
This document provides an overview of current power limiting strategies used in computing systems. It begins with background on power capping capabilities in Intel processors using Running Average Power Limit (RAPL). The document then reviews over 15 strategies that directly limit power consumption while minimizing performance impact, including feedback controllers, dynamic voltage and frequency scaling, task scheduling policies, and power budgeting techniques. It concludes that power limiting has become important for maximizing exascale system performance within power constraints.
This document discusses energy efficient computing architectures. It examines whether architectural features that improve performance also improve energy efficiency. Cache parameters like size affect both performance and energy consumption. Reduced Instruction Set Computing (RISC) architectures can improve performance and efficiency by making instructions a static size. Register file architectures decrease memory traffic and improve efficiency by holding small registers in cache. Wireless sensor networks aim to have long battery life through efficient hardware and software designs like turning radios off during idle periods using protocols like T-MAC.
A survey on dynamic energy management at virtualization level in cloud data c...csandit
Data centers have become indispensable infrastructure for data storage and facilitating the
development of diversified network services and applications offered by the cloud. Rapid
development of these applications and services imposes various resource demands that results
in increased energy consumption. This necessitates the development of efficient energy
management techniques in data center not only for operational cost but also to reduce the
amount of heat released from storage devices. Virtualization is a powerful tool for energy
management that achieves efficient utilization of data center resources. Though, energy
management at data centers can be static or dynamic, virtualization level energy management
techniques contributes more energy conservation than hardware level. This paper surveys
various issues related to dynamic energy management at virtualization level in cloud data
centers.
Design and Implementation of Low Power DSP Core with Programmable Truncated V...ijsrd.com
The programmable truncated Vedic multiplication is the method which uses Vedic multiplier and programmable truncation control bits and which reduces part of the area and power required by multipliers by only computing the most-significant bits of the product. The basic process of truncation includes physical reduction of the partial product matrix and a compensation for the reduced bits via different hardware compensation sub circuits. These results in fixed systems optimized for a given application at design time. A novel approach to truncation is proposed, where a full precision vedic multiplier is implemented, but the active section of the truncation is selected by truncation control bits dynamically at run-time. Such architecture brings together the power reduction benefits from truncated multipliers and the flexibility of reconfigurable and general purpose devices. Efficient implementation of such a multiplier is presented in a custom digital signal processor where the concept of software compensation is introduced and analyzed for different applications. Experimental results and power measurements are studied, including power measurements from both post-synthesis simulations and a fabricated IC implementation. This is the first system-level DSP core using a high speed Vedic truncated multiplier. Results demonstrate the effectiveness of the programmable truncated MAC (PTMAC) in achieving power reduction, with minimum impact on functionality for a number of applications. On comparison with the previous parallel multipliers Vedic should be much more fast and area should be reduced. Programmable truncated Vedic multiplier (PTVM) should be the basic part implemented for the arithmetic and PTMAC units.
This document summarizes an approach for designing system-on-chip (SOC) devices. It describes analyzing initial designs to meet key requirements, then systematically optimizing designs by addressing issues related to memory, interconnect, processors, and customization. It provides an example application study of analyzing and optimizing AES encryption designs to meet various throughput requirements for applications like Wi-Fi. Potential optimizations explored include modifying cache size and exploring parallel pipelined architectures. The discussion aims to illustrate SOC design techniques using a simplified example.
The combination of Cisco's UCS Manager and Schneider Electric's DCIM solution provides Cisco UCS customers with an opportunity to bridge the gap between IT and Facilities, offer transparency across the two silos, and positively impact the rest of the organization – both in terms of efficiency, uptime and capex/opex costs. This is achieved by optimising the existing physical infrastructure capacities, thus reducing overprovisioning and improving the balance between supply and demand, resulting in continual availability and optimal energy efficiency.
Adapting Software Components Dynamically for the GridIOSR Journals
This document summarizes a research paper on dynamically adapting software components for grid computing. It presents a framework for dynamic adaptation that incorporates a layered architecture with concepts of dynamicity. The framework separates adaptation mechanisms from component content. It defines four steps for adaptation: observe the execution environment, decide if adaptation is needed, plan adaptation actions, and execute planned actions. The framework uses three levels - a functional level for component services, a component-independent level for adaptation mechanisms, and a component-specific level for developer customization. It evaluates using this framework to design dynamically adaptable scientific application components.
Schneider Electric\’s PowerLogic is the worlkd's largest and most advanced range of software and meters for helping organizations understand and control energy-related cost, risk and opportunity>
Electricity usage costs have become an increasing fraction of the total cost of ownership (TCO) for data centers. It is possible to dramatically reduce the electrical consumption of typical data centers through appropriate design of the data center physical infrastructure and through the design of the IT architecture. This paper explains how to quantify the electricity savings and provides examples of methods that can greatly reduce electrical power consumption.
Data center power and cooling infrastructure worldwide wastes more than 60,000,000 megawatt-hours per year of electricity that does no useful work powering IT equipment. This represents an enormous financial burden on industry, and is a significant public policy environmental issue. This paper describes the principles of a new, commercially available data center
architecture that can be implemented today to dramatically improve the electrical efficiency of data centers.
AN EFFICIENT DSP ARCHITECTURE DESIGN IN FPGA USING LOOP BACK ALGORITHMIJARIDEA Journal
Abstract— Advanced flag processors assume a noteworthy part in electronic gadgets, bio restorative applications, correspondence conventions. Effective IC configuration is a key variable to accomplish low power and high throughput IP center improvement for convenient gadgets. Computerized flag processors assume a huge parts progressively figuring and preparing yet region overhead and power utilization are real disadvantages to accomplish productive outline requirements. Adaptable DSP engineering utilizing circle back calculation is a proposed way to deal with beat existing outline limitations. For instance, plan of 8 point FFT engineering requires 3 phases for butterfly calculation unit that 48 adders and 12 multipliers prompts high power and territory utilization. To lessen range and power, Loop back calculation is proposed and it requires 16 adders and 4 multipliers for general outline. Likewise outline of various DSP layouts like FFT, First request FIR channel and Second request FIR channel is acquainted and mapping in with the design as preparing component and applying the circle back calculation. In the outline of FFT,FIR formats adders, for example, parallel prefix viper, move and include multiplier, baugh-wooley multiplier are utilized to break down effective plan of DSP engineering. Recreation and examination of inactivity, territory, control productivity with the current structures are happens utilizing model sim 6.4a and combine utilizing Xilinx 14.3 ISE.
Keywords— Baugh-UWooley Multiplier, Loop Back Algorithm, Parallel Prefix Adders.
Performance evaluation of a multi-core system using Systems development meth...Yoshifumi Sakamoto
I propose to apply Systems development method utilizing Reverse modeling and Model-based Simulation(SRMS), in order to solve the issues of derivational development. Derivational development is a development method for developing derived products based on products already been released. Many organizations developing embedded system have been applying derivational development to improve development efficiency, time-to-market, and product quality.
However, applying derivational development to large-scale embedded systems in a traditional way to performanec requirements of higher functionality is of high risk. In addition, as requirements of embedded systems for dependability are extremely high, the dependability of development process and safety of the product must be proven by evidence. Therefore, there is a possibility to significantly inhibit the evidence of dependability on embedded systems by continuing derivational development over many years. We apply SRMS in order to solve these issues.
In this presentation, I propose a methodology to solve issues of derivational development by applying SRMS to an SoC Equipped Multi Function Peripheral/Printer. Performance evaluation and energy consumption estimation of SoC are performed in the early stages of the SoC development. Estimating the performace and the energy consumption by a high level of abstraction model-based simulation. Because at the early stages of SoC development, modeling with a high level of abstraction is required for both the system architecture of the SoC and the behavior of the System.
The proposed method is applied for an actual embedded system in an MFP - Multi Function Peripheral/Printer. Proposed method SRMS is promising to solve the issues of derivational developmnrt in the embedded system development.
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...IDES Editor
Cloud computing is an emerging computing
paradigm. It aims to share data, resources and services
transparently among users of a massive grid. Although the
industry has started selling cloud-computing products,
research challenges in various areas, such as architectural
design, task decomposition, task distribution, load
distribution, load scheduling, task coordination, etc. are still
unclear. Therefore, we study the methods to reason and model
cloud computing as a step towards identifying fundamental
research questions in this paradigm. In this paper, we propose
a model for load distribution on cloud computing by modeling
them as cognitive systems and using aspects which not only
depend on the present state of the system, but also, on a set of
predefined transitions and conditions. The entirety of this
model is then bundled to cater the task of job distribution
using the concept of application metadata. Later, we draw a
qualitative and simulation based summarization for the
proposed model. We finally evaluate the results and draw up
a series of key conclusions in cloud computing for future
exploration.
Trends in HPC Power Metrics and where to from here Ramkumar Nagappan Intel FinalRamkumar Nagappan
This document discusses trends in high performance computing (HPC) power metrics and where the field is headed. It covers current metrics like PUE, ITUE, and TUE that are used to measure data center efficiency. It notes that while PUE has been effective, newer metrics provide more granular information. The document also summarizes trends in CPU/system energy efficiency, noting significant improvements. It identifies challenges in accurately measuring energy efficiency, and outlines upcoming usage models involving power caps and ramp controls. Overall, the document analyzes the current state of HPC power/energy metrics and suggests the field may need new metrics to further improve energy efficiency.
[Case study] Dakota Electric Association: Solutions to streamline GIS, design...Schneider Electric
Applications:
Integration of GIS-based processes makes existing circuits and proposed circuits
available in the same system so operations staff can work in parallel with the designers
rather than in succession.
Customer benefits
• Model, design and manage critical infrastructure
• Highly configurable
• Easily adapted for multiple uses
• Proactively identify needed repairs and replacements well in advance
Optimized Design of an Alu Block Using Power Gating TechniqueIJERA Editor
Power is the limiting factor in traditional CMOS scaling and must be dealt with aggressively. With the scaling
of technology and the need for high performance and more functionality, power dissipation becomes a major
bottleneck for a system design. Power gating of functional units has been proved to be an effective technique to
reduce power consumption. This paper describe about to design of an ALU block with sleep mode to reduce the
power consumption of the circuit. Local sleep transistors are used to achieve sleep mode. During sleep mode
one functional unit is working and another functional unit is in idle state. i.e., it disconnects the idle logic
blocks from the power supply. Architecture and functionality of the ALU implemented on FPGA and is tested
using DSCH tool. Power analysis is carried out using MICROWIND tool.
The document discusses challenges and business success related to software reuse. It outlines topics like reuse challenges, technologies, economics, case studies and empirical investigations. Regarding challenges, it notes organizational, technical, domain engineering, and economic aspects. For technologies, it discusses software analysis/visualization, product lines, and architectures. It also examines cost/benefit relationships, metrics, and legal issues regarding reuse. Case studies from HP and Ericsson demonstrate quality, productivity and economic benefits of large-scale reuse programs. Strategies for successful reuse include formal reuse programs with quality control and continuous improvement.
This document summarizes trends in arithmetic logic unit (ALU) and floating point architecture from 2001 to 2016. Key metrics like execution time and power consumption are shown to decrease over this period. The document discusses these trends, relating metrics like node length, power/energy consumption, and data bus width to the year. Research from each year is examined, showing improvements like reduced transistor size/count, lower voltages, and new technologies that increased efficiency and performance of ALUs over the 15 year period.
System on Chip Based RTC in Power ElectronicsjournalBEEI
Current control systems and emulation systems (Hardware-in-the-Loop, HIL or Processor-in-the-Loop, PIL) for high-end power-electronic applications often consist of numerous components and interlinking busses: a micro controller for communication and high level control, a DSP for real-time control, an FPGA section for fast parallel actions and data acquisition, multiport RAM structures or bus systems as interconnecting structure. System-on-Chip (SoC) combines many of these functions on a single die. This gives the advantage of space reduction combined with cost reduction and very fast internal communication. Such systems become very relevant for research and also for industrial applications. The SoC used here as an example combines a Dual-Core ARM 9 hard processor system (HPS) and an FPGA, including fast interlinks between these components. SoC systems require careful software and firmware concepts to provide real-time control and emulation capability. This paper demonstrates an optimal way to use the resources of the SoC and discusses challenges caused by the internal structure of SoC. The key idea is to use asymmetric multiprocessing: One core uses a bare-metal operating system for hard real time. The other core runs a “real-time” Linux for service functions and communication. The FPGA is used for flexible process-oriented interfaces (A/D, D/A, switching signals), quasi-hard-wired protection and the precise timing of the real-time control cycle. This way of implementation is generally known and sometimes even suggested–but to the knowledge of the author’s seldomly implemented and documented in the context of demanding real-time control or emulation. The paper details the way of implementation, including process interfaces, and discusses the advantages and disadvantages of the chosen concept. Measurement results demonstrate the properties of the solution.
This document discusses the benefits of modular data center design and proposes a common language for specifying modular architectures. It outlines desired characteristics like flexibility, scalability, and cost effectiveness. Standardization and scalability are key goals of modularity. The document defines modular architecture and discusses how modularity can be implemented at different levels and through different linkages between subsystems. Benefits include lower costs, simpler expansion, and pre-characterized performance. Challenges include vendor lock-in and site compatibility. Future trends may include increased packaging of subsystems and the emergence of industry standards.
32 bit×32 bit multiprecision razor based dynamicMastan Masthan
This document summarizes a research paper that presents a reconfigurable multiplier circuit that can dynamically adjust its precision, voltage, and frequency to minimize power consumption based on workload. It incorporates multiple smaller precision multipliers that can operate independently or in parallel. Razor flip-flops and dynamic voltage scaling are used to aggressively lower the voltage while ensuring correctness. Experimental results showed the design achieved up to 86.3% power reduction with only 11.1% area overhead compared to a fixed-width multiplier.
A verilog based simulation methodology for estimating statistical test for th...ijsrd.com
The low Power estimation is an important aspect in digital VLSI circuit design. The estimation includes a power dissipation of a circuit and hence this to be reduces. The power estimations are specific to a particular component of power. The process of optimization of circuits for low power, user should know the effects of design techniques on each component. There are different power dissipation methods for reduction in power component. In this paper, estimating the power like short circuit and the total power, power reduction technique and the application of different proposed technique has been presented here. Hence, it is necessary to provide the information about the effect on each of these components.
Investigation of the challenges in establishing plug and play low voltage dc ...PromiseBeshel
A research proposal to improve the stability, efficiency, and reliability problems of low voltage DC microgrids from a communication control strategy point of view.
A Machine Learning Toolkit for Power Systems Security Analysisbutest
This document describes a machine learning toolkit called UMLPSE that has been developed to facilitate power systems security analysis and contingency analysis. UMLPSE has a modular structure combining independent power flow and machine learning tool packages. It incorporates a data warehouse to store operating point data and contingency analysis results. The toolkit is applied to assess contingencies on the power grid of Crete, Greece involving 61 buses and 78 lines. It uses various operating points simulated through different connectivity, load, and generation scenarios to train machine learning models to automatically rank contingencies by risk for new operating points.
This document presents information about a PIC microcontroller-based educational trainer being developed by Engr. Muhammad Mujtaba Asad at Universiti Tun Hussein Onn Malaysia. The trainer aims to integrate several embedded system modules onto a single board to provide students hands-on experience with programming PIC microcontrollers and implementing applications. It seeks to address limitations in current technical education which is more theoretical than practical. The trainer will support teaching and learning of microprocessor and embedded system design courses affordably.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document summarizes a research paper that proposes a synthesizable AMBA AXI protocol checker to verify communication properties in a system-on-chip (SoC) design. It contains 44 rules to check the AMBA AXI protocol and provides verification of the AXI master, slave, and default slave protocols. The protocol checker uses a rule-based methodology and model simulation to thoroughly verify chip-level behaviors and help debug issues, improving design quality and reducing verification time and costs.
The combination of Cisco's UCS Manager and Schneider Electric's DCIM solution provides Cisco UCS customers with an opportunity to bridge the gap between IT and Facilities, offer transparency across the two silos, and positively impact the rest of the organization – both in terms of efficiency, uptime and capex/opex costs. This is achieved by optimising the existing physical infrastructure capacities, thus reducing overprovisioning and improving the balance between supply and demand, resulting in continual availability and optimal energy efficiency.
Adapting Software Components Dynamically for the GridIOSR Journals
This document summarizes a research paper on dynamically adapting software components for grid computing. It presents a framework for dynamic adaptation that incorporates a layered architecture with concepts of dynamicity. The framework separates adaptation mechanisms from component content. It defines four steps for adaptation: observe the execution environment, decide if adaptation is needed, plan adaptation actions, and execute planned actions. The framework uses three levels - a functional level for component services, a component-independent level for adaptation mechanisms, and a component-specific level for developer customization. It evaluates using this framework to design dynamically adaptable scientific application components.
Schneider Electric\’s PowerLogic is the worlkd's largest and most advanced range of software and meters for helping organizations understand and control energy-related cost, risk and opportunity>
Electricity usage costs have become an increasing fraction of the total cost of ownership (TCO) for data centers. It is possible to dramatically reduce the electrical consumption of typical data centers through appropriate design of the data center physical infrastructure and through the design of the IT architecture. This paper explains how to quantify the electricity savings and provides examples of methods that can greatly reduce electrical power consumption.
Data center power and cooling infrastructure worldwide wastes more than 60,000,000 megawatt-hours per year of electricity that does no useful work powering IT equipment. This represents an enormous financial burden on industry, and is a significant public policy environmental issue. This paper describes the principles of a new, commercially available data center
architecture that can be implemented today to dramatically improve the electrical efficiency of data centers.
AN EFFICIENT DSP ARCHITECTURE DESIGN IN FPGA USING LOOP BACK ALGORITHMIJARIDEA Journal
Abstract— Advanced flag processors assume a noteworthy part in electronic gadgets, bio restorative applications, correspondence conventions. Effective IC configuration is a key variable to accomplish low power and high throughput IP center improvement for convenient gadgets. Computerized flag processors assume a huge parts progressively figuring and preparing yet region overhead and power utilization are real disadvantages to accomplish productive outline requirements. Adaptable DSP engineering utilizing circle back calculation is a proposed way to deal with beat existing outline limitations. For instance, plan of 8 point FFT engineering requires 3 phases for butterfly calculation unit that 48 adders and 12 multipliers prompts high power and territory utilization. To lessen range and power, Loop back calculation is proposed and it requires 16 adders and 4 multipliers for general outline. Likewise outline of various DSP layouts like FFT, First request FIR channel and Second request FIR channel is acquainted and mapping in with the design as preparing component and applying the circle back calculation. In the outline of FFT,FIR formats adders, for example, parallel prefix viper, move and include multiplier, baugh-wooley multiplier are utilized to break down effective plan of DSP engineering. Recreation and examination of inactivity, territory, control productivity with the current structures are happens utilizing model sim 6.4a and combine utilizing Xilinx 14.3 ISE.
Keywords— Baugh-UWooley Multiplier, Loop Back Algorithm, Parallel Prefix Adders.
Performance evaluation of a multi-core system using Systems development meth...Yoshifumi Sakamoto
I propose to apply Systems development method utilizing Reverse modeling and Model-based Simulation(SRMS), in order to solve the issues of derivational development. Derivational development is a development method for developing derived products based on products already been released. Many organizations developing embedded system have been applying derivational development to improve development efficiency, time-to-market, and product quality.
However, applying derivational development to large-scale embedded systems in a traditional way to performanec requirements of higher functionality is of high risk. In addition, as requirements of embedded systems for dependability are extremely high, the dependability of development process and safety of the product must be proven by evidence. Therefore, there is a possibility to significantly inhibit the evidence of dependability on embedded systems by continuing derivational development over many years. We apply SRMS in order to solve these issues.
In this presentation, I propose a methodology to solve issues of derivational development by applying SRMS to an SoC Equipped Multi Function Peripheral/Printer. Performance evaluation and energy consumption estimation of SoC are performed in the early stages of the SoC development. Estimating the performace and the energy consumption by a high level of abstraction model-based simulation. Because at the early stages of SoC development, modeling with a high level of abstraction is required for both the system architecture of the SoC and the behavior of the System.
The proposed method is applied for an actual embedded system in an MFP - Multi Function Peripheral/Printer. Proposed method SRMS is promising to solve the issues of derivational developmnrt in the embedded system development.
Multilevel Hybrid Cognitive Load Balancing Algorithm for Private/Public Cloud...IDES Editor
Cloud computing is an emerging computing
paradigm. It aims to share data, resources and services
transparently among users of a massive grid. Although the
industry has started selling cloud-computing products,
research challenges in various areas, such as architectural
design, task decomposition, task distribution, load
distribution, load scheduling, task coordination, etc. are still
unclear. Therefore, we study the methods to reason and model
cloud computing as a step towards identifying fundamental
research questions in this paradigm. In this paper, we propose
a model for load distribution on cloud computing by modeling
them as cognitive systems and using aspects which not only
depend on the present state of the system, but also, on a set of
predefined transitions and conditions. The entirety of this
model is then bundled to cater the task of job distribution
using the concept of application metadata. Later, we draw a
qualitative and simulation based summarization for the
proposed model. We finally evaluate the results and draw up
a series of key conclusions in cloud computing for future
exploration.
Trends in HPC Power Metrics and where to from here Ramkumar Nagappan Intel FinalRamkumar Nagappan
This document discusses trends in high performance computing (HPC) power metrics and where the field is headed. It covers current metrics like PUE, ITUE, and TUE that are used to measure data center efficiency. It notes that while PUE has been effective, newer metrics provide more granular information. The document also summarizes trends in CPU/system energy efficiency, noting significant improvements. It identifies challenges in accurately measuring energy efficiency, and outlines upcoming usage models involving power caps and ramp controls. Overall, the document analyzes the current state of HPC power/energy metrics and suggests the field may need new metrics to further improve energy efficiency.
[Case study] Dakota Electric Association: Solutions to streamline GIS, design...Schneider Electric
Applications:
Integration of GIS-based processes makes existing circuits and proposed circuits
available in the same system so operations staff can work in parallel with the designers
rather than in succession.
Customer benefits
• Model, design and manage critical infrastructure
• Highly configurable
• Easily adapted for multiple uses
• Proactively identify needed repairs and replacements well in advance
Optimized Design of an Alu Block Using Power Gating TechniqueIJERA Editor
Power is the limiting factor in traditional CMOS scaling and must be dealt with aggressively. With the scaling
of technology and the need for high performance and more functionality, power dissipation becomes a major
bottleneck for a system design. Power gating of functional units has been proved to be an effective technique to
reduce power consumption. This paper describe about to design of an ALU block with sleep mode to reduce the
power consumption of the circuit. Local sleep transistors are used to achieve sleep mode. During sleep mode
one functional unit is working and another functional unit is in idle state. i.e., it disconnects the idle logic
blocks from the power supply. Architecture and functionality of the ALU implemented on FPGA and is tested
using DSCH tool. Power analysis is carried out using MICROWIND tool.
The document discusses challenges and business success related to software reuse. It outlines topics like reuse challenges, technologies, economics, case studies and empirical investigations. Regarding challenges, it notes organizational, technical, domain engineering, and economic aspects. For technologies, it discusses software analysis/visualization, product lines, and architectures. It also examines cost/benefit relationships, metrics, and legal issues regarding reuse. Case studies from HP and Ericsson demonstrate quality, productivity and economic benefits of large-scale reuse programs. Strategies for successful reuse include formal reuse programs with quality control and continuous improvement.
This document summarizes trends in arithmetic logic unit (ALU) and floating point architecture from 2001 to 2016. Key metrics like execution time and power consumption are shown to decrease over this period. The document discusses these trends, relating metrics like node length, power/energy consumption, and data bus width to the year. Research from each year is examined, showing improvements like reduced transistor size/count, lower voltages, and new technologies that increased efficiency and performance of ALUs over the 15 year period.
System on Chip Based RTC in Power ElectronicsjournalBEEI
Current control systems and emulation systems (Hardware-in-the-Loop, HIL or Processor-in-the-Loop, PIL) for high-end power-electronic applications often consist of numerous components and interlinking busses: a micro controller for communication and high level control, a DSP for real-time control, an FPGA section for fast parallel actions and data acquisition, multiport RAM structures or bus systems as interconnecting structure. System-on-Chip (SoC) combines many of these functions on a single die. This gives the advantage of space reduction combined with cost reduction and very fast internal communication. Such systems become very relevant for research and also for industrial applications. The SoC used here as an example combines a Dual-Core ARM 9 hard processor system (HPS) and an FPGA, including fast interlinks between these components. SoC systems require careful software and firmware concepts to provide real-time control and emulation capability. This paper demonstrates an optimal way to use the resources of the SoC and discusses challenges caused by the internal structure of SoC. The key idea is to use asymmetric multiprocessing: One core uses a bare-metal operating system for hard real time. The other core runs a “real-time” Linux for service functions and communication. The FPGA is used for flexible process-oriented interfaces (A/D, D/A, switching signals), quasi-hard-wired protection and the precise timing of the real-time control cycle. This way of implementation is generally known and sometimes even suggested–but to the knowledge of the author’s seldomly implemented and documented in the context of demanding real-time control or emulation. The paper details the way of implementation, including process interfaces, and discusses the advantages and disadvantages of the chosen concept. Measurement results demonstrate the properties of the solution.
This document discusses the benefits of modular data center design and proposes a common language for specifying modular architectures. It outlines desired characteristics like flexibility, scalability, and cost effectiveness. Standardization and scalability are key goals of modularity. The document defines modular architecture and discusses how modularity can be implemented at different levels and through different linkages between subsystems. Benefits include lower costs, simpler expansion, and pre-characterized performance. Challenges include vendor lock-in and site compatibility. Future trends may include increased packaging of subsystems and the emergence of industry standards.
32 bit×32 bit multiprecision razor based dynamicMastan Masthan
This document summarizes a research paper that presents a reconfigurable multiplier circuit that can dynamically adjust its precision, voltage, and frequency to minimize power consumption based on workload. It incorporates multiple smaller precision multipliers that can operate independently or in parallel. Razor flip-flops and dynamic voltage scaling are used to aggressively lower the voltage while ensuring correctness. Experimental results showed the design achieved up to 86.3% power reduction with only 11.1% area overhead compared to a fixed-width multiplier.
A verilog based simulation methodology for estimating statistical test for th...ijsrd.com
The low Power estimation is an important aspect in digital VLSI circuit design. The estimation includes a power dissipation of a circuit and hence this to be reduces. The power estimations are specific to a particular component of power. The process of optimization of circuits for low power, user should know the effects of design techniques on each component. There are different power dissipation methods for reduction in power component. In this paper, estimating the power like short circuit and the total power, power reduction technique and the application of different proposed technique has been presented here. Hence, it is necessary to provide the information about the effect on each of these components.
Investigation of the challenges in establishing plug and play low voltage dc ...PromiseBeshel
A research proposal to improve the stability, efficiency, and reliability problems of low voltage DC microgrids from a communication control strategy point of view.
A Machine Learning Toolkit for Power Systems Security Analysisbutest
This document describes a machine learning toolkit called UMLPSE that has been developed to facilitate power systems security analysis and contingency analysis. UMLPSE has a modular structure combining independent power flow and machine learning tool packages. It incorporates a data warehouse to store operating point data and contingency analysis results. The toolkit is applied to assess contingencies on the power grid of Crete, Greece involving 61 buses and 78 lines. It uses various operating points simulated through different connectivity, load, and generation scenarios to train machine learning models to automatically rank contingencies by risk for new operating points.
This document presents information about a PIC microcontroller-based educational trainer being developed by Engr. Muhammad Mujtaba Asad at Universiti Tun Hussein Onn Malaysia. The trainer aims to integrate several embedded system modules onto a single board to provide students hands-on experience with programming PIC microcontrollers and implementing applications. It seeks to address limitations in current technical education which is more theoretical than practical. The trainer will support teaching and learning of microprocessor and embedded system design courses affordably.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
This document summarizes a research paper that proposes a synthesizable AMBA AXI protocol checker to verify communication properties in a system-on-chip (SoC) design. It contains 44 rules to check the AMBA AXI protocol and provides verification of the AXI master, slave, and default slave protocols. The protocol checker uses a rule-based methodology and model simulation to thoroughly verify chip-level behaviors and help debug issues, improving design quality and reducing verification time and costs.
This document analyzes the impact of introducing an alternative sluice method of irrigation in 1999 on groundwater levels in the Parambikulam Aliyar Project area in Tamil Nadu, India. It uses groundwater level data from 17 observation wells from 1971-2010 and rainfall data from 28 stations. The results show that groundwater levels increased in 9 of the wells after 1999, decreased in 3 wells, and remained unchanged in 1 well. Most increases were seen in the Valayar and Aliyar sub-basins, where rainfall also increased somewhat over the same period. The study aims to assess whether the alternative irrigation method has positively impacted groundwater recharge in parts of the project area.
This document proposes and evaluates a classification system for automatically classifying apples based on images using a nearest neighbor classifier. The system divides apple images into windows, extracts statistical features from each window, eliminates unnecessary windows, and classifies the apple as defected or non-defected based on the features. It aims to discriminate stem ends and calyxes, which are natural parts of apples, from defects. The results show the proposed window-based approach is effective and more efficient than existing techniques at classifying apples and distinguishing stem ends/calyxes from defects.
1) The document presents a half bridge converter topology for battery charging applications.
2) A single stage half bridge converter is proposed that provides DC voltage regulation and power factor correction with only one controller.
3) The converter operation depends on whether the inductor is operating in discontinuous or continuous conduction mode, with discontinuous mode avoiding high voltage stress at light loads.
This document summarizes a study that investigated modifying polystyrene (PS) with nanoclay to improve its mechanical properties. PS nanocomposites were prepared with different types of nanoclays using in-situ polymerization. The mechanical properties were highest with 2% vinyl clay, due to better interaction between vinyl groups on the clay and styrene monomer. A central composite design was used to optimize the clay and latex content. The model equations derived showed good fitting to experimental data. Scanning electron microscopy images showed the nanoclay was well dispersed improving mechanical properties. Thermogravimetric analysis indicated the composites had increased thermal stability compared to pure PS.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
The document proposes a method for image enhancement through noise suppression using a Nonlinear Parameterized Adaptive Recursive (PAR) model in the spatial domain. The PAR model uses an intentional median filter that performs filtering only on noisy pixels, adaptively varying the window size and number of iterations. Experimental results on images corrupted with salt and pepper noise show the PAR model achieves better noise suppression than traditional, recursive, and adaptive median filters as measured by higher peak signal-to-noise ratios and shorter computational times. The PAR model is thus useful for interactive image processing by providing a family of possible denoised images.
This document discusses using a fuzzy logic system to maximize spectrum access in cognitive radio. It proposes controlling spectrum access through a fuzzy logic system to minimize call blocking and interference. The fuzzy logic system will use three descriptors as inputs: the secondary user's spectrum utilization efficiency, degree of mobility, and distance to the primary user. The spectrum chosen for access will be based on the maximum possibility of better utilization while avoiding interference to primary users.
The document describes a floral shaped patch antenna designed to operate at multiple frequency bands through the introduction of four slits. This provides a bandwidth of 4.04%. To further increase the bandwidth, four parasitic T-shaped patches are directly coupled to the floral patch. This creates three resonant modes close together, increasing the bandwidth to 8.582%. The combined effect of slits and parasitic elements provides a simple method for a low profile, broadband antenna operating from 3.68GHz to 4.01GHz.
This document reviews research on using artificial intelligence to predict laser cutting processes. It discusses how laser cutting works and the main types of lasers used. It also provides an overview of artificial intelligence and its applications. The literature survey summarizes several studies that used artificial intelligence methods like neural networks to model and optimize laser cutting quality factors based on processing parameters. The conclusion is that laser power and cutting speed are important for quality, and artificial intelligence can help determine the best parameter combinations.
This document describes a heart prescreening device developed using a low-power MSP430 microcontroller. The device aims to provide cardiac screening in rural areas by non-invasively analyzing heart sounds. It uses signal processing techniques to extract and classify heart sounds into normal or abnormal categories. The system architecture incorporates adaptive denoising algorithms and physiological considerations for analysis. A prototype was developed using a Texas Instruments MSP430 microcontroller due to its low-power capabilities. Preliminary results found the device could help determine cardiac conditions and be used by paramedics for rural healthcare screening.
This document summarizes a research paper that analyzes malicious data attacks in underwater sensor networks. It first defines underwater sensor networks and discusses their key challenges, including limited bandwidth, impaired acoustic channels, high propagation delays, and limited energy. It then introduces the vector-based forwarding routing protocol that is energy-efficient and handles node mobility. Finally, it discusses security issues for underwater sensor networks including denial of service attacks, which can be launched at different network layers to eliminate the network's capacity. The document analyzes how a denial of service attack introduces malicious nodes that hack sensor node data.
This document summarizes an investigation into the load carrying capacity of hollow concrete block masonry walls. It begins with an introduction to the use and advantages of hollow concrete blocks. It then describes the experimental program conducted, which included compressing individually hollow concrete blocks to determine their strength and constructing walls of different mortar mixes (1:3, 1:4, 1:5) to test their load capacities and crack patterns. The results showed that walls constructed with a 1:3 mortar mix had the highest load capacities. In conclusion, hollow concrete blocks provide economical construction while still providing adequate strength for load-bearing walls.
The document discusses the potential future effects of nanomedicine on human generations. It describes how nanomedicine uses engineered nanodevices and knowledge of the human body to diagnose, treat, and prevent disease. Specifically, it outlines how nanodiamonds coated with drugs and proteins can target and destroy cancer cells without harming normal cells. The document also discusses how nanotubes may be used to deliver cancer drugs directly to diseased cells. While nanomedicine holds promise to revolutionize disease treatment, the document notes potential health risks from nanoparticle exposure that require further study.
The document introduces the concept of a restrained triple connected dominating set and restrained triple connected domination number (γrtc) of a graph. A restrained triple connected dominating set is a restrained dominating set where the induced subgraph is triple connected. γrtc is defined as the minimum cardinality of a restrained triple connected dominating set. Bounds on γrtc are provided for general graphs. Exact values of γrtc are given for certain standard graphs like cycles, complete graphs, and complete bipartite graphs. Properties of γrtc are explored, such as the relationship between a graph and its spanning subgraphs.
This document describes a proposed modified cluster-based fuzzy-genetic data mining algorithm. The algorithm aims to mine both association rules and membership functions from quantitative transaction data. It uses a genetic algorithm approach that represents each set of membership functions as a chromosome. Chromosomes are clustered using a modified k-means approach to reduce computational costs. The representative chromosome of each cluster is used to calculate fitness values. Offspring are produced through genetic operators and selected through roulette wheel selection. The algorithm iterates until obtaining a set of membership functions with high fitness. These are then used to mine multilevel fuzzy association rules from the transaction data. The algorithm is illustrated through a simple example involving transaction data containing purchases of items like milk, bread, etc
This document summarizes the design of a microstrip decagon carpet fractal antenna for wireless applications. The antenna was designed using a decagon shape up to the 3rd iteration on an Arlon substrate with a relative permittivity of 2.7. Simulation software was used to analyze the antenna performance. The results showed the antenna could operate in multiple frequency bands between 1-12GHz, making it suitable for wireless communication applications. Parameters like substrate properties and antenna dimensions affected the multiband performance. Radiation patterns indicated good antenna gain across operating frequencies.
This document summarizes a research paper that proposes the SPAMINE algorithm to generate similar item sets from temporal databases. It begins with an introduction on association rule mining and issues with existing approaches on large, sparse datasets. It then reviews related work on frequent item set generation and temporal association rule discovery. The document describes the SPAMINE algorithm's use of lattice-dominant scanning to generate support time sequences for candidate item sets and calculate similarity using Euclidean distance. The goal is to find item sets with support variations over time similar to a reference sequence.
Sustainable Development using Green ProgrammingIRJET Journal
The document discusses sustainable development using green programming. It notes that programmers typically receive training on programming languages and methodologies but not on software energy consumption. Modern technologies like mobile apps and cloud computing require increased awareness of energy usage. The document outlines various functions that are associated with high energy consumption like graphics, computation, algorithms, memory usage, and networking. It then discusses methods to improve energy efficiency such as using better algorithms, caching, multithreading, and native code. A survey found that programmers have limited knowledge of energy efficiency and best practices for reducing software energy usage. The document argues for educating programmers on the importance of creating energy-efficient software.
APE-Annotation Programming For Energy Eciency in Androidkaranwayne
Here is a complete seminar report on "APE-Annotation Programming For Energy Eciency".
This topic is a latest topic available in market and this concept is launched in recent.
This topic is a motivation for power saving procedure in android and smart phones.
This document presents a hardware benchmarking application that was developed to test various aspects of computer performance. The application measures performance on single and multi-core CPUs, graphics processing, and RAM computation speed. It uses standard benchmark tests to provide a score that indicates a system's relative performance compared to reference databases of other configurations. The application aims to help users and manufacturers evaluate computer hardware performance objectively.
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to
around the operands to the closest exponent of 2. This way the machine intensive a part of the
multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is
evaluated by comparing its performance with those of some approximate and correct multipliers using
different design parameters. In this proposed approach combined the conventional RoBA multiplier with
Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA
multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved
the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the
DSP
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to around the operands to the closest exponent of 2. This way the machine intensive a part of the multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is evaluated by comparing its performance with those of some approximate and correct multipliers using different design parameters. In this proposed approach combined the conventional RoBA multiplier with Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the DSP.
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to
around the operands to the closest exponent of 2. This way the machine intensive a part of the
multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is
evaluated by comparing its performance with those of some approximate and correct multipliers using
different design parameters. In this proposed approach combined the conventional RoBA multiplier with
Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA
multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved
the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the
DSP.
A METHODOLOGY FOR IMPROVEMENT OF ROBA MULTIPLIER FOR ELECTRONIC APPLICATIONSVLSICS Design
In this paper, propose an approximate multiplier that is high speed yet energy efficient. The approach is to around the operands to the closest exponent of 2. This way the machine intensive a part of the multiplication is omitted up speed and energy consumption. The potency of the planned multiplier factor is evaluated by comparing its performance with those of some approximate and correct multipliers using different design parameters. In this proposed approach combined the conventional RoBA multiplier with Kogge-stone parallel prefix adder. The results revealed that, in most (all) cases, the newly designed RoBA multiplier architectures outperformed the corresponding approximate (exact) multipliers. Thus improved the parameters of RoBA multiplier which can be used in the voice or image smoothing applications in the DSP.
Predicting Machine Learning Pipeline Runtimes in the Context of Automated Mac...IRJET Journal
This document discusses predicting machine learning pipeline runtimes in the context of automated machine learning (AutoML). It presents an approach to predict the runtime of two-step machine learning pipelines with up to one preprocessor. Separate runtime models are trained offline for each algorithm that may be used in a pipeline, and an overall prediction is derived from these models. This allows AutoML tools to anticipate whether a pipeline will time out, improving resource usage by avoiding wasted evaluations. The approach is shown to increase successful evaluations while preserving or improving best solutions.
This document analyzes and compares various adder architectures using Verilog. The objectives are to evaluate adder performance in terms of size, delay, and power consumption. Adders to be compared include ripple carry, carry lookahead, carry skip, carry select, and carry save adders. The methodology involves designing adders in Verilog, synthesizing in Xilinx, and analyzing simulation results to compare metrics like speed, area usage, and power consumption. Comparative analysis provides insights into adder tradeoffs and implications for applications like ALU design, embedded systems, signal processing, and more.
IRJET- Analysis of Software Cost Estimation TechniquesIRJET Journal
This document analyzes and compares different software cost estimation techniques using machine learning algorithms. It uses the COCOMO and function point estimation models on NASA project datasets to test the performance of the ZeroR and M5Rules classifiers. The M5Rules classifier produced more accurate results with lower mean absolute errors and root mean squared errors compared to COCOMO, function points, and the ZeroR classifier. Therefore, the study suggests using M5Rules techniques to build models for more precise software effort estimation.
Enhancing the Software Effort Prediction Accuracy using Reduced Number of Cos...IRJET Journal
This document presents research on modifying the COCOMO II software cost estimation model to improve prediction accuracy. The researchers reduced the number of cost estimation factors (called cost drivers) from 17 to 13 by adjusting the definitions and impact levels to better reflect current industry situations. They estimated effort for software projects using the modified model and found lower percentage errors compared to the original COCOMO II model, demonstrating improved estimation efficiency. The goal of the research was to analyze cost drivers and their impact on effort estimation in COCOMO II and enhance the model for more accurate predictions.
Performance prediction for software architecturesMr. Chanuwan
The document proposes an approach called APPEAR for predicting software performance in component-based systems. APPEAR uses both structural and statistical modeling techniques. It consists of two main parts: (1) calibrating a statistical regression model by measuring performance of existing applications, and (2) using the calibrated model to predict performance of new applications. Both parts are based on a model that describes relevant execution properties in terms of a "signature". The method supports flexible choice of parts modeled structurally versus statistically. It is being validated on two industrial case studies.
This document proposes an approach called APPEAR (Analysis and Prediction of Performance for Evolving Architectures) for predicting software performance in component-based systems. APPEAR uses both structural and statistical modeling techniques. Structural modeling reasons about component properties, while statistical modeling abstracts irrelevant execution details. APPEAR consists of two parts: 1) calibrating a statistical regression model by measuring existing applications, and 2) using the calibrated model to predict new application performance. Both parts are based on a signature model describing relevant execution properties. APPEAR supports choosing parts for structural vs. statistical modeling to balance accuracy and effort. It is being validated on two industrial case studies.
IRJET- An Energy-Saving Task Scheduling Strategy based on Vacation Queuing & ...IRJET Journal
This document summarizes a research paper that proposes an energy-saving task scheduling strategy for cloud computing based on vacation queuing and optimization of resources. The proposed approach aims to minimize energy consumption, reduce processing time, and increase the number of sleeping nodes to make the system more efficient. It introduces a task scheduling algorithm that assigns tasks to computing nodes based on their properties using a load balancer. Simulation results show the proposed algorithm reduces energy consumption while meeting task performance compared to the vacation queuing algorithm. The document discusses related work on energy optimization techniques, presents the proposed approach, and analyzes results showing improvements in energy usage, time, and idle nodes.
ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AN...IJCSEA Journal
Energy consumption is a critical design issue in real-time systems, especially in battery- operated systems. Maintaining high performance, while extending the battery life between charges is an interesting challenge for system designers. Dynamic Voltage Scaling and Dynamic Frequency Scaling allow us to adjust supply voltage and processor frequency to adapt to the workload demand for better energy management. Usually, higher processor voltage and frequency leads to higher system throughput while energy reduction can be obtained using lower voltage and frequency. Many real-time scheduling algorithms have been developed recently to reduce energy consumption in the portable devices that use voltage scalable processors. For a real-time application, comprising a set of real-time tasks with precedence and resource constraints executing on a distributed system, we propose a dynamic energy efficient scheduling algorithm with weighted First Come First Served (WFCFS) scheme. This also considers the run-time behaviour of tasks, to further explore the idle periods of processors for energy saving. Our algorithm is compared with the existing Modified Feedback Control Scheduling (MFCS), First Come First Served (FCFS), and Weighted scheduling (WS) algorithms that uses Service-Rate-Proportionate (SRP) Slack Distribution Technique. Our proposed algorithm achieves about 5 to 6 percent more energy savings and increased reliability over the existing ones.
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...ijesajournal
Task intensive electronic control units (ECUs) in automotive domain, equipped with multicore processors ,
real time operating systems (RTOSs) and various application software, should perform efficiently and time
deterministically. The parallel computational capability offered by this multicore hardware can only be
exploited and utilized if the ECU application software is parallelized. Having provided with such
parallelized software, the real time operating system scheduler component should schedule the time critical
tasks so that, all the computational cores are utilized to a greater extent and the safety critical deadlines
are met. As original equipment manufacturers (OEMs) are always motivated towards adding more
sophisticated features to the existing ECUs, a large number of task sets can be effectively scheduled for
execution within the bounded time limits. In this paper, a hybrid scheduling algorithm has been proposed,
that meticulously calculates the running slack of every task and estimates the probability of meeting
deadline either being in the same partitioned queue or by migrating to another. This algorithm was run and
tested using a scheduling simulator with different real time task models of periodic tasks . This algorithm
was also compared with the existing static priority scheduler, which is suggested by Automotive Open
Systems Architecture (AUTOSAR). The performance parameters considered here are, the % of core
utilization, average response time and task deadline missing rate. It has been verified that, this proposed
algorithm has considerable improvements over the existing partitioned static priority scheduler based on
each performance parameter mentioned above.
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...ijesajournal
This document proposes a hybrid scheduling algorithm for real-time critical tasks in multicore automotive ECUs. It describes the existing static priority and partitioned scheduling approach used in AUTOSAR that has issues like poor core utilization and low priority tasks missing deadlines. The proposed algorithm uses both global and partitioned queues, allows task migration, and prioritizes tasks based on their slack (period minus remaining execution time). It was tested on a simulation tool using different task models and showed improvements over the existing approach in core utilization, average response time, and deadline missing rates.
A LIGHT WEIGHT VLSI FRAME WORK FOR HIGHT CIPHER ON FPGAIRJET Journal
This document discusses the implementation of a lightweight VLSI design for the HIGHT cipher on an FPGA. It begins with an introduction to lightweight VLSI architecture and its applications in low-resource devices. It then provides background on the HIGHT cipher and discusses prior work implementing cryptographic algorithms on FPGAs. The document goes on to describe the proposed VLSI design for the HIGHT cipher, which is optimized for size, power, and speed. It achieves a throughput of 25 Mbps with an encryption/decryption delay of 0.64 ms. Evaluation results demonstrate the effectiveness and suitability of the design for low-power applications.
1. The document discusses the unit ITEL01 - Embedded Systems which covers introduction to embedded computing, instruction sets, textbooks and reference books.
2. It defines what an embedded system is, discusses the hardware and software components, and classification of microprocessors.
3. The challenges in embedded system design process are also summarized which includes meeting specifications within constraints of cost, power and size.
1. The document discusses the unit ITEL01 - Embedded Systems which covers introduction to embedded computing, instruction sets, textbooks and reference books.
2. It defines what an embedded system is, discusses the hardware and software components, and classification of microprocessors.
3. The document covers advantages of programmable CPUs, functionality of embedded systems, challenges in design, performance aspects, and formalisms for system design including UML.
1. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
Energy-Efficient Design of Battery-Powered Embedded Systems
V.Prasanna Kumar M.Tech,
Asst.Professor, Dept Of E.C.E, L.N.B.C.I.E.T, Satara
Abstract
Energy-efficient design of battery- development and once the prototype is built in order
powered systems demands optimizations in to get the best performance and energy consump-
both hardware and software. We present a tion from the system. Embedded software
modular approach for enhancing instruction optimization requires tools for estimating the impact
level simulators with cycle-accurate simulation of program transformations on energy consumption
of energy dissipation in embedded systems. and performance.
Our methodology has tightly coupled
component models thus making our approach This work presents a complete solution for
more accurate. Performance and energy all embedded system design issues discussed above.
computed by our simulator are within a 5% The distinctive features of our approach are the
tolerance of hardware measurements on the following: i) complete system-level and component
SmartBadge [2]. We show how the simulation energy consumption estimates as well as battery
methodology can be used for hardware design lifetime estimates; ii) ability to explore multiple
exploration aimed at enhancing the SmartBadge architectural alternatives; and iii) easy estimation of
with real-time MPEG video feature. In the impact of software changes both during and
addition, we present a profiler that relates after the architectural exploration. The tool set is
energy consumption to the source code. Using integrated within the instruction set simulator
the profiler we can quickly and easily redesign provided by ARM Ltd. [1]. It consists of two
the MP3 audio decoder software to run in real components: a cycle-accurate system-level energy
time on the SmartBadge with low energy consumption simulator with battery lifetime
consumption. Performance increase of 92% and estimation and a system profiler that correlates
energy consumption decrease of 77% over the both energy consumption and performance with
original executable specification have been the code. Our tools have been tested on a real-life
achieved. industrial application, and have proven to be both
accurate (within 5% of hardware measurements) and
Index Terms—Low-power design, performance highly effective in optimizing the energy
tradeoffs, power consumption model, system- consumption in embedded systems (energy
level. consump- tion reduced by 77%). In addition, they
are very flexible and easy to adopt to different
I. INTRODUCTION systems. The tools contain general models for all
NERGY consumption is a critical factor in typical embedded system components but the
system-level design of embedded portable microprocessor. In order to adopt the tools to another
E
appliances. In addition, low costs with fast time to processor, the ARM ISS needs to be replaced by the
market are crucial. As a result, typical portable ISS for the processor of interest.
appliances are built of commodity components and The rest of this manuscript is organized as
have a microprocessor-based architecture. Full follows. We discuss related work in Section II.
system evalua- tion is often done on prototype System model and the methodology for cycle-
boards resulting in long design times. Field accurate simulation of energy dissi- pation are
programmable gate array (FPGA) hardware emu- presented in Section III. Section IV shows that the
lators are sometimes used for functional debugging simulation results of timing and energy dissipation
but cannot give accurate estimates of energy using the methodology presented are within 5% of
consumption or performance. Performance can be the hardware mea- surements for the Dhrystone test
evaluated using instruction-set simulators (e.g., [1]), case. Hardware architecture trade-offs for
but there is limited or no support for energy consump- SmartBadge’s real-time MPEG video decode
tion evaluation. design are explored using cycle-accurate energy
Ideally, when designing an embedded simulation in Section V. The profiling support
system built of com- modity components, a designer we have developed is presented in Section VI. A
would like to explore a limited number of full software design example of MP3 audio decoder
architectural alternatives and test functionality, en- for the SmartBadge that uses our profiler is shown
ergy consumption, and performance without the need in Section VII.
to build a prototype first. In addition, designers need
to optimize software both during hardware
312 | P a g e
2. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
II. RELATED WORK components. In addition, it is slower than our
As portable embedded systems have approach as it models at lower abstraction level.
grown in importance in recent years, so has the An alternative approach for energy estimation using
need for tools that enable energy consumption measure- ments as a basis for estimation is presented
estimation for such systems. CAE support for in PowerScope tool [9]. PowerScope requires two
embedded system design is still limited. computers to collect the measure- ment statistics,
Commercial tools target mainly functional some changes to the operating system source code,
verification and performance estima- tion [3]–[6], and a digital multimeter. Although this system
but provide no support for energy-related cost enables accurate code profiling of an existing
metrics. system, it would be very difficult to use it for both
Processor energy consumption is generally hardware and software architecture ex- ploration we
estimated by nstruction-level power analysis, first present in this paper, as in the early design stages
proposed by Tiwari et al. [24], [25]. This technique neither hardware nor operating systems or software
estimates the energy consumed by a program by are avail- able for measurements.
summing the energy consumed by the execution of Finally, previous approaches do not focus
each instruction. Instruction-by-instruction energy on battery life op- timization, the ultimate goal of
costs, together with nonideal effects, are energy optimization for portable systems. In fact,
precharacterized once for all for each target when the battery subsystem is not considered in
processor. An approach proposed recently in [12] energy estimation significant errors can result [21].
attempts to evaluate the effects of different cache Some an- alytical estimates of the tradeoff between
and bus configurations using linear equations to battery capacity and delay in digital CMOS systems
relate the main cache characteristics to system are presented in [18]. Battery capacity is strongly
performance and energy consumption. This approach dependent on the discharge current as can be seen
does not account for highly nonlinear behavior in from any battery data sheet [22]. Hence, it is
cache accesses for different cache configurations important to accurately model discharge current as a
that are both data and architecture dependent. function of time in an embedded system.
A few research prototype tools that estimate the In contrast to previous approaches, in this
energy con- sumption of processor core, caches, and work memory models and processor instruction-
main memory in SOC design have been proposed level simulator are tightly integrated together with
[7], [10]. Memory energy consump- tion is estimated an accurate battery model into cycle- accurate
using cost-per-access models. Processor ex- ecution simulation engine. Estimation results obtained with
traces are used to drive memory models, thereby ne- our simulator are shown to be within 5% of
glecting the nonnegligible impact of a nonideal measured energy consumption in hardware. In
memory system on program execution. The final addition, we accurately model battery discharge
system energy is obtained by summing over the current. Since we develop only one simulation
contribution of each component. The main engine, there is no overhead in executing simulators
limitation of the approaches presented in [7] and at different levels of abstraction, or in the interface
[10] is that the interaction between memory system between them. Thus, our approach enables fast and
(or I/O peripherals) and processor is not modeled. accurate architecture exploration for both energy
A more recent approach presented in [11] combines consumption and performance.
multiple power estimators into one simulation In an industrial environment, the degrees
engine thus enabling de- tailed simulation of some of freedom in hardware design for embedded
components, while using high-level models for portable appliances are often very limited, but for
others. This approach is able to account for interac- software a lot more freedom is available. As a
tion between memory, cache and processor at run result, a primary requirement for system-level
time, but at the cost of potentially long run-times. design methodology is to effectively support code
Longer run-times are caused by different abstraction energy consumption optimization. Several
levels of various simulators and by the overhead in techniques for code optimization have been
communication between different components. The presented in the past. A methodology that
techniques that enable significant simulation combines automated and manual software
speedup are pre- sented, but at the cost of the loss of optimizations focused on optimizing memory
detail in software design and in the input data trace. accesses has been presented in [17]. Tiwari et al.
Cycle-accurate register-transfer level energy [24], [25] uses instruction-level energy models to
estimation is pre- sented in [8]. This tool integrates develop compiler-driven energy optimizations such
RT level processor simulator with DineroIII cache as instruction reordering, reduction of memory
simulator and memory model. It is shown to be operands, operand swapping in Booth multipliers,
within 15% of HSPICE simulations. Unfortunately, efficient usage of memory banks, and series of
this ap- proach is not practical for component-based processor specific optimizations. Several other opti-
designs such as the one presented in this paper, as it mizations have been suggested, such as energy
requires knowledge of the in- ternal design of system efficient register labeling during the compile
313 | P a g e
3. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
phase [19], procedure inlining and loop unrolling to illustrate our methodology and to obtain hard-
[7], as well as instruction scheduling [27]. Work ware measurements. The SmartBadge, shown in Fig.
presented in [20] applies a set of compiler 1, is an em- bedded system consisting of the
optimizations concurrently and evaluates the StrongARM-1100 processor, FLASH, SRAM,
resulting energy consumption via simulation. sensors, and modem/audio analog front-end on a
All of the techniques discussed above focus on PCB board powered by the batteries through a DC–
automated instruction-level optimizations driven by DC converter. The initial goal in designing the
the compiler. Unfor- tunately, currently available SmartBadge was to allow a computer or a human
commercial compilers have lim- ited capabilities. user to provide location and envi- ronmental
The improvements gained when using stan- dard information to a location server through a heteroge-
compiler optimizations are marginal compared to neous network. The SmartBadge could be used as a
writing energy efficient source code [16]. The corporate ID card, attached (or built in) to devices
largest energy savings were observed at the such as PDAs and mobile telephones, or
interprocedural level that compilers have not been incorporated in computing systems. The design goal
able to exploit. for the SmartBadge has since been extended to
Code optimization requires extensive combine lo- cation awareness and authentication
program execution analysis to identify energy- with audio and video sup- port. We will illustrate
critical bottlenecks and to provide feedback on the how our methodology has been used for architecture
impact of transformations. Profiling is typically used exploration of the new SmartBadge that needed to
to relate performance to the source code for CPU support real-time MPEG video decode feature. In
and L1 cache [1]. Leveraging our estimation engine, addition, we will show how our profiler and code
we implemented a code profiling tool that gives optimizations can be used to improve code for MP3
percentages of time and energy spent in each audio decoder.
procedure for every system component, not only The system we use in this work to illustrate
CPU and L1 cache. Thanks to energy profiling, the our methodology, the SmartBadge, has an ARM
programmer can easily identify the most energy- processor. As a result, we im- plemented the energy
critical procedures, apply transformations, and models as extensions to the cycle-accu- rate
estimate their impact not only on pro- cessor energy instruction-level simulator for the ARM processor
consumption, but also on memory hierarchy and family, called the ARMmulator [1]. The ARMulator
system busses. is normally used for functional and performance
validation. Fig. 2 shows the sim- ulator architecture.
The typical sequence of steps needed to set up
system simulation can be summarized as follows: 1)
The designer provides a simple functional model for
each system component other than the processor; 2)
The functional model is annotated with a cycle-
accurate performance model; 3) Ap- plication
software (written in C) is cross-compiled and
loaded in specified locations of the system memory
model; and 4) The simulator runs the code and the
designer can analyze execution using a cross-
debugger or collecting statistics. A designer inter-
ested in using our methodology would only need to
Fig. 1. SmartBadge. addition- ally provide cycle-accurate energy models
for each component during step 2) of the simulation
Our approach enables complete system-
setup. Thus, the designer can obtain power estimates
level and component energy consumption estimates with little incremental effort.
as well as battery lifetime es- timates. In addition, it We developed a methodology for
provides an ability to quickly explore multiple
enhancing cycle-accurate simulators with energy
architectural alternatives. Finally, it enables software
models of typical components used in embedded
optimization both during and after architectural system design. Each component is characterized with
exploration using our energy profiling tool. In the equivalent capacitance for each of its power states.
following section we present the cycle-accurate Energy spent per cycle is a function of equivalent
energy simulator architecture to- gether with energy capacitance, current voltage, and frequency. The
consumption models for the components modeled. equivalent capacitance allows us to easily scale
energy consumed for each component as frequency
III. SYSTEM MODEL or voltage of operation change. Equivalent
Typical portable embedded systems have capacitances are cal- culated given the information
processors, storage, and peripherals. We use provided in data sheets.
SmartBadge [2] throughout this paper as a vehicle Internal operation of our simulator proceeds as
314 | P a g e
4. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
follows. On each cycle of execution the ARMulator
sends out the infor- mation about the state of the
processor (―cycle type‖) and its address and data
busses. Two main classes of processor cycle types
are processor active, where active power is consumed,
and processor idle, where idle power is consumed.
The processor idle state represents an off-chip
memory request. The number of cycles that the
processor remains idle depends on L2 cache and
memory model access times. L2 cache, when
present, is always accessed before the main memory
and so is active on every memory access request. On
L2 cache miss, main memory is accessed. Memory
model accounts for energy spent during the memory
access. The interconnect energy model calculates
energy consumed by the interconnect and pins
based on the number of lines switched during the
Fig. 2. Simulator architecture.
cycle on the data and ad- dress busses. The DC–DC
converter energy model sums all the currents
the on-chip cache, and the state in which
consumed each cycle by other system components,
the processor is exe- cuting NOPs while waiting to
ac- counts for its efficiency loss, and gets the total
fill the cache.
energy consumed from the battery. The battery
Note that in the case of StrongARM
model accounts for battery effi- ciency losses due to
processor used in this work, the data sheet values for
the difference between the rated current and discharge
current consumption correspond well to the measured
current computed the current cycle.
values. Wan [26] extended the StrongARM processor
The total energy consumed by the system per cycle
model with base current costs for each instruction.
is the sum of energies consumed by the processor
The average power consumption for most of the
and L1 cache ( ), interconnect and pins (
instructions is 200 mW measured at 170 MHz. Load
), memory ( ), L2 cache ( ), the
and store instructions re- quired 260 mW each.
DC–DC converter ( ) and the efficiency losses
Because the difference in energy per in- struction is
in the battery ( )
minimal, it can be expected that the average power
consumption value from the data sheets is on the
same level of accuracy as the instruction-level
The total energy consumed during the model. Thus we can use data sheet values to derive
execution of the software on a given hardware equivalent capacitances for the Stron- gARM. Note
architecture is the sum of the energies con- sumed that for other processors data sheet values would
during the each cycle. Models for energy need to be verified by measurement, as often data
consumption and performance estimation of each sheet values report the maximum power
system component are de- scribed in the following consumption, instead of typical.
sections. When the processor is executing with the
on-chip cache, it consumes the active power
A. Processor specified in the data sheet mea- sured at given
The ARM simulator provides a cycle- voltage and frequency of operation . Total
accurate, instruction- level model for ARM equivalent active capacitance within the processor
processors and L1 on-chip cache. The model was is estimated as
enhanced with energy consumption estimates based
on the information provided by the data sheets. Two
power states are considered: active state in which
processor is running with
The amount of energy consumed by
processor and L1-cache at specified processor cycle
time and CPU core voltage is
When there is an on-chip cache miss, the processor
stalls and ex-
ecutes NOP instructions which consume less power.
315 | P a g e
5. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
can be estimated from the power consumed during estimated from the ac- tive power specified in the data
execution of sheet measured at voltage and frequency
NOPs at voltage and frequency
Multibank memory can be represented as multiple
The energy consumed within processor core per cycle one-bank memories
while ex- ecuting NOPs is Idle state can be further subdivided into
multiple states that describe modes of operation for
different types of memories. For example, DRAM
might have two idle states: refresh and sleep.
The designer specifies the percentage of the time
B. Memory and L2 Cache memory spends in each idle state. Total idle energy
The processor issues an off-chip memory per cycle for memory is
access when there is an L1 cache miss. The cache-
fill request will either be ser- viced by the L2 cache
if one is present in the design or directly from the
main memory. On L2 cache miss, a request is issued
to the processor to fetch data from the main memory. where is power consumption in idle
Data sheets specify the memory and L2 cache access state . Both RAM and ROM are represented with
times and energy con- sumed during active and idle the same memory model, but with different
states of operation. parameters.
Memory access time is scaled by the The L2 cache access time and energy
processor cycle time to obtain the number of consumption are treated the same way as any other
cycles the processor has to wait to serve a request memory. L2 cache organization is de- termined from
(6). Wait cycles are defined for two different the number of banks, lines per bank, and words per
types of memory accesses: sequential and line. Line replacement can follow any of the well-
nonsequen- tial. Sequential access is at the address known replace- ment policies. Cache hit rate is
immediately following the address of the previous strongly dependent on its organi- zation, which in turn
access. In burst type memory the se- quential access affects the total memory access time and the energy
is normally a fraction of the first nonsequential consumption. Note that we are simulating details of
access the L2 cache access and thus know the exact L2 cache
miss rate.
C. Interconnect and Pins
The interconnects on the PCB can
Two energy consumption states are
contribute a large portion of the off-chip
defined for each type of memory: active and idle.
capacitance. Capacitance per unit length of the
Energy consumed per cycle while
interconnect is a parameter in the energy model that
can be ob- tained from the PCB manufacturer. The
length of an intercon- nect can be estimated by the
designer based on the approximate placement of the
selected components on the PCB. Pin capaci- tance
values are reported on the data sheets.
For each component the average length of
the clock line, data, and address buses between the
processor and the component are provided as one of
the input simulation parameters. Hence, the designer
is free to use any wire-length estimate [14] or mea-
surement. The interconnect lengths used in our
simulation of SmartBadge come from the prototype
Fig. 3. DC–DC converter efficiency
board layout.
memory is in active state operating at
The total capacitance switched during one cycle is
supply voltage is a function of equivalent
shown in (10). It depends on the capacitance of one
active capacitance, voltage of operation and number
interconnect line and the pins attached to it
of total access cycles ( +1)
and the number of lines switched during the cycle
Active memory capacitance can be
316 | P a g e
6. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
The total energy consumed per cycle The discharge rate (or discharge current ratio) is
is a function given by
of the voltage swing on the lines that switched
total capac- itance switched and the total time
to access the memory where , the rated discharge current, is
derived from the battery specification and is the
average current drawn by the DC–DC converter. As a
battery cannot respond to instantaneous changes in
current, a first order time constant is defined to
determine the short-term average current drawn from
the battery [23]. Given , and processor cycle time
D. DC–DC Converter
, we can compute , the number of cycles
DC–DC converter losses can account for a
over which average DC–DC current is calculated as
significant frac- tion of the total energy consumption.
Fig. 3 from the datasheets shows the dependence of
efficiency on the DC–DC converter output current.
Total current drawn from the DC–DC converter by
the system each cycle is a sum of the currents
drawn by each system component. A component
current is defined by where is the instantaneous current drawn
from the bat- tery. With discharge current ratio, we
estimate battery efficiency using battery efficiency
plot such as the one shown in Fig. 4. The
where is the energy consumed by the
component during cycle of length at
operating voltage .
Total current drawn from the battery can be
calculated as
Efficiency can be estimated using linear
interpolation from the data points derived from the
output current versus efficiency plot in the data Fig. 4. Battery efficiency.
sheet. From our experience, a table with 20 points
derived from the data sheets gives enough total energy loss of the battery per cycle
information for accurate linear estimation of values is the product of energy drained from the
not directly represented in the table. battery by the system with the effi- ciency loss
Total energy DC–DC converter draws out of the
battery each cycle is
The energy consumed by the DC–DC converter
is differ- Given the battery capacity model described
ence between the energy provided by the battery above, battery es-timation is performed as follows.
and First, the designer character- izes the battery with its
the energy consumed from the DC–DC converter rated capacity, the time constant, and the table of
by all other components, points describing the discharge plot similar to the
one shown in Fig. 4. During each simulation cycle
discharge cur- rent ratio is computed from the rated
battery current and average DC–DC current
E. Battery Model calculated from the last cycles. Efficiency is
The main battery characteristic is its rated calculated using linear interpolation between the
capacity measured in megawatt hours. Since total points from the discharge plot. Total energy drawn
available battery capacity varies with the discharge from the battery during the cycle is obtained from
rate, manufacturers specify plots in the datasheets (19). Lower efficiency means that less battery energy
with discharge rate versus battery efficiency similar remains and thus the battery lifetime is propor-
to the one shown below. tionally lower. For example, if battery efficiency is
60% and its rated capacity is 100 mAhr at 1 V, then
317 | P a g e
7. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
the battery would be drained in 12 min at average V. EMBEDDED MPEG DECODER
DC–DC current of 300 mA. With efficiency of SYSTEM DESIGN EXPLORATION
100% the battery would last 1 h. The primary motivation for the
development of cycle-accu- rate energy
consumption simulation methodology is to provide
IV. VALIDATION OF THE SIMULATION an easy way for embedded system designers to
METHODOLOGY evaluate mul- tiple hardware and software
We validated the cycle-accurate power architectures with respect to per- formance and
simulator by com- paring the computed energy energy consumption constraints. In this section we
consumption with measurements on the SmartBadge will present an application of the simulation
prototype implementation. The SmartBadge methodology to embedded MPEG video decoder
prototype consists of the StrongARM-1100 system design exploration. The MPEG decoder
processor, DC–DC Converter, FLASH, and SRAM design consists of the processor, the off-chip
on a PCB board. All the com- ponents except the memory, the DC–DC converter, output to the LCD
CPU core are powered through the 3.3 V supply display, and the interface to the source of the MPEG
line. CPU core runs on 1.5 V supply. DC–DC stream. The input and output portions of the MPEG
converter is powered by the 3.5 V supply. DC–DC decoder design will not be consid- ered at this point.
converter efficiency table contains 22 points derived We focus on selection of memory hierarchy that is
from the plot shown in Fig. 3. Stripline interconnect most energy efficient.
model is used with 1.6 pF/cm capacitance calculated The characteristics of memory
based on the PCB board characteristics [13]. Table I components considered are shown in Table II.
shows other system components. Average current Two different instruction memories were
consumed by the processor’s power supply and the evaluated—low-power FLASH and power-hungry
total current drawn from the battery are measured burst FLASH. We looked at three different data
with digital multimeters. Execution time is memories—low- power SRAM, faster burst
measured using the processor timer. SRAM, and very power-hungry burst SDRAM.
Industry standard Dhrystone benchmark is used as a Both instruction and data memories are 1 MB in
vehicle for methodology verification. Measurements size. We considered using L2 cache in addition to L1
and simulations have been done for ten different cache. Unified L2 cache is 256 Kb, four-way set
operating frequencies of SA-1100 and SA-110 associative. The hardware configurations simulated
processors. Dhrystone test case is run 10 million are shown in Table III. The MPEG video decode
times, 445 instructions per loop. Simulations ran sequence we used has 12 frames running at 30
frames/second, with two I, three P, and seven B-
Table 1 frames. We found that the results we obtained with a
DHRYSTONE TEST CASE SYSTEM DESIGN shorter video sequence matched well the results
obtained with the longer trace.
Fig. 6 shows the amount of time each
system component is active during the MPEG
decode and the amount of energy con- sumed. The
original configuration is limited by the bandwidth of
data memory. L2 cache is very fast but also
consumes too much energy. Burst SDRAM design
On HP Vectra PC with Pentium II MMX fully solves the memory bandwidth problem with
300 MHz processor and 128 MB of memory. least energy consumption. Instruction
Hardware ran 450 times faster than the simulations
without the energy models. Simulations with energy
models were slightly slower (about 7%). Fig. 5
shows average processor core and battery currents.
Average simulation current is obtained by dividing
the total energy consumed by the processor core or
the battery with their respective supply voltages and
the total execution time.
Simulation results are within 5% of the
hardware measure- ments for the same frequency of
operation. The methodology presented in this paper
for cycle-accurate energy consumption simulation is
very accurate and thus can be used for architecture
design exploration in embedded system designs. An
example of such exploration is presented next.
318 | P a g e
8. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
Fig. 5. Average processor core and battery
currents.
TABLE II
MEMORY ARCHITECTURES FOR MPEG
DESIGN
Fig. 6. Performance and energy consumption for
hardware architectures
TABLE III HARDWARE CONFIGURATIONS
Fig. 7. Cycle-accurate energy plot
The analysis of peak energy consumption
and the fine tuning of the architectures can be done
memory constitutes a very small portion of by studying the energy con- sumption and the
the total energy due to the relatively large L1 cache in memory access patterns over a period of time. Fig. 7
comparison to the MPEG code size. The DC–DC shows the energy consumption over time of the
converter consumes a significant amount of total processor with burst FLASH and SRAM. Peak
energy and thus should be considered in system energy consumption can reach twice the average
simula- tions. We conclude from this example that consumption, so the thermal character- istics of the
using faster and more power-hungry memory can be hardware design, the DC–DC converter, and the bat-
energy efficient. tery have to be specified accordingly.
For best battery utilization, it is important to
match the current consumption of the embedded
319 | P a g e
9. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
system to the discharge charac- teristic of the
battery. On the other hand, the more capacity bat-
tery has, the heavier and more expensive it will be.
Fig. 8 shows that the instantaneous battery efficiency
varies greatly over time with MPEG decode running
on the hardware described above.
Lower capacity batteries have larger
efficiency losses. Fig. 9 shows that the total decrease
in battery lifetime when contin- ually running
MPEG algorithm on a battery with lower rated
discharge current can be as high as 16%. The battery’s
time con- stant was set to ms.
The design exploration example presented
in this section il- lustrates how the methodology for
cycle-accurate energy con- sumption simulation can
be used to select and fine-tune hard- ware
configuration that gives the best tradeoff between Fig. 9. Percent decrease in battery lifetime for
perfor- mance and energy consumption. MPEG decoder
The main limitation of the cycle-accurate sign can be optimized for both energy
energy simulator is that the impact of code consumption and perfor- mance based on the
optimizations is not easily evaluated. For example, in expected input data set.
order to evaluate energy efficiency of two different The profiler operates as follows. Source
implementations of a particular portion of software, code is compiled using a compiler for a target
the designer would need to obtain cycle-by-cycle processor (e.g., application or op- erating system
plots and then manually relate cycles to the software code). The output of the compiler is the exe- cutable
portion of interest. The profiling methodology that the cycle-accurate simulator executes
presented next addresses this limitation. (represented in this figure as assembly code that is
input into the simulator) and a map of locations of
each procedure in the executable that a profiler uses
VI. PROFILING OF SOFTWARE to gather statistics (the map is correspondence of
ENERGY CONSUMPTION assembly code blocks to procedures in ―C‖ source
The profiler architecture is shown in Fig. 10. code). In order to increase the simulation speed, a
The shaded por- tion represents the extension we user-defined profiling interval is set so that the
made to the cycle-accurate en- ergy simulator to profiler gathers statistics only at pre- determined
enable code profiling. Profiling for energy and time increments. Usually an interval of 1 s is suf-
performance enables designers to identify those ficient. Note that longer intervals will give slightly
portions of their source code that need to be further faster ex- ecution time, with a loss of accuracy. Very
optimized in order to either decrease energy short intervals (on the other of a few cycles) have
consumption, increase performance, or both. Our larger calculation overhead. For example, energy
profiler enables designers to explore multiple dif- consumption calculation gives approximately 10%
ferent hardware and software architectures, as well as overhead to standard cycle-accurate performance
to do sta- tistical analysis based on the input simula- tion. Profiling with an interval of 1 s gives
samples. In this way the de- negligible overhead over energy simulation (less then
1%), with still accurate results. During each cycle of
operation, the cycle-accurate energy consumption
simulator calculates the current total executiontime
and energy consumption of all system
components as shown in (1). The profiler works
concurrently with the cycle-accurate simulator. It
periodically samples the simulation results (using
sample interval specified by the user) and maps the
energy and performance to the function executed
using information gathered at the compile time.
Once the simulation is complete, the results of
profiling can be printed out by the total energy or
time spent in each function.
The main advantage of the profiler is that it allows
designers to obtain energy consumption breakdown
by procedures in their source code after running only
Fig. 8. Battery efficiency for MPEG decoder. one simulation. This information is of critical
320 | P a g e
10. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
importance when designing an embedded system, as uses the profiler to guide the implementation of
it enables designers to quickly identify and address the source code optimizations described earlier for
the areas in the source code that will provide largest the MP3 audio decoder running on the martBadge.
overall energy sav- ings. A good example of profiler
usage is shown in Table IV. The table shows a VII. OPTIMIZING MP3 AUDIO DECODER
portion of energy profile for MP3 audio de- code. The block diagram of the MPEG Layer III
The first column gives the name of the top procedure, audio decoding algorithm (MP3) is shown in Fig. 11.
fol- lowed by its children. The next column gives It consists of three blocks: frame unpacking,
the total energy spent for that procedure. For reconstruction, and inverse mapping. The first step
example, the total energy spent running the program in decoding is synchronizing the incoming bitstream
( ) is 0.32 mWhr. The final column gives and the decoder. Huffman decoding of the subband
the amount of energy spent only in that particular coefficients is performed before requantization.
proce- dure. For example, under it is Stereo processing, if applicable, occurs before the
clear that and its descendants spend inverse mapping which consists of an inverse
the most energy, 0.0671 mWhr. Looking at the entry modified cosine transform (IMDCT) followed by a
for , it is easy to see that the largest portion polyphase synthesis filterbank. We obtained the
of energy is consumed by its child, . There- fore, original MP3 audio decoder software from the
the procedures to focus optimization on are International Organization for Standardization [28].
and . Although in this example we showed Our design goal was to obtain real-time performance
source code profile of total battery energy with low energy consumption while keeping in full
consumption, the pro- filer can report energy compliance with the MPEG standard.
consumption for any system component, such as Given the limited compiler support available [16],
SRAM or the interconnect. our ap- proach to code optimization is based on
The profiler allows for fast and accurate manual code rewriting and optimization guided by
evaluation of software and hardware architectures. our profiler. Code transformations are applied in
Most importantly, it gives good guidance to the layers, starting from a high level of abstraction and
designer during the design process without requiring moving down to very detailed and architecture-
manual intervention. In addition, the profiler specific op- timization. In the next three
accounts for all embedded system components, not subsections, we will describe in detail the three
just the processor and the L1 cache as most optimization layers, moving from high to low
general-purpose profilers do. In the next section we abstraction. The results of optimizations applied to
present a real design example the MP3 de- coder will be presented in the last
subsection. Note that all the optimizations presented
in the following subsections were per- formed
manually.
A. Algorithmic Optimization
The top layer in the optimization hierarchy
targets algorithms. The original specification is first
profiled to identify all compu- tational kernels, i.e.,
the procedures where most time and power are spent.
Each computational kernel is then analyzed from a
functional viewpoint. Then, the alternative
algorithms for im- plementing the same functionality
are considered and compared
˘ IMUNIC et al.: ENERGY-EFFICIENT DESIGN
´
OF BATTERY-POWERED EMBEDDED
SYSTEMS
Fig. 10. Profiler architecture.
321 | P a g e
11. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
Table IV optimization, was unacceptably slow and power-
SAMPLE ENERGY PROFILING consuming. Trying to reduce the precision of
floating point computation, such as discussed in [31],
would have helped only marginally as the processor
would have to emulate in soft- ware all the floating
point operations.
To overcome this problem, we developed a fixed-
precision library and we implemented all
computational kernels of the algorithm using fixed
precision numbers. The number of dec- imal digits
can be set at compile time. The ARM architecture
is designed to support computation with 32-bits
integers with maximum efficiency. Hence, little can
be gained by reducing data size below 32 bits. On
the other hand, when multiplyingtwo 32-bit
numbers, the result is a 64-bit number and directly
truncating the result of a multiplication to 32 digits
frequently leads to incorrect results because of
overflow. To increase ro- bustness, 64-bit numbers
have been used for fixed-point compu- tation. This
data type is supported by the ARM compiler through
the definition of a integer type.
Computing with integers is less
efficient than using 32-bit integers, but results are
accurate and the risk of overflow is minimized.
Data optimization produced significant energy
savings and speedups for almost all computational
kernels of MP3 without any perceivable degradation
in quality. The fixed-point library developed for this
with the original one. At this level of purpose contains macros for conversion from fixed-
abstraction, we consider only high-level estimators point to floating point, accuracy adjustment,
of algorithmic efficiency (such as number of basic elementary function computation.
operations).
Our approach to algorithmic optimization
C. Instruction Flow Optimization
in MP3 decoding has been conservative. First, we Moving further down in abstraction level,
focused on just one computa- tional kernel where a
the third layer of optimizations targets low-level
large fraction of run time (and power) was spent,
instruction flow. After extensive profiling, the most
namely the subband synthesis. Second, we did not critical loops are identified and carefully analyzed.
try to develop new original algorithms but we used
Source code is then rewritten to make computation
previously pub- lished algorithmic enhancements
more efficient. Well-known techniques such as loop
[29], [30] that are still fully compliant to the MPEG merging, unrolling, software pipelining, loop
standard. The new algorithm incorpo- rates an invariant extraction, etc. [36], [35] have been
integer implementation of the scaled Chen discrete applied. In the innermost loops, code can be
co- sine transform (DCT) instead of a generic DCT in written directly as inline assembly, to better exploit
the polyphase synthesis filterbank. The use of a specialized instructions.
scaled DCT reduces the DCT multiply count by
28%.
Instruction flow optimizations have been
extensively applied in the MP3 decoder, obtaining
B. Data Optimization significant speedup. We do not describe these
At a lower level of abstraction than the
optimizations in detail because they are common
algorithmic level, we optimize code by changing the knowledge in the optimizing compilers literature
representation of the data ma- nipulated by the [36], [35]. However, in our case most optimizations
algorithms. The main objective is to match the were performed manually due to lack of support by
characteristics of the target architecture with the the ARM compiler.
processed data. In our case, the executable A simple example of this class of
specification of the MPEG de- coder performed transformation is the use of the multiply-
most computations using doubles, while the accumulate instruction ( ) available in the ARM
processor SA-1100 has no hardware floating point SA-1100 core. The inner loops of subband synthesis
support. As a result, a direct implementation of the and inverse modified cosine transform (the two key
decoding algorithm, even after algorithmic computational kernels of MP3 decoder) contain
322 | P a g e
12. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
matrix multiplications which can be implemented Table VI ENERGY FOR MP3
efficiently with multiply-accumulate. In this case, IMPLEMENTATIONS
we forced the ARM compiler to use the instruc-
tion by inlining it in assembly.
D. Results of MP3 Audio Decode Optimization
Table V shows the top three functions in
energy consumption for each code revision we
worked on. The original code has a very large
overhead due to floating point emulation, about 80%
of energy consumption. The next largest issue is the
redesign of SubBandSynthesis function that
implements the polyphase syn- thesis filterbank. The
details of each optimization type, namely
algorithmic, data, and instruction-level optimizations,
have been presented above.
We will use the SubBandSynthesis function
redesign as a ve- hicle to illustrate the use of our
profiler. In the initial stage, we transferred all critical
operations to fixed-point from floating point. The the Table V. In parallel we had decided to try the
transfer resolved the issue with floating-point opera- algorithmic changes on the current code.
tions but at the same time increased Profiling results in Table V show that the
SubBandSynthesis fraction of total energy six times. algorithmic opti- mizations considerably reduced the
Next we introduced a series of instruc- tion-level energy consumption of Sub- BandSynthesis
optimizations that resulted in a 30% decrease of Sub- function—it does not appear in the top three func-
BandSynthesis fraction of total energy, to 34.32% as tions, and in fact it is only 3.2% of the total energy
shown in consump- tion. The final step is to combine the
algorithmic changes with the data and instruction-
level changes, resulting in decrease of Sub-
BandSynthesis fraction of energy consumption to 6%
of total.
System and component energy
consumptions are shown in Table VI for different
revisions of source code optimization. Positive
percentages show energy decrease with respect
to the original code. Table VII shows the same
Fig. 11. MP3 audio decoder architecture. results but for performance measurements. Positive
percentages show perfor- mance increase. Although
Table V the energy savings of algorithmic versus data and
PROFILING FOR MP3 IMPLEMENTATIONS instruction-level optimizations as compared to
original code are comparable, the performance
improvement of data and instruction-level
optimizations is significant. Note that the increase
in energy consumption and the decrease in
performance of Flash is due to the increase in
code size with the algorithmic change in
SubBandSynthesis procedure. The total
improvement in system performance and energy
consumption more than makes up for the
degradation of Flash performance and energy
consumption. Combined optimiza- tions give real-
time performance for MP3 audio decode which is a
primary constraint for this project. In addition,
lower energy consumption enables longer battery
life. Note that faster implementation that is also
more energy efficient might imply higher power
consumption, which can be an issue for thermal
design of the device. In the case presented in this
paper, it
323 | P a g e
13. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
Table VII selection of the best hardware configuration.
PERFORMANCE FOR MP3 I have also developed a tool for profiling
IMPLEMENTATIONS energy consump- tion of software in embedded
systems. Profiling results enabled us to quickly and
easily target the redesign the MP3 audio de- coder
software. Our final MP3 audio decoder is fully
compliant with the MPEG standard and runs in real
time with low energy consumption. Using our
design tools we have been able to in- crease
performance by 92% while decreasing energy
consump- tion by 77%.
REFERENCES
[1] Advanced RISC Machines Ltd (ARM),
ARM Software Development Toolkit
Version 2.11, 1996.
Table VIII [2] G. Q. Maguire, M. Smith, and H. W. P.
Beadle, ―SmartBadges: A wear- able
FIXED-POINT PRECISION AND
computer and communication system,‖ in
COMPLIANCE
Proc. 6th Int. Workshop
Hardware/Software Codesign, 1998,
Invited talk.
[3] CoWare. CoWareN2c [Online]. Available:
url:www.coware.com/n2c. html [4]
Mentor Graphics. [Online]. Available:
www.mentor.com/codesign
[5] Synopsys. [Online]. Available:
www.synopsys.com/products/hwsw
was critical to get real-time performance with [6] Cadence. [Online]. Available:
longer battery lifetime. The average and peak power www.cadence.com/alta/products
consumption constraints are met with our final [7] Y. Li and J. Henkel, ―A framework for
design. estimating and minimizing energy
The final MP3 audio decoder compliance to dissipation of embedded HW/SW systems,‖
the MPEG stan- dard has been tested as a function in Proc. Design Automation Conf., 1998,
of precision for fixed-point computation. We used pp. 188–193.
the compliance test provided by the MPEG standard [8] N. Vijaykrishnan, M. Kandemir, M.
[32], [33]. The range of RMS error between the Irwin, H. Kim, and W. Ye, ―Energy-
samples defines the compliance level. Table VIII driven integrated hardware–software
shows that results. Clearly, the larger number of optimizations using SimplePower,‖ in
precision bits results in better compliance. In our Proc. 27th Int. Symp. Computer
final MP3 audio decoder we used 27 bits precision. Architecture, 2000, pp. 24–30.
Using our design tools to guide the software [9] J. Flinn and M. Satyanarayanan,
optimization process, we have been able to increase ―PowerScope:A tool for profiling the
performance by 92% while decreasing energy energy usage of mobile applications,‖ in
consumption by 77%, with full com- pliance to the Proc. 2nd IEEE Workshop Mo-bile
MP3 audio decode standard. Computing Systems Applications, 1999, pp.
23–30.
VIII. CONCLUSION [10] B. Kapoor, ―Low power memory
I developed a methodology for cycle-accurate architecutres for video applications,‖ in
simulation of performance and energy consumption Proc. 8th Great Lakes Symp. VLSI, 1998,
in embedded systems. Accuracy, modularity, and pp. 2–7.
ease of integration with the instruc- tion-level [11] M. Lajolo, A. Raghunathan, and S. Dey,
simulators widely used in industry make this method- ―Efficient power co-estimation techniques
ology very applicable to the embedded system for SOC design,‖ in Proc. Design,
hardware and soft- ware design exploration. Automation Test Europe Conf., 2000, pp.
Simulation is found to be within 5% of the 27–34.
hardware measurements for Dhrystone benchmark. [12] T. Givargis, F. Vahid, and J. Henkel, ―Fast
We presented MPEG video decoder embedded system cache and bus power estima- tion for
design explo- ration as an example of how our parameterized SOC design,‖ in Proc.
methodology can be used in prac- tice to aid in the Design, Automation Test Europe Conf.,
324 | P a g e
14. V.Prasanna Kumar / International Journal of Engineering Research and Applications
(IJERA) ISSN: 2248-9622 www.ijera.com
Vol. 2, Issue 6, November- December 2012, pp.312-325
2000, pp. 333–339.
[13] OZ Electronics Manufacturing. PCB
Modeling Tools [Online]. Avail- able: url:
www.oem.com.au/manu/pcbmodel.html
[14] A. El Gamal and Z. A. Syed, ―A stochastic
model for interconnections in custom
integrated circuits,‖ IEEE Trans. Circuits
Syst., vol. CAS-28, pp. 888–894, Sept.
1981.
[15] T. Simunic, L. Benini, and G. De Micheli,
―Cycle-accurate simulation of energy
consumption in embedded systems,‖ in
Proc. Design Automation Conf., 1999, pp.
867–872.
[16] T. Simunic, L. Benini, G. De Micheli, and
M. Hans, ―Energy-efficient design of
battery-powered embedded systems,‖ in Int.
Symp. Low-Power Electronics Design,
1999, pp. 212–217.
[17] F. Catthoor, S. Wuytack, E. De Greef, F.
Balasa, L. Nachtergaele, and A.
Vanduoppelle, Custom Memory
Management Methodology: Explo- ration
of Memory Organization for Embedded
Multimedia System De-sign. New York:
Kluwer, 1998.
[18] M. Pedram and Q. Wu, ―Battery-powered
digital CMOS design,‖ in Proc. Design,
Automation Test Europe Conf., 1999, pp.
17–23.
[19] H. Mehta, R. M. Owens, M. J. Irvin, R.
Chen, and D. Ghosh, ―Tech- niques for
low energy software,‖ in Proc. Int. Symp.
Low Power Elec- tronics Design, 1997, pp.
72–75.
[20] M. Kandemir, N. Vijaykrishnan, M.
Irwin, and W. Ye, ―Influence of compiler
optimizations on system power,‖ in Proc.
27th Int. Symp. Com- puter Architecture,
2000, pp. 35–41.
325 | P a g e