A Seminar Presentation onCompiler Techniques for Energy Reduction in High-Performance Microprocessors
1. Presented by
Miss. Neha D. Jaiswal
Co-guided by Guided by
Prof. S.A. Fanan Prof.P.M.Pandit
lJAWAHARLAL DARDA INSTITUTE OF ENGINEERING & TECHNOLOGY, YAVATMAL
DEPARTMENT OF ELECTRONICS & TELECOMMUNICATION ENGINEERING
2. Contents
1. History
2. What is Power?
3. Compiler?
4. What is Cache Memory?
5. Introduction of microprocessor
6. Compiler Enhancement
7. How compiling done?
8. Hardware Enhancement
9. Energy Estimation
10. Conclusion
11. References
3. History
In 1970, at Carnegie Mellon University, William. A. Wulf developed
optimizing compiler.
He was founder and vice president of Tartan Laboratories,
a compiler technology company, in 1981.
4. What is Power?
Power is the rate of doing work. It is the amount of energy
consumed per unit time.
In formal terms,
P = W/T……..(1)
E = P ∗ T…….(2)
where, P is power,
E is energy,
T is a specific time interval,
W is the total work performed in that interval.
• Power is measured in watts.
• For microprocessor, Power is the rate at which the computer
consumes electrical energy or dissipates it in the form of
heat.
5. Compiler
A compiler is a special program that processes statements written in
a particular programming language and turns them into machine
language or "code" that a computer's processor uses.
When executing the statements, the compiler first analyzes all of the
language statements one after the other and then ,builds the output
code.
The output of the compilation has been called object code.
Fig. Process of compiling
6. What is Cache Memory?
Cache memory is a small-sized type
of volatile computer memory.
Cache provide high-speed data
access to a processor.
It stores frequently used computer
programs, applications and data.
7. Introduction about Microprocessor
Modern microprocessor are large power consumer-
• Ultra SPARK-II consumes 58W maximum power at 296MHz.
• Pentium Pro consumes 35W at 280MHz.
• Alpha 21164 PC consumes 32.5W at 433MHz.
In microprocessor I-Cache subsystem which is one of the main
power consumers.
Fig:- I-cache
8. For that an additional minicache i.e L-cache is used between the I-
Cache and the central processing unit (CPU) core and buffers
instructions that are nested within loops.
In compiler technique for energy reduction code modifcations take
place, through the compiler, that greatly simplify the required
hardware, eliminate unnecessary instruction fetching, and consequently
reduce signal switching activity and the dissipated energy.
The basic blocks are used a L-cache for proper distribution of
instruction according to their priority.
CPU L-cache Main
memory
9. How compiling done?
The control flow graph is built
for describing each function of
the original program.
The block placement algorithm
is shown in Fig.
The input code & profile data is
as a input
Profile
data
Input
code
Nesting
comput.
For each BB
LableTree
construction
BB selection
and
placements
Global
placement
Branch
insertion
10. • The following sections give a detailed description for each block:
1.First Step: Nesting Computation:
The tool finds the loops and the nesting for every basic block.
Figure describe the data structure used & information produce.
Loop nesting shown in fig. along with CFG & labelsets.
Basic blocks within a loop which has a function call will not be eligible for
caching.
Fig. First step of block placement.
11. 2. Second Step: LabelTree Construction
The LabelTree describes the nesting relationship
between basic blocks.
Fig. LableTree.
12. 3.Third Step: Basic Block Selection and Placement
• In this compiler know the how many maximum basic
block can place in a cache.
• In this step algorithm is done which scans the basic
blocks in descending order of execution frequency.
• The most important blocks are the first to be
considered and have a greater chance to be placed in
the L-Cache.
13. 4.Fourth and Fifth Steps: Global Placement in the Memory
• In this placing basic blocks in the global address space.
• The algorithm takes as input the placement of the basic blocks with respect
to the L-Cache and tries to minimize the necessary space as much as
possible.
Fig. Placing blocks in cache
14. Hardware Enhancement
• To implement of L-cache scheme require hardware.
This is shown in fig.
• L-cache tag will only output if the
blocked part signal is on. This
signal is generated by the instruction
fetch unit (IFU).
• In that case, the comparator checks for
a match, and if it finds one, it instructs
the multiplexer to drive the contents
of the L-Cache in the data path. Fig. L-cache organisation
15. • At the same time, the data portion of the L-Cache asserts its
output and sends the new instruction to the data path. The I-
Cache is disabled for the clock cycle, since the signal blocked
part is on.
• If blocked part off, the I-Cache controller activates the I-Cache
without waiting for the L-Cache Hit signal. In this way, the L-
Cache can be bypassed without a delay penalty.
16. Energy Estimation
In this modeluses run-time information of the cache
utilization i.e number of accesses, input statistics, etc.
A 0.8- µm technology with 3.3-V voltage supply
is assumed.
These models are used for the estimation of energy in
both the I-Cache and the L-Cache.
17. By using of power-consuming modules of a CPU, the Cache
result in energy reduction.
Reduce total energy consumption of microprocessor.
Major energy gains can be obtained if the compiler and the
hardware are designed with low energy.
18. References
[1] J. Edmondon, “Internal organization of the Alpha 21164, a 300 MHz 64-bit quad-
issue CMOS RISC microprocessor,” Digital Tech. J., vol.7, no. 1, pp. 119–135, 1995.
[2] D. Dobberpuhl, “The design of a high-performance low-power microprocessor,” in
Proc. Int. Symp. Low Power Electronics and Design, 1996,pp. 11–16. [3] S. Manne, D.
Grunwald, and A. Klauser, “Pipeline gating: Speculation control for energy reduction,”
in Proc. Int. Symp. Computer Architecture, 1998, pp. 132–141.
[4] V. Tiwari, S. Malik, and A. Wolfe, “Power analysis of embedded soft-
ware: A first step toward software power minimization,” IEEE Trans.
VLSI Syst., vol. 2, pp. 437–445, Dec. 1994.
[5] V. Tiwari, S. Malik, A. Wolfe, and T. C. Lee, “Instruction level power
analysis and optimization of software,” J. VLSI Signal Processing, vol.
13, Aug. 1996.