Design of 16 bit low power processor using clock gating technique 2-3


Published on

  • Be the first to comment

  • Be the first to like this

Design of 16 bit low power processor using clock gating technique 2-3

  1. 1. INTERNATIONAL JOURNAL OF ELECTRONICS ANDInternational Journal of Electronics and Communication Engineering & Technology (IJECET), ISSNCOMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEMEISSN 0976 – 6464(Print)ISSN 0976 – 6472(Online)Volume 3, Issue 3, October- December (2012), pp. 333-340 IJECET© IAEME: Impact Factor (2012): 3.5930 (Calculated by GISI) © DESIGN OF 16 BIT LOW POWER PROCESSOR USING CLOCK GATING TECHNIQUE Khaja Mujeebuddin Quadry Research Scholar, JNTU Ananatapur, A.P., India. Email: Dr. Syed Abdul Sattar Professor & Dean of Academics, Royal Institute of Technology & Science, Chevella, R. R. Dist. A. P. India. Email: Dr. K. Soundara Rajan Professor, Dept of ECE, JNTU Anantapur, A.P.,India. Email: soundararajan_jntucea@yahoo.comABSTRACT Low power design is gaining importance due to the increasing need of batteryoperated portable devices with high computing capability. The reliability of integrated circuitdepends on the heat dissipated in the circuit. The cost of the system also increases with thecooling systems for heat removal. A large fraction of the power consumed by a synchronouslogic is due to the clock distribution network and the high switching activity at the nodes.Clock Gating is the well known technique used to reduce the clock power. In this paper wehave presented the design of 16 bit processor using 90nm technology by applying the clockgating principle at the fine grained level to minimize the power dissipation.I. INTRODUCTION Clock gating is a technique used to reduce power dissipation in clock distributednetwork. This is achieved by shutting down the clock of any component whenever it is notbeing used or accessed. It involves inserting combinational logic along the clock path toprevent the unnecessary switching of sequential elements. By shutting down the idle units wecan prevent the circuit from consuming unnecessary power. A portion of the clock tree canalso be shut down by masking off the clock at the internal node of the tree using an ANDgate. 333
  2. 2. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME Figure1. Processor Power Breakdown[16] This prevents wasteful switching in the clock tree and saves power in the clock tree inaddition to saving power in the functional units which are fed by the clock. In modernprocessors and SoCs the clock distribution network is responsible for an increasing fractionof the dynamic power consumption[15]. The Figure 1 shows the breakdown of powerconsumption for a recent high-performance microprocessor[16]. The clock power is expectedto increase as the complexity and the operating frequency of the circuits keep growing as aresult of technology scaling [11]. Designing the clock tree has thus become critical not onlyfor performance, but also for power, and the development of new modeling capabilities andsynthesis techniques that help in controlling the clock tree power effectively is one of thechallenges that EDA engineers currently have to face[13]. Different solutions for minimizingthe power consumed by the clock tree have been investigated in the recent past. In this paper,we have presented the design of 16 bit processor by applying the clock-gating technique forpower optimization at the gate and RT levels. The rest of this paper is organized as follows.In Section II we briefly review previous work on minimization of power using clock gating.Section III provides design of 16 bit processor and how clock gating is applied. Section IVdiscusses simulation results. Finally, Section V concludes the manuscript with some finalremarks.II. PREVIOUS WORK The problem of minimizing the power dissipation by clock distribution networks hasbeen addressed by many authors and a brief overview of their work is mentioned below. In[14] Jaewon Oh presented a zero-skew gated clock routing technique for VLSI circuits.In which they constructed a clock-tree topology based on the locations and the activationfrequencies of the modules, while the locations of the internal nodes of the clock tree aredetermined using a dynamic programming approach. .In [11] Hans Jacobson et al. examinedthe power reduction benefits of a couple of newly invented schemes called transparentpipeline clock-gating and elastic pipeline clock-gating. In their work they have bounded thepractical limits of clock gating efficiency in future microprocessors. In [10] Jochen Preiss etal. introduced fine-grain clockgating schemes for fused multiply-add-type floating-point units(FPU). This method based on instruction type, precision and operand values. 334
  3. 3. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEMEIn[9] Donno et al. presented a methodology in which low-power clock trees are obtainedthrough aggressive exploitation of the clock-gating technology. In[8] M.Kamaraju1, Dr.K.LalKishore presented a power optimized ALU for efficient data path with clock gating techniqueand achieved a saving of 33.3% power dissipation. In[7] M.Kamaraju, Dr.K.Lal Kishorepresented a FPGA based power optimized programmable embedded controller with a powerdissipation of 15mw and with a frequency of operation of 15Mhz. In [5] Khaja MujeebuddinQuadry and Dr. Syed Abdul Sattar presented FPGA based design of low power 16 bitprocessor with a power dissipation of 25mw with operating frequency of 30.931Mhz, and asaving of 21% is achieved after applying various low power techniques. In [6] N.Sivasankarareddy presented a low power 16 bit processor with a power dissipation of 1.37mw, andsaving of 29% by using low power techniques. In [4] Samiappa Sakthikumaran1et al.proposed a 16-bit non-pipelined RISC processor with 329.3 µW power dissipation and totalarea of 65012 nm² using 90nm technologuy. In [3] Jagrit Kathuria et al. presented the reviewof existing clock gating techniques. In [2] Ali Elkateeb presented a practical introduction tosoft-core processor design through the use of step-by-step integrating of the processor’scomponents. In [1] Shmuel Wimer et al. presented a probabilistic model of the clock gatingnetwork that allows to quantify the expected power savings and the implied overhead. Theypresented expressions for the power savings in a gated clock tree and derived the optimalgater fan-out based on flip-flops toggling probabilities and process technology parameters.The resulting clock gating methodology achieves 10% savings of the total clock treeswitching power. However, the described approaches give little attention to integration issueswith existing design flows.III. PROCESSOR WITH CLOCK GATING The Design of 16 bit RISC processor is done and clock gatng is applied to the design.The processor has 24 basic instructions involving Arithmetic, Logical, data transfer,Branching, and Control instructions. The processor consist of 16 bit register set R0 to R7,PC,IR,RegY and Add_reg R. The 16 bit ALU is designed to perform the arithmetic and logicopeartions. The control unit generates the control signals according to the instruction beingexecuted. The state machine of the control unit has basically four states idle, fetch, decodeand execute.The processor has two flag bits zero and carry. Figure 2. Processor Architecture 335
  4. 4. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME The architecture of the processor is shown in Figure 2 it has two buses Bus_1 and Bus_2which are driven by Mux_1 and Mux_2 respectively. Mux_1 is 8 to 1 multiplexer. RegisterR0 to R6, and PC are input to Mux-1, 3 bit select line is used to select any of these registers.The output of the Mux_1 is driving the Bus_1.The output of Bus_1 is given as input to ALU,Bus_2,and Memory. Mux_2 is a 4 to 1 mux uses 2 bit selecet line to select ALU output,Bus_1 and memory word, the output of Mux_2 is driving the Bus_2. The data from theBus_2 is loaded in to any one of 16 bit register by using the respective load _x signal fromthe control unit. The instruction format of the processor is shown in Figure 3. The source anddestination registers are specified by the 3 bit address. The opcode is of 5 bits hence a total of32 instructions are possible. opcode Source Destinati on 15- 1 1 9 8 7 6 5 4 3 2 1 12 1 0 Figurre 3. Instruction formatThere is a provision for increasing the number of instructions and number of registers , as 4bits are left for future use. In case source or destination is a memory location then an addressof the memory location is mentioned in the second word of the the instruction. The programcounter holds address of the current instruction to be executed. The contents of the programcounter are transferred to address register through Bus-1 and Bus-2.The contents of thememory pointed by the address register are transferred to the instruction register throughBus_2 and the program counter is incremented. The instruction is decoded by the control unitand the control signals are generated by the control unit to perform the operation .Once theprocessor is designed, then verified the functionality for all the instructions. Theoptimization of the the processor for power dissipation is done by applying the clock gatingtechnique at fine grained level. In clock gating technique clock is disabled to a circuit to savepower by eliminating power dissipation on clock network by preventing unnecessary activityin logic modules. In the processor architecture, identified the units for which the clock is tobe gated and the condition for the gating is evaluated separately for each of the module. In thecase of register file only source and destination registers are used in the execution and theother registers are in idle condition hence the clock is masked with the AND gate by using anenable signal Figure 4. Clock gating at fine grained level 336
  5. 5. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEMEThe Figure 4.shows how the various modules are connected to the clock through the maskingAND gate with an enable signal. The condition for activating the enable signal for variousmodules is found out by care fully analyzing the functionality and timing diagram of theprocessor. Figure 5. Clock Enable signal for Flip FlopThe Figure 5 shows how a clock enable signal is derived for flip flop, whenever there is a nochange between previous output and present input the clock signal is masked by the clk_ensignal which is computed by perfroming the XOR opeartion between D and Q. Here the costwe are paying for saving the power is extra logic circuit overhead. The carry and zero flagregisters are implemented with this method.The group enable signal is generated for 16 bit registers as they consist of group of flipflops. The instruction based clock gating is also incorporated by carefully partitioning theCPU registers into blocks and a common clock enable signal is derived to turn off the registergroup independently[9].IV. SIMULATION RESULTS The Figure 6 shows simulation results from 485ns to 715ns, of the simulation time ,SUB instruction is executed from 485ns to 525ns (4 cycles), BRZ instruction is executedfrom 525ns to 555ns (3 cycles), ADD instruction is executed from 555ns to 595ns (4 cycles),AND instruction is executed from 595ns to 635ns (4 cycles), OR instruction is executed from635ns to 675ns (4 cycles), XOR instruction is executed from 675ns to 715ns (4 cycles), Thecontrol signals to execute the above instructions can be seen in the figure.The processor istested by executing the number of test programs from the test bench and verified thefunctionality. Figure 6. Simulation results 337
  6. 6. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME Figure 7. Net power usage Table 1. Power and Area report 16 bit Cells/ Leakage Dynamic Total Processor Area(nm²) power power (nw) Power(nw) (nw) w/o low 1123/ 347302. 170..928 347131.269 power 34378 197 With low 1141/34638 179. 588 281869.759 282049. power 348The figure 7 shows the net power usage report generated by cadence Encounter(R) RTLcompiler.The Table 1 shows the number of cells, cell area, leakage and dynamic powerdissipation of the processor with and without applying clock gating technique. We haveobserved that 23.15% power saving is achieved after the application of clock gatingtechnique.V. CONCLUSION The 16 bit processor with 90nm technology is designed, simulated, verified thefunctionality and Power optimization is done by applying the clock gating technique at finegrained level. Instruction level clock gating is done by grouping the modules according tothe instructions. The activation functions for enabling and disabling the clock for group offlip flops is evaluated care fully. The power and area are evaluated before and after theapplication of clock gating technique and is shown in Table 1, it is observed that an overallsaving of 23.15% of power is achieved. The absolute power dissipation of the designedprocessor is 282049.348 nw, compared to the 16 bit processor designed using 90nmtechnology presented in [4] it is less. The frequency of operation of the designed processor is226MHz. The leakage power dissipation is going to increase as the technology is scaleddown which can be reduced by applying power gating technique in combination of clockgating technique. 338
  7. 7. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEMEREFERENCES [1] Shmuel Wimer, Israel Koren, “The Optimal Fan-Out of Clock Network for Power Minimization by Adaptive Gating”, IEEE transactions on very large scale integration (vlsi) systems, vol. 20, no. 10, pp.1772-1780, october 2012.[2] Ali Elkateeb “A Processor Design Course Project: Creating Soft-Core MIPS Processor Using Step-by-Step Components Integration Approach“ International Journal of Information and Education Technology, Vol. 1, No. 5,pp.432-440, December 2011.[3] Jagrit Kathuria,M.Ayoub khan, Arti noor, “A review of clock gating techniques”, International Journal of Electronics and Communication Engineering, , Vol. 1, No. 2 pp.106-114,, Aug 2011. [4] Samiappa Sakthikumaran, S. Salivahanan, V. S. Kanchana Bhaaskaran, “16-Bit RISC Processor Design for Convolution Application” proceedings of IEEE-International Conference on Recent Trends in Information Technology, ICRTIT 2011 978-1-4577- 0590-8[5] Khaja Mujeebuddin Quadry, Dr. Syed Abdul Sattar, Design of 16 bit low power processor”, (IJCSIS), International Journal of Computer Science and Information Security Vol. 10, No. 6, pp.67-71 June 2012, ISSN 1947-5500[6] N.Sivasankara reddy, “minimization of power dissipation in 16 bit processor using low power techniques” Asian Journal of Applied Sciences 4(6):657-662, 2011 ISSN 1996- 3343.[7] M.Kamaraju, K.Lal Kishore, A.V.N.Tilak, “ Power Optimized ALU for Efficient Datapath”, International Journal of Computer Applications (0975 – 8887)Volume 11– No.11,pp.39-43, December 2010[8] M.Kamaraju, K.Lal Kishore, A.V.N.Tilak, “Power optimized programmable embedded Controller”, International Journal of Computer Networks & Communications (IJCNC), Vol.2, No.4,pp 97-107 July 2010 [9] Monica Donno, Enrico Macii, Luca Mazzoni“ power aware clock-tree planning” Proceedings of the 2004 international symposium on Physical design Pages 138-147New York, NY, USA ©2004 ISBN:1-58113-817-2[10] Jochen Preiss, Maarten Boersma, Silvia Melitta Mueller “Advanced Clockgating Schemes for Fused-Multiply-Add-Type Floating-Point Units” 19th IEEE International Symposium on Computer Arithmetic pp.48-56. 2009[11]Hans Jacobson Pradip Bose Zhigang HuRick Eickemeyer Lee Eisen John Griswell “Stretching the Limits of Clock-Gating Efficiency in Server-Class Processors” Proceedings of the 11th Int’l Symposium on High-Performance Computer Architecture (HPCA-11 2005) 1530-0897/05 © 2005 IEEE[12] D. Duarte, V. Narayanan, M. J. Irwin, “Impact of Technology Scaling in the Clock System Power,” IEEE Computer Society Annual Symposium on VLSI, pp. 52-57, Pittsburgh, PA, April 2002.[13] D. Duarte, V. Narayanan, M. J. Irwin, “A Clock Power Model to Evaluate Impact of Architectural and Technology Optimizations,” IEEE Transactions on VLSI Systems, Vol. 10, No. 6, pp. 844-855, December 2002.[14]Jaewon Oh and Massoud Pedram “Gated Clock Routing for Low-Power Microprocessor Design” IEEE transactions on computer-aided design of integrated circuits and systems, vol. 20, no. 6, pp 715-722, june 2001[15]T. Mudge, “Power: A First-Class Architectural Design Constraint,” IEEE Computer, Vol. 34, No. 4, pp. 52-58, April 2001. 339
  8. 8. International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN0976 – 6464(Print), ISSN 0976 – 6472(Online) Volume 3, Issue 3, October- December (2012), © IAEME[16]V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, F. Baez, “Reducing Power in High- Performance Microprocessors,” DAC-35: ACM/IEEE Design Automation Conference, pp. 732-737, San Francisco, CA, June 1998.[17] Raj Kumar Tiwari and Santosh Kumar Agrahari, “Low Power Arm Processor Based Embedded System” International journal of Electronics and Communication Engineering &Technology (IJECET), Volume3, Issue2, 2012, pp. 369 - 374, Published by IAEME[18] B.K.V.Prasad, P.Satishkumar, B.Stephencharles, T.Prasad, “Low Power Design Of Wallance Tree Multiplier” International journal of Electronics and Communication Engineering &Technology (IJECET), Volume3, Issue3, 2012, pp. 258 - 264, Published by IAEME[19] P.Sreenivasulu, Krishnna veni ,Dr. K.Srinivasa Rao and Dr.A.VinayaBabu, “Low Power Design Techniques Of Cmos Digital Circuits” International journal of Electronics and Communication Engineering &Technology (IJECET), Volume3, Issue2, 2012, pp. 199 - 208, Published by IAEME AUTHORS PROFILEKhaja Mujeebuddin Quadry (Member IEEE), Presently working as Associate Professor& Head Of Department ECE, RITS, Chevella, Hyderabad, A.P., India. He has obtainedDiploma in Electronics and communication Engineering from state board of TechnicalEducation and Training, A.P India in 1993, BE Degree in Electroics and CommunicationEngineering from Osmania University in 1997, ME Degree in VLSI & Embedded SystemDesign from Osmania University in 2007. Presently he is Research scholar of JNTUA,Anantapur, A.P., India. He is a Life member of Institution of Electronics andTelecommunication Engineers (IETE) India. He has 6 years of Industrial experience and 8years of Teaching ExperienceDr.Syed Abdul Sattar, is presently working as a Dean of Academics & Professor of ECEdepartment, RITS, Chevella, Hyderabad. He has completed his B.E. in ECE in 1990 fromMarathwada University Aurangabad, M. Tech. in DSCE from JNTU Hyderabad, in 2002, anddid his first Ph.D. in Computer Science from Golden State University USA, in 2004, andsecond Ph.D. in ECE from JNTU Hyderabad, A. P. India in 2007. He is a fellow member ofInstitution of Electronics and Telecommunication Engineers India, and Life member ofIndian society for Technical Education. His area of specialization is wirelesscommunications and image Processing. He has about 21years of experience in teaching andindustry together and recipient of national award as an Engineering Scientist of the year 2006by NESA New Delhi, India. He has about 73 publications in International and NationalJournals and conferences.Presently he is guiding research scholars in ECE and ComputerScience from different Universities. He is a member of Board of studies for a centraluniversity and reviewer/editorial member/chief editor for national and International journals.Dr. K. Soundara Rajan, obtained his Master’s degree and Ph.D. from IIT, Roorkee. Hehas more than 30 years of teaching experience and 12 years of Research experience. He hasguided 10 Phd scholars suceesfully , presently 10 PhD scholars are under his guidance.He haspublished more than 79 Publications at National and International level. He is Member of anInternational Research Journal of Higher Education, Life member for ISTE, RegionalCoordinator for NAFEN (National Foundation of Indian Engineers, New Delhi.). He isformer Principal and Rector of JNTUA , Presently he is OSD to Vice Chancellor at JNTUA,Anantapur, A.P., India. 340