Synchronous Design of 8259 Programmable Interrupt controller


Published on

For more projects contact us @ @ hyderabad and nagpur

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Synchronous Design of 8259 Programmable Interrupt controller

  1. 1. Synchronous Design of 8259 Programmable Interrupt Controller Sia, Chee Yap Universiti Sains Malaysia (USM) Intel Corporation Penang, Malaysia Bakhtiar Affendi Rosdi Universiti Sains Malaysia (USM) Penang, Malaysia Lee, Ming Chew Intel Corporation Penang, Malaysia Abstract— This paper presents the design of a synchronous 8259 Programmable Interrupt Controller (PIC) that is functionally compatible with the existing asynchronous design of 8259 PIC. The main objective is to reduce the design review efforts along the process technology migration. It also serves as the solutions for the disadvantages and potential hazards inherited in the asynchronous 8259 PIC such as timing loops, race conditions, undetectable signal pulse width and glitches. It is a clock gated synchronous design with only flip flops as the memory elements in it. This synchronous design is implemented using a standard- cell based 32 nm CMOS process. Pre-layout simulation results demonstrate an equivalent interrupt handling mechanism with approximate increase of 0.3% in total gate count and 12.2% in area correspondingly compared to the existing design. Although there is increment of 4.7uW in total dynamic power consumption, but the range of ‘uW’ is acceptable. It can be explained by the higher switching activity of the gated clock signal in the synchronous 8259 PIC compared to the handshaking signal in the asynchronous counterpart. Keywords- Programmable Interrupt Controller; asynchronous design; synchronous design; process migration, hazards and races I. INTRODUCTION The interrupt controller is a common device found in computer systems, which deals with interrupts generated by the peripherals and the processors, handles the interrupt priorities, and delegates the execution to a processor [1]. An interrupt request is an asynchronous signal that is typically triggered by an I/O device which needs to be serviced [2]. The 8259 PIC is designed to minimize the software and real time overhead in handling multi-level priority interrupts [3]. In today's Intel architecture, the I/O Controller Hub (ICH) incorporates the functionality of two cascaded 8259 controllers that provide system interrupts for the Industry Standard Architecture (ISA) compatible interrupts as shown in Figure 1[4]. Generally, the sequence of events that occur with the 8259 PIC during an interrupt request and service started with one or more of the Interrupt Request (IRQ) lines are raised high in edge mode, or seen high in level mode. Then, the 8259 PIC sends interrupt (INTR) signal to the Central Processing Unit (CPU) if an asserted interrupt is not masked. The CPU acknowledges and responds with an interrupt acknowledge (INTA_B) cycle. Upon receiving the INTA_B pulse, the highest priority Interrupt Service Register (ISR) bit is set and the corresponding Interrupt Request Register (IRR) bit is reset. Then, the PIC returns the interrupt vector, thus completing the interrupt cycle [3, 5]. If the Automatic End of Interrupt (AEOI) mode is used, the bit set earlier in the ISR will be reset. Otherwise, the interrupt controller will wait for an appropriate End of Interrupt (EOI) command at the end of the interrupt service routine [5]. Figure 1: 8259 PICs located in ICH Currently, the existing asynchronous 8259 PIC’s circuits are designed with internal generated clock. From time to time, the dynamic state of this asynchronous circuit needs to be reviewed with large efforts in order to ensure its correctness of operations [14, 16]. This can be explained by a few examples such as the large number of sequential cells that require clock balancing need to be identified, clock versus data paths and clock versus reset paths which originates from a common driver need to be reviewed, and modeled netlist buffer delay is consistent with the buffer cells instantiated in Register Transfer Level (RTL). Furthermore, this circuit might fail in operations due to some potential hazards such as race conditions, timing loops and glitches which are the characteristics of the asynchronous design [12]. However, there is a lack of research conducted on providing a robust solution to these problems as a whole due to the legacy nature of the design. In view of the disadvantages and potential hazards that are associated with current 8259 PIC, this paper proposes to implement a synchronous design of 8259 PIC using a standard-cell based approach with 32 nm CMOS process as the solution to the problem statement. The synchronous 8259 PIC has been benchmarked against the asynchronous 2011 International Conference on Computer Applications and Industrial Electronics (ICCAIE 2011) 978-1-4577-2059-8/11/$26.00 ©2011 IEEE 195
  2. 2. counterpart implemented using the same process and standard- cell library. The remaining part of the paper is organized as follows. Section 2 reviews the previous architecture of 8259 PIC. Section 3 introduces the synchronous design of 8259 PIC based on the same specifications of existing design. Section 4 discusses the simulation results of the synchronous 8259 PIC and its comparison against the asynchronous counterpart. At last, Section 5 concludes the paper and proposes future works. II. ASYNCHRONOUS ARCHITECTURE OF 8259 PIC Figure 2 depicts the top level block diagram of an existing 8259 PIC, partitioned into 2 major functional blocks, i.e. Priority Resolver (PR) and T-unit Storage Unit (TSU). The PR block mainly acts as a priority arbiter that accepts 8 interrupt request inputs to determine the order of interrupt servicing. The order of servicing priority is based on the last serviced interrupt or determined through software programmed with the option to select the next to be service interrupt request input. On the other hand, TSU block functions both as the register storage unit that outputs to the PR block and as the 8259 PIC operating mode sequencer. The final interrupt output, INTR is generated from the TSU block to the logical Advanced PIC (APIC) residing in the CPU and then pending for INTA_B. Figure 2: Top level view of legacy 8259 PIC Figure 3 breaks down the PR block into eight priority cells (PRCEL), each map to one interrupt request input, where the interrupt is either sampled as level or edge triggered. Each cells contain four priority select bits, i.e. pPrSelAO and pPrSelAO_b for IRR status, pPrSelBO, and pPrSelBO_b for ISR status to determine the priority of PRCEL’s request. The priority select bits also keep track of the interrupt priority rotation’s bits Figure 3: PRCELs in asynchronous PR block The priority select signals for IRR/ISR in the priority cell blocks are connected serially from interrupt request input 0 (IRQ0) to interrupt request input 7 (IRQ7), and the output of the last priority cell block fed into the first priority cell block forming a timing loop without encountering any sequential device in the path [7]. Timing loop can cause endless computation loops in many design tools. One of the examples is shown in Figure 4. In static timing analysis (STA), each of them has to be broken (virtually) which means one of the segments forming the loop should be timing check disabled. There is a risk of timing failure for that particular path which is not timed. Hence, the challenge is to choose the segment very carefully. Figure 4: No sequential element in the timing loop In each PRCEL, the signal of pISRSet is generated by pPrInfDin signal from the feedback loop together with the input freeze signal, sFrzPro from TSU block as shown in Figure 5. There is a risk whereby sFrzPro may be delayed; pPrInfDin may transition earlier across process variations, supply voltage variations, operating temperature variations (PVT) and different silicon process. If the overlap between the two signals shrinks, then the pulse width required to reset the IRR/ISR will fail to be detectable by the latch, and subsequent interrupt servicing will not be able to proceed. Figure 6 illustrates that the width of this intended pulse is around 100ps which might not meet the minimum pulse width that is required by the next sequential cell’s input [18]. If it is not detected, IRR will stay high and the subsequent incoming interrupt requests will not be serviced. 196
  3. 3. Figure 5: Generation of pISRSet signal Figure 6: The pulse width of pISRSet signal The block diagrams under the hierarchy of TSU and their corresponding functions are given in Figure 7 and TABLE 1 respectively. The generation of sFrez_b signal is depicted in Figure 8 where there are 2 different delay buffers, i.e. 4ns and 5ns. The difference of 1ns between these delay buffers is the duration of the unintended glitch in sFrez_b signal as shown in Figure 9. It causes the IRR into the state of re-sampling edge- triggered interrupt requests along the period of first and second INTA_B as shown in Figure 10. This will give an undesired interrupt vector if there is another higher priority interrupt request comes in during the rise of the glitch. Figure 7: TSU block under asynchronous architecture TABLE 1: FUNCTIONS OF EACH BLOCK IN TSU UNDER ASYNCHRONOUS ARCHITECTURE Block Descriptions SCWSM To initialize the 8259 PIC into standby mode and pending for interrupt requests SAKSM To differentiate the INTA_B into two pulses and then feed them to block SCTL since the interrupt service procedure is based on the INTA_B signal SCAS Slaves address comparator to decide whether the interrupt requests are from slave 82659 PIC SREG This unit includes all the storage units for the 8259 PIC. It Instantiates the flip flops which needed to store the data acquired from the data bus SCTL Control units that sends the enable signals to other blocks in TSU SRDMX Multiplexer which has the IMR bits; IRR bits, ISR bits, and interrupt vector address bits as its input. Figure 8: Generation of sFrez_b signal Figure 9: Unintended glitch of 1ns in sFrez_b signal Figure 10: Risk of unintended glitch during IRR sampling Generally, asynchronous circuits are affected by critical races or hazards due to the internal gate delays [8]. In CMOS technologies, internal delay values are depended on the load capacitances which are unable to predict precisely. Hence, it is 197
  4. 4. impossible to fix a precise delay value to prevent critical races or hazards [9]. In the existing SREG block, there are a few paths that might cause race conditions due to the common driver of the clock and the enable signals of a certain sequential elements. One of these paths can be seen from Figure 11. It clearly shows that the latch might capture the undesired data if the delay buffer inserted does not provide sufficient setup time. Figure 11: The clock input of latch and the enable signal generated from the common driver To solve these problems and hazards that associated with the asynchronous design of 8259 PIC, the re-architecture of 8259 PIC into synchronous design is adopted whereby the clock source for sequential elements is originated from the PCI clock domain; thus replacing the internal generated clock from the write signal, WR_B. III. SYNCHRONOUS ARCHITECTURE OF 8259 PIC Synchronous design is the most common methodology used to design and develop large, complex digital systems. Most EDA tools are based on this model and thus facilitate the design automation [11]. Synchronization ensures that operations occur in logically correct order, and is a critical factor in ensuring the reliable system operation [13]. In order to maintain the compatibility of the asynchronous legacy 8259 PIC, the top level block diagram as Figure 2 is unchanged in terms of the input and output pins that interact with the external units. In synchronous design, the blocks under hierarchy of PR are partitioned based on their functionalities as shown in Figure 12. Descriptions of each block as Figure 12 are listed in TABLE 12. By comparing to the asynchronous design, there is no timing loop reported in this design, thus a simple approach for STA can be adopted since it is unnecessary to set the break points and disable the timing checks. Figure 12: Synchronous PR block is partitioned based on functions TABLE 2: FUNCTIONS OF EACH BLOCK IN PR UNDER SYNCHRONOUS ARCHITECTURE Block Descriptions CU State machine as a main control unit to send the enable signals depends on the interrupt acknowledge (INTA_B) signal. It's a part of the clock gating strategy with the enable signals to particular blocks in one state. IRR Register to store the interrupt request status and produce masked interrupt status. ISR Register to store the current interrupt service status, indicating which IRQs is currently in service and those nested ISR bit pending for clearing. Priority Resolver Combinational logic to resolve the interrupt priorities based on the mode of operation such as Special Mask Mode (SMM) and Specially Fully Nested Mode (SFNM). PRSET Register to indicate the lowest priority out of the 8 IRQs if there is priority rotation or rotational EOI. IRQ7 is the lowest priority by default. EOI Register to indicate type of EOI for clearing the ISR status bit. INTR State machine to send the INTR signal based on the masked IRR status and the priority level of IRQs. Under synchronous technique, the signal of pISRSet is generated by the Control Unit (CU) state machine instead of generated by the closed feedback loop. Therefore, the signal’s pulse width is no longer influenced by the change in PVT but the clock period of the state machine. In accordance to that, the pulse width of this signal can be guaranteed to be one PCI clock cycle as depicted in Figure 13 and thus meeting the minimum pulse width that required by the next cell’s input. Figure 13: State machine guarantees that the intended signal last for a clock period The block diagrams under the hierarchy of synchronous TSU and their corresponding functions are almost the same as the asynchronous legacy unit except that the block of Interrupt Acknowledge State Machine (SAKSM) is removed since the CU block under hierarchy of synchronous PR is able to produce the necessary control signals including the second 198
  5. 5. INTA_B signal. By using the synchronous state machine of CU block, the sFrez_b is replaced by IRR_en signal in synchronous design as depicted in Figure 14 and the glitch hazard is eliminated. Hence, the IRR status is frozen during the whole first and second INTA_B interval. Figure 14: Signal of IRR_en shows no glitch for IRR to re- sample during INTA cycle Compared to asynchronous design in Figure 11, those potential race conditions paths in register storage unit (SREG) block are solved which the clock input for flip flops is the enabled PCI clock as shown in Figure 15. Instead of using latches, the synchronous architecture replaces them with D- flip flops to ease the timing analysis in later stage. Flip flops have sampling property advantage where the variations and glitches between two rising edges have no effect on the content of the memory [11]. Furthermore, there is no need to insert any delay buffers in the data path as shown in Figure 11. Figure 15: Synchronous design eliminates race conditions According to the formula of dynamic power: Pdyn = αfCVDD 2 [15, 19] (1) where α is called switching activity and represent the probability of transitions per clock cycle, f is the clock frequency, C is the capacitance at the node and VDD is the power supply voltage. So, the term ‘αf’ represents the transition frequency at the node. Clock signal that contributes to the dynamic power can reach up to 20%-50% of total system power [19]. This is due to the switching activity is equal to one which is the highest of the system, in fact, in a period the clocked node makes a fully transition cycle (i.e., 0->1->0); and the total node capacitance is high due to the large number of clocked nodes [15]. Hence, reduction of clock signal power consumption can hugely save on the system power consumption. Refer to Figure 15, the gated clock approach is a power saving strategy implemented to avoid burning energy whenever the flip-flops’ output does not change. However, in order to prevent glitches on the clock network, for each enable signals we are introducing a latch, which certainly contributes to clock energy consumption. IV. RESULTS AND DISCUSSIONS In the beginning of synchronous 8259 PIC design flow, the legacy 8259 PIC’s asynchronous architecture is reviewed and coded in Verilog. After the pre-layout simulation and validation, the synthesis is done by Design Compiler (DC) which the 8259 PIC’s hardware is mapped to 32nm CMOS process under Intel standard-cell library. The synchronous approach greatly reduces the efforts of engineers in maintaining this legacy unit across the improvement on process technology as they have to ensure the timing correctness in its operations by Gate Level Simulation (GLS) model for the asynchronous 8259 PIC. Besides, for 100% coverage of functional correctness, 2n number of test benches need to be run where ‘n’ is the number of inputs to the asynchronous design. As for synchronous design of 8259 PIC, the correctness in terms of timing and functionalities can be done by applying the timing constraints during the stage of STA. Delay elements are required between sequential functions to compensate for races and skew which is part of characteristics of this asynchronous design [12]. In addition to that, the buffer delay stages for the same amount of delay period in this legacy unit keep increasing along the shrinking size of process technology and thus might accumulate the margin errors that inherited by each buffer. The variations in component delays (e.g., due to statistical variations in operating and manufacturing conditions) affect the performance and correctness of these circuits. On the other hand, the synchronous design only has to make sure the clock period is large enough to accommodate the worst critical path delay including clock skew and all process variations without the need to manually insert the delay buffers [14, 16]. Another advantage of synchronous design is design reusability. The main timing constraint of the synchronous design is embedded in the period of the clock signal which depends mainly on the propagation delay of the combination part. As long as the clock period is large enough, the same design can be implemented by different device technologies [11]. In the view of operating frequency wise, the synchronous 8259 PIC can operate at the frequency of 500MHz without negative setup and hold slack where the maximum frequency of the clock in ICH is up to 470MHz. This may enable the synchronous design to become modular so that it can be plugged into different bus systems in ICH. The pre-layout simulation which considers the edge- triggered IRQ4 without EOI as in Figure 16 and Figure 17 proved that both the synchronous and asynchronous design of 8259 PICs have the same interrupt handling operations and protocols. The only difference is that the behavior of synchronous 8259 PIC is based on the positive edge of PCI clock signal and the behavior of asynchronous one is based on the positive edge of handshaking signals from combinational logics. 199
  6. 6. Figure 16: Overall behavior of asynchronous 8259 PIC in handling an edge-triggered interrupt Figure 17: Overall behavior of synchronous 8259 PIC in handling an edge-triggered interrupt TABLE 13 shows the comparison of gate count, area, and total dynamic power between both asynchronous and synchronous designs of 8259 PIC. With the switching activity file generated by the same testbench, the power consumption of synchronous design is relatively higher but it is in the acceptable range of “uW” [6]. This is because of the clock driver has to constantly provide a powerful clock that reaches all parts of the circuit although clock gating can avoid the sending of the clock signal to the inactive blocks. The slight increase in area is affected by the increasing number of flip flops where its circuit size is about twice as large as that of a D latch. In addition to that, latches have less input capacitance and consume less switching power than comparable flip-flops, and their use can lead to substantial savings in power [17]. TABLE 3: COMPARISONS BETWEEN ASYNCHRONOUS AND SYNCHRONOUS 8259 PICS Parameters Asynchronous 8259 PIC Synchronous 8259 PIC Gate Count Combinational elements 549 520 Sequential elements 88 119 Total Area 980 um2 1100 um2 Total Dynamic Power 0.0020mW 0.0067mW V. CONCLUSION The synchronous design of 8259 PIC is presented and benchmarked with the corresponding asynchronous design. Simulation results show that the functionalities are same as the legacy unit together with the solutions for the problems and hazards; meanwhile it is aimed to be a modular and reusable intellectual property (IP). As for future works, the dynamic power consumption of synchronous design can be further reduced by applying a clock gate control block on the iPCICLK before it goes to 8259 PIC. ACKNOWLEDGMENT The authors would like to thank Intel Malaysia for providing the benchmark asynchronous design and the tools that are used in the re-architecture process. REFERENCES [1] A. Tumeo, M. Branca, L. Camerini, M. Monchiero, G. Palermo, F. Ferrandi, and D. Sciuto, “An interrupt controller for FPGA-based multiprocessors,” in International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, pp. 82–87, Samos, Greece, 2007. [2] Intel Architecture Software Developer’s Manual Volume 1: Basic Architecture, 1999. [3] Intel 8259A Programmble Interrupt Controller (8259A/8259A-2), 1988. [4] Intel® I/O Controller Hub 10 (ICH10) Family Datasheet, October 2008. [5] 82C59A Priority Interrupt Controller: Application Note, April 1999. [6] Y. Shi, B. H. Gwee, J. Chang, “Asynchronous DSP for low-power energy-efficient embedded systems,” Microprocessors and Microsystems, vol. 35, pp. 318–328, 2011. [7] S. Churiwala, C. Kumar, S. Verma from Atrenta (I) Pvt. Ltd, “Exploring the types of combinational loops,” in EETimes Asia, March 2010. [8] E. Vittoz, C. Piguet, and W. Hammer, “Model of the logic gate,” in Proc. J . d’Electronique EPF-L, Lausanne, 1977, pp. 455-467. [9] C. Piguet, “Logic Synthesis of Race-Free Asynchronous CMOS Circuits,” IEEE Journal of Solid-state Circuits, vol. 26, no. 3, March 1991, pp. 371. [10] S. H. Unger, “Hazards, critical races, and metastability,” IEEE Trans. Comput., vol. 44, pp.754 - 768, 1995. [11] P. P. Chu, “RTL Hardware Design Using VHDL: Coding for Efficiency, Portability and Scalability,” A JohnWiley Sons, Inc., Hoboken, NJ, 2006. [12] P. Forshaw, R. Hahn, “Synchronous design: The right technique for digital ASICs,” in Proc. The Third Annual IEEE ASIC Seminar and Exhibit, pp. P6-l.l-P6-1.5, Rochester, New York, Sept. 1990. [13] D. G. Messerschmitt, “Synchronization in digital system design,” IEEE J. Select. Areas Commun., vol. 8, pp.1404 - 1419 , 1990. [14] P. A. Beerel, R. O. Ozdag, M. Ferretti, “A Designer’s Guide to Asynchronous VLSI,” in Cambridge University Press. [15] G. Palumbo, F. Pappalardo, and S. Sannella, “Evaluation on power reduction applying gated clock approaches,” in Proc. IEEE Int. Symp. Circuits and Systems, vol. 4, pp.85 - 88, 2002. [16] S. Hauck, “Asynchronous design methodologies: An overview,” in Proc. IEEE, vol. 83, no. 1, pp.69 - 93, 1995. [17] K. van Berkel, R. Burgess, J. Kessels, A. Peeters, M. Roncken, F. Schalij, R. van de Wiel, “A single-rail re-implementation of a DCC error detector using a generic standard-cell library,” in IEEE Computer Society Press, Asynchronous Design Methodologies, pp.72 - 79 , 1995. [18] J. Bhasker, R. Chadha, “Static Timing Analysis for Nanometer Designs: A Practical Approach,” Springer, Springer 1st Edition, April 2009. [19] A. H. Farrahi, C. Chen, A. Srivastava, G. Tellez, M. Sarrafzadeh, "Activity-driven clock design", IEEE Trans. Comput.-Aided Des., vol. 20, pp.705 , 2001. 200