counterpart implemented using the same process and standard-
The remaining part of the paper is organized as follows.
Section 2 reviews the previous architecture of 8259 PIC.
Section 3 introduces the synchronous design of 8259 PIC
based on the same specifications of existing design. Section 4
discusses the simulation results of the synchronous 8259 PIC
and its comparison against the asynchronous counterpart. At
last, Section 5 concludes the paper and proposes future works.
II. ASYNCHRONOUS ARCHITECTURE OF 8259 PIC
Figure 2 depicts the top level block diagram of an existing
8259 PIC, partitioned into 2 major functional blocks, i.e.
Priority Resolver (PR) and T-unit Storage Unit (TSU). The PR
block mainly acts as a priority arbiter that accepts 8 interrupt
request inputs to determine the order of interrupt servicing.
The order of servicing priority is based on the last serviced
interrupt or determined through software programmed with the
option to select the next to be service interrupt request input.
On the other hand, TSU block functions both as the register
storage unit that outputs to the PR block and as the 8259 PIC
operating mode sequencer. The final interrupt output, INTR is
generated from the TSU block to the logical Advanced PIC
(APIC) residing in the CPU and then pending for INTA_B.
Figure 2: Top level view of legacy 8259 PIC
Figure 3 breaks down the PR block into eight priority cells
(PRCEL), each map to one interrupt request input, where the
interrupt is either sampled as level or edge triggered. Each
cells contain four priority select bits, i.e. pPrSelAO and
pPrSelAO_b for IRR status, pPrSelBO, and pPrSelBO_b for
ISR status to determine the priority of PRCEL’s request. The
priority select bits also keep track of the interrupt priority
Figure 3: PRCELs in asynchronous PR block
The priority select signals for IRR/ISR in the priority cell
blocks are connected serially from interrupt request input 0
(IRQ0) to interrupt request input 7 (IRQ7), and the output of
the last priority cell block fed into the first priority cell block
forming a timing loop without encountering any sequential
device in the path . Timing loop can cause endless
computation loops in many design tools. One of the examples
is shown in Figure 4. In static timing analysis (STA), each of
them has to be broken (virtually) which means one of the
segments forming the loop should be timing check disabled.
There is a risk of timing failure for that particular path which
is not timed. Hence, the challenge is to choose the segment
Figure 4: No sequential element in the timing loop
In each PRCEL, the signal of pISRSet is generated by
pPrInfDin signal from the feedback loop together with the
input freeze signal, sFrzPro from TSU block as shown in
Figure 5. There is a risk whereby sFrzPro may be delayed;
pPrInfDin may transition earlier across process variations,
supply voltage variations, operating temperature variations
(PVT) and different silicon process. If the overlap between the
two signals shrinks, then the pulse width required to reset the
IRR/ISR will fail to be detectable by the latch, and subsequent
interrupt servicing will not be able to proceed. Figure 6
illustrates that the width of this intended pulse is around 100ps
which might not meet the minimum pulse width that is
required by the next sequential cell’s input . If it is not
detected, IRR will stay high and the subsequent incoming
interrupt requests will not be serviced.
Figure 5: Generation of pISRSet signal
Figure 6: The pulse width of pISRSet signal
The block diagrams under the hierarchy of TSU and their
corresponding functions are given in Figure 7 and TABLE 1
respectively. The generation of sFrez_b signal is depicted in
Figure 8 where there are 2 different delay buffers, i.e. 4ns and
5ns. The difference of 1ns between these delay buffers is the
duration of the unintended glitch in sFrez_b signal as shown in
Figure 9. It causes the IRR into the state of re-sampling edge-
triggered interrupt requests along the period of first and
second INTA_B as shown in Figure 10. This will give an
undesired interrupt vector if there is another higher priority
interrupt request comes in during the rise of the glitch.
Figure 7: TSU block under asynchronous architecture
TABLE 1: FUNCTIONS OF EACH BLOCK IN TSU UNDER
SCWSM To initialize the 8259 PIC into standby mode and pending for
SAKSM To differentiate the INTA_B into two pulses and then feed
them to block SCTL since the interrupt service procedure is
based on the INTA_B signal
SCAS Slaves address comparator to decide whether the interrupt
requests are from slave 82659 PIC
SREG This unit includes all the storage units for the 8259 PIC. It
Instantiates the flip flops which needed to store the data
acquired from the data bus
SCTL Control units that sends the enable signals to other blocks in
SRDMX Multiplexer which has the IMR bits; IRR bits, ISR bits, and
interrupt vector address bits as its input.
Figure 8: Generation of sFrez_b signal
Figure 9: Unintended glitch of 1ns in sFrez_b signal
Figure 10: Risk of unintended glitch during IRR sampling
Generally, asynchronous circuits are affected by critical
races or hazards due to the internal gate delays . In CMOS
technologies, internal delay values are depended on the load
capacitances which are unable to predict precisely. Hence, it is
impossible to fix a precise delay value to prevent critical races
or hazards . In the existing SREG block, there are a few
paths that might cause race conditions due to the common
driver of the clock and the enable signals of a certain
sequential elements. One of these paths can be seen from
Figure 11. It clearly shows that the latch might capture the
undesired data if the delay buffer inserted does not provide
sufficient setup time.
Figure 11: The clock input of latch and the enable signal
generated from the common driver
To solve these problems and hazards that associated with
the asynchronous design of 8259 PIC, the re-architecture of
8259 PIC into synchronous design is adopted whereby the
clock source for sequential elements is originated from the PCI
clock domain; thus replacing the internal generated clock from
the write signal, WR_B.
III. SYNCHRONOUS ARCHITECTURE OF 8259 PIC
Synchronous design is the most common methodology
used to design and develop large, complex digital systems.
Most EDA tools are based on this model and thus facilitate the
design automation . Synchronization ensures that
operations occur in logically correct order, and is a critical
factor in ensuring the reliable system operation .
In order to maintain the compatibility of the asynchronous
legacy 8259 PIC, the top level block diagram as Figure 2 is
unchanged in terms of the input and output pins that interact
with the external units. In synchronous design, the blocks
under hierarchy of PR are partitioned based on their
functionalities as shown in Figure 12. Descriptions of each
block as Figure 12 are listed in TABLE 12. By comparing to
the asynchronous design, there is no timing loop reported in
this design, thus a simple approach for STA can be adopted
since it is unnecessary to set the break points and disable the
Figure 12: Synchronous PR block is partitioned based on functions
TABLE 2: FUNCTIONS OF EACH BLOCK IN PR UNDER
CU State machine as a main control unit to send the enable
signals depends on the interrupt acknowledge (INTA_B)
signal. It's a part of the clock gating strategy with the
enable signals to particular blocks in one state.
IRR Register to store the interrupt request status and produce
masked interrupt status.
ISR Register to store the current interrupt service status,
indicating which IRQs is currently in service and those
nested ISR bit pending for clearing.
Priority Resolver Combinational logic to resolve the interrupt priorities
based on the mode of operation such as Special Mask
Mode (SMM) and Specially Fully Nested Mode
PRSET Register to indicate the lowest priority out of the 8 IRQs
if there is priority rotation or rotational EOI. IRQ7 is the
lowest priority by default.
EOI Register to indicate type of EOI for clearing the ISR
INTR State machine to send the INTR signal based on the
masked IRR status and the priority level of IRQs.
Under synchronous technique, the signal of pISRSet is
generated by the Control Unit (CU) state machine instead of
generated by the closed feedback loop. Therefore, the signal’s
pulse width is no longer influenced by the change in PVT but
the clock period of the state machine. In accordance to that,
the pulse width of this signal can be guaranteed to be one PCI
clock cycle as depicted in Figure 13 and thus meeting the
minimum pulse width that required by the next cell’s input.
Figure 13: State machine guarantees that the intended
signal last for a clock period
The block diagrams under the hierarchy of synchronous
TSU and their corresponding functions are almost the same as
the asynchronous legacy unit except that the block of Interrupt
Acknowledge State Machine (SAKSM) is removed since the
CU block under hierarchy of synchronous PR is able to
produce the necessary control signals including the second
INTA_B signal. By using the synchronous state machine of
CU block, the sFrez_b is replaced by IRR_en signal in
synchronous design as depicted in Figure 14 and the glitch
hazard is eliminated. Hence, the IRR status is frozen during
the whole first and second INTA_B interval.
Figure 14: Signal of IRR_en shows no glitch for IRR to re-
sample during INTA cycle
Compared to asynchronous design in Figure 11, those
potential race conditions paths in register storage unit (SREG)
block are solved which the clock input for flip flops is the
enabled PCI clock as shown in Figure 15. Instead of using
latches, the synchronous architecture replaces them with D-
flip flops to ease the timing analysis in later stage. Flip flops
have sampling property advantage where the variations and
glitches between two rising edges have no effect on the
content of the memory . Furthermore, there is no need to
insert any delay buffers in the data path as shown in Figure 11.
Figure 15: Synchronous design eliminates race conditions
According to the formula of dynamic power:
Pdyn = αfCVDD
[15, 19] (1)
where α is called switching activity and represent the
probability of transitions per clock cycle, f is the clock
frequency, C is the capacitance at the node and VDD is the
power supply voltage. So, the term ‘αf’ represents the
transition frequency at the node. Clock signal that contributes
to the dynamic power can reach up to 20%-50% of total system
power . This is due to the switching activity is equal to one
which is the highest of the system, in fact, in a period the
clocked node makes a fully transition cycle (i.e., 0->1->0); and
the total node capacitance is high due to the large number of
clocked nodes . Hence, reduction of clock signal power
consumption can hugely save on the system power
consumption. Refer to Figure 15, the gated clock approach is a
power saving strategy implemented to avoid burning energy
whenever the flip-flops’ output does not change. However, in
order to prevent glitches on the clock network, for each enable
signals we are introducing a latch, which certainly contributes
to clock energy consumption.
IV. RESULTS AND DISCUSSIONS
In the beginning of synchronous 8259 PIC design flow, the
legacy 8259 PIC’s asynchronous architecture is reviewed and
coded in Verilog. After the pre-layout simulation and
validation, the synthesis is done by Design Compiler (DC)
which the 8259 PIC’s hardware is mapped to 32nm CMOS
process under Intel standard-cell library.
The synchronous approach greatly reduces the efforts of
engineers in maintaining this legacy unit across the
improvement on process technology as they have to ensure the
timing correctness in its operations by Gate Level Simulation
(GLS) model for the asynchronous 8259 PIC. Besides, for
100% coverage of functional correctness, 2n
number of test
benches need to be run where ‘n’ is the number of inputs to
the asynchronous design. As for synchronous design of 8259
PIC, the correctness in terms of timing and functionalities can
be done by applying the timing constraints during the stage of
Delay elements are required between sequential functions
to compensate for races and skew which is part of
characteristics of this asynchronous design . In addition to
that, the buffer delay stages for the same amount of delay
period in this legacy unit keep increasing along the shrinking
size of process technology and thus might accumulate the
margin errors that inherited by each buffer. The variations in
component delays (e.g., due to statistical variations in
operating and manufacturing conditions) affect the
performance and correctness of these circuits. On the other
hand, the synchronous design only has to make sure the clock
period is large enough to accommodate the worst critical path
delay including clock skew and all process variations without
the need to manually insert the delay buffers [14, 16].
Another advantage of synchronous design is design
reusability. The main timing constraint of the synchronous
design is embedded in the period of the clock signal which
depends mainly on the propagation delay of the combination
part. As long as the clock period is large enough, the same
design can be implemented by different device technologies
. In the view of operating frequency wise, the synchronous
8259 PIC can operate at the frequency of 500MHz without
negative setup and hold slack where the maximum frequency
of the clock in ICH is up to 470MHz. This may enable the
synchronous design to become modular so that it can be
plugged into different bus systems in ICH.
The pre-layout simulation which considers the edge-
triggered IRQ4 without EOI as in Figure 16 and Figure 17
proved that both the synchronous and asynchronous design of
8259 PICs have the same interrupt handling operations and
protocols. The only difference is that the behavior of
synchronous 8259 PIC is based on the positive edge of PCI
clock signal and the behavior of asynchronous one is based on
the positive edge of handshaking signals from combinational
Figure 16: Overall behavior of asynchronous 8259 PIC in
handling an edge-triggered interrupt
Figure 17: Overall behavior of synchronous 8259 PIC in
handling an edge-triggered interrupt
TABLE 13 shows the comparison of gate count, area, and
total dynamic power between both asynchronous and
synchronous designs of 8259 PIC. With the switching activity
file generated by the same testbench, the power consumption
of synchronous design is relatively higher but it is in the
acceptable range of “uW” . This is because of the clock
driver has to constantly provide a powerful clock that reaches
all parts of the circuit although clock gating can avoid the
sending of the clock signal to the inactive blocks. The slight
increase in area is affected by the increasing number of flip
flops where its circuit size is about twice as large as that of a
D latch. In addition to that, latches have less input capacitance
and consume less switching power than comparable flip-flops,
and their use can lead to substantial savings in power .
TABLE 3: COMPARISONS BETWEEN ASYNCHRONOUS AND
SYNCHRONOUS 8259 PICS
Parameters Asynchronous 8259
Sequential elements 88 119
Total Area 980 um2
Total Dynamic Power 0.0020mW 0.0067mW
The synchronous design of 8259 PIC is presented and
benchmarked with the corresponding asynchronous design.
Simulation results show that the functionalities are same as the
legacy unit together with the solutions for the problems and
hazards; meanwhile it is aimed to be a modular and reusable
intellectual property (IP).
As for future works, the dynamic power consumption of
synchronous design can be further reduced by applying a
clock gate control block on the iPCICLK before it goes to
The authors would like to thank Intel Malaysia for
providing the benchmark asynchronous design and the tools
that are used in the re-architecture process.
 A. Tumeo, M. Branca, L. Camerini, M. Monchiero, G. Palermo, F.
Ferrandi, and D. Sciuto, “An interrupt controller for FPGA-based
multiprocessors,” in International Conference on Embedded Computer
Systems: Architectures, Modeling and Simulation, pp. 82–87, Samos,
 Intel Architecture Software Developer’s Manual Volume 1: Basic
 Intel 8259A Programmble Interrupt Controller (8259A/8259A-2), 1988.
 Intel® I/O Controller Hub 10 (ICH10) Family Datasheet, October 2008.
 82C59A Priority Interrupt Controller: Application Note, April 1999.
 Y. Shi, B. H. Gwee, J. Chang, “Asynchronous DSP for low-power
energy-efficient embedded systems,” Microprocessors and
Microsystems, vol. 35, pp. 318–328, 2011.
 S. Churiwala, C. Kumar, S. Verma from Atrenta (I) Pvt. Ltd, “Exploring
the types of combinational loops,” in EETimes Asia, March 2010.
 E. Vittoz, C. Piguet, and W. Hammer, “Model of the logic gate,” in
Proc. J . d’Electronique EPF-L, Lausanne, 1977, pp. 455-467.
 C. Piguet, “Logic Synthesis of Race-Free Asynchronous CMOS
Circuits,” IEEE Journal of Solid-state Circuits, vol. 26, no. 3, March
1991, pp. 371.
 S. H. Unger, “Hazards, critical races, and metastability,” IEEE Trans.
Comput., vol. 44, pp.754 - 768, 1995.
 P. P. Chu, “RTL Hardware Design Using VHDL: Coding for Efficiency,
Portability and Scalability,” A JohnWiley Sons, Inc., Hoboken, NJ,
 P. Forshaw, R. Hahn, “Synchronous design: The right technique for
digital ASICs,” in Proc. The Third Annual IEEE ASIC Seminar and
Exhibit, pp. P6-l.l-P6-1.5, Rochester, New York, Sept. 1990.
 D. G. Messerschmitt, “Synchronization in digital system design,” IEEE
J. Select. Areas Commun., vol. 8, pp.1404 - 1419 , 1990.
 P. A. Beerel, R. O. Ozdag, M. Ferretti, “A Designer’s Guide to
Asynchronous VLSI,” in Cambridge University Press.
 G. Palumbo, F. Pappalardo, and S. Sannella, “Evaluation on power
reduction applying gated clock approaches,” in Proc. IEEE Int. Symp.
Circuits and Systems, vol. 4, pp.85 - 88, 2002.
 S. Hauck, “Asynchronous design methodologies: An overview,” in Proc.
IEEE, vol. 83, no. 1, pp.69 - 93, 1995.
 K. van Berkel, R. Burgess, J. Kessels, A. Peeters, M. Roncken, F.
Schalij, R. van de Wiel, “A single-rail re-implementation of a DCC error
detector using a generic standard-cell library,” in IEEE Computer
Society Press, Asynchronous Design Methodologies, pp.72 - 79 , 1995.
 J. Bhasker, R. Chadha, “Static Timing Analysis for Nanometer Designs:
A Practical Approach,” Springer, Springer 1st Edition, April 2009.
 A. H. Farrahi, C. Chen, A. Srivastava, G. Tellez, M. Sarrafzadeh,
"Activity-driven clock design", IEEE Trans. Comput.-Aided Des., vol.
20, pp.705 , 2001.