UNIT-III CASE STUDIES -FPGA & CPGA ARCHITECTURES APPLICATIONS
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

UNIT-III CASE STUDIES -FPGA & CPGA ARCHITECTURES APPLICATIONS

  • 1,886 views
Uploaded on

Meant for M.Tech I semester VLSI students of SKUCET

Meant for M.Tech I semester VLSI students of SKUCET

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,886
On Slideshare
1,886
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
187
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com UNIT III : CASE STUDIES [CPLD & FPGA ARCHITECTURE & APPLICATIONS] INTRODUCTION: The Field Programmable Gate Arrays consist of an array of programmable logic blocks including general logic, memory and multiplier blocks, surrounded by a programmable routing fabric that allows blocks to be . The array is surrounded by programmable input/output blocks, labeled I/O in the figure, that connect the chip to the outside world. Here the term ―programmable‖ indicates an ability to program a function into the chip after completion of silicon fabrication . This is possible by the programming technology, which is a method that can cause a change in the behavior of the pre-fabricated chip after fabrication, in the ―field,‖ where system users create designs. The first programmable logic devices used very small fuses as the programming technology. Every FPGA depends on a programming technology that is used to control the programmable switches that give FPGAs their programmability. Programming Technologies There are a number of programming technologies that have been used for reconfigurable architectures. Each of these technologies have different characteristics and have significant effect on the programmable architecture. Some of the well-known technologies are (i).SRAM Based Programming Technology (ii).Flash Programming Technology(EEPROM) , and (iii) Anti-fuse based Programming Technology SRAM-Based Programming Technology Static memory cells are the basic cells used for SRAM-based FPGAs. Most commercial vendors like XILINX, Lattice and Altera etc.. use static memory (SRAM) based programming technology in their devices. These devices use static memory cells which are divided throughout the FPGA to provide configurability. An example of such memory cell is shown below .In an SRAM-based FPGA, SRAM cells are mainly used for following purposes (i). To program the routing interconnect of FPGAs which are generally steered by small multiplexors. 1
  • 2. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com (ii). To program Configurable Logic Blocks (CLBs) that are used to implement logic functions. There are two primary uses for the SRAM cells. Most are used to set the select lines to multiplexers that steer interconnect signals. The majority of the remaining SRAM cells are used to store the data in the lookup-tables (LUTs) that are typically used in SRAM-based FPGAs to implement logic functions. Historically, SRAM cells were used to control the tri-state buffers and simple pass transistors that were also used for programmable interconnect. SRAM-based programming technology has become the dominant approach for FPGAs because of its re-programmability and the use of standard CMOS process technology and therefore leading to increased integration, higher speed and lower dynamic power consumption of new process with smaller geometry. There are however a number of drawbacks associated with SRAM-based programming technology. For example an SRAM cell requires 6 transistors which makes this technology costly in terms of area compared to other programming technologies. Further SRAM cells are volatile in nature and external devices are required to permanently store the configuration data. These external devices add to the cost and area overhead of SRAM-based FPGAs. There is a problem in terms of security of data also. Since the configuration information must be loaded into the device at power up, there is the possibility that the configuration information 2
  • 3. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com could be intercepted and stolen for use in a competing system. To overcome this problem certain encryption techniques are followed. Electrical properties of pass transistors are not ideal. i.e SRAM-based FPGAs typically rely on the use of pass transistors to implement multiplexers. However, they are far from ideal switches as they have significant on-resistances and present an appreciable capacitive load. As FPGAs migrate to smaller device geometries these issues may be exacerbated. Flash Programming Technology An important alternative to the SRAM-based programming technology is the use of flash or EEPROM based programming technology. This technology inject charge onto a gate that ―floats‖ above the transistor. This approach is used in flash or EEPROM memory cells. These cells are non-volatile; they do not lose information when the device is powered down. With modern IC fabrication processes, it has become possible to use the floating gate cells directly as switches. Flash memory cells, in particular, are now used because of their improved area efficiency. The widespread use of flash memory cells for non-volatile memory chips ensures that flash manufacturing processes will benefit from steady decreases in process geometries. Flash-based programming technology offers several advantages. For example, this programming technology is nonvolatile in nature. Flash-based programming technology is also more area efficient than SRAM-based programming technology. Flash-based programming technology has its own disadvantages also. Unlike SRAM-based programming technology, flash based devices cannot be reconfigured/reprogrammed an infinite number of times. Also, flash-based technology uses non-standard CMOS process. This flash-based programming technology offers several unique advantages, most importantly non-volatility. This feature eliminates the need for the external resources required to store and load configuration data when SRAM-based programming technology is used. Additionally, a flash-based device can function immediately upon power-up instead of having to wait for the loading of configuration data. The flash approach is also more area efficient than SRAM-based technology which requires up to six transistors to implement the programmable storage. The programming circuitry, such as the high and low voltage buffers needed to program the cell, contributes an area overhead not present in SRAM-based devices. However, this cost is relatively modest as it is amortized across numerous programmable elements. In comparison to 3
  • 4. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com anti-fuses, an alternative non-volatile programming technology, flash-based FPGAs are reconfigurable and can be programmed without being removed from a printed circuit board. The use of a floating-gate to control the switching transistor adds design complexity because care must be taken to ensure the source–drain voltage remains sufficiently low to prevent charge injection into the floating gate . Since newer processes require lower voltage levels, this issue may become less of a concern in the future .One disadvantage of flash-based devices is that they cannot be reprogrammed an infinite number of times. Charge buildup in the oxide eventually prevents a flash-based device from being properly erased and programmed . Devices such as the Actel ProASIC3 are useful for only 500 programming cycles . For most of the uses of FPGAs ,this programming count is more than sufficient. In many cases FPGAs are programmed for only one use. Another significant disadvantage of flash devices is the need for a non-standard CMOS process. Also, like the static memory-based technology, this programming technology suffers from relatively high resistance and capacitance due to the use of transistor-based switches. One trend that has recently emerged is the use of flash storage in combination with SRAM programming technology. In devices from Altera, Xilinx and Lattice, on-chip flash memory is used to provide nonvolatile storage while SRAM cells are still used to control the programmable elements in the design. This addresses the problems associated with the volatility of pure-SRAM approaches, such as the cost of additional storage devices or the possibility of configuration data interception, while maintaining the infinite re-configurability of SRAM-based devices. It is important to recognize that, since the programming technology is still based on SRAM cells, the devices are no different than pure-SRAM based devices from an FPGA architecture standpoint. However, the incorporation of flash memory generally means that the processing technology will not be as advanced as pure-SRAM devices. Additionally, the devices incur more area overhead than pure-SRAM devices since both flash and SRAM bits are required for every programmable element. Anti-fuse Programming Technology An alternative to SRAM and floating gate-based technologies is anti fuse programming technology. This technology is based on structures which exhibit very high-resistance under normal circumstances but can be programmably ―blown‖ (in reality, connected) to create a low resistance link. 4
  • 5. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com An anti-fuse is a two terminal device with an unprogrammed state presenting a very high resistance between its terminals. When a high voltage (from 11 to 20 volts, depending on the type of anti-fuse) is applied across its terminals the anti-fuse will ―blow‖ and create a low resistance link. This link is permanent. Anti-fuses in use today are built either using an OxygenNitrogen-Oxygen (ONO) dielectric between N+ diffusion and poly-silicon or amorphous silicon between metal layers or between polysilicon and the first layer of metal. Programming an anti-fuse requires extra circuitry to deliver the high programming voltage and a relatively high current of 5 mA or more. This is done in through fairly sizable pass transistors to provide addressing to each anti-fuse. Anti-fuse technology is used in the FPGA’s from Actel , Quick logic , and Cross point. A major advantage of the anti-fuse is its small size, little more than the cross-section of two metal wires. But this advantage is limited by transistors, which the large size of the necessary programming handle large currents, and the inclusion of isolation transistors that are sometimes needed to protect low voltage transistors from high programming voltages. A second major advantage of an anti-fuse is its relatively low series resistance. The on-resistance of the ONO anti-fuse is 300 to500 ohms, while the amorphous silicon anti-fuse is 50 to100 ohms. Additionally, the parasitic capacitance of an un programmed amorphous anti-fuse is significantly lower than for other programming technologies. The limitations of this technology are , this technology does not make use of standard CMOS process. Also, anti-fuse programming technology based devices cannot be reprogrammed. The ideal technology should be re-programmable, non-volatile, and that uses a standard CMOS process. But it is clear that none of the above technologies satisfy these conditions. However, SRAM-based programming technology is the most widely used programming technology. The main reason is its use of standard CMOS process .Due to this reason it is expected that this technology will continue to dominate the other two programming technologies. 5
  • 6. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com Comparison of Programming Technologies Programming Re-Programmable Volatile Series Capacitance Technology Storage Resistance Cell Area in pf In-circuit Yes 1KΩ 15 5X Anti-Fuse No No 50-500 Ω 1.2 – 5.0 1X EPROM Outside circuit No 2 KΩ 10 1X In-Circuit No 2 KΩ 10 2X Static RAM EEPROM XILINX XC3000 FPGA Device Xilinx introduced the first FPGA family, called the XC2000 series, in 1985 and next offered three more series of FPGAs namely XC3000, XC4000, and XC5000 etc. The first modern-era FPGA was introduced with 64 logic blocks and 58 inputs and outputs. XC3000 series of FPGA devices were introduced in 1985 by XILINX Inc.This was the most successful family of FPGAs. The XC3000 archtecture includes enhancements to the XC2000 architecture to improve performance ,density and usability. The XC3000 architecture was developed with manual tools for design implementation and the architecture also shows a bias towards manual design. The XC3000 Family covers a range of nominal device densities from 2,000 to 9,000 gates, practically achievable densities from 1,000 to 6,000 gates with up to 144 user-definable I/Os. Device speeds, described in terms of maximum guaranteed toggle frequencies, range from 70 to 125 MHz. The XC3000 Configurable Logic block is substantially larger than XC2000 and Each of the lookup tables has four inputs and requires 16 bits of configuration memory. The two lookup tables can be combined with a multiplexer to produce any function of five inputs and some functions of up to seven inputs.The XC3000 archtecture allows faster logic implementation with minimum CLBs in series. There are now four distinct familes within the XC3000 Series of FPGA devices • XC3000A Family • XC3000L Family • XC3100A Family • XC3100L Family 6
  • 7. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com All four families share a common architecture, development software, design and programming methodology, and also common package pin-outs. • XC3000A Family : The XC3000A is an enhanced version of the basic XC3000 family, featuring additional interconnect resources and other user-friendly enhancements. • XC3000L Family : The XC3000L is identical in architecture and features to the XC3000A family, but operates at a nominal supply voltage of 3.3 V. The XC3000L is the right solution for battery-operated and low-power applications. • XC3100A Family — The XC3100A is a performance-optimized relative of the XC3000A family. While both families are bit stream and footprint compatible, the XC3100A family extends toggle rates to 370 MHz and in-system performance to over 80 MHz. The XC3100A family also offers one additional array size, the XC3195A. • XC3100L Family — The XC3100L is identical in architectures and features to the XC3100A family, but operates at a nominal supply voltage of 3.3V The basic LCA (Logic Cell Array) of XC3000 consists of three components .They are Programmable I/O Blocks , Configurable Logic Block and Programmable Interconnect. In addition to this a small amount of configurable memory is also present . Programmable I/O Block Each user-configurable IOB as shown below, provides an interface between the external package pin of the device and the internal user logic. Each IOB includes both registered and direct input paths. Each IOB provides a programmable3-state output buffer, which may be driven by a registered or direct output signal. Configuration options allow each IOB an inversion, a controlled slew rate and a high impedance pull-up. Each input circuit also provides input clamping diodes to provide electrostatic protection, and circuits to inhibit latch-up produced by input currents. 7
  • 8. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com Each IOB includes input and output storage elements and I/O options selected by configuration memory cells. A choice of two clocks is available on each die edge. The polarity of each clock line (not each flip-flop or latch) is programmable. A clock line that triggers the flip-flop on the rising edge is an active Low Latch Enable (Latch transparent) signal and vice versa. Passive pullup can only be enabled on inputs, not on outputs. All user inputs are programmed for TTL or CMOS thresholds. Configurable Logic Block. Each CLB includes a combinatorial logic section, two flip-flops and a program memory controlled multiplexer selection of function. It has the following components Five logic variable inputs A, B, C, D, and E a direct data in DI an enable clock EC a clock (invertible) K an asynchronous direct RESET RD Two outputs X and Y. 8
  • 9. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com XC3000 CLB Each CLB has a combinatorial logic section, two flip-flops, and an internal control section. The CLB has five logic inputs (A, B, C, D and E) ; a common clock input(K); an asynchronous direct RESET input (RD) and an enable clock (EC) as shown in the block diagram. Each CLB also has two outputs (X and Y) which may drive interconnect networks. Data input for the flipflops within a CLB is supplied from the function F or G outputs of the combinatorial logic, or the block input, DI. Both flip-flops in each CLB share the asynchronous RD which, when enabled , is dominant over clocked inputs. All flip-flops are reset by the active-Low chip input, RESET, or during the configuration process. The flip-flops share the enable clock (EC) which, when Low, re circulates the flip-flops present states and inhibits response to the data-in or combinatorial function inputs on a CLB. The user may enable these control inputs and select their sources. The user may also select the clock net input (K), as well as its active sense within each CLB. This programmable inversion eliminates the need to route both phases of a clock signal throughout the device. Programmable Interconnect : Programmable-interconnection resources in the Field Programmable Gate Array provide routing paths to connect inputs and outputs of the IOBs and CLBs into logic networks. Interconnections 9
  • 10. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com between blocks are composed of a two-layer grid of metal segments. Specially designed pass transistors, each controlled by a configuration bit, form programmable interconnect points (PIPs) and switching matrices used to implement the necessary connections between selected metal segments and block pins. Three types of metal resources are provided to accommodate various network interconnect requirements. • General Purpose Interconnect • Direct Connection • Long lines (multiplexed busses and wide AND gates) XC3000 Interconnect XILINX XC4000 FPGA Device : The XC4000 features a Configurable Logic Block (CLB) that is based on look-up tables (LUTs). A LUT is a small one bit wide memory array, where the address lines for the memory are inputs of the logic block and the one bit output from the memory is the LUT output. A LUT with K inputs would then correspond to a 2K x 1 bit memory and can realize any logic function of its K inputs by programming the logic function’s truth table directly into the memory. The XC4000 CLB contains three separate LUTs, in the configuration 10
  • 11. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com as shown below. There are two 4-input LUTS that are fed by CLB inputs, and the third LUT can be used in combination with the other two. This arrangement allows the CLB to implement a wide range of logic functions of up to nine inputs, two separate functions of four inputs or other possibilities. Each CLB also contains two flip-flops. Xilinx XC4000 Configurable Logic Block (CLB). To provide high density devices that support the integration of entire systems, the XC4000 chips have ―system oriented‖ features. For example, each CLB contains circuitry that allows it to efficiently perform arithmetic (i.e., a circuit that can implement a fast carry operation for adderlike circuits) and also the LUTs in a CLB can be configured as read/write RAM cells. A new version of this family, the 4000E, has the additional feature that the RAM can be configured as a dual port RAM with a single write and two read ports. In the 4000E, RAM blocks can be synchronous RAM. Also, each XC4000 chip includes very wide AND-planes around the periphery of the logic block array to facilitate implementing circuit blocks such as wide decoders. 11
  • 12. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com The other important feature of this FPGA is its interconnect structure. The XC4000 interconnect is arranged in horizontal and vertical channels. Each channel contains some number of short wire segments that span a single CLB (the number of segments in each channel depends on the specific part number), longer segments that span two CLBs, and very long segments that span the entire length or width of the chip. Programmable switches are available to connect the inputs and outputs of the CLBs to the wire segments, or to connect one wire segment to another.. The figure below shows only the wire segments in a horizontal channel, and does not show the vertical routing channels, the CLB inputs and outputs, or the routing switches. The salient feature about the Xilinx interconnect is that signals must pass through switches to reach one CLB from another, and the total number of switches traversed depends on the particular set of wire segments used. Thus, speed-performance of an implemented circuit depends in part on how the wire segments are allocated to individual signals by CAD tools. Actel FPGAs In contrast to XILINX FPGAs the devices manufactured by Actel are based on anti fuse technology. Actel offers three main families .They are : Act 1, Act 2, and Act 3. Actel devices are based on a structure similar to traditional gate arrays; the logic blocks are arranged in rows and there are horizontal routing channels between adjacent rows. This architecture is shown in figure below. The logic blocks in the Actel devices are relatively small in comparison to the LUT based ones. , and are based on multiplexers. The figure illustrates the logic block in the Act 3 and shows that it comprises an AND and OR gate that are connected to a multiplexer based 12
  • 13. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com circuit block. The multiplexer circuit is arranged such that, in combination with the two logic gates, a very wide range of functions can be realized in a single logic block. About half of the logic blocks in an Act 3 device also contain a flip-flop. Actel FPGA structure. Actel’s interconnect is organized in horizontal routing channels. The channels consist of wire segments of various lengths with anti-fuses to connect logic blocks to wire segments or one wire to another. Also, Actel chips have vertical wires that overlay the logic blocks, for signal paths that span multiple rows. In terms of speed-performance, it is evident that Actel chips are not fully predictable, because the number of anti-fuses traversed by a signal depends on how the wire 13
  • 14. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com segments are allocated during circuit implementation by CAD tools. However, Actel provides a rich selection of wire segments of different length in each channel and has developed algorithms that guarantee strict limits on the number of anti-fuses traversed by any two-point connection in a circuit which improves speed-performance significantly. Quicklogic pASIC FPGAs : The Quicklogic is the main competitor for Actel in anti-fuse -based FPGAs . It produces two families of devices, called pASIC and pASIC-2. The pASIC-2 is an enhanced version of pASIC. The pASIC, consists of a regular two-dimensional array of blocks called pASIC Logic Blocks (pLBs).The logic capacities of first generation of Quick Logic FPGAs is between 48 and 380pLBs,or 500 to 4000 equivalent MPGAs gates. As shown in figure below pASIC has similarities to other FPGAs i.e the overall structure is array-based like Xilinx FPGAs, and logic blocks use multiplexers similar to Actel FPGAs, and the interconnect consists of only long- lines like in Altera FLEX 8000. It is to be noted that the pASIC architecture is now independently developed by Cypress also. Structure of Quicklogic pASIC FPGA. It consists of a top layer of metal, an insulating layer of amorphous silicon, and a bottom layer of metal. When compared to Actel’s PLICE anti-fuse, Via Link offers a very low on-resistance of about 50 ohms (PLICE is about 300 ohms) and a low parasitic capacitance. The Via Link anti- 14
  • 15. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com fuses are present at every crossing of logic block pins and interconnect wires, providing generous connectivity. Quicklogic (Cypress) Logic Cell pASIC’s multiplexer-based logic block is shown in the above figure. It is more complex than Actel’s Logic Module, with more inputs and wide (6-input) AND-gates on the multiplexer select lines. Every logic block also contains a flip- flops. Altera FLEX 8000 and FLEX 10000 FPGAs : The first FPGA chips from Aletra were simple arrays of logic cells ,which are relatively simple logic elements (LEs),each element comprising of a three input look-up table (LUT ) to generate logic functions ,a single configurable flip-flop and multiplexers for routing the signals and selecting clocks. The logic cells were connected by switch boxes instead of fixed interconnect. The general architecture of Altera’s FPGAs is shown in the diagram below . . 15
  • 16. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com There are two high performance FPGA series called FLEX series. Altera’s FLEX 8000 series consists of a three-level hierarchy similar to CPLDs. However, the lowest level of the hierarchy consists of a set of lookup tables, rather than an SPLD like block, and so the FLEX 8000 is categorized here as an FPGA. It should be noted, however ,that FLEX 8000 is a combination of FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a four-input LUT as its basic logic block. Logic capacity ranges from about 4000gates to more than 15,000 for the 8000 series. The architecture of FLEX 8000 is shown in figure below. The basic logic block, called a Logic Element (LE) contains a four-input LUT, a flip-flop, and special-purpose carry circuitry for arithmetic circuits (similar to Xilinx XC 4000). The LE also includes cascade circuitry that allows for efficient implementation of wide AND functions. In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term borrowed from Altera’s CPLDs). As shown in Figure below each LAB contains local interconnect and each local wire can connect any LE to any other LE within the same LAB. Architecture of Altera FLEX 8000 FPGAs. 16
  • 17. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com Altera FLEX 8000 Logic Element (LE). Local interconnect also connects to the FLEX 8000’s global interconnect, called Fast Track. Fast Track is similar to Xilinx long lines in that each Fast Track wire extends the full width or height of the device. However, a major difference between FLEX 8000 and Xilinx chips is that Fast Track consists of only long lines. This makes the FLEX 8000 easy for CAD tools to automatically configure. All Fast-Track wires horizontal wires are identical, and so interconnect delays in the FLEX 8000 are more predictable than FPGAs that employ many smaller length segments because there are fewer programmable switches in the longer paths. Predictability is furthered aided by the fact that connections between horizontal and vertical lines pass through active buffers. The FLEX 8000 architecture has been extended in the state-of-the-art FLEX 10000 family. FLEX 10000 offers all of the features of FLEX 8000, with the addition of variable-sized blocks of SRAM, called Embedded Array Blocks (EABs) which shows that each row in a FLEX 10000 chip has an EAB on one end. Each EAB is configurable to serve as an SRAM block with a variable aspect ratio: 256 x 8, 512 x 4, 1K x 2, or 2K x 1. In addition, an EAB can alternatively be configured to implement a complex logic circuit, such as a multiplier, by employing it as a large multi-output lookup table. Altera provides, as part of their CAD tools, several macrofunctions that implement useful logic circuits in EABs. Counting the EABs as logic gates, FLEX 10000 offers the highest logic capacity of any FPGA, although it is hard to provide an accurate number. 17
  • 18. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com Concurrent Logic FPGA Device : The manufacturer Concurrent Logic offers the CFA6006 FPGA device ,which is based on two dimensional array of identical blocks ,where each block is symmetrical on its four sides. The array holds 3136 of such blocks ,providing a total logic capacity of about 5000 equivalent gates. Connections are formed using multiplexers that are configured by a static RAM programming technology. The structure of the Concurrent Logic Block is shown below diagram. It comprises of user configurable multiplexers, basic gates and a D type flip-flop .The concurrent FPGA is especially suitable for register-intensive and arithmetic applications since the logic block can easily implement a half-adder and a register bit. There are two direct connections A and B formed by routing signals through the multiplexers within the blocks.Long connection is implemented using a bussing network, in which wires of various lengths are superimposed on the array of logic blocks. Crosspoint Solutions FPGAs: The crosspoint FPGAs are different from other transistor level as aoposed to logic FPGAs because it is configurable at the block level in other FPGAs.Basically the architecture consists of rows of transistor pairs ,where the rows are separated by horizontal wiring segments .Veritical wiring segments are also available ,for connection among the rows. 18
  • 19. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com Each transistor row comprises two lines of series connected transistors ,with one line being NMOS and the other PMOS .The wiring resources allow individual transistor pairs tobe interconnected to implement CMOS logic gates. The programming technology used for the programmable switches is similar to the Via-Link anti-fuse ,which is based on amorphous silicon. The structure of the transistor pair rows is shown in below diagram.The diagram shows the implementation of a NOR agte and a NAND gate using the transistor lines. The transistor gates ,drains , sources can be programmable interconnected to other transistors and also to power and ground.The series connections across the lines is broken where necessary by permanently holding a transistor in its OFF state. A wide range of logic gates can be implemented by the transistor lines and the interconnection patterns. The FPGAs currently offered by Crosspoint Solutions has a total logic capacity of 4200 gates.The chip has 256 rows of transistor pairs and an additional 64-rows of multiplexer like structures are provided.With its rows based architecture ,anti-fuse programming technology and multiplexers ,the Crosspoint FPGAs are most similar to those of Actel FPGAs. ALGOTRONIX CAL-1024 This design has a two-dimensional mesh array structure which resembles the gatearray ―sea of gates‖ architecture previously identified in Figure . Like the Xilinx architecture, Algotronics used Static RAM programming technology to specify the function performed by each logic cell 19
  • 20. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com and to control the switching of connections between cells. The CAL1024 design contains 1024 identical logic cells arranged in a 32 X 32 matrix. The design is considered to be a meshconnected architecture since each cell is directly connected to its nearest north, south, east, and west neighbors. In addition to these direct connects, two global interconnect signals are routed to each cell to distribute clock and other ―low skew requirement‖ control signals. Figure 19 shows the basic array architecture, indicating both nearest neighbor and global connections to the logic cells. In addition to these logical connections, row select lines and bit select lines which are not shown on the figure are connected to program each cell’s SRAM bits. ALGOTRONIX Array Architecture The basic building block of the Algotronix design is a configurable cell containing multiplexers and a function unit. As indicated in the figure , the function unit is preceded by multiplexers 20
  • 21. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com which select the source for the X1 and X2 inputs. The function unit is capable of generating any logic function of the two inputs, or of operating as a D-type latch. Not shown in the figure are four additional multiplexers which select the function output or one of the external inputs for routing to each of the four outputs (north, south, east, and west). A unique feature in the Algotronix I/O pad design is its capability to provide simultaneous input and output on the same pin when communicating with another Algotronix chip. This is done through a 3-level (ternary) logic signaling scheme in which I/O pads sense whenever two outputs are driving each other via a contention scheme. Even during contention, the pad can deduce the correct input value and pass it along to the internal circuitry. This makes it easier to partition a single design across multiple FPGAs because the increased connectivity reduces pin limitations on communications bandwidth. AMD Mach : AMD offers a CPLD family comprising five subfamilies calledMach. Each Mach device consists of multiple PAL-like blocks (or optimizedPALs). Mach 1 and 2 consist of optimized22V16 PALs, Mach 3 and 4 consist of several optimized 34V16 PALs,and Mach 5 is similar to Mach 3 and 4but offers enhanced speed performance .All Mach chips use EEPROM technology, and together the five subfamilies provide a wide range of selection ,from small, inexpensive chips to larger, state-of-the-art ones. We will focus on Mach 4 because it represents the most advanced currently available parts in the family. 21
  • 22. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com Figure (a) below depicts a Mach 4 chip, showing the multiple 34V16 PAL-like blocks and the interconnect, called the central switch matrix. The in-circuit programmable chips range in size from6 to 16 PAL-like blocks, corresponding roughly to 2,000 to 5,000 equivalent gates. All connections between PAL-like blocks (even from a PAL-like block to itself) pass through the central switch matrix. Thus, the device is not merely a collection of PAL-like blocks but a single ,large device. Since all connections travel through the same path, circuit timing delays are predictable. Figure (b) illustrates a Mach 4 PAL-like block. It has 16 outputs and a total of 34inputs (16 of which are the fed-back outputs), so it corresponds to a 34V16 PAL. However, there are two key differences between this block and a normal PAL:1) a product term (PT) allocator between the AND plane and the macro cells (the macro cells comprise an OR gate, an EXOR gate, and a flip-flop), and2) an output switch matrix between the OR gates and the I/O pins. These features make a Mach 4 chip easier to use because they decouple sections of the PAL-like block. More specifically, the product term allocator distributes and shares product terms from the AND plane to OR gates that require them, allowing much more flexibility than thefixed-size OR gates in regular PALs. The output switch matrix enables any macrocell output (OR gate or flip-flop)to drive any I/O pin connected to the PAL-like block, again providing greater flexibility than a PAL, in which each macro cell can drive only one specific I/O pin. Mach 4’s combination of in-system programmability and high flexibility allow easy hardware design changes. 22
  • 23. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com AMD Mach 4 structure FPGA Design Flow: 23
  • 24. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com The earlier PLD and FPGA designs were performed largely by hand But to-days complex programmable logic devices requires the use of an integrated Computer-Aided Design (CAD) system. Both commercial CAD tool vendors and FPGA companies offer appropriate tools. For example, traditional Electronic Design Automation (EDA) vendors such as Cadence, Mentor Graphics, Synopsys, and View Logic etc. offer tools to support FPGA design. These tools are typically used for the front-end design entry and simulation operations and provide the necessary interfaces to vendor-specific back-end tools for chip placement and routing. Examples of vendor specific tools are the Xilinx XACT system and the Altera MAX+PLUS II software.The Altera’s MAX+PLUS II software supports the entire design flow on either PC or workstation platforms. The first step in the design process is the description of the logic circuit,which can be done either by schematic capture tool or with Boolean expressions.This is followed by a translation that converts the original circuit description into a standard format used by the suitable CAD tools (Ex: XILINX CAD tools).The circuit is then passed through CAD programs that partition it into appropriate logic blocks. Select a specific location in the FPGA for each logic block and form the required interconnections.( (Cadence, View Logic, OrCAD, etc.) The performance of the implemented circuit can then be checked and its functionality is verified.Finally a bitmap is generated and downloaded in a serial fashion to configure the FPGA. Initial Design Entry: The detailed description of the logic circuit are entered using a schematic capture program. In the design entry phase, RTL or schematic entry is used to create the logic to be implemented in the device. Pin assignments can also be made, including pin placement information, and timing constraints that might be necessary for building a functioning design. In the design entry step a schematic or Block Design File (.bdf) is created that is the top-level design. The library of parameterized modules (LPM) functions are added and Verilog HDL code is used to add a logic block. The library may be either supplied by the vendor of the schematic capture program or any FPGA vendor(Like Xilinx or Altera etc.) .An alternate way to specify the logic circuit is to use a Boolean expression or state machine language.This is done without the graphical interface.Some times it is possible to use a mixture of both schematic and Boolean expressions. 24
  • 25. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com Translation to XNF Format: After the logic circuit is successfully designed and merged into one circuit ,it is translated into a special format that is understood by the CAD tools.Foe Xilinx this format is called Xilinx net list format or XNF.This translation utility is supported by the Xilinx or by the vendor of the logic entry tool.The translation process may also involve automatic optimizations of the circuit. Partition: The XNF circuit is partitioned into logic cells (this partition is also known as Technology Mapping). This technology mapping converts the XNF circuit which is a net list of basic logic gates ,into a net list of Xilinx logic cells.The logic cell used depends on which Xilinx product the circuit is to be implemented in. XACT tools also attempt to optimize the circuit during this step. For example, circuitry associated with unused logic block inputs or outputs is eliminated from the design. In addition, the partitioning program attempts to minimize either the total number of CLBs used or the number of logic stages in the critical delay path. The mapping procedure attempts to optimize the resulting circuit, either to minimize the total of logic cells required or the number of stages of logic cells in time critical circuitry. Place and Route: This step is performed by using the CAD tools, manually by the user or mixture of the two. The first step is placement ,in which each logic cell generated during the partition step is assigned to a specific location in the FPGA. Automatic placement can be done using the simulated annealing algorithm. After the placement ,the required interconnections among the logic cells must be realized by selecting wire segments and routing switches within the FPGA interconnection resources.An automatic routing algorithm is used for this task which is based on Maze routing algorithm. Generally this routing and placement must be done automatically but sometimes it is done manually by the user also. With the physical placement and routing completed, exact timing values can now be used to determine chip performance. The XACT tools provide a critical path 25
  • 26. Dr.Y.Narasimmha Murthy Ph.D yayavaram@yahoo.com timing analyzer which provides delay information on the longest through shortest paths through the chip.In addition, the physical layout timing information can also be back-annotated to the schematics to get more accurate functional simulation results. The final step in the Xilinx design flow is the creation of the BIT file which contains the binary programming data needed to configure the SRAM bits of the target chip. This file is then downloaded to configure the chip for final functional and timing tests of the programmed chip. After creating the design it must be compiled. Compilation converts the design into a bitstream that can be downloaded into the FPGA. The most important output of compilation is an SRAM Object File (.sof), which is used to program the device. The software also generates other report files that provide information about the code as it compiles In the design flow process the simulation is very important to learn, and there are entire applications devoted to simulating hardware designs. There are two types of simulation, RTL and timing. RTL (or functional) simulation allows you to verify that your code is place-androute) simulation verifies that the design meets timing and functions appropriately in the device. After completion of the design ,its performance is checked either by downloading the configuration bits into FPGA or by using an interface to a timing simulation program.If the performance is not satisfactory ,suitable modifications are done at some point in the design flow.Once the timing and functionality is verified the implementation is complete. ---------------------xxxxxx-----------------References: 1.Field Programmable Gate Arrays – S.D Brown, R.J.Francis et al 2.FPGA and CPLD Architectures : A Tutorial -STEPHEN BROWN & JONATHAN ROSE. 3. FPGA Architecture: Survey and Challenges --Ian Kuon1, Russell Tessier and Jonathan Rose1 26