• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Memory map selection of real time sdram controller using verilog full project report
 

Memory map selection of real time sdram controller using verilog full project report

on

  • 198 views

full report on Memory map selection of real time SDRAM controller using verilog which was my project.If you want any help email me @ rahulverma2512@gmail.com

full report on Memory map selection of real time SDRAM controller using verilog which was my project.If you want any help email me @ rahulverma2512@gmail.com

Statistics

Views

Total Views
198
Views on SlideShare
198
Embed Views
0

Actions

Likes
2
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Memory map selection of real time sdram controller using verilog full project report Memory map selection of real time sdram controller using verilog full project report Document Transcript

    • Project Report On Memory map selection of real time SDRAM controller using Verilog By RAHUL VERMA (9015694258)
    • vi TABLE OF CONTENTS Page DECLARATION ............................................................................................................................ii CERTIFICATE .............................................................................................................................iii ACKNOWLEDGEMENTS ..........................................................................................................iv ABSTRACT ..................................................................................................................................vi LIST OF FIGURES .....................................................................................................................vii LIST OF TABLES........................................................................................................................viii LIST OF ABBREVIATION……………………………………………………………………...ix CHAPTER 1 (INTRODUCTION)……………………………………………………………..01 1.1 LITERATURE SURVEY……………………………………………………...02 1.2 GOAL OF THE PROJECT…………………………………………………….03 CHAPTER 2 (BACKGROUND)………………………………………………………………04 2.1 RANDOM ACCESS MEMORY…………………………………........ ……..04 2.2 STATIC RANDOM ACCESS MEMORY …………………………………....04 2.3 DYNAMIC RANDOM ACCESS MEMORY ……………..………………….05 2.4 DEVELOPMENT OF DRAM ………………………………………………...06 2.4.1 DRAM …………………………………………………………………...07 2.4.2 SYNCHRONOUS DRAM……………………………………………….07 2.4.3 DDR1SDRAM…………………………………………………………....08 2.4.4 DDR2SDRAM……………………………………………………………08
    • vii 2.4.5 DDR3SDRAM………………………………………………………..…09 2.5 TIMELINE……………………………………………………………………09 CHAPTER 3 (METHODOLOGY)…………………………………………………………...11 3.1 HARDWARE…………………………………………………………………11 3.1.1 VIRTEX-6FPGA………………………………………………………..11 3.1.2 ML605 BOARD………………………………………………………...12 3.2 TOOLS………………………………………………………………………..12 3.2.1 XILINX INTERGRATED SOFTWARE ENVIRONMENT(ISE)……..13 3.2.2 SYNTHESIS AND SIMULATION……………………. ……………..14 3.2.3 IMPLEMENTATION AND HARDWARE VALIDATION…………...14 3.2.4 ANALYSIS OF TURN-AROUND TIMES…………………………….17 3.2.5 XILINX CORE GENERATOR…………………………………………19 CHAPTER 4 (ARCHITECTURE)……………………………………………………………20 4.1 CONTROL INTERFACE MODULE…………………………………………21 4.2 COMMAND MODULE…………………….……………….………………...22 4.3 DAPATH MODULE…………………………………………………………..24 CHAPTER 5 (OPERATION).....................................................................................................25 5.1 SDRAM OVERVIEW…………………………………………………………26 5.2 FUNCTIONAL DESCRIPTION………………………………………………27 5.3 SDRAM CONTROLLER COMMAND INTERFACE……………………….28 5.3.1 NOP COMMAND……………………………………………………….29 5.3.2 READA COMMAND…………………………………………………...30 5.3.3 WRITEA COMMAND……………………………………………….…31 5.3.4 REFRESH COMMAND…………………………………………….…..32
    • viii 5.3.5 PRECHARGE COMMAND………………………………………….....34 5.3.6 LOAD_MODE COMMAND……………………………………………35 5.3.7 LOAD_REG1 COMMAND……………………………………………..36 5.3.8 LOAD_REG2 COMMAND……………………………………………..37 CHAPTER 6 (ELEMENTS OF MEMORY BANK)…………………………………………38 6.1 DECODER…………………………………………………………………….38 6.1.1 A 2 TO 4 SINGLE BIT DECODER…………………………………….38 6.2 DEMUX………………………………………………………………………..40 6.3 RAM…………………………………………………………………………...41 6.3.1 TYPES OF RAM………………………………………………………...42 6.4 MUX…………………………………………………………………………...44 6.5 BUFFER……………………………………………………………………….45 6.5.1 VOLTAGE BUFFER…………………………………………………….46 6.5.2 CURRENT BUFFER…………………………………………………….47 6.6 MEMORY BANK……………………………………………………………..48 CHAPTER 7 (RESULT AND CONCLUSIONS)……………………………………………..51 7.1 POWER CONSUMED WHEN ALL 8 BANKS ARE ON…………..………51 7.1.1 PROJECT………………………………………………………...………51 7.1.2 DEVICE ………………………………………………………………….51 7.1.3 ENVIRONMENT …………………………………………………….,,,,.52 7.1.4 DEFAULT ACTIVITY………...………………………………….……..52 7.1.5 ON-CHIP POWER SUMMARY………………………………………...53 7.1.6 THERMAL SUMMARY………………………………………………...53 7.1.7 POWER SUPPLY SUMMARY………………………………………….53 7.1.8 CONFIDENCE LEVEL………………………………………………….54 7.1.9 BY HIERARCHY………………………………………………………..55
    • ix 7.2 POWER CONSUMED WHEN ONLY ONE MEMORY BANK IS IN USE…..56 7.2.1. PROJECT………………………………………………………………..56 7.2.2 DEVICE……………………………………………………………….....56 7.2.3 ENVIRONMENT………………………………………………………...57 7.2.4 DEFAULT ACTIVITY RATES…………………………………………57 7.2.5 ON-CHIP POWER SUMMARY………………………………………...58 7.2.6 THERMAL SUMMARY………………………………………………...58 7.2.7 POWER SUPPLY SUMMARY……………………………………...….58 7.2.8 CONFIDENCE LEVEL………………………………………………….59 7.2.9 BY HIERARCHY………………………………………………………..60 7.3 CONCLUSION…………………………………………………………….….60 CHAPTER 8 (FUTURE SCOPE)……………………………………………………………...61 REFERENCES...............................................................................................................................62
    • x LIST OF FIGURES Page Figure 2.1 DRAM Row Access Latency vs. Year 09 Figure 2.2 DRAM Column Address Time vs. Year 10 Figure 3.1 Screenshot of ISE Project Navigator 13 Figure 3.2 Flow Chart and Timing for Simulation and Hardware Validation 15 Figure 3.3 ISim Screen Shot 18 Figure 3.4 CHIPSCOPE Screen Shot 19 Figure 4.0 Architecture of SDRAM controller 20 Figure 4.1 Control Interface Module 21 Figure 4.2 Command Module Block Diagram 23 Figure 4.3 Data Path Module 24 Figure 5.0 SDR SDRAM Controller System-Level Diagram 25 Figure 5.1 Timing diagram for a READA command 30 Figure 5.2 Timing diagram for a WRITEA command 31 Figure 5.3 Timing diagram for a REFRESH command 32 Figure 5.4Timing diagram for a PRECHARGE command 34 Figure 5.5 Timing diagram for a PRECHARGE command 35 Figure 6.1 RTL of decoder 39
    • xi Figure 6.2 Simulation of Decoder 40 Figure 6.3 RTL of DEMUX 41 Figure 6.4 Simulation Of DEMUX 42 Figure 6.5 RTL of RAM 44 Figure 6.6 Simulation of RAM 44 Figure 6.7 RTL of MUX 46 Figure 6.8 Simulation of MUX 46 Figure 6.9 RTL of Buffer 48 Figure 6.10 Simulation of Buffer 49 Figure 6.11 RTL of Memory Bank 50 Figure 6.12 Simulation of Memory Bank 50
    • xii LIST OF TABLES Page Table 5.1 SDRAM Bus Commands 26 Table 5.2 Interface Signals 28 Table 5.3 Interface Commands 29 Table 5.4 REG1 Bit Definitions 36 Table 7.1 Project 51 Table 7.2 Device 51 Table 7.3 Environment 52 Table 7.4 Default Activity 52 Table 7.5 On-Chip Power Summary 53 Table 7.6 Thermal Summary 53 Table 7.7 Power Supply Summary 53 Table 7.8 Power Supply Current 54 Table 7.9 Confidence Level 54 Table 7.10 By Hierarchy 55 Table 7.11 Project 56 Table 7.12 Device 56
    • xiii Table 7.13 Environment 57 Table 7.14 Default Activity 57 Table 7.15 On-Chip Power Summary 58 Table 7.16 Thermal Summary 58 Table 7.17 Power Supply Summary 58 Table 7.18 Power Supply Current 59 Table 7.19 Confidence Level 59 Table 7.20 By Hierarchy 60
    • xiv LIST OF ABBREVIATIONS A/D Analog To Digital CAS Column Address Strobing CLB Configurable Logic Block DRAM Dynamic Random-Access Memory FPGA Field-Programmable Gate Array ISE Integrated Software Environment I/O Input/ Output LUTs Look-Up Tables NCD Native Circuit Description RAM Random Access Memory RAS Row Address Strobing ROM Read Only Memory SDRAM Synchronous Dynamic Random-Access Memory SRAM Static Random-Access Memory XST Xilinx Synthesis Technology
    • 1 CHAPTER 1 INTRODUCTION Embedded applications with real-time requirements are mapped to heterogeneous multiprocessor systems. The computational demands placed upon these systems are continuously increasing, while power and area budgets limit the amount of resources that can be expended to reduce costs, applications are often forced to share hardware resources. Functional correctness for Real- Time application is only guaranteed if their timing requirements are considered throughout the entire system when the requirements are not met, it may cause an unacceptable loss of functionality or severe quality degradation. We focus on the real-time properties of the (off-chip) memory. SDRAM is a commonly used memory type because it provides a large amount of storage space at low cost per bit. It comprises a hierarchical structure of banks and rows that have to be opened and closed explicitly by the memory controller, where only one row in each bank can be open at a time. Requests to the open row are served at a low latency, while request to a different row results in a high latency, since it requires closing the open row and subsequent opening of the requested row. Locality thus strongly influences the performance of the memory subsystem. The worst-case (minimum) bandwidth and worst-case (maxi- mum) latency are determined by the way requests are mapped to the memory. The worst-case latency can be optimized by accessing the memory at a small granularity (i.e. few words), such that the individual requests take a small amount of time to complete. This allows fine-grained sharing of the memory resource, at the expense of efficiency, since the overhead of opening and closing rows is amortized over only a small number of bits. Latency sensitive requests like cache misses favor this configuration. Conversely, to optimize for bandwidth, the memory has to be used as efficiently as possible, which requires memory maps that use a large access granularity.
    • 2 Existing memory controllers offer only limited configurability of the memory mapping and are unable to balance this trade-off based on the application requirements .A memory controller must take the latency and bandwidth requirements of all of its applications into account, while staying within the given power budget. This requires an understanding of the effect that different memory maps have on the attainable worst-case bandwidth, latency and power. 1.1 LITERATURE SURVEY Synchronous DRAM (SDRAM) has become a mainstream memory of choice in embedded system memory design due to its speed, burst access and pipeline features. For high-end applications using processors such as Motorola MPC 8260 or Intel StrongArm, the interface to the SDRAM is supported by the processor’s built-in peripheral module. However, for other applications, the system designer must design a controller to provide proper commands for SDRAM initialization, read/write accesses and memory refresh. In some cases, SDRAM is chosen because the previous generations of DRAM (FP and EDO) are either end-of-life or not recommended for new designs by the memory vendors. From the board design point of view, design using earlier generations of DRAM is much easier and more straightforward than using SDRAM unless the system bus master provides the SDRAM interface module as mentioned above. This SDRAM controller reference design, located between the SDRAM and the bus master, reduces the user’s effort to deal with the SDRAM command interface by providing a simple generic system interface to the bus master. In today's SDRAM market, there are two major types of SDRAM distinguished by their data transfer rates. The most common single data rate (SDR) SDRAM transfers data on the rising edge of the clock. The other is the double data rate (DDR) SDRAM which transfers data on both the rising and falling edge to double the data transfer throughput. Other than the data transfer phase, the different power-on initialization and mode register definitions, these two SDRAMs share the same command set and basic design concepts. This reference design is targeted for SDR SDRAM, however, due to the similarity of SDR and DDR SDRAM, this design can also be modified for a DDR SDRAM controller.
    • 3 For illustration purposes, the Micron SDR SDRAM MT48LC32M4A2 (8Meg x 4 x 4 banks) is chosen for this design. Also, this design has been verified by using Micron’s simulation model. It is highly recommended to download the simulation model from the SDRAM vendors for timing simulation when any modifications are made to this design. Several SDRAM controllers focusing on real-time applications have been proposed, all trying to maximize the worst case performance. Uses a static command schedule computed at design time. Full knowledge of the application behavior is thus required, making it unable to deal with dynamism in the request streams. The controller proposed in dynamically schedules pre- computed sequences of SDRAM commands according to a fixed set of scheduling rules. The controller proposed in follows a similar approach. Dynamically schedules commands at run-time according to a set of rules from which an upper bound on the latency of a request is determined and use a memory map that always interleaves requests over all banks in the SDRAM, which sets a high lower bound on the smallest request size that can be supported efficiently. Supports multiple bursts to each bank in an access to increase guaranteed bandwidth for large requests. Allows only single burst accesses to all banks in a fixed sequential manner, although multiple banks can be clustered to create a single logical resource. None of the mentioned controllers take power into account, despite it being an increasingly important design constraint. 1.1 GOAL OF THE PROJECT 1) We explore the full memory map design space by allowing requests to be interleaved over a variable number of banks. This reduces the minimum access granularity and can thus be beneficial for applications with small requests or tight latency constraints. 2) We propose a configuration methodology that is aware of the real-time and power constraints, such that an optimal memory map can be selected.
    • 4 CHAPTER 2 BACKGROUND There are two different types of random access memory: synchronous and dynamic. Synchronous random access memory (SRAM) is used for high-speed, low power applications while dynamic random access memory (DRAM) is used for its low cost and high density. Designers have been working to make DRAM faster and more energy efficient .The following sections will discuss the differences between these two types of RAM, as well as present the progression of DRAM towards a faster, more energy efficient design. 2.1 RANDOM ACCESS MEMORY Today, the most common type of memory used in digital systems is random access memory (RAM). The time it takes to access RAM is not affected by the data’s location in memory. RAM is volatile, meaning if power is removed, then the stored data is lost. As a result, RAM cannot be used for permanent storage. However, RAM is used during runtime to quickly store and retrieve data that is being operated on by a computer. In contrast, nonvolatile memory, such as hard disks, can be used for storing data even when not powered on. Unfortunately, it takes much longer for the computer to store and access data from this memory. There are two types of RAM: static and dynamic. In the following sections the differences between the two types and the evolution of DRAM will be discussed. 2.2 STATIC RANDOM ACCESS MEMORY Static random access memory (SRAM) stores data as long as power is being supplied to the chip.
    • 5 Each memory cell of SRAM stores one bit of data using six transistors: a flip flop and two access transistors (i.e. four transistors). SRAM is the faster of the two types of RAM because it does not involve capacitors, which involve sense amplification of a small charge. For this reason, it is used in cache memory of computers. Additionally, SRAM requires a very small amount of power to maintain its data in standby mode Although SRAM is fast and energy efficient it is also expensive due to the amount of silicon needed for its large cell size. This presented the need for a denser memory cell, which brought about DRAM. 2.3 DYNAMIC RANDOM ACCESS MEMORY According to Wakerly , “In order to build RAMs with higher density (more bits per chip), chip designers invented memory cells that use as little as one transistor per bit. Each DRAM cell consists of one transistor and a capacitor. Since capacitors “leak” or lose charge over time, DRAM must have a refresh cycle to prevent data loss. According to a high-performance DRAM study on earlier versions of DRAM, DRAM’s refresh cycle is one reason DRAM is slower than SRAM. The cells of DRAM use sense amplifiers to transmit data to the output buffer in the case of a read and transmit data back to the memory cell in the case of a refresh. During a refresh cycle, the sense amplifier reads the degraded value on a capacitor into a D- Latch and writes back the same value to the capacitor so it is charged correctly for 1 or 0. Since all rows of memory must be refreshed and the sense amplifier must determine the value of a, already small, degenerated capacitance, refresh takes a significant amount of time. The refresh cycle typically occurs about every 64 milliseconds the refresh rate of the latest DRAM (DDR3) is about 1 microsecond. Although refresh increases memory access time, according to a high-performance DRAM study on earlier versions of DRAM, the greatest amount of time is lost during row addressing, more specifically, “[extracting] the required data from the sense amps/row caches” . During addressing, the memory controller first strobes the row address (RAS) onto the address bus. Once the RAS is sent, a sense amplifier (one for each cell in the row) determines if a charge indicating a 1 or 0 is loaded into each capacitor.
    • 6 This step is long because “the sense amplifier has to read a very weak charge” and “the row is formed by the gates of memory cells.” The controller then chooses a cell in the row from which to read from by strobing the column address (CAS) onto the address bus. A write requires the enable signal to be asserted at the same time as the CAS, while a read requires the enable signal to be de-asserted. The time it takes the data to move onto the bus after the CAS is called the CAS latency. Although recent generations of DRAM are still slower than SRAM, DRAM is used when a largeramount of memory is required since it is less expensive. For example, in embedded systems, a small block of SRAM is used for the critical data path, and a large block of DRAM is used to satisfy all other needs .The following section will discuss the development of DRAM into a faster, more energy efficient memory. 2.4 DEVELOPMENT OF DRAM Many factors are considered in the development of high performance RAM. Ideally, the developer would always like memory to transfer more data and respond in less time; memory would have higher bandwidth and lower latency. However, improving upon one factor often involves sacrificing the other. Bandwidth is the amount of data transferred per second. It depends on the width of the data bus and the frequency at which data is being transferred. Latency is the time between when the address strobe is sent to memory and when the data is placed on the data bus. DRAM is slower than SRAM because it periodically disables the refresh cycle and because it takes a much longer time to extract data onto the memory bus. Advancements have been, however, to several different aspects of DRAM to increase bandwidth and decrease latency. Over time, DRAM has evolved to become faster and more energy efficient by decreasing in cell size and increasing in capacity. In the following section, we will look at different types of DRAM and how DDR3 memory has come to be.
    • 7 2.4.1 DRAM One of the reasons the original DRAM was very slow is because of extensive addressing overhead. In the original DRAM, an address was required for every 64-bit access to memory. Each access took six clock cycles. For a four 64-bit access to consecutive addresses in memory, the notation for timing was 6-6-6-6. Dashes separate memory accesses and the numbers indicate how long the accesses take. This DRAM timing example took 24 cycles to access the memory four times. In contrast, more recent DRAM implements burst technology which can send many 64-bit words toconsecutive addresses. While the first access still takes six clock cycles due memory accessing, the next three adjacent addresses can be performed in as little as one clock cycle since the addressing does not need to be repeated. During burst mode, the timing would be 6-1-1-1, a total of nine clock cycles. The original DRAM is also slower than its descendants because it is asynchronous. This means there is no memory bus clock to synchronize the input and output signals of the memory chip. The timing specifications are not based on a clock edge, but rather on maximum and minimum timing values (in seconds). The user would need to worry about designing a state machine with idle states, which may be inconsistent when running the memory at different frequencies. 2.4.2 Synchronous DRAM In order to decrease latency, SDRAM utilizes a memory bus clock to synchronize signals to and from the system and memory. Synchronization ensures that the memory controller does not need to follow strict timing; it simplifies the implemented logic and reduces memory access latency. With a synchronous bus, data is available at each clock cycle. SDRAM divides memory into two to four banks for concurrent access to different parts of memory.Simultaneous access allows continuous data flow by ensuring there will always be a memory bank read for access. The addition of banks adds another segment to the addressing, resulting in a bank, row and column address.
    • 8 The memory controller determines if an access addresses the same bank and row as the previous access, so only a column address strobe must be sent. This allows the access to occur much more quickly and can decrease overall latency. 2.4.3 DDR1 SDRAM DDR1 SDRAM (i.e. first generation of SDRAM) doubles the data rate (hence the term DDR) of SDRAM without changing clock speed or frequency. DDR transfers data on both the rising and falling edge of the clock, has a pre-fetch buffer and low voltage signaling, which makes it more energy efficient than previous designs. Unlike SDRAM, which transfers 1 bit per clock cycle from the memory array to the data queue, DDR1 transfers 2 bits to the queue in two separate pipelines. The bits are released in order on the same output line. This is called a 2n-prefetch architecture. In addition, DDR1 utilizes double transition clocking by triggering on both the rising and falling edge of the clock to transfer data. As a result, the bandwidth of DDR1 is doubled without an increase in the clock frequency. In addition to doubling the bandwidth, DDR1 made advances is energy efficiency. DDR1 can operate at 2.5V instead of the 3.3V operating point of SDRAM thanks to low voltage signaling technology. 2.4.4 DDR2 SDRAM Data rates of DDR2 SDRAM are up to eight times more than original SDRAM. At an operation voltage of1.8V, it achieves lower power consumption than DDR1. DDR2 SDRAM has a 4-bit prefetch buffer, an improvement from the DDR12-bit prefetch. This means that 4 bits are transferred per clock cycle from the memory array to the data bus, which increases bandwidth.
    • 9 2.4.5 DDR3 SDRAM DDR3 provides two burst modes for both reading and writing: burst chop (BC4) and burst length eight (BL8). BC4 allows bursts of four by treating data as though half of it is masked. This creates smooth transitioning if switching from DDR2 to DDR3 memory. However, burst mode BL8 is the primary burst mode. BL8 allows the most data to be transferred in the least amount of time; it transfers the greatest number of 64-bit data packets (eight) to or from consecutive addresses in memory, which means addressing occurs once for every eight data packets sent. In order to support a burst length of eight data packets, DDR3 SDRAM has an 8- bit prefetch buffer.DDR3, like its predecessors, not only improves upon bandwidth, but also energy conservation.Power consumption of DDR3 can be up to 30 percent less than DDR2. The DDR3 operating voltage is the lowest yet, at 1.5 V, and low voltage versions are supported at voltages of 1.35 V. 2.5 TIMELINE Ideally, memory performance would improve at the same rate as central processing unit (CPU) performance. However, memory latency has only improved about five percent each year . The longest latency (RAS latency) of the newest release of DRAM for each year is shown in the plot in Figure 2.1. Figure 2.1 DRAM Row Access Latency vs. Year
    • 10 As seen in Figure 2.1, the row access latency decreases linearly with every new release of DRAM until 1996. Once SDRAM is released in 1996, the difference in latency from year to year is much smaller. With recent memory releases it is much more difficult to reduce RAS latency. This can be seen especially for DDR2 and DDR3 memory releases 2006 to 2012.CAS latency, unlike RAS latency, consistently decreases (bandwidth increases) with every memory release, and in the new DDR3 memory, is very close to 0 ns. Figure 2.2 shows the column access latency. Figure 2.2 DRAM Column Address Time vs. Year Looking at some prominent areas of the CAS graph, it can be seen in Figure 2.2 that bandwidth greatly increased (CAS decreased) from 1983 to 1986. This is due to the switch from NMOS DRAMs to CMOS DRAMs. In1996 the first SDRAM was released. The CAS latency decreased (bandwidth increased) due to synchronization and banking. In later years, the CAS latency does not decrease by much, but this is expected since the latency is already much smaller. Comparing Figure 2.2 to Figure 2.1, CAS time decreases much more drastically than RAS time. This means the bandwidth greatly improves, while latency improves much more slowly. In 2010, when DDR2 was released, it can be seen that latency was sacrificed (Figure 2.1) for an increase in bandwidth (Figure 2.2).
    • 11 CHAPTER 3 METHODOLOGY In this section the ML605 and Virtex-6 board hardware is described as well as the tools utilized for design and validation. The Xilinx Integrated Software Environment (ISE) was used for design and iSim and ChipScope were used for validation in simulation and in hardware. 3.1 HARDWARE 3.1.1 Virtex-6FPGA The Virtex-6 FPGA (XC6VLX240T) is used to implement the arbiter. This FPGA has 241, 152 logic cells and is organized into banks (40 pins per bank). These logic cells, or slices, are composed of four look-up tables (LUTs), multiplexers and arithmetic carry logic. LUTs implement Boolean functions, and multiplexers enable combinatorial logic. Two slices form a configurable logic block (CLB). In order to distribute a clock signal to all these logic blocks, the FPGA has five types of clock lines: BUFG, BUFR, BUFIO, BUFH, and high- performance clock. These lines satisfy “requirements of high fan out, short propagation delay, and extremely low skew”. The clock lines are also split into categories depending on the sections of the FPGA and components they drive. The three categories are: global, regional, and I/O lines. Global clock lines drive all flip-flops, clock enables, and many logic inputs. Regional clock lines drive all clock destinations in their region and two bordering regions. There are six to eighteen regions in an FPGA. Finally, I/O clock lines are very fast and only drive I/O logic and serializer/deserializer circuits.
    • 12 3.1.2 ML605 Board The Virtex-6 FPGA is included on the ML605 Development Board. In addition to the FPGA, the development board includes a 512 MB DDR3 small outline dual inline memory module (SODIMM), which our design arbitrates access to. A SODIMM is the type of board the memory is manufactured on .The FPGA also includes 32 MB of linear BPI Flash and 8 Kb of IIC EEPROM. Communication mechanisms provided on the board include Ethernet, SFP transceiver connector, GTX port, USB to UART Bridge, USB host and peripheral port, and PCI Express. The only connection used during this project was the USB JTAG connector. It was used to program and debug the FPGA from the host computer. There are three clock sources on the board: a 200 MHz differential oscillator, 66 MHz single- ended oscillator and SMA connectors for an external clock. This project utilizes the 200MHz oscillator. Peripherals on the ML605 board were useful for debugging purposes. The push buttons were used to trigger sections of code execution in ChipScope such as reading and writing from memory. Dip switches acted as configuration inputs to our code. For example, they acted as a safety to ensure the buttons on the board were not automatically set to active when the code was downloaded to the board. In addition, the value on the switches indicated which system would begin writing first for debugging purposes. LEDs were used to check functionality of sections of code as well, and for additional validation, they can be used to indicate if an error as occurred. Although we did not use it, the ML605 board provides an LCD. 3.2 TOOLS Now that the hardware where the design is placed is described, the software used to manipulate the design can be described. The tools for design include those provided within Xilinx Integrated Software Environment, and the tools used for validation include iSim and ChipScope. This looks at the turn-around time for both validation tools and what it means for the design process.
    • 13 3.2.1 Xilinx Integrated Software Environment (ISE) We designed the arbiter using Verilog hardware description language in Xilinx Integrated Software Environment (ISE). ISE is an environment in which the user can “take [their] design from design entry through Xilinx device programming”. The main workbench for ISE is ISE Project Navigator. The Project Navigator tool allows the user to effectively manage their design and call upon development processes. In Figure 3.1, a screen shot of ISE Project Navigator : Figure 3.1 Screen Shot of ISE Project Navigator Figure 3.1 shows some main windows in ISE Project Navigator. On the right hand side is the window for code entry. The hierarchal view of modules in the design appears on the left, and when implementation is selected from the top, the design implementation progress is shown in the bottom window. If simulation were selected instead of implementation there would be an option to run the design for simulation. The main processes called upon by ISE are synthesis, implementation, and bit stream generation. During synthesis, Xilinx Synthesis Technology (XST) is called upon. XST synthesizes Verilog, VHDL or mixed language designs and creates netlist files. Netlist files, or NGC files, contain the design logic and constraints.
    • 14 They are saved for use in the implementation process. During synthesis, the XST checks for synthesis errors (parsing) and infers macros from the code. When the XST infers macros it recognizes parts of the code that can be replaced with components in its library such as MUXes, RAM encodes them in a way that would be best for reduced area and/or increased speed. Implementation is the longest process to perform on the design. The first step of implementation is to combine the netlists and constraints into a design/NGD file. The NGD file is the design file reduced to Xilinx primitives. This process is called translation. During the second step, mapping, the design is fitted into the target device. This involves turning logic into FPGA elements such as configurable logic blocks. Mapping produces a native circuit description (NCD) file. The third step, place and route, uses the mapped NCD file to place the design and route timing constraints. Finally, the program file is generated and, at the finish of this step, a bit stream is ready to be downloaded to the board. 3.2.2 Synthesis and Simulation Once the design has been synthesized, simulation of the design is possible. Simulating a design enables verification of logic functionality and timing. We used simulation tool in ISE (isim) to view timing and signal values. In order to utilize isim, we created a test bench to provide the design with stimulus. Since simulation only requires design synthesis, it is a relatively fast process. The short turn-around time of simulation means we were able to iteratively test small changes to the design and, therefore, debug our code efficiently. 3.2.3 Implementation and Hardware VALIDATION Once the design was working in simulation, we still needed to test the design’s functionality in hardware. Testing the design in hardware is the most reliable validation method. In order to download the design to the board, it first needs to be implemented in ISE.
    • 15 Implementation has a much longer turn- around time than synthesis, so while functionality in hardware ensures the design is working, simulation is the practical choice for iterative verification. In order to test our design in hardware, we utilized ChipScope Pro Analyzer, a GUI which allows the user to “configure [their] device, choose triggers, setup the console, and view results of the capture on the fly”. In order to use ChipsScope Pro, you may either insert ChipScope Pro Cores into the design using the Core Generator, a tool that can be accessed in ISE Project Figure 3.2 Flow Chart and Timing for Simulation and Hardware Validation Navigator, or utilize the Plan Ahead or Core Inserter tool, which automatically inserts cores into the design netlist for you. One method of inserting ChipScope cores into the design is by utilizing Plan Ahead software. The Plan Ahead tool enables the creation of floorplans.
    • 16 Floorplans provide an initial view of “the design’s interconnect flow and logic module sizes. This helps the designer to “avoid timing, utilization, and routing congestion issues. Plan Ahead also allows the designer to create and configure I/O ports and analyze implementation results, which aids in the discovery of bottlenecks in the design. For our project, however, we utilized Plan Ahead only for its ability to automatically insert ChipScope cores. Plan Ahead proved to be inefficient for our purposes since many times, when a change was made in the design, the whole netlist would need to be selected again. In addition, there were bugs in the software that greatly affected the turn-around time of debugging, and it crashed several times. If Plan Ahead were used for floor planning and other design tools, then it might have proved to be much for useful. In replace of Plan Ahead, we utilized the Core Generator within ISE. The ChipScope cores provided by Xilinx include ICON, ILA, VIO, ATC2, and IBERT. The designer can choose which cores to insert by using the Core Generator in ISE. The ICON core provides communication between the different cores and the computer running ChipScope. It can connect up to fifteen ILA, VIO, and ATC2 cores. The ILA core is used to synchronously monitor internal signals. It contains logic to trigger inputs and outputs and capture data. ILA cores allow up to sixteen trigger ports, which can be 1 to 256 bits wide. The VIO core can monitor signals like ILA, but also drive internal FPGA signals real-time. The ATC2 core is similar to the ILA core, but was created for Agilent FPGA dynamic probe technology. Finally, the IBERT core contains “all the logic to control, monitor, and change transceiver parameters and perform bit error ratio tests. The only ChipScope cores we were concerned with in this project were the ICON and ILA cores We inserted one ChipScope ILA and ICON cores using the ISE Core Generator within ISE Project Navigator. The ILA core allowed us to monitor internal signals in the FPGA. Instead of inserting a VIO core, which allows inputs to and outputs from ChipScope, we used buttons to trigger the execution of write and read logic.
    • 17 3.2.4 Analysis of Turn-Around Times As introduced in sections 3.3.2 and 3.3.3, implementation takes much longer than synthesis. Therefore, when it comes down to turn-around time, simulation is much more effective for iterative debugging. In Figure 3.2, the phases for simulation and hardware validation can be seen as well as the time it takes to complete each phase. For simulation, the process starts at Verilog code, becomes synthesized logic, and using a test bench, is run in iSim for viewing. This process takes about eight minute’s total. A system’s simulation run-time is much longer than if it were running on hardware, but simulation is still faster than hardware validation because it does not have to undergo implementation. The bottleneck in our simulation process is the set up time for the DDR3 memory model which accounts for most of the simulation time. Hardware validation starts at Verilog code, is synthesized, implemented, and imported into ChipScope. This whole process takes about fifteen minutes. Most of the time spent for hardware validation is on implementation of the design. In addition, hardware validation requires more of the user’s attention. It is more difficult and takes more time to set up a ChipScope core than it does to create a test bench for simulation. While a test bench (green) involves writing some simple code, a ChipScope core (orange) involves setting up all the signals to be probed. Not only is simulation faster, but the iSim tool is easier to use than ChipScope. Figure.3.3shows
    • 18 Figure 3.3 iSim Screen Shot The screen shot of iSim shows the instance names in the first column, all the signals to choose from in the second, and the signals and their waveforms in the third and fourth columns. The user can view any signal without having to port it out of the design and re-implement like when using ChipScope. When adding an additional signal in iSim, only simulation needs to be restarted. The iSim interface makes debugging much easier with collapsible signal viewing, grouping abilities, and a large window for viewing many signals at once. A screen shot of ChipScope is shown in Figure 3.4 In ChipScope, you can view the devices, signals, triggers, and waveforms window.The time ChipScope is able to capture is much less than iSim. For this reason, triggers are required to execute different parts of code; this is where buttons were utilized. If a signal could not fit into the allowable number of signal inputs or was forgotten, it would need to be added to the design and implemented all over again much longer turn-around time than simulation. Therefore, simulation is used for iterative debugging and functionality testing, while hardware validation is the next step to ensure design accuracy.
    • 19 Figure 3.4 ChipScope Screen Shot 3.2.5 Xilinx Core Generator One tool in ISE that was very important to our project was the CORE Generator. The core generator provided us with not only the ChipScope cores, but the memory controller, and FIFOs as well. The core generator can be accessed within ISE Project Navigator. It provides many additional functions for the designer. The options provided for creating FIFOs, for example, include common or independent clocks, first-word fall-through; a variety of flags to indicated the amount of data in the FIFO and write width, read width and depth. The different width capabilities allowed us to create asynchronous FIFOs. The memory controller was created using the Xilinx memory interface generator (MIG). There were options to use an AXI4, native, or user interface, which is discussed in a following section on interfacing with the Xilinx MIG.
    • 20 CHAPTER 4 ARCHITECTURE The SDR SDRAM Controller consists of four main modules: the SDRAM controller, control interface, command, and data path modules. The SDRAM controller module is the top-level module that instantiates the three lower modules and brings the whole design together. The control interface module accepts commands and related memory addresses from the host, decoding the command and passing the request to the command module. The command module accepts commands and addresses from the control interface module, and generates the proper commands to the SDRAM. The data path module handles the data path operations during WRITEA and READA commands. The SDRAM controller module also instantiates a PLL that is used in the CLOCK_LOCK mode to improve I/O timing. This PLL is not essential to the operation of the SDR SDRAM Controller and can be easily removed. Figure 4 Architecture of SDRAM controller
    • 21 4.1 CONTROL INTERFACE MODULE The control interface module decodes and registers commands from the host, and passes the decoded NOP, WRITEA, READA, REFRESH, PRECHARGE, and LOAD_MODE commands, and ADDR to the command module. The LOAD_REG1 and LOAD_REG2 commands are decoded and used internally to load the REG1 and REG2 registers with values from ADDR. Figure 4.1 shows the control interface module block diagram. Figure 4.1 Control Interface Module
    • 22 The control interface module also contains a 16-bit down counter and control circuit that is used to generate periodic refresh commands to the command module. The 16-bit down counter is loaded with the value from REG2 and counts down to zero. The REFRESH_REQ output is asserted when the counter reaches zero and remains asserted until the command module acknowledges the request. The acknowledge from the command module causes the down counter to be reloaded with REG2 and the process repeats. REG2 is a 16-bit value that represents the period between REFRESH commands that the SDR SDRAM Controller issues. The value is set by the equation int (refresh_period/clock_period). For example, if an SDRAM device that is connected to the SDR SDRAM Controller has a 64-ms, 4096-cycle refresh requirement, the device must have a REFRESH command issued to it at least every64 ms/4096 = 15.625 µs. If the SDRAM and SDR SDRAM Controller are clocked by a 100-MHz clock, the maximum value of REG2 is 15.625 µs/0.01µs = 1562d. 4.2 COMMAND MODULE The command module accepts decoded commands from the control interface module, refresh requests from the refresh control logic, and generates the appropriate commands to the SDRAM. The module contains a simple arbiter that arbitrates between the commands from the host interface and the refresh requests from the refresh control logic. The refresh requests from the refresh control logic have priority over the commands from the host interface. If a command from the host arrives at the same time or during a hidden refresh operation, the arbiter holds off the host by not asserting CMDACKuntil the hidden refresh operation is complete. If a hidden refresh command is received while a host operation is in progress, the hidden refresh is held off until the host operation is complete. Figure 4.2 shows the command module block diagram.
    • 23 Figure 4.2 Command Module Block Diagram After the arbiter has accepted a command from the host, the command is passed onto the command generator portion of the command module. The command module uses three shift registers to generate the appropriate timing between the commands that are issued to the SDRAM. One shift register is used to control the timing the ACTIVATE command; a second is used to control the positioning of the READA or WRITEA commands; a third is used to time command durations, which allows the arbiter to determine if the last requested operation has been completed. The command module also performs the multiplexing of the address to the SDRAM. The row portion of the address is multiplexed out to the SDRAM outputs A[11:0] during the ACTIVATE(RAS) command. The column portion is then multiplexed out to the SDRAM address outputs during a READA (CAS) or WRITEA command. The output signal OEis generated by the command module to control tristate buffers in the last stage of the DATAIN path in the data path module.
    • 24 4.3 DATA PATH MODULE The data path module provides the SDRAM data interface to the host. Host data is accepted on DATAINfor WRITEA commands and data is provided to the host on DATAOUTduring READA commands. Figure 4.3 shows the data path module block diagram. Figure 4.3 Data Path Module The DATAINpath consists of a 2-stage pipeline to align data properly relative to the CMDACK and the commands that are issued to the SDRAM. DATAOUTconsists of a 2-stage pipeline that registers data from the SDRAM during a READA command. DATAOUTpipeline delay can be reduced to one or even zero registers, with the only affect that the relationship of DATAOUTto CMDACKchanges.
    • 25 CHAPTER 5 OPERATION The single data rate (SDR) synchronous dynamic random access memory (SDRAM) controller provides a simplified interface to industry standard SDR SDRAM. The SDR SDRAM Controller is available in either Verilog HDL or VHDL and is optimized for the architecture. The SDR SDRAM Controller supports the following features:  Burst lengths of 1, 2, 4, or 8 data words.  CAS latency of 2 or 3 clock cycles.  16-bit programmable refresh counter used for automatic refresh.  2-chip selects for SDRAM devices.  Supports the NOP, READA, WRITEA, AUTO_REFRESH, PRECHARGE, ACTIVATE, BURST_STOP, and LOAD_MR commands.  Support for full-page mode operation.  Data mask line for write operations.  PLL to increase system performance. Figure 5 SDR SDRAM Controller System-Level Diagram
    • 26 5.1 SDRAM OVERVIEW SDRAM is high-speed dynamic random access memory (DRAM) with a synchronous interface. The synchronous interface and fully-pipelined internal architecture of SDRAM allows extremely fast data rates if used efficiently. Internally, SDRAM devices are organized in banks of memory, which are addressed by row and column. The number of row- and column-address bits and the number of banks depends on the size of the memory. SDRAM is controlled by bus commands that are formed using combinations of the RASN, CASN, and WENsignals. For instance, on a clock cycle where all three signals are high, the associated command is a no operation (NOP). A NOP is also indicated when the chip select is not asserted. Table 5.1 shows the standard SDRAM bus commands. Table 5.1 SDRAM Bus Commands SDRAM banks must be opened before a range of addresses can be written to or read from. The row and bank to be opened are registered coincident with the ACT command. When a bank is accessed for a read or a write it may be necessary to close the bank and re-open it if the row to be accessed is different than the row that is currently opened. Closing a bank is done with the PCH command.
    • 27 The primary commands used to access SDRAM are RD and WR. When the WR command is issued, the initial column address and data word is registered. When a RD command is issued, the initial address is registered. The initial data appears on the data bus 1 to 3 clock cycles later. This is known as CAS latency and is due to the time required to physically read the internal DRAM core and register the data on the bus. The CAS latency depends on the speed of the SDRAM and the frequency of the memory clock. In general, the faster the clock, the more cycles of CAS latency are required. After the initial RD or WR command, sequential read and writes continue until the burst length is reached or a BT command is issued. SDRAM memory devices support burst lengths of 1, 2, 4, or 8 data cycles. The ARF is issued periodically to ensure data retention. This function is performed by the SDR SDRAM Controller and is transparent to the user. The LMR is used to configure the SDRAM mode register which stores the CAS latency, burst length, burst type, and write burst mode. Consult the SDRAM specification for additional details. SDRAM comes in dual in-line memory modules (DIMMs), small-outline DIMMs (SO-DIMMs) and chips. To reduce pin count SDRAM row and column addresses are multiplexed on the same pins. SDRAM often includes more than one bank of memory internally and DIMMS may require multiple chip selects. 5.2 FUNCTIONAL DESCRIPTION Table shows the SDR SDRAM Controller interface signals. All signals are synchronous to the system clock and outputs are registered at the SDR SDRAM Controller’s outputs.
    • 28 Table 5.2 Interface Signals 5.3 SDRAM CONTROLLER COMMAND INTERFACE The SDR SDRAM Controller provides a synchronous command interface to the SDRAM and several control registers. Table shows the commands, which are described in following sections. The following rules apply to the commands with reference with table 5.2:  All commands, except NOP, are driven by the user ontoCMD [2:0]; ADDR and DATAIN are set appropriately for the requested command. The controller registers the command on the next rising clock edge.
    • 29  To acknowledge the command the controller asserts CMDACKfor one clock period.  For READA or WRITEA commands, the user should start receiving or writing data on DATAOUTand DATAIN.  The user must drive NOP onto CMD [2:0] by the next rising clock edge after CMDACKis asserted. Table 5.3 Interface Commands 5.3.1 NOP Command NOP is a no operation command to the controller. When NOP is detected by the controller, it performs a NOP in the following clock cycle. A NOP must be issued the following clock cycle after the controller has acknowledged a command. The NOP command has no affect on SDRAM accesses that are already in progress.
    • 30 5.3.2 READA Command Figure 5.1 Timing diagram for a READA command The READA command instructs the SDR SDRAM Controller to perform a burst read with auto- precharge to the SDRAM at the memory address specified by ADDR. The SDR SDRAM Controller issues an ACTIVATE command to the SDRAM followed by a READA command. The read burst data first appears on DATAOUT(RCD + CL + 2) after the SDR SDRAM Controller asserts CMDACK. During a READA command the user must keep DMlow. When the controller is configured for full-page mode, the READA command becomes READ (READ without auto-pre- charge). Figure 5.1 shows an example timing diagram for a READA command.
    • 31 The following sequence describes the general operation of the READA command:  The user asserts READA, ADDRand DM.  The SDR SDRAM Controller asserts CMDACK to acknowledge the command and simultaneously starts issuing commands to the SDRAM devices.  One clock after CMDACKis asserted, the user must assert NOP.  The CMDACKpresents the first read burst value on DATAOUT, the remainder of the read bursts follow every clock cycle. 5.3.3 WRITEA Command Figure 5.2 Timing diagram for a WRITEA command The WRITEA command instructs the SDR SDRAM Controller to perform a burst write with auto- precharge to the SDRAM at the memory address specified by ADDR.
    • 32 The SDR SDRAM Controller will issue an ACTIVATE command to the SDRAM followed by a WRITEA command. The first data value in the burst sequence must be presented with the WRITEA and ADDR address. The host must start clocking data along with the desired DMvalues into the SDR SDRAM Controller (tRCD – 2) clocks after the SDR SDRAM Controller has acknowledged the WRITEAcommand. See a SDRAM data sheet for how to use the data mask lines DM/DQM.When the SDR SDRAM Controller is in the full-page mode WRITEA becomes WRITE (write without auto-precharge). Figure shows an example timing diagram for a WRITEA command. The following sequence describes the general operation of a WRITEA command:  The user asserts WRITEA, ADDR, the first write data value on DATAIN, and the desired data mask value on DM with reference to the table 5.2 and 5.3.  The SDR SDRAM Controller asserts CMDACK to acknowledge the command and simultaneously starts issuing commands to the SDRAM devices.  One clock after CMDACKwas asserted, the user asserts NOP on CMD.  The user clocks data and data mask values into the SDR SDRAM Controller through DATAIN and DM. 5.3.4 REFRESH Command The REFRESH command instructs the SDR SDRAM Controller to perform an ARF command to the SDRAM. The SDR SDRAM Controller acknowledges the REFRESH command with CMDACK. Figure 5.3 shows an example timing diagram of the REFRESH command.
    • 33 Figure 5.3 Timing diagram for a REFRESH command The following sequence describes the general operation of a REFRESH command:  The user asserts REFRESH on the CMDinput.  The SDR SDRAM Controller asserts CMDACK to acknowledge the command and simultaneously starts issuing commands to the SDRAM devices.  The user asserts NOP on CMD
    • 34 5.3.5 PRECHARGE Command Figure 5.4 Timing diagram for a PRECHARGE command The PRECHARGE command instructs the SDR SDRAM Controller to perform a PCH command to the SDRAM. The SDR SDRAM Controller acknowledges the command with CMDACK. The PCH command is also used to generate a burst stop to the SDRAM. Using PRECHARGE to terminate a burst is only supported in the full-page mode. Note that the SDR SDRAM Controller adds a latency from when the host issues a command to when the SDRAM sees the PRECHARGE command of 4 clocks. If a full-page read burst is to be stopped after 100 cycles, the PRECHARGE command must be asserted (4 + CL – 1) clocks before the desired end of the burst (CL – 1 requirement is imposed by the SDRAM devices). So if the CAS latency is 3, the PRECHARGE command must be issued (100 – 3 –1 – 4) = 92 clocks into the burst.
    • 35 Figure 5.4 shows an example timing diagram of the PRECHARGE command. The following sequence describes the general operation of a PRECHARGE command:  The user asserts PRECHARGE on CMD.  The DR SDRAM Controller asserts CMDACK to acknowledge the command and simultaneously starts issuing commands to the SDRAM devices.  The user asserts NOP on CMD 5.3.6 LOAD_MODE Command The LOAD_MODE command instructs the SDR SDRAM Controller to perform a LMR command to the SDRAM. The value that is to be written into the SDRAM mode register must be present on ADDR [11:0]with the LOAD_MODE command. The value on ADDR [11:0]is mapped directly to the SDRAM pins A11-A0 when the SDR SDRAM Controller issues the LMR to the SDRAM. Figure 5.5 shows an example timing diagram.  The following sequence describes the general operation of a LOAD_MODE command, the users asserts LOAD_MODE on CMD.  The SDR SDRAM Controller asserts CMDACK to acknowledge the command and simultaneously starts issuing commands to the SDRAM devices.  One clock after the SDR SDRAM Controller asserts CMDACK, the users asserts NOP on CMD.
    • 36 . Figure 5.5 Timing diagram for a LOAD_MODE Command 5.3.7 LOAD_REG1 Command Table 5.4 REG1 Bit Definitions
    • 37 CL is the CAS latency of the SDRAM memory in clock periods and is dependent on the memory device speed grade and clock frequency. Consult the SDRAM data sheet for appropriate settings. CL must be set to the same value as CL for the SDRAM memory devices. RCD is the RAS to CAS delay in clock periods and is dependent on the SDRAM speed grade and clock frequency. RCD = INT(tRCD/clock period), where tRCD is the value from the SDRAM data sheet and clock period is the clock period of the SDR SDRAM Controller and SDRAM clock.RRD is the refresh to RAS delay in clock periods. RRD is dependent on the SDRAM speed grade and clock frequency. RRD= INT(tRRD/clock_period), where tRRD is the value from the SDRAM data sheet and clock_period is the clock period of the SDR SDRAM controller and SDRAM clock.PM is the page mode bit. If PM = 0, the SDR SDRAM Controller operates in non- page mode. If PM = 1, the SDR SDRAM Controller operates in page-mode. See Section “Full- Page Mode Operation” for more information. BL is the burst length the SDRAM devices have been configured for. 5.3.8 LOAD_REG2 Command The LOAD_REG2 command instructs the SDR SDRAM Controller to load the internal configuration register REG2. REG2 is a 16-bit value that represents the period between REFRESH commands that the SDR SDRAM Controller issues. The value is set by the equation int (refresh_period/clock period). For example, if a SDRAM device connected to the SDR SDRAM Controller has a 64-ms, 4096- cycle refresh requirement the device must have a REFRESH command issued to it at least every 64 ms/4096 = 15.625 09 µs.If the SDRAM and SDR SDRAM Controller are clocked by a 100 MHz clock, the maximum value of REG2 is 15.625 µs/0.01 µs = 1562d. The value that is to be written into REG2 must be presented on the ADDR input simultaneously with the assertion of the command LOAD_REG2.
    • 38 CHAPTER 6 ELEMENTS OF MEMORY BANK 6.1 DECODER A decoder is a device which does the reverse operation of an encoder, undoing the encoding so that the original information can be retrieved. The same method used to encode is usually just reversed in order to decode. It is a combinational circuit that converts binary information from n input lines to a maximum of 2n unique output lines. 6.1.1 A 2-to-4 line single-bit decoder In digital electronics, a decoder can take the form of a multiple-input, multiple-output logic circuit that converts coded inputs into coded outputs, where the input and output codes are different. E.g. n-to-2n , binary-coded decimal decoders. Enable inputs must be on for the decoder to function, otherwise its outputs assume a single "disabled" output code word. Decoding is necessary in applications such as data multiplexing, 7 segment display and memory address decoding. The example decoder circuit would be an AND gate because the output of an AND gate is "High" (1) only when all its inputs are "High." Such output is called as "active High output". If instead of AND gate, the NAND gate is connected the output will be "Low" (0) only when all its inputs are "High". Such output is called as "active low output". A slightly more complex decoder would be the n-to-2n type binary decoders. These type of decoders are combinational circuits that convert binary information from 'n' coded inputs to a maximum of 2n unique outputs. We say a maximum of 2n outputs because in case the 'n' bit coded information has unused bit combinations, the decoder may have less than 2n outputs.
    • 39 We can have 2-to-4 decoder, 3-to-8 decoder or 4-to-16 decoder. We can form a 3-to-8 decoder from two 2-to-4 decoders (with enable signals). Figure 6.1 RTL of decoder Similarly, we can also form a 4-to-16 decoder by combining two 3-to-8 decoders. In this type of circuit design, the enable inputs of both 3-to-8 decoders originate from a 4th input, which acts as a selector between the two 3-to-8 decoders. This allows the 4th input to enable either the top or bottom decoder, which produces outputs of D(0) through D(7) for the first decoder, and D(8) through D(15) for the second decoder. Figure 6.2 Simulation Of Decoder
    • 40 A decoder that contains enable inputs is also known as a decoder-demultiplexer. Thus, we have a 4-to-16 decoder produced by adding a 4th input shared among both decoders, producing 16 outputs. 6.2 DEMUX The data distributor, known more commonly as a demultiplexer or “Demux” for short, is the exact opposite of the Multiplexer we saw in the previous tutorial. The demultiplexer converts a serial data signal at the input to a parallel data. Figure 6.3 RTL Of DEMUX
    • 41 The demultiplexer takes one single input data line and then switches it to any one of a number of individual at its output lines output lines one at a time. Figure 6.4 Simulation Of DEMUX 6.3 RAM Random-access memory (RAM) is a form of computer data storage. A random-access memory device allows data items to be read and written in roughly the same amount of time regardless of the order in which data items are accessed. In contrast, with other direct-access data storage media such as hard disks, CD-RWs, DVD-RWs and the older drum memory, the time required to read and write data items varies significantly depending on their physical locations on the recording medium, due to mechanical limitations such as media rotation speeds and arm movement delays. Today, random-access memory takes the form of integrated circuits. Strictly speaking, modern types of DRAM are not random access, as data is read in bursts, although the name DRAM / RAM has stuck. However, many types of SRAM are still random access even in a strict sense.
    • 42 RAM is normally associated with volatile types of memory (such as DRAM memory modules), where stored information is lost if the power is removed, although many efforts have been made to develop non-volatile RAM chips. Other types of non-volatile memory exist that allow random access for read operations, but either do not allow write operations or have limitations on them. These include most types of ROM and a type of flash memory called NOR-Flash. 6.3.1 TYPES OF RAM The two main forms of modern RAM are Static Ram (SRAM), dynamic RAM (DRAM). In SRAM, a bit of data is stored using the state of a flip-flop. This form of RAM is more expensive to produce, but is generally faster and requires less power than DRAM and, in modern computers, is often used as cache memory for the CPU. DRAM stores a bit of data using a transistor and capacitor pair, which together comprise a memory cell. The capacitor holds a high or low charge (1 or 0, respectively), and the transistor acts as a switch that lets the control circuitry on the chip read the capacitor's state of charge or change it. As this form of memory is less expensive to produce than static RAM, it is the predominant form of computer memory used in modern computers. Figure 6.5 RTL of RAM
    • 43 Both static and dynamic RAM are considered volatile, as their state is lost or reset when power is removed from the system. By contrast, read-only memory (ROM) stores data by permanently enabling or disabling selected transistors, such that the memory cannot be altered. Writeable variants of ROM (such as EEPROM and flash memory) share properties of both ROM and RAM, enabling data to persist without power and to be updated without requiring special equipment. These persistent forms of semiconductor ROM include USB flash drives, memory cards for cameras and portable devices, etc. ECC memory (which can be either SRAM or DRAM) includes special circuitry to detect and/or correct random faults (memory errors) in the stored data, using parity bits or error correction code. In general, the term RAM refers solely to solid-state memory devices (either DRAM or SRAM), and more specifically the main memory in most computers. In optical storage, the term DVD- RAM is somewhat of a misnomer since, unlike CD-RW or DVD-RW it does not need to be erased before reuse. Nevertheless a DVD-RAM behaves much like a hard disc drive if somewhat slower. Figure 6.6 Simulation of RAM
    • 44 6.4 MUX In electronics, a multiplexer is a device that selects one of several analog or digital input signals and forwards the selected input into a single line. A multiplexer of 2n inputs has n select lines, which are used to select which input line to send to the output. Multiplexers are mainly used to increase the amount of data that can be sent over the network within a certain amount of time and bandwidth. A multiplexer is also called a data selector. Figure 6.7 RTL of MUX An electronic multiplexer can be considered as a multiple-input, single-output switch, and a demultiplexer as a single-input, multiple-output switch. The schematic symbol for a multiplexer is an isosceles trapezoid with the longer parallel side containing the input pins and the short parallel side containing the output pin.
    • 45 The schematic on the right shows a 2-to-1 multiplexer on the left and an equivalent switch on the right. The wire connects the desired input to the output. An electronic multiplexer makes it possible for several signals to share one device or resource, for example one A/D converter or one communication line, instead of having one device per input signal. Figure 6.8 Simulation Of MUX 6.5 BUFFER A buffer amplifier (sometimes simply called a buffer) is one that provides electrical impedance transformation from one circuit to another. Two main types of buffer exist: the voltage buffer and the current buffer
    • 46 6.5.1 VOLTAGE BUFFER A voltage buffer amplifier is used to transfer a voltage from a first circuit, having a high output impedance level, to a second circuit with a low input impedance level. The interposed buffer amplifier prevents the second circuit from loading the first circuit unacceptably and interfering with its desired operation. In the ideal voltage buffer in the diagram, the input resistance is infinite, the output resistance zero (impedance of an ideal voltage source is zero). Other properties of the ideal buffer are: perfect linearity, regardless of signal amplitudes; and instant output response, regardless of the speed of the input signal. If the voltage is transferred unchanged (the voltage gain Av is 1), the amplifier is a unity gain buffer; also known as a voltage follower because the output voltage follows or tracks the input voltage. Although the voltage gain of a voltage buffer amplifier may be (approximately) unity, it usually provides considerable current gain and thus power gain. However, it is commonplace to say that it has a gain of 1 (or the equivalent 0 dB), referring to the voltage gain. As an example, consider a Thévenin source (voltage VA, series resistance RA) driving a resistor load RL. Because of voltage division (also referred to as "loading") the voltage across the load is only VA RL / ( RL + RA ). However, if the Thévenin source drives a unity gain buffer such as that in Figure 1 (top, with unity gain), the voltage input to the amplifier is VA, and with no voltage division because the amplifier input resistance is infinite. At the output the dependent voltage source delivers voltage Av VA = VA to the load, again without voltage division because the output resistance of the buffer is zero. A Thévenin equivalent circuit of the combined original Thévenin source and the buffer is an ideal voltage source VA with zero Thévenin resistance. Figure 6.9 RTL Of Buffer
    • 47 6.5.2 CURRENT BUFFER Typically a current buffer amplifier is used to transfer a current from a first circuit, having a low output impedance level, to a second circuit with a high input impedance level. The interposed buffer amplifier prevents the second circuit from loading the first circuit unacceptably and interfering with its desired operation. In the ideal current buffer in the diagram, the input impedance is zero and the output impedance is infinite (impedance of an ideal current source is infinite). Again, other properties of the ideal buffer are: perfect linearity, regardless of signal amplitudes; and instant output response, regardless of the speed of the input signal. For a current buffer, if the current is transferred unchanged (the current gain βi is 1), the amplifier is again a unity gain buffer; this time known as a current follower because the output current follows or tracks the input current. Figure 6.10- Simulation Of Buffer
    • 48 As an example, consider a Norton source (current IA, parallel resistance RA) driving a resistor load RL. Because of current division (also referred to as "loading") the current delivered to the load is only IA RA / ( RL + RA ). However, if the Norton source drives a unity gain buffer (bottom, with unity gain), the current input to the amplifier is IA, with no current division because the amplifier input resistance is zero. At the output the dependent current source delivers current βi IA = IA to the load, again without current division because the output resistance of the buffer is infinite. A Norton equivalent circuit of the combined original Norton source and the buffer is an ideal current source IA with infinite Norton resistance. 6.6 MEMORY BANK A memory bank is a logical unit of storage in electronics, which is hardware dependent. In a computer the memory bank may be determined by the memory access controller along with physical organization of the hardware memory slots. In a typical synchronous dynamic random-access memory (SDRAM) or double data rate synchronous dynamic random-access memory (DDR SDRAM), a bank consists of multiple rows and columns of storage units and is usually spread out across several chips. In a single read or write operation, only one bank is accessed, therefore bits in a column or a row, per bank, per chip = memory bus width in bits (single channel). The size of a bank is further determined by bits in a column and a row, per chip× number of chips in a bank.
    • 49 Figure 6.11 RTL Of Memory Bank Some computers have several identical memory banks of RAM, and use bank switching to switch between them. Harvard architecture computers have (at least) 2 very different banks of memory, one for program storage and one for data storage.
    • 50 Figure 6.12 Simulation Of Memory Bank
    • 51 CHAPTER 7 RESULTS AND CONCLUSIONS 7.1 POWER CONSUMED WHEN ALL 8 BANKS ARE ON 7.1.1 Project Table 7.1 Project 7.1.2 Device Table 7.2 Device
    • 52 7.1.3 Environment Table 7.3 Environment 7.1.4 Default Activity Table 7.4 Default Activity
    • 53 7.1.5 On-Chip Power Summary Table 7.5 On-Chip Power Summary 7.1.6 Thermal Summary Table 7.6 Thermal Summary 7.1.7 Power Supply Summary Table 7.7 Power Supply Summary
    • 54 Table 7.8 Power Supply Current 7.1.8 Confidence Level Table 7.9 Confidence Level
    • 55 7.1.9 By Hierarchy Table 7.10 By Hierarchy
    • 56 7.2 POWER CONSUMED WHEN ONLY ONE MEMORY BANK IS IN USE 7.2.1. Project Table 7.11 Project 7.2.2 Device Table 7.12 Device
    • 57 7.2.3 Environment Table 7.13 Environment 7.2.4 Default Activity Rates Table 7.14 Default Activity
    • 58 7.2.5 On-Chip Power Summary Table 7.15 On-Chip Power Summary 7.2.6 Thermal Summary Table 7.16 Thermal Summary 7.2.7 Power Supply Summary Table 7.17 Power Supply Summary
    • 59 Table 7.18 Power Supply Current 7.2.8 Confidence Level Table 7.19 Confidence Level
    • 60 7.2.9 By Hierarchy Table 7.20 By Hierarchy 7.3 CONCLUSION This project addresses the problem of finding a memory map for firm real-time workloads in the context of SDRAM memory controllers. Existing controllers use either a static memory map or provide only limited configurability. We use the number of banks requests are interleaved over as flexible configuration parameter, while previous work considers it a fixed part of the controller architecture. We use this degree of freedom to optimize the memory configuration to the mix of applications and their requirements. This is beneficial for the worst-case performance in terms of bandwidth, latency and power.
    • 61 CHAPTER 8 FUTURE SCOPE The advantages of this controller compared to SDR SDRAM, DDR1 SDRAM and DDR2 SDRAM is that it synchronizes the data transfer, and the data transfer is twice as fast as previous, the production cost is also very low. We have successfully designed using Verilog HDL and synthesized using Xilinx tool. 1. DDR4 SDRAM is the 4th generation of DDR SDRAM. 2. DDR3 SDRAM improves on DDR SDRAM by using differential signalling and lower voltages to support significant performance advantages over DDR SDRAM. 3. DDR3 SDRAM standards are still being developed and improved.
    • 62 REFERENCES [1] C. van Berkel, “Multi-core for Mobile Phones,” in Proc. DATE, 2009. [2] “International Technology Roadmap for Semiconductors (ITRS),” 2009. [3] P. Kollig et al., “Heterogeneous Multi-Core Platform for Consumer Multimedia Applications,” in Proc. DATE, 2009. [4] L. Steffens et al., “Real-Time Analysis for Memory Access in Media Processing SoCs : A Practical Approach,” Proc. ECRTS, 2008. [5] S. Bayliss et al., “Methodology for designing statically scheduled application-specific SDRAM controllers using constrained local search, “in Proc. FPT, 2009. [6] B. Akesson et al., “Architectures and modelling of predictable memory controllers for improved system integration,” in Proc. DATE, 2011. [7] J. Reineke et al., “PRET DRAM Controller: Bank Privatization for Predictability and Temporal Isolation,” in Proc. CODES+ISSS, 2011. [8] M. Paolieri et al., “An Analyzable Memory Controller for Hard Real-Time CMPs,” Embedded Systems Letters, IEEE, vol. 1, no. 4, 2009. [9] Micron Technology Inc., “DDR3-800-1Gb SDRAM Datasheet, 02/10 EN edition,” 2006. [10] D. Stiliadis et al., “Latency-rate servers: a general model for analysis of traffic scheduling algorithms,” IEEE/ACM Trans. Netw., 1998. [11] B. Akesson et al., “Classification and Analysis of Predictable Memory Patterns,” in Proc.RTCSA, 2010. [12] DDR2 SDRAM Specification, JESD79-2E ed., JEDEC Solid State Technology Association, 2008. [13] DDR3 SDRAM Specification, JESD79-3D ed., JEDEC Solid State Technology Association, 2009.
    • 63 [14] K. Chandrasekar et al., “Improved Power Modelling of DDR SDRAMs,” in Proc. DSD, 2011. [15] B. Akesson et al., “Automatic Generation of Efficient Predictable Memory Patterns,” in Proc. RTCSA, 2011.