Design of a 64-bit ultra low latency memory using 6T SRAM cells and PDK 45nm technology on CADENCE to simulate the results of our chosen implementation.
DevEX - reference for building teams, processes, and platforms
64 bit sram memory: design paper
1. A 64-Bit Memory System Design
Group 13
Mamoon Ismail Khalid
N11965449
Haixiang Liu
N14227283
Mohammed El Massad
N19662628
Yifei Chen
N15390017
Linxuam Wang
N14518973
Abstract—This report discusses the design and implementation
of a 64-bit Static Random Access Memory (SRAM) system in
45 nm CMOS technology using six-transistor (6T) cells. The goal
of the project was to design a memory system that is optimal, in
terms of speed, in the target technology.
I. INTRODUCTION
The project was divided into four parts.
An SRAM memory system consists of four parts: (1) the
address registers, where addresses received from the CPU are
stored while the memory retrieves the requested data, (2) a row
decoder that takes the address stored in the address register
and selects the appropriate row of the memory array, (3) the
memory array itself, consisting of SRAM cells arranged into
an array fashion, and (4) two data registers where the data to
be read or written into the memory is stored.
II. METHODOLOGY
The 6T cell is the basis of the SRAM, which holds the
data. On a 64 bit SRAM layout, there is 1 column and 16
rows. A column can store 4 bits. i.e. 4 individual elements are
present in each column. The effective layout is 16x[1x4]. A
modular design approach is done, wherein each column and
its circuits(read/write) are done individually. Address registers
are used to store and deliver the address values to the circuit,
and similarly data registers are used to delver the data to be
written to the circuit. A second set of data register is used to
store the values read from the SRAM. A 4-16 decoder is used
as the row decoder to select the row of operation.
A. Decoder Logic
The goal here was to design a 1-to-16 line decoder, i.e.,
a combinational logic circuit that activates one of sixteen
output bits for each input value from 0 to 15 — the range of
integer values that can be expressed in four bits. The intended
functionality of the circuit is shown in Figure 1. The circuit
was to be designed such that the delay is at most 116.55ps
(35% of 333ps).
We implemented the 4-to-16 decoder using two 2-to-4
decoders and 16 AND gates. We used the static logic style
to implement the two 2-to-4 decoders. To implement the WL
signals, we simply fed each decoder output together with the
clock signal through an AND gate. To avoid glitch, we use clk
with different delay to control the decoder and wl generation
(this part is used synchronize wl with cl) Figure 2. The delay
from the rising edge of the of the clock signal to the rising
Fig. 1. Intended functionality of WL signals in the target design.
edge of the WL signal was approximately 44 ps. The delay
from the falling edge of the clock to the falling edge of
the WL signal was approximately 46 ps. The average power
dissipation of the decoder was 24.5 µW (note that the average
power dissipation of the decoder was calculated assuming all
four inputs transition from 0 to 1 and 1 to 0 with the same
probability, i.e., 50%).
Fig. 2. Top-level Cadence Schematic for Decoder Design and WL imple-
mentation in our memory system.
Fig. 3. Waveform showing the operation of WL signals in our memory system.
2. B. Address and Data Registers
For the address and data registers, we used master–slave
edge-triggered D flip-flops as such flip-flops are not suscep-
tible to race conditions, which makes them more stable in
comparison to other types of flip-flops. We used the flip-flops
to create the 4-bit address register and the two 4-bit data
registers.
Fig. 4. Circuit-level schematic of address and data registers in our memory
system.
C. Read and Write Circuit
1) Write Circuits: We implemented the write circuit based
on the lecture notes (using two transmission gates for each
write circuit that are controlled by the Write Enable (WE)
signal). We connect the outputs of the data register flip-flops to
a voltage-controlled switch that outputs 0 upon input of a LOW
voltage for BL and 1 upon input of a HIGH voltage for BL.
We also added an inverter chain between each of the two data
drivers and its corresponding transmission gate, in order to
reduce the write delay (a 4-inverter chain for the complement
of the bit line, with u factor of 2.39, and 2-inverter chain for
the bit line with a u factor of 2.98. Fig. 5 shows the schematic
of a single write circuit in our memory system.
Fig. 5. Schematic of a single write circuit in our memory system.
2) Read Circuit: The bit line of an SRAM cell takes a
relatively long time to discharge (after the being charged to
VDD using the PRE signal and the activation of the word line).
To enable reading at higher speeds, we used a sense amplifier,
which senses small changes in the bit line of the SRAM cell
and generates a full-swing output. We then feed the output of
the sense amplifier to the input of the appropriate data register
flip-flop.
Fig. 6. Schemtic of the read circuit of our memory system.
D. 6T SRAM Cell
The 6T sram cell is a bi-stable latching circuitry. Fig. 7
shows the schematic of our memory cell, where M4 and
M5 are the access transistors, M0 and M1 are the pull-up
transistors, M2 and M6 are the two pull-down transistors.
M6, M7, and M8 are the three precharge transistors. The
word lines are connected to the gate terminals of the access
transistors. Whenever the particular word line goes high the
access transistors are ON and the sram cell stores the data from
the write driver at which time the word enable is ON and in
the next clock cycle the read enable is ON, at this point the
sense amplifier reads the data stored data from 6T sram cell.
We chose the transistor sizes to the read margin and write
margin requirements. We tested five different sizing configu-
rations and the best one satisfy the performance requirements
with the configuration listed in Table I.
Fig. 7. Schematic of a single memory cell in our system.
3. Transistor Pull-up Pull-down Access
Width 90 nm 180 nm 145 nm
TABLE I
SIZING OF THE VARIOUS TRANSISTORS IN OUR SRAM CELLS AND THE
PRECHARGE CIRCUITRY.
Read Margin:
a) VTrip calculation: The potential difference between M1
and M3 with a VDD of 0.8 V is calculated to be 380 mV
b) VRead Calculation: The voltage between M2 and M4 is
calculated with a constant BLB (Bit Line Bar) to be 174 mV.
Therefore, the Read margin is calculated to be Vtrip - Vread
equals to 206 mV which is nearly 26% of VDD.
Write Margin: a) To calculate the write margin, BLB is kept
constant and BL is kept changing from 0 to VDD and we
observe the voltage at which the out goes high with respect to
BL which is at approximately 290 mV. b) Note, whenever we
are giving access transistors greater than pull down network,
we are not able to get write margin greater than 36.25 percent
Vdd.
E. SRAM Layout
SRAM Layout The 6T cell layout is done using two metal
layers and the achieved area for a single sram cell is 0.81 µm2
.
The 16*1 bit sram layout along with the pre-charge circuit
is included in Fig. 9.
Fig. 8. Layout a single memory cell in our system.
F. Complete Peripheral
1) Row Write Circuits: The write circuits are used to push
the BL and BLB beyond the bistability to the value that
needs to be stored. Each row write circuit has a write driver
controlled by the write enable signal, that drives the output to
the value of the data. The BL and BLB before being pushed
to the 6T cell is controlled by the row address decoder, thus
choosing the cell of operation.
2) Row Read Circuits: The row read circuits are controlled
by the address register outputs. The selected BL and BLB
values are now read by using a sense Amplifier. A sense
Fig. 9. Layout of memory cell array.
amplifier also uses a Precharge circuits to charge the values
being held and a latch is finally used to control, smooth and
filter the final output.
3) Remaining Peripherals: The remainder of the peripherals
are split into three parts. First part is Address registers com-
bined with row decoders. This peripheral generated a delay of
about 118.5+16.5+20 = 155ps. The second peripheral consists
of write driver combined with the data registers which inputs
the data into the write driver. The output of the write driver
should reach the sram cell a few picoseconds before the write
line output from the row decoder reaches the sram cell. So the
data from the write driver is delayed by 70ps by using buffers.
The third peripheral consists of sense amplifier and data latch
whose combined accounted to approximately 225 ps. (Please
see the table at the end of the report for timing delays and
other details).
Component Performance characteristic Value
Data and address registers
CLK-Q Delay 16.5 ps
Setup time 20 ps
Hold time 0 ps
Power dissipation 11.72 µW
Row decoder
Delay (before array layout) 118.5 ps
Power dissipation 24.5 µW
SRAM array
Read margin 206 mV
Write margin > 290 mV
Area (of individual cell) 0.81 µm2
Cell access Time 159 ps
Power dissipation
Sense amplifier
Delay 225 ps
Power dissipation
Write circuit
Discharge time of bit line 105 ps
Power dissipation
Total read access time 96 ps
Total write delay 112 ps
TABLE II
PERFORMANCE CHARACTERISTICS OF DIFFERENT COMPONENTS IN OUR
MEMORY SYSTEM.
4. III. SIMULATION RESULTS
Final Output:
1) Read and write enable signal (we and re) are opposite. Here
we add a Q of cell to make the result more clearly.
2) Synchronization of signals: PRE: syn (RE AND2 reverse of
WL): PRE = 0 only when re = 1 and wl = 0, SAE = syn(PRE
AND2 RE): SAE = 1 only when PRE = 1 and re = 1 (to
avoid glitch for SAE, We use ”RE AND2 RE” replace ”RE”
in ”RE” AND2 PRE)
3) Explanation of the process: At the initial state, Q = 0, BL =
0, BLB = 1. Here, when the add (address) = 0, the wl (word
line) works (equal = 1). In this whole process, there are 4
times wl = 1. In first wl = 1, it write 1 to cell (Q goes to 1,
BL goes to 1, BLB goes to 0), then in second wl = 1, read 1
from cell (pre-charge BL and BLB to 1 when PRE = 0, get
OUT sap when SAE = 1, result read data in read register =
1). Then write 0 to cell, and read 0 from cell (read data = 0).
Fig. 10. Simulation results of our memory system. The waveforms show the
transitions of the different signals in our system corresponding to a set of read
and write operations.
IV. DISCUSSION
1) Synchronization: To generate the PRE signal, we
AND’ed the RE signal with the complement of the WL
signal, i.e. PRE = RE.ReverseWL. To generate the SAE
signal, we AND’ed the PRE signal with the RE signal, i.e.,
SAE = PRE.RE. To avoid having glitches in the SAE
signal, we first AND the RE signal with itself, and use that as
the Read Enable signal when generating SAE.
2) Layout Issues: The next problem is the layout area
constraint. The area requirement for a signal SRAM cell is 0.8
nm2. It is extremely hard to satisfy the DRC requirement with
many via and connectors within such small area. The solution
to this problem is to compress the number of connector since
this component will take a lot of space to satisfy DRC needs.
Moreover, utilize upper layers to reduce conflicts happened in
M1 layer. If two metal path with different voltage, use M1
only demands more space to prevent the interference, yet use
two different layers could compress them together within a
narrower space.
V. FILES IN CADENCE
ID: yc2389 PW: N15390017 Working Directory: /ca-
dence/vlsi proj
VI. CONCLUSION
A 64 bit Memory System Design along with the layout of
SRAM array is demonstrated in this report. The whole system
is fully functional with reasonable timing sequence. The layout
of SRAM array is well designed and compact.