UNIT - IV
SUBSYSTEM DESIGN
VLSI
12/03/2024
1
CONTENTS
12/03/2024
2
DATA PATH SUBSYSTEMS: Subsystem Design, Shifters, Adders, ALUs, Multipliers,
Parity generators, Comparators, Zero/One Detectors, Counters.
ARRAY SUBSYSTEMS:
SRAM, DRAM, ROM, Serial Access Memories.
2
Outline
12/03/2024
3
UNIT IV
 DATA PATH SUBSYSTEMS
 Shifters, Adders
 ALUs
 Multipliers
 Parity generators
 Comparators
 Zero/One Detectors
 Counters
3
Shifters
12/03/2024
4
 Logical Shift:
 Shifts number left or right and fills with 0’s
 1011 LSR 1 = 0101 1011 LSL1 =
0110
 Arithmetic Shift:
 Shifts number left or right. Rt shift sign
extends
 1011 ASR1 = 1101 1011 ASL1 =
0110
 Rotate:
 Shifts number left or right and fills with lost bits
 1011 ROR1 = 1101 1011 ROL1 =
0111
12/03/2024
5
 1110
 LsL—1100
 LSR—0111
 ASHR—111
 ASHL-1100
 RR—0111
 RL--1101
4-Bit Barrel Shifter
12/03/2024
6
• A rotate is a shift in which the bits shifted out are inserted into the positions vacated
• The circuit rotates its contents left from 0 to 3 positions depending on Selector S.
Note that a left rotation by three (3)
positions is the same as a right
rotation by one position in this 4 bit
barrel shifter
57
4*4 Barrel shifter
12/03/2024
7
12/03/2024
8
12/03/2024
9
ADDERS
12/03/2024
10
– Single-bit Addition
– Carry-Ripple Adder
– Carry-Skip Adder
– Carry-Lookahead Adder
– Carry-Select Adder
– Carry Save Adder
Single-Bit Addition
12/03/2024
11
Half Adder Full Adder
A B Cout S
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
A B C
Cou
t
S
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
A B
S
Cout
A B
C
S
Cout
Cout
S  A 
B  A B
out
S  A  B 
C
C  MAJ ( A,
B,C)
12/03/2024
12
12/03/2024
13
12/03/2024
14
4 bit binary parallel adder
12/03/2024
18
Carry look ahead adder
12/03/2024
19
Generate / Propagate
12/03/2024
20
– Equations often factored into G and P
– Generate and propagate for groups spanning
ci+1 = Gi + Pi.ci
si = Pi ⊕ ci
Where Gi = ai.bi
Pi = (ai⊕ bi)
CARRY SKIP ADDER
12/03/2024
21
12/03/2024
22
12/03/2024
23
example
12/03/2024
24
 010101
 101010
 Cin=0
 Result=111111 Cout= 0
 Cin= 1
 Result=000000, Cout= 1
 This example shows, if Cin= 0, Cout=0.If Cin = 1, Cout= 1.
CARRY SELECT ADDER
12/03/2024
25
CARRY SELECT ADDER
12/03/2024
26
12/03/2024
27
12/03/2024
28
CARRY SAVE ADDER
12/03/2024
29
12/03/2024
30
12/03/2024
31
12/03/2024
32
CARRY SAVE ADDER(CSA)
12/03/2024
33
12/03/2024
34
12/03/2024
35
12/03/2024
38
12/03/2024
39
12/03/2024
40
12/03/2024
41
12/03/2024
42
Array multiplier
12/03/2024
47
12/03/2024
48
12/03/2024
49
12/03/2024
50
1st part
12/03/2024
51
2nd
part
12/03/2024
52
A 4 × 4 Unsigned Array Multiplier
skew array
for rectangular
layout
X3 X2 X1 X0
× Y3 Y2 Y1 Y0
X0Y0
X1Y0
X0Y1
X2Y0
X1Y1
X0Y2
X3Y3
X3Y2
X2Y3
X3Y1
X2Y2
X1Y3
X3Y0
X2Y1
X1Y2
X0Y3
P7 P6 P5 P4 P3 P2 P1 P0
12/03/2024
53
5
Array Multiplier
12/03/2024
54
5
Array Multiplier
12/03/2024
55
y0
y1
y2
y3
x0
x1
x2
x3
p0
p1
p2
p3
p4
p6
p7
B
S in A C in
S o u t
C o u t
p5
A B
C in
C o u t
S o u t
S in
=
C S A
A rra y
C P A
critical path B
A
S o u t
C o u t C in
C o u t
S o u t
=
C in
B
A
5
Rectangular Array
12/03/2024
56
– Squash array to fit rectangular floorplan
y0
y1
y2
y3
x0
x1
x2
x3
p0
p1
p2
p3
p4
p5
p6
p7
10
Wallace Tree
12/03/2024
58
– Reduces the number of partial products
– Built from carry-save adders:
– Three inputs: a, b, c
– Two outputs: y, z such that y + z = a + b + c
– Carry-save equations:
– yi = ai bi ci
– zi+1 = aibi + bici + ciai
Wallace Tree Structure
12/03/2024
59
FA FA FA
a2 b2 c2
a1 b1 c1 a0 b0 c0
s0
s1
s2
carry-ripple
adder
FA FA FA
a2 b2 c2
a1 b1 c1 a0 b0 c0
carry-save
adder
z1 y0
z2 y1
z3 y2
Wallace Tree Operation
12/03/2024
60
– n additions are reduced to (2n/3) additions after each level
– Sum of inputs = Sum of outputs
– Can apply the reduction hierarchically
– More efficient design uses 4-2 adders to reduce n additions to (n/2) additions after each
level
– Need final adder to add the last two numbers
Comparators
12/03/2024
61
 0’s detector:
 1’s detector:
A = 00…000
A = 11…
111
 Equality comparator: A = B
 Magnitude comparator: A < B
1’s & 0’s Detectors
12/03/2024
62
 1’s detector: N-input AND gate
 0’s detector: NOTs + 1’s detector (N-input NOR)
a l l o n e s
A0
A3
A2
A1
allzeros
a l l o n e s
A 7
A 6
A 5
A 4
A 3
A 2
A 1
A 0
A 7
A 6
A 5
A 4
A 3
A 2
Equality Comparator
12/03/2024
63
 Check if each bit is equal (XNOR, aka equality gate)
 1’s detect on bitwise equality
A = B
B[3]
A[3]
B[2]
A[2]
B[1]
A[1]
B[0]
A[0]
Magnitude Comparator
 Compute B – A and look at sign
 B – A = B + ~A + 1
 For unsigned numbers, carry out is sign bit
B3
A
3
B
2
A
2
B
1
A
1
B
A=
B
Z
A
B
12/03/2024
64
C
N A
B
Counters
Counters can be implemented using the adder/subtractor
circuits and registers (or equivalently, D flip-flops)
12/03/2024
65
The simplest counter circuits can be built using T flip-flops because the toggle feature is
naturally suited for the implementation of the counting operation. Counters are available in two
categories.
1. Asynchronous(Ripple counters) Asynchronous counters, also known as ripple counters,
are not clocked by a common pulse and hence every ip- op in the counter changes at different
times.
EX:- Binary ripple counters, BCD ripple counters
2.Synchronous counters A synchronous counter however, has an internal clock, and the external
event is used to produce a pulse which is synchronized with this internal clock.
E.X.:- Binary counter, Up-down Binary counter, BCD Binary counter, Ring counter, Johnson
Counter.
A 3-bit up-counter.
12/03/2024
A 3-bit down-counter
A 4bit synchronous up counter
synchronous counter using adders and registers
12/03/2024
A linear-feedback shift register (LFSR) consists of N registers configured as a shift register.
The input to the shift register comes from the XOR of particular bits of the register, as
shown in Figure for a 3-bit LFSR. On reset, the registers must be initialized to a
nonzero value (e.g., all 1s). The pattern of outputs for the LFSR is shown in Table
12/03/2024
Linear-Feedback Shift Registers
Array Sub
Systems
12/03/2024
SRAM
DRAM
ROM
Serial Access Memories
Content Addressable Memory
Memory Arrays
12/03/2024
Random Access Memory Serial Access Memory Content Addressable Memory
(CAM)
Read/Write Memory
(RAM)
(Volatile)
Read Only Memory
(ROM)
(Nonvolatile)
Static RAM
(SRAM)
Dynamic RAM
(DRAM)
Shift Registers Queues
First In
First Out
(FIFO)
Last In
First Out
(LIFO)
Serial In
Parallel Out
(SIPO)
Parallel In
Serial
Out
(PISO)
Mask ROM Programmable
ROM
(PROM)
Erasable
Programmable
ROM
(EPROM)
Electrically
Erasable
Programmable
ROM
(EEPROM)
Flash ROM
70
CLASSIFICATION
12/03/2024
Mask Programmed ROMs -Data is written during chip fabrication using a photo mask
Fused ROMs -Data is written by blowing the fuse electrically, hence cannot be modified later
Programmable Read Only Memories (PROMs) :Data is written after chip fabrication
Erasable PROMs -Complete block is erased using UV light which is penetrated through glass
window
Electrically Erasable PROMs -8 bit data is erased at a time, hence slower
Flash - Programmed using high electrical voltage. Erases data in blocks hence faster
71
Architecture
12/03/2024
 Stores large number of bits
 m x n: m words of n bits each
 k = Log2(m) address input signals
 or m = 2k words
 e.g., 4,096 x 8 memory:
 32,768 bits
 12 address input signals
 8 input/output data signals
 Memory access
 r/w: selects read or write
 enable: read or write only when asserted
 multiport: multiple accesses to different locations simultaneously
m × n
memory
…
…
n bits per
word
m
words
enabl
e
2k × n read and write memory
A0
…
r/
w
…
Q0
Qn-1
Ak-1
72
memory external
view
Semiconductor Memory Types (Cont.)
12/03/2024
 RAM: the stored data is volatile
 DRAM
 A capacitor to store data, and a transistor to access the capacitor
 Need refresh operation
 Low cost, and high density  it is used for main memory
 SRAM
 Consists of a latch
 Don’t need the refresh operation
 High speed and low power consumption it is mainly used for
cache memory and memory in hand-held devices
73
12/03/2024
74
12/03/2024
75
Memory
12/03/2024
 Nonvolatile
 Can be read from but not written to, by a
processor in an microcomputer system
before
inserting
 Traditionally written to, “programmed”,
to microcomputer system
 Uses
 Store software program for general-purpose
processor
 Store constant data (parameters) needed by
system
 Implement combinational circuits (e.g.,
decoders)
2k × n ROM
…
Q0
76
Qn-1
A0
…
enabl
e
Ak-1
External
view
Example: 8 x 4 ROM
12/03/2024
 Horizontal lines = words
 Vertical lines = data
 Lines connected only at
circles
010
 Data lines Q3 and Q1 are set to 1 because
there is a “programmed” connection with
word 2’s line
 Word 2 is not connected with data lines
Q2 and Q0
Output is 1010
8 × 4
ROM
3×8
decode
r
 Decoder sets word 2’s line to 1 if address
input is A0
enabl
e
A
1
A
2
Q3 Q2 Q1
Q0
programma
ble
connection
data
line
word 0
word 1
word 2
word
line
Internal
view
77
Memory –
ROM
12/03/2024
 ROM Arrays
 There are two basic types of ROM arrays
1)NOR-based ROM
2) NAND-based ROM
NOR-based ROM: All Column Lines are pulled-up using a PMOS transistor (or
resistor)
The Row Lines are connected to the gates of NMOS
transistors at the intersection of Row and Column Lines
 The presence or absence of the NMOS transistors dictates whether a 1 or a 0 is
stored
If the NMOS transistor is present, it will pull
down the Column Line when its gate is driven high by
the Row Line.
If the NMOS transistor is absent, the Column Line will not be pulled
78
ROM
12/03/2024
 NOR-based ROM
 In order to Read
from the array, the
Row line is asserted
and the desired
Column line is
observed
 a NOR-based
ROM is similar to a
Hex Keypad
79
ROM
12/03/2024
architectu
re
NAND-based ROM
NAND-based ROM is a different
array it uses a depletion-load NMOS as the
pull-up transistor
the Column NMOS’s are connected in series with the
column lines (i.e. a NAND configuration)
If an NMOS exists in the Column line and the Row line is
asserted, the NMOS will pull the Column Line down and
represent a stored ’0’
If an NMOS is absent on the Column line and the
Row line is asserted, the Column
Line
will
remain and
represent
pulled
high a
stored ‘1’
 since all
of
by the depletion NMOS
the NMOS’s arein series,
in order
to
Read
much be
turned ON theRow
we are
asserting,
from a Row, all other
Rows
- this means in
order to distinguish we
80
ROM
12/03/2024
 NAND-based ROM- In this configuration, if an
NMOS is present, it will represent a “stored 1”
since in order to address its location, the Row line
is driven to a ‘0’ and the NMOS not turned on. This
leaves the Column line pulled HIGH.
is absent, it will
represent a “stored
0” the other
Row NMOS’s are turned
on
 - if
an
since
and
NM
OS
all of
will
pull the Column Line
LOW
- this gives the opposite behavior as in a NOR-
based ROM

NMOS
present
NMOS
absent
NO
R
0
1
- it also gives a
complementary
NAND
1
0
addressi
ng
schem
e
NOR NAN
D
Address Row Line by
driving:
1 0
All other Row Lines driven
to:
0 1
81
Mask-programmed ROM
12/03/2024
 Connections “programmed” at fabrication
 set of masks
 Lowest write ability
 only once
 Highest storage permanence
 bits never change unless damaged
 Typically used for final design of high-volume
systems
 spread out NRE (non-recurrent
engineering) cost for a low unit cost
82
EPROM
12/03/2024
84
Sample EPROM
components
12/03/2024
85
Sample EPROM
programmers
12/03/2024
86
EEPROM: Electrically erasable programmable
ROM
12/03/2024
 Programmed and erased electronically
 typically by using higher than normal voltage
 can program and erase individual words
 Better write ability
 can be in-system programmable with built-in circuit to provide higher than
normal voltage
 built-in memory controller commonly used to hide details from memory
user
 writes very slow due to erasing and programming
 “busy” pin indicates to processor EEPROM still writing
 can be erased and programmed tens of thousands of times
 Similar storage permanence to EPROM (about 10 years)
 Far more convenient than EPROMs, but more expensive
87
FLASH
12/03/2024
 Extension of EEPROM
 Same write ability and storage permanence
 Fast erase
 Large blocks of memory erased at once, rather than one word at a time
 Blocks typically several thousand bytes large
 Writes to single words may be slower
 Entire block must be read, word updated, then entire block written back
 Used with embedded microcomputer systems storing large data items in nonvolatile
memory
 e.g., digital cameras, MP3, cell phones
88
12/03/2024
89
12/03/2024
90
DRAM
12/03/2024
DRAM store their contents as charge on a capacitor rather than in a feedback
loop. The cell must be periodically read and refreshed so that its contents do
not leak away. Like SRAM accessed by asserting wordline to connect the
capacitor to the bitline.
91
Serial Access
Memories
12/03/2024
Serial access memories do not use an
address
Shift Registers
Serial In Parallel Out (SIPO)
Parallel In Serial Out (PISO)
Queues (FIFO, LIFO)
92
Register
12/03/2024
– Shift registers store and delay
data
– Simple design: cascade of
registers
– Watch your hold times!
clk
Din Dout
8
93
Out
12/03/2024
– 1-bit shift register reads in serial data
– After N steps, presents N-bit parallel
output
P0
94
P1 P2 P3
clk
Sin
Out
12/03/2024
– Load all N bits in parallel when
shift = 0
– Then shift one bit out per
cycle
shift/load
clk
P0 P1 P2 P3
Sout
95
Queues
12/03/2024
– First In First Out (FIFO)
– Initialize read and write pointers to first
element
– Queue is EMPTY
– On write, increment write pointer
– If write almost catches read, Queue is FULL
– On read, increment read pointer
– Last In First Out (LIFO)
– Also called a stack
– Use a single stack pointer for read and write
96
DRAM
12/03/2024
DRAM store their contents as charge on a capacitor rather than in a feedback
loop. The cell must be periodically read and refreshed so that its contents do
not leak away. Like SRAM accessed by asserting wordline to connect the
capacitor to the bitline.
97
READ
12/03/2024
 On read the bitline is precharged to Vdd/2.
 When wordline rises the capacitor shares its charge with the bitline causing a
voltage
 change that can be sensed.
 some DRAMs drive the wordline to Vddp=Vdd+Vt to avoid degraded level when
writing a ‘1’.
 DRAM capacitor must be physically small as possible to achieve good density.
 According to charge-sharing equation the voltage swing on bitline during readout is
98
Content Addressable Memories
12/03/2024
99
s
12/03/2024
– Extension of ordinary memory (e.g. SRAM)
– Read and write memory as usual
– Also match to see which words contain
a key
CAM
adr data/key
match
read
write
100
What is CAM?
12/03/2024
 Content Addressable Memory is a special kind of memory!
 Read operation in traditional memory:
 Input is address location of the content that we are
interested in it.
 Output is the content of that address.
 In CAM it is the reverse:
 Input is associated with something stored in the
memory.
 Output is location where the associated content
is stored.
1 0 1 X X
0 1 1 0 X
0 1 1 X X
1 0 0 1 1
0 1 1 0 1
0 0
0 1
1 0
1 1
0 1
Content Addressable
Memory
1 0 1 X X
0 1 1 0 X
0 1 1 X X
1 0 0 1 1
0 1
0 0
0 1
1 0
1 1
0 1 1 0 X
Traditional Memory
101
Simplified CAM Block Diagram
12/03/2024
 The input to the system is the search word.
 The search word is broadcast on the search lines.
 Match line indicates if there were a match btw. the search and
stored word.
 Encoder specifies the match location.
 If multiple matches, a priority encoder selects the first match.
 Hit signal specifies if there is no match.
 The length of the search word is long ranging from 36 to 144
bits.
 Table size ranges: a few hundred to 32K.
 Address space : 7 to 15 bits.
102
CAMs
12/03/2024
 Binary CAM (BCAM) only stores 0s and 1s
 Applications: MAC table consultation. Layer 2 security
related VPN segregation.
 Ternary CAM (TCAM) stores 0s, 1s and don’t cares.
 Application: when we need wilds cards such as, layer 3 and 4
classification for QoS and CoS purposes. IP routing (longest
prefix matching).
 Available sizes: 1Mb, 2Mb, 4.7Mb, 9.4Mb, and 18.8Mb.
 CAM entries are structured as multiples of 36 bits rather than
32 bits.
103
Advantages
12/03/2024
 They associate the input (comparand) with their memory contents
in one clock cycle.
 They are configurable in multiple formats of width and depth
of search data that allows searches to be conducted in
parallel.
 CAM can be cascaded to increase the size of lookup tables that
they can store.
 We can add new entries into their table to learn what they
don’t know before.
 They are one of the appropriate solutions for higher speeds.
104
Disadvantages
12/03/2024
 They cost several hundred of dollars per CAM even in
large quantities.
 They occupy a relatively large footprint on a card.
 They consume excessive power.
 Generic system engineering problems:
 Interface with network processor.
 Simultaneous table update and looking up requests.
105
106
12/03/2024
107
12/03/2024
108
12/03/2024
12/03/2024
109
12/03/2024
110
12/03/2024
111
12/03/2024
112
12/03/2024
113
12/03/2024
114
12/03/2024
115
12/03/2024
116
12/03/2024
117
12/03/2024
118
12/03/2024
119
12/03/2024
120
12/03/2024
121
12/03/2024
122
12/03/2024
123
12/03/2024
124
12/03/2024
125
Booth multiplication
12/03/2024
126

VLSI_UNIT_4 _PPT.pptx

  • 1.
    UNIT - IV SUBSYSTEMDESIGN VLSI 12/03/2024 1
  • 2.
    CONTENTS 12/03/2024 2 DATA PATH SUBSYSTEMS:Subsystem Design, Shifters, Adders, ALUs, Multipliers, Parity generators, Comparators, Zero/One Detectors, Counters. ARRAY SUBSYSTEMS: SRAM, DRAM, ROM, Serial Access Memories. 2
  • 3.
    Outline 12/03/2024 3 UNIT IV  DATAPATH SUBSYSTEMS  Shifters, Adders  ALUs  Multipliers  Parity generators  Comparators  Zero/One Detectors  Counters 3
  • 4.
    Shifters 12/03/2024 4  Logical Shift: Shifts number left or right and fills with 0’s  1011 LSR 1 = 0101 1011 LSL1 = 0110  Arithmetic Shift:  Shifts number left or right. Rt shift sign extends  1011 ASR1 = 1101 1011 ASL1 = 0110  Rotate:  Shifts number left or right and fills with lost bits  1011 ROR1 = 1101 1011 ROL1 = 0111
  • 5.
    12/03/2024 5  1110  LsL—1100 LSR—0111  ASHR—111  ASHL-1100  RR—0111  RL--1101
  • 6.
    4-Bit Barrel Shifter 12/03/2024 6 •A rotate is a shift in which the bits shifted out are inserted into the positions vacated • The circuit rotates its contents left from 0 to 3 positions depending on Selector S. Note that a left rotation by three (3) positions is the same as a right rotation by one position in this 4 bit barrel shifter 57
  • 7.
  • 8.
  • 9.
  • 10.
    ADDERS 12/03/2024 10 – Single-bit Addition –Carry-Ripple Adder – Carry-Skip Adder – Carry-Lookahead Adder – Carry-Select Adder – Carry Save Adder
  • 11.
    Single-Bit Addition 12/03/2024 11 Half AdderFull Adder A B Cout S 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0 A B C Cou t S 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 A B S Cout A B C S Cout Cout S  A  B  A B out S  A  B  C C  MAJ ( A, B,C)
  • 12.
  • 13.
  • 14.
  • 15.
    4 bit binaryparallel adder 12/03/2024 18
  • 16.
    Carry look aheadadder 12/03/2024 19
  • 17.
    Generate / Propagate 12/03/2024 20 –Equations often factored into G and P – Generate and propagate for groups spanning ci+1 = Gi + Pi.ci si = Pi ⊕ ci Where Gi = ai.bi Pi = (ai⊕ bi)
  • 18.
  • 19.
  • 20.
  • 21.
    example 12/03/2024 24  010101  101010 Cin=0  Result=111111 Cout= 0  Cin= 1  Result=000000, Cout= 1  This example shows, if Cin= 0, Cout=0.If Cin = 1, Cout= 1.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
    A 4 ×4 Unsigned Array Multiplier skew array for rectangular layout X3 X2 X1 X0 × Y3 Y2 Y1 Y0 X0Y0 X1Y0 X0Y1 X2Y0 X1Y1 X0Y2 X3Y3 X3Y2 X2Y3 X3Y1 X2Y2 X1Y3 X3Y0 X2Y1 X1Y2 X0Y3 P7 P6 P5 P4 P3 P2 P1 P0 12/03/2024 53 5
  • 45.
  • 46.
    Array Multiplier 12/03/2024 55 y0 y1 y2 y3 x0 x1 x2 x3 p0 p1 p2 p3 p4 p6 p7 B S inA C in S o u t C o u t p5 A B C in C o u t S o u t S in = C S A A rra y C P A critical path B A S o u t C o u t C in C o u t S o u t = C in B A 5
  • 47.
    Rectangular Array 12/03/2024 56 – Squasharray to fit rectangular floorplan y0 y1 y2 y3 x0 x1 x2 x3 p0 p1 p2 p3 p4 p5 p6 p7 10
  • 48.
    Wallace Tree 12/03/2024 58 – Reducesthe number of partial products – Built from carry-save adders: – Three inputs: a, b, c – Two outputs: y, z such that y + z = a + b + c – Carry-save equations: – yi = ai bi ci – zi+1 = aibi + bici + ciai
  • 49.
    Wallace Tree Structure 12/03/2024 59 FAFA FA a2 b2 c2 a1 b1 c1 a0 b0 c0 s0 s1 s2 carry-ripple adder FA FA FA a2 b2 c2 a1 b1 c1 a0 b0 c0 carry-save adder z1 y0 z2 y1 z3 y2
  • 50.
    Wallace Tree Operation 12/03/2024 60 –n additions are reduced to (2n/3) additions after each level – Sum of inputs = Sum of outputs – Can apply the reduction hierarchically – More efficient design uses 4-2 adders to reduce n additions to (n/2) additions after each level – Need final adder to add the last two numbers
  • 51.
    Comparators 12/03/2024 61  0’s detector: 1’s detector: A = 00…000 A = 11… 111  Equality comparator: A = B  Magnitude comparator: A < B
  • 52.
    1’s & 0’sDetectors 12/03/2024 62  1’s detector: N-input AND gate  0’s detector: NOTs + 1’s detector (N-input NOR) a l l o n e s A0 A3 A2 A1 allzeros a l l o n e s A 7 A 6 A 5 A 4 A 3 A 2 A 1 A 0 A 7 A 6 A 5 A 4 A 3 A 2
  • 53.
    Equality Comparator 12/03/2024 63  Checkif each bit is equal (XNOR, aka equality gate)  1’s detect on bitwise equality A = B B[3] A[3] B[2] A[2] B[1] A[1] B[0] A[0]
  • 54.
    Magnitude Comparator  ComputeB – A and look at sign  B – A = B + ~A + 1  For unsigned numbers, carry out is sign bit B3 A 3 B 2 A 2 B 1 A 1 B A= B Z A B 12/03/2024 64 C N A B
  • 55.
    Counters Counters can beimplemented using the adder/subtractor circuits and registers (or equivalently, D flip-flops) 12/03/2024 65 The simplest counter circuits can be built using T flip-flops because the toggle feature is naturally suited for the implementation of the counting operation. Counters are available in two categories. 1. Asynchronous(Ripple counters) Asynchronous counters, also known as ripple counters, are not clocked by a common pulse and hence every ip- op in the counter changes at different times. EX:- Binary ripple counters, BCD ripple counters 2.Synchronous counters A synchronous counter however, has an internal clock, and the external event is used to produce a pulse which is synchronized with this internal clock. E.X.:- Binary counter, Up-down Binary counter, BCD Binary counter, Ring counter, Johnson Counter.
  • 56.
  • 57.
    A 4bit synchronousup counter synchronous counter using adders and registers 12/03/2024
  • 58.
    A linear-feedback shiftregister (LFSR) consists of N registers configured as a shift register. The input to the shift register comes from the XOR of particular bits of the register, as shown in Figure for a 3-bit LFSR. On reset, the registers must be initialized to a nonzero value (e.g., all 1s). The pattern of outputs for the LFSR is shown in Table 12/03/2024 Linear-Feedback Shift Registers
  • 59.
  • 60.
    Memory Arrays 12/03/2024 Random AccessMemory Serial Access Memory Content Addressable Memory (CAM) Read/Write Memory (RAM) (Volatile) Read Only Memory (ROM) (Nonvolatile) Static RAM (SRAM) Dynamic RAM (DRAM) Shift Registers Queues First In First Out (FIFO) Last In First Out (LIFO) Serial In Parallel Out (SIPO) Parallel In Serial Out (PISO) Mask ROM Programmable ROM (PROM) Erasable Programmable ROM (EPROM) Electrically Erasable Programmable ROM (EEPROM) Flash ROM 70
  • 61.
    CLASSIFICATION 12/03/2024 Mask Programmed ROMs-Data is written during chip fabrication using a photo mask Fused ROMs -Data is written by blowing the fuse electrically, hence cannot be modified later Programmable Read Only Memories (PROMs) :Data is written after chip fabrication Erasable PROMs -Complete block is erased using UV light which is penetrated through glass window Electrically Erasable PROMs -8 bit data is erased at a time, hence slower Flash - Programmed using high electrical voltage. Erases data in blocks hence faster 71
  • 62.
    Architecture 12/03/2024  Stores largenumber of bits  m x n: m words of n bits each  k = Log2(m) address input signals  or m = 2k words  e.g., 4,096 x 8 memory:  32,768 bits  12 address input signals  8 input/output data signals  Memory access  r/w: selects read or write  enable: read or write only when asserted  multiport: multiple accesses to different locations simultaneously m × n memory … … n bits per word m words enabl e 2k × n read and write memory A0 … r/ w … Q0 Qn-1 Ak-1 72 memory external view
  • 63.
    Semiconductor Memory Types(Cont.) 12/03/2024  RAM: the stored data is volatile  DRAM  A capacitor to store data, and a transistor to access the capacitor  Need refresh operation  Low cost, and high density  it is used for main memory  SRAM  Consists of a latch  Don’t need the refresh operation  High speed and low power consumption it is mainly used for cache memory and memory in hand-held devices 73
  • 64.
  • 65.
  • 66.
    Memory 12/03/2024  Nonvolatile  Canbe read from but not written to, by a processor in an microcomputer system before inserting  Traditionally written to, “programmed”, to microcomputer system  Uses  Store software program for general-purpose processor  Store constant data (parameters) needed by system  Implement combinational circuits (e.g., decoders) 2k × n ROM … Q0 76 Qn-1 A0 … enabl e Ak-1 External view
  • 67.
    Example: 8 x4 ROM 12/03/2024  Horizontal lines = words  Vertical lines = data  Lines connected only at circles 010  Data lines Q3 and Q1 are set to 1 because there is a “programmed” connection with word 2’s line  Word 2 is not connected with data lines Q2 and Q0 Output is 1010 8 × 4 ROM 3×8 decode r  Decoder sets word 2’s line to 1 if address input is A0 enabl e A 1 A 2 Q3 Q2 Q1 Q0 programma ble connection data line word 0 word 1 word 2 word line Internal view 77
  • 68.
    Memory – ROM 12/03/2024  ROMArrays  There are two basic types of ROM arrays 1)NOR-based ROM 2) NAND-based ROM NOR-based ROM: All Column Lines are pulled-up using a PMOS transistor (or resistor) The Row Lines are connected to the gates of NMOS transistors at the intersection of Row and Column Lines  The presence or absence of the NMOS transistors dictates whether a 1 or a 0 is stored If the NMOS transistor is present, it will pull down the Column Line when its gate is driven high by the Row Line. If the NMOS transistor is absent, the Column Line will not be pulled 78
  • 69.
    ROM 12/03/2024  NOR-based ROM In order to Read from the array, the Row line is asserted and the desired Column line is observed  a NOR-based ROM is similar to a Hex Keypad 79
  • 70.
    ROM 12/03/2024 architectu re NAND-based ROM NAND-based ROMis a different array it uses a depletion-load NMOS as the pull-up transistor the Column NMOS’s are connected in series with the column lines (i.e. a NAND configuration) If an NMOS exists in the Column line and the Row line is asserted, the NMOS will pull the Column Line down and represent a stored ’0’ If an NMOS is absent on the Column line and the Row line is asserted, the Column Line will remain and represent pulled high a stored ‘1’  since all of by the depletion NMOS the NMOS’s arein series, in order to Read much be turned ON theRow we are asserting, from a Row, all other Rows - this means in order to distinguish we 80
  • 71.
    ROM 12/03/2024  NAND-based ROM-In this configuration, if an NMOS is present, it will represent a “stored 1” since in order to address its location, the Row line is driven to a ‘0’ and the NMOS not turned on. This leaves the Column line pulled HIGH. is absent, it will represent a “stored 0” the other Row NMOS’s are turned on  - if an since and NM OS all of will pull the Column Line LOW - this gives the opposite behavior as in a NOR- based ROM  NMOS present NMOS absent NO R 0 1 - it also gives a complementary NAND 1 0 addressi ng schem e NOR NAN D Address Row Line by driving: 1 0 All other Row Lines driven to: 0 1 81
  • 72.
    Mask-programmed ROM 12/03/2024  Connections“programmed” at fabrication  set of masks  Lowest write ability  only once  Highest storage permanence  bits never change unless damaged  Typically used for final design of high-volume systems  spread out NRE (non-recurrent engineering) cost for a low unit cost 82
  • 73.
  • 74.
  • 75.
  • 76.
    EEPROM: Electrically erasableprogrammable ROM 12/03/2024  Programmed and erased electronically  typically by using higher than normal voltage  can program and erase individual words  Better write ability  can be in-system programmable with built-in circuit to provide higher than normal voltage  built-in memory controller commonly used to hide details from memory user  writes very slow due to erasing and programming  “busy” pin indicates to processor EEPROM still writing  can be erased and programmed tens of thousands of times  Similar storage permanence to EPROM (about 10 years)  Far more convenient than EPROMs, but more expensive 87
  • 77.
    FLASH 12/03/2024  Extension ofEEPROM  Same write ability and storage permanence  Fast erase  Large blocks of memory erased at once, rather than one word at a time  Blocks typically several thousand bytes large  Writes to single words may be slower  Entire block must be read, word updated, then entire block written back  Used with embedded microcomputer systems storing large data items in nonvolatile memory  e.g., digital cameras, MP3, cell phones 88
  • 78.
  • 79.
  • 80.
    DRAM 12/03/2024 DRAM store theircontents as charge on a capacitor rather than in a feedback loop. The cell must be periodically read and refreshed so that its contents do not leak away. Like SRAM accessed by asserting wordline to connect the capacitor to the bitline. 91
  • 81.
    Serial Access Memories 12/03/2024 Serial accessmemories do not use an address Shift Registers Serial In Parallel Out (SIPO) Parallel In Serial Out (PISO) Queues (FIFO, LIFO) 92
  • 82.
    Register 12/03/2024 – Shift registersstore and delay data – Simple design: cascade of registers – Watch your hold times! clk Din Dout 8 93
  • 83.
    Out 12/03/2024 – 1-bit shiftregister reads in serial data – After N steps, presents N-bit parallel output P0 94 P1 P2 P3 clk Sin
  • 84.
    Out 12/03/2024 – Load allN bits in parallel when shift = 0 – Then shift one bit out per cycle shift/load clk P0 P1 P2 P3 Sout 95
  • 85.
    Queues 12/03/2024 – First InFirst Out (FIFO) – Initialize read and write pointers to first element – Queue is EMPTY – On write, increment write pointer – If write almost catches read, Queue is FULL – On read, increment read pointer – Last In First Out (LIFO) – Also called a stack – Use a single stack pointer for read and write 96
  • 86.
    DRAM 12/03/2024 DRAM store theircontents as charge on a capacitor rather than in a feedback loop. The cell must be periodically read and refreshed so that its contents do not leak away. Like SRAM accessed by asserting wordline to connect the capacitor to the bitline. 97
  • 87.
    READ 12/03/2024  On readthe bitline is precharged to Vdd/2.  When wordline rises the capacitor shares its charge with the bitline causing a voltage  change that can be sensed.  some DRAMs drive the wordline to Vddp=Vdd+Vt to avoid degraded level when writing a ‘1’.  DRAM capacitor must be physically small as possible to achieve good density.  According to charge-sharing equation the voltage swing on bitline during readout is 98
  • 88.
  • 89.
    s 12/03/2024 – Extension ofordinary memory (e.g. SRAM) – Read and write memory as usual – Also match to see which words contain a key CAM adr data/key match read write 100
  • 90.
    What is CAM? 12/03/2024 Content Addressable Memory is a special kind of memory!  Read operation in traditional memory:  Input is address location of the content that we are interested in it.  Output is the content of that address.  In CAM it is the reverse:  Input is associated with something stored in the memory.  Output is location where the associated content is stored. 1 0 1 X X 0 1 1 0 X 0 1 1 X X 1 0 0 1 1 0 1 1 0 1 0 0 0 1 1 0 1 1 0 1 Content Addressable Memory 1 0 1 X X 0 1 1 0 X 0 1 1 X X 1 0 0 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 X Traditional Memory 101
  • 91.
    Simplified CAM BlockDiagram 12/03/2024  The input to the system is the search word.  The search word is broadcast on the search lines.  Match line indicates if there were a match btw. the search and stored word.  Encoder specifies the match location.  If multiple matches, a priority encoder selects the first match.  Hit signal specifies if there is no match.  The length of the search word is long ranging from 36 to 144 bits.  Table size ranges: a few hundred to 32K.  Address space : 7 to 15 bits. 102
  • 92.
    CAMs 12/03/2024  Binary CAM(BCAM) only stores 0s and 1s  Applications: MAC table consultation. Layer 2 security related VPN segregation.  Ternary CAM (TCAM) stores 0s, 1s and don’t cares.  Application: when we need wilds cards such as, layer 3 and 4 classification for QoS and CoS purposes. IP routing (longest prefix matching).  Available sizes: 1Mb, 2Mb, 4.7Mb, 9.4Mb, and 18.8Mb.  CAM entries are structured as multiples of 36 bits rather than 32 bits. 103
  • 93.
    Advantages 12/03/2024  They associatethe input (comparand) with their memory contents in one clock cycle.  They are configurable in multiple formats of width and depth of search data that allows searches to be conducted in parallel.  CAM can be cascaded to increase the size of lookup tables that they can store.  We can add new entries into their table to learn what they don’t know before.  They are one of the appropriate solutions for higher speeds. 104
  • 94.
    Disadvantages 12/03/2024  They costseveral hundred of dollars per CAM even in large quantities.  They occupy a relatively large footprint on a card.  They consume excessive power.  Generic system engineering problems:  Interface with network processor.  Simultaneous table update and looking up requests. 105
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115.