We present a Ternary Content-addressable Memory (TCAM) design which is based on the use of floating-gate (flash) transistors.
TCAMs are extensively used in high speed IP networking, and are commonly found in routers in the internet core. Traditional TCAM ICs are built using CMOS devices, and a single TCAM cell utilizes 17 transistors. In contrast, our TCAM cell utilizes only 2 flash transistors, thereby significantly reducing circuit area.
We are focusing mainly on the TCAM block which does fast parallel IP routing table lookup. Our flash-based TCAM (FTCAM) block is simulated in SPICE, and we show that it has a significantly lowered area compared to a CMOS based TCAM block, with a speed that can meet current (400 Gb/s) data rates that are found in the internet core.
1. An Area-efficient
Ternary CAM Design
using Floating Gate Transistors
Viacheslav Fedorov
Monther Abusultan
Sunil P. Khatri
2. Key Contributions
• First TCAM design using flash transistors
• 2 transistors per TCAM cell (17 for CMOS)
• 1 transistor per port cell (6 for CMOS)
• Layout and SPICE simulations
– 8 times more dense than CMOS TCAM
– 1.6x less power consumption
– Operates at today’s line rates (~ 400 Gb/s)
4. Motivation
• Internet backbone (core) operates at extreme
speeds
– 100s of Gb/s
• Fast IP routers crucial to sustain the internet
• Hardware Ternary Content-addressable Memory
used for core routers
– Enables lookup of IP addresses in parallel
– Increases routing speed dramatically
• Drawbacks: large area, high power consumption
5. IP Routing
Address Interface
01001 B
01010 C
01011 C
Router 1Router 2
A
E
C
B D
Address Interface
01001 D
11000 E
11001 E
01000
01001
To: 01001
6. • Ternary (entries can have “0”, “1” or “X”)
TCAM operation
Address Interface
01000 A
01001 A
01010 A
01011 A
10000 B
• Content-addressable
0100001000
Address Interface
010XX A
10000 B
• High-speed hardware-parallel lookups
7. Longest Prefix Matching
• “010XX” : “010” (prefix) U “XX” (mask)
• IP address might match more than one entry
– “01000” matches “0100X” and “010XX” below
• Select the entry with longest prefix (fewer “X”s)
• Longer prefix = more specific routing
information Address Interface
010XX A
0100X C
000XX D
1XXXX E
110XX B
9. Previous work
• TCAM research largely done using CMOS
• Monolithically stacked TCAM
– 3D stacking memory array on top of search circuitry
– Programmable vias replace SRAM
– 4x cell density, 3.5x dynamic power reduction
– Orthogonal to our ideas
• Resistive TCAM cells
– Utilizing PCM and STT-MRAM technology
– Up to 20x cell density
– Relatively high latency (several nanoseconds)
– Early stages of design
10. Previous work
• Research on Flash devices
– Device characterization
– Cell program/erase optimization
– Wear leveling algorithms
– Do not consider using them in TCAM circuits
12. Our approach: Overview
• Routing entries stored in blocks
– Fixed number of blocks for each mask length
• Single LPM block
• Shadow blocks
– Control route flaps
– Control burst updates
13. Our approach: TCAM Block
• Address is looked up in TCAM portion of the
block
– 256 entries looked up in parallel, at most one
matches (implemented using matchline)
• Matched entry has its port memory driven out
14. Our approach: TCAM Row
• Matchline (precharged) spans 256 TCAM
cells horizontally
– Large delay for any row
• Split the matchline into smaller (8-bit) sections
– Cascade mismatch propagation
– Use keepers to speed up the lookup
256 TCAM cells
Matchline
16. Our approach: Lookup “1”
Stored “1” Stored “0” Stored “X”
For lookup of “1”:
a(i) = RH
b(i) = RL
match
Match stays prechg Match pulled down Match stays prechg
17. Our approach: Lookup “0”
Stored “1” Stored “0” Stored “X”
For lookup of “0”:
a(i) = RL
b(i) = RH
match
Match stays prechgMatch pulled down Match stays prechg
18. Flash versus CMOS TCAM Cells
0.2v
0.7v
0.7v
Flash TCAM cell CMOS TCAM cell
match
19. Our approach: Proof of correctness
Threshold and read voltages
0.6v
0.21v
0.76v
1.1v
match
Store ”1”
Lookup ”1”
Lookup ”0”
Store ”0”
29. Lifetime Estimation
• 535K UPDATES to flash blocks, w/o CMOS shadow
• 210K UPDATES to flash blocks, w/ CMOS shadow
• Observations:
– CMOS shadow blocks filter 61% UPDATES
– Average time between flushes to flash blks ~ 5min
– Several cases when 7 flushes in 1 second
• Can support this with double-buffering
– No packets are lost
• Estimated TCAM lifetime is 5 years (worst case)
30. Conclusion
• First to design a TCAM using flash transistors
• Extremely high density
– TCAM cell: 2 transistors vs 17 with CMOS
– Port memory cell: 1 trans. vs 6 with CMOS
• Area improvement 8x
• Power improvement 1.64x
• Exceeds current internet backbone data rates
(~400 Gb/s)
• > 5-year lifetime