poster_final

Simulation tools and process design kit are donated by the Synopsys Inc.
Funding for this research is provided through a grant by DARPA.
Design Optimization of Write Circuit for Spin Transfer Torque Magnetic Latches and Look-Up-Tables in 32/28nm CMOS Process
School of Engineering, San Francisco State University
Andrew Miller <amiller2@mail.sfsu.edu>, Hamid Mahmoodi <mahmoodi@sfsu.edu>
Contributors: Tyler Sheaves <tsheaves@mail.sfsu.edu>, Aliyar Attaran <aattaran@mail.sfsu.edu>
Canonical Issues
Moore’s law… blah blah blah. The problem for modern
information storage reaches beyond density, but also is
the speed and power choke-point for many high-end
products, where density is often the enemy. Processor
performance gains are beginning to degrade as wire
widths are reaching molecular sizes, and non-volatile
caches need to be located in a non-CMOS chip. Similar
multi-die issues arise in FPGA designs, most FPGAs lack
a non-volatile single chip solution.
New Issues/Applications
What about the security of your design? Most designers
ship out their designs to a fab, relying on NDAs to
protect their IP. What if you could configure your
hardware with software, like an FPGA, but retain the
speed and power of custom gates? STT based magnetic
switching could offer this with non-volatile, low power
look-up-tables (LUTs) by replacing gates to obfuscate
functionality.
How about a computer that turns on instantly?
Integrating non-volatile memory into a CMOS process
would enable true single-chip computer systems. STT-
MTJ devices offer this flexibility.
What’s wrong with modern memory?
State of the Art & Spintronics
Goal of Spintronics: Manipulate electrons’ spin axis of freedom for interesting effects.
Existing Technology:
 Giant Magneto-Resistance (GMR) com-
mercialized in modern Hard Disk Drives
 Tunneling-Magneto-Resistance (TMR)
Realized in Magnetic Tunnel Junctions
In the figure to the right, we have two mag-
netic materials. One is permanently magnetized
(Fixed layer) and the other is a ferromagnetic thin
film (Free layer). If current is passed through the
two devices, electrons become spin polarized by
one layer, which applies a spin-torque to another
layer. When enough torque is applied (i.e. the Criti-
cal Current), magnetic domain switching can occur.
The longer this current is applied, the more likely
the flipping is to occur. Shown in the figure below,
the energy can only have two states, therefor it
makes a good “binary storage element”
WARNING: PHYSICS DETOUR BELOW!
Structure of STT-MTJ, the “Spin Valve”, The STT Unit Bit Cell
An MTJ Device processed
using a thin film deposition
in a CMOS fabrication plant.
This device controls
electron tunneling, with no
classical path of conduction
between the two magnetic
layers.
Barrier to Scalability: Writing States
STT-Latch Write Path Sizing Requirements
Δϴ≈π PASS
FAIL
Single latch topology uses two MTJs to enhance
reliability
 Current Reference mode converts magnetic
state (resistance) to voltage.
 Differentially programmed MTJ yields better
sensing reliability (no reference needed).
 Standard 1.05 Volts requires > 1.3 micron2
Relatively large write current is the
current bottleneck of STTRAM
technology to make it competitive with
other more dense topologies.
Much work has been done to
optimize the write circuitry for memory
arrays, primarily using the Spin-Valve
Topology. However, not much has been
explored for the write optimization of
STT based latches and look up tables
which enable reconfigurability and IP
Security.
We will target a generic
commercially available 32 / 28nm,
currently one of the smallest processes
available to fabless companies. The plot
above illustrates required transistor
sizing in the conventional latch circuit
presented to the left [1]. Area required
to create a single bit of memory order of
1.3 square microns! (roughly 96 Kbytes/
mm2
areal density.)
Parallel Mode, Single Transistors Access
Series Mode, Single Transistor Access
Write Circuit Topologies Investigated
Optimization (yellow),
offset added to absorb
process variations.
Expectation Sample Simulation Results
Results
Write
Circuit
Voltage Modes
1.05 1.05 (lvt) 1.8 2.5
FPPF_SA FAILURE FAILURE 6.02 0.300
PFFP_SA FAILURE FAILURE FAILURE 0.285
FPPF_PP FAILURE 0.6 1.21 0.424
PFFP_PP FAILURE 0.6 1.21 0.424
PLL_SA 1.04 0.45 0.705 0.261
PLL_TG 0.171 0.09 0.396 0.288
Area of simulated circuits in micro-meters2
.
FAILURE means a circuit failing to write with
transistor width less than 300 microns. Each
voltage mode represents a different transistor
choice in the same process.
According to our simulations, the best circuit
for the formation of dense memory block write
circuitry is one that utilizes the “Parallel” writing
scheme with a transmission gate topology. From
the graph above, we see that the target transistor
is the Low Threshold Voltage “LVT” type that
operates at the nominal 1.05 Volts, has a minimum
gate width of 30nm.
This configuration also has advantages beyond
the unit bit cell. Unique to the “Parallel” writing
scheme is the ability to share one of the gates
between many bits, forming blocks of functions
that can further utilize sharing of the read mode
circuitry via a selection tree (already incorporated
into the LUT designs[1]). Since it must repeat two
gates per differential bit, but shares the third gate,
the area can be recalculated for N bits sharing —
like so:
In comparison, series mode circuits do not share
components between bits, resulting in a linear
function of area over N number of bits.
Some estimated figures can be seen in the table
below. Though these figures do not compete with
gigabyte scale density of 3D ICs for memory, it
should be noted that this technology can be (CMOS)
integrated directly with processors and FPGAs, or in
place of critical IP logic blocks, in a single chip
solution. 3D technology to date requires multiple
chips.
[1] H. Mahmoodi, S. Srinivasan Lakshmipuram† , M. Arora , Y. Asgarieh , H. Homayoun , B. Lin and D. M.Tullsen “Resistive Computation: A Critique.” IEEE COMPUTER ARCHI-
TECTURE LETTERS, VOL. 13, NO. 2 , JULY-DECEMBER 2014
[2] W. Zhao, E. Belhaire and C. Chappert “Spin-MTJ based Non-Volatile Flip-Flop.” Proceedings of the 7th IEEE International Conference on Nanotechnology August 2 - 5,
2007, Hong Kong.
[3] I.Y. Loh “Mechanism and Assessment of Spin Transfer Torque (STT) Based Memory.” Department of Materials Science and Engineering, Massachusetts Institute of Technol-
ogy 2009, accesses from EBSCO 4/20/2015
2 MB per mm2
CMOS compatible reconfigurable logic!
Left: an example layout of the STT-Latch with
“Parallel” mode write with only single NMOS
access transistors in the write path (top
NMOS) and the Sense Amplifier (mid-to-
bottom).
Right: a preliminary design of the transmis-
sion gate access control scheme that shows
the best density performance. Entire LUT de-
sign will be required to reap the benefits of its
reduced size, as the picture on the right shows
a cell that is about 0.5 microns2
References
3. MTJ devices are grouped into Look-Up Ta-
bles (LUT) to maximize density and provide
replacement for select static logic blocks.
How We Use MTJs
Series Mode, Push-Pull Access
 Acronym: “PLL_SA”
 2 cycle write
 Vdd-Vth Full Swing
 “WEN” Potential Sharing
opportunity across latches
 Acronyms: “PFFP_SA”
or “FPPF_SA”
 Layer Orientation may effect
performance (F=Free
P=Pinned)
 Less current due to series
resistance
 Faster, single cycle write
 No potential for transistor
sharing across latches
D. Suzuki et al. (2009)
Iong Ying Loh (2009) [3]
Iong Ying Loh (2009) [3]
D. Suzuki et al. (2014)
Fong et al. (2016)
Kawahara et al. (2012)
 Acronyms:
“PFFP_PP”,
“FPPF_PP”, or
“Series_PP”
 Improved voltage across
MTJs
 Improved current
performance
 Single cycle write
 2 less transistors than
Transmission Gate access.
 Acronym: “PLL_TG”
 Same “parallel”
topology
 Maximum available
voltage drop across
each MTJ = most
current per area
 Common “WEN”
transistor could be
shared across latches.
Area (µm2
) Density (MB/mm2
)
1.05 1.05(lvt) 1.8 2.5 1.05 1.05(lvt) 1.8 2.5
8 Bit
Sharing
PLL_SA 5.92 2.55 4.00 1.48 1.39 3.14 2.00 5.41
PLL_TG 0.97 0.51 2.24 1.63 8.26 15.69 3.57 4.90
Series_PP* 64.80 4.80 9.66 3.39 0.12 1.67 0.83 2.36
16 Bit
Sharing
PLL_ SA 11.48 4.95 7.76 2.87 1.35 3.23 2.06 5.57
PLL_ TG 1.88 0.99 4.36 3.17 8.51 16.16 3.67 5.05
Series_PP* 129.60 9.60 19.31 6.78 0.12 1.67 0.83 2.36
VDD
SEn
Q
Q’
Sense Amplifier
Differential
MTJs
WEN
BL’
EN1
EN2
BL
SEp SEp
Read Path Write Path
Parallel Mode, Transmission Gate Access
*FPPF and PFFP configurations less than 1% difference in any circuit, so their symbol has been grouped into “Series_PP” to represent layer
independent topology.
3.2µ
Sense Amp
MTJ Pre-Amp
MTJ Access & Control
Optimally sized
transmission gates
for 2 bits
0.9µ

poster_final

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to poster_final

Similar to poster_final (20)

poster_final