SlideShare a Scribd company logo
Simulation tools and process design kit are donated by the Synopsys Inc.
Funding for this research is provided through a grant by DARPA.
Design Optimization of Write Circuit for Spin Transfer Torque Magnetic Latches and Look-Up-Tables in 32/28nm CMOS Process
School of Engineering, San Francisco State University
Andrew Miller <amiller2@mail.sfsu.edu>, Hamid Mahmoodi <mahmoodi@sfsu.edu>
Contributors: Tyler Sheaves <tsheaves@mail.sfsu.edu>, Aliyar Attaran <aattaran@mail.sfsu.edu>
Canonical Issues
Moore’s law… blah blah blah. The problem for modern
information storage reaches beyond density, but also is
the speed and power choke-point for many high-end
products, where density is often the enemy. Processor
performance gains are beginning to degrade as wire
widths are reaching molecular sizes, and non-volatile
caches need to be located in a non-CMOS chip. Similar
multi-die issues arise in FPGA designs, most FPGAs lack
a non-volatile single chip solution.
New Issues/Applications
What about the security of your design? Most designers
ship out their designs to a fab, relying on NDAs to
protect their IP. What if you could configure your
hardware with software, like an FPGA, but retain the
speed and power of custom gates? STT based magnetic
switching could offer this with non-volatile, low power
look-up-tables (LUTs) by replacing gates to obfuscate
functionality.
How about a computer that turns on instantly?
Integrating non-volatile memory into a CMOS process
would enable true single-chip computer systems. STT-
MTJ devices offer this flexibility.
What’s wrong with modern memory?
State of the Art & Spintronics
Goal of Spintronics: Manipulate electrons’ spin axis of freedom for interesting effects.
Existing Technology:
 Giant Magneto-Resistance (GMR) com-
mercialized in modern Hard Disk Drives
 Tunneling-Magneto-Resistance (TMR)
Realized in Magnetic Tunnel Junctions
In the figure to the right, we have two mag-
netic materials. One is permanently magnetized
(Fixed layer) and the other is a ferromagnetic thin
film (Free layer). If current is passed through the
two devices, electrons become spin polarized by
one layer, which applies a spin-torque to another
layer. When enough torque is applied (i.e. the Criti-
cal Current), magnetic domain switching can occur.
The longer this current is applied, the more likely
the flipping is to occur. Shown in the figure below,
the energy can only have two states, therefor it
makes a good “binary storage element”
WARNING: PHYSICS DETOUR BELOW!
Structure of STT-MTJ, the “Spin Valve”, The STT Unit Bit Cell
An MTJ Device processed
using a thin film deposition
in a CMOS fabrication plant.
This device controls
electron tunneling, with no
classical path of conduction
between the two magnetic
layers.
Barrier to Scalability: Writing States
STT-Latch Write Path Sizing Requirements
Δϴ≈π PASS
FAIL
Single latch topology uses two MTJs to enhance
reliability
 Current Reference mode converts magnetic
state (resistance) to voltage.
 Differentially programmed MTJ yields better
sensing reliability (no reference needed).
 Standard 1.05 Volts requires > 1.3 micron2
Relatively large write current is the
current bottleneck of STTRAM
technology to make it competitive with
other more dense topologies.
Much work has been done to
optimize the write circuitry for memory
arrays, primarily using the Spin-Valve
Topology. However, not much has been
explored for the write optimization of
STT based latches and look up tables
which enable reconfigurability and IP
Security.
We will target a generic
commercially available 32 / 28nm,
currently one of the smallest processes
available to fabless companies. The plot
above illustrates required transistor
sizing in the conventional latch circuit
presented to the left [1]. Area required
to create a single bit of memory order of
1.3 square microns! (roughly 96 Kbytes/
mm2
areal density.)
Parallel Mode, Single Transistors Access
Series Mode, Single Transistor Access
Write Circuit Topologies Investigated
Optimization (yellow),
offset added to absorb
process variations.
Expectation Sample Simulation Results
Results
Write
Circuit
Voltage Modes
1.05 1.05 (lvt) 1.8 2.5
FPPF_SA FAILURE FAILURE 6.02 0.300
PFFP_SA FAILURE FAILURE FAILURE 0.285
FPPF_PP FAILURE 0.6 1.21 0.424
PFFP_PP FAILURE 0.6 1.21 0.424
PLL_SA 1.04 0.45 0.705 0.261
PLL_TG 0.171 0.09 0.396 0.288
Area of simulated circuits in micro-meters2
.
FAILURE means a circuit failing to write with
transistor width less than 300 microns. Each
voltage mode represents a different transistor
choice in the same process.
According to our simulations, the best circuit
for the formation of dense memory block write
circuitry is one that utilizes the “Parallel” writing
scheme with a transmission gate topology. From
the graph above, we see that the target transistor
is the Low Threshold Voltage “LVT” type that
operates at the nominal 1.05 Volts, has a minimum
gate width of 30nm.
This configuration also has advantages beyond
the unit bit cell. Unique to the “Parallel” writing
scheme is the ability to share one of the gates
between many bits, forming blocks of functions
that can further utilize sharing of the read mode
circuitry via a selection tree (already incorporated
into the LUT designs[1]). Since it must repeat two
gates per differential bit, but shares the third gate,
the area can be recalculated for N bits sharing —
like so:
In comparison, series mode circuits do not share
components between bits, resulting in a linear
function of area over N number of bits.
Some estimated figures can be seen in the table
below. Though these figures do not compete with
gigabyte scale density of 3D ICs for memory, it
should be noted that this technology can be (CMOS)
integrated directly with processors and FPGAs, or in
place of critical IP logic blocks, in a single chip
solution. 3D technology to date requires multiple
chips.
[1] H. Mahmoodi, S. Srinivasan Lakshmipuram† , M. Arora , Y. Asgarieh , H. Homayoun , B. Lin and D. M.Tullsen “Resistive Computation: A Critique.” IEEE COMPUTER ARCHI-
TECTURE LETTERS, VOL. 13, NO. 2 , JULY-DECEMBER 2014
[2] W. Zhao, E. Belhaire and C. Chappert “Spin-MTJ based Non-Volatile Flip-Flop.” Proceedings of the 7th IEEE International Conference on Nanotechnology August 2 - 5,
2007, Hong Kong.
[3] I.Y. Loh “Mechanism and Assessment of Spin Transfer Torque (STT) Based Memory.” Department of Materials Science and Engineering, Massachusetts Institute of Technol-
ogy 2009, accesses from EBSCO 4/20/2015
2 MB per mm2
CMOS compatible reconfigurable logic!
Left: an example layout of the STT-Latch with
“Parallel” mode write with only single NMOS
access transistors in the write path (top
NMOS) and the Sense Amplifier (mid-to-
bottom).
Right: a preliminary design of the transmis-
sion gate access control scheme that shows
the best density performance. Entire LUT de-
sign will be required to reap the benefits of its
reduced size, as the picture on the right shows
a cell that is about 0.5 microns2
References
3. MTJ devices are grouped into Look-Up Ta-
bles (LUT) to maximize density and provide
replacement for select static logic blocks.
How We Use MTJs
Series Mode, Push-Pull Access
 Acronym: “PLL_SA”
 2 cycle write
 Vdd-Vth Full Swing
 “WEN” Potential Sharing
opportunity across latches
 Acronyms: “PFFP_SA”
or “FPPF_SA”
 Layer Orientation may effect
performance (F=Free
P=Pinned)
 Less current due to series
resistance
 Faster, single cycle write
 No potential for transistor
sharing across latches
D. Suzuki et al. (2009)
Iong Ying Loh (2009) [3]
Iong Ying Loh (2009) [3]
D. Suzuki et al. (2014)
Fong et al. (2016)
Kawahara et al. (2012)
 Acronyms:
“PFFP_PP”,
“FPPF_PP”, or
“Series_PP”
 Improved voltage across
MTJs
 Improved current
performance
 Single cycle write
 2 less transistors than
Transmission Gate access.
 Acronym: “PLL_TG”
 Same “parallel”
topology
 Maximum available
voltage drop across
each MTJ = most
current per area
 Common “WEN”
transistor could be
shared across latches.
Area (µm2
) Density (MB/mm2
)
1.05 1.05(lvt) 1.8 2.5 1.05 1.05(lvt) 1.8 2.5
8 Bit
Sharing
PLL_SA 5.92 2.55 4.00 1.48 1.39 3.14 2.00 5.41
PLL_TG 0.97 0.51 2.24 1.63 8.26 15.69 3.57 4.90
Series_PP* 64.80 4.80 9.66 3.39 0.12 1.67 0.83 2.36
16 Bit
Sharing
PLL_ SA 11.48 4.95 7.76 2.87 1.35 3.23 2.06 5.57
PLL_ TG 1.88 0.99 4.36 3.17 8.51 16.16 3.67 5.05
Series_PP* 129.60 9.60 19.31 6.78 0.12 1.67 0.83 2.36
VDD
SEn
Q
Q’
Sense Amplifier
Differential
MTJs
WEN
BL’
EN1
EN2
BL
SEp SEp
Read Path Write Path
Parallel Mode, Transmission Gate Access
*FPPF and PFFP configurations less than 1% difference in any circuit, so their symbol has been grouped into “Series_PP” to represent layer
independent topology.
3.2µ
Sense Amp
MTJ Pre-Amp
MTJ Access & Control
Optimally sized
transmission gates
for 2 bits
0.9µ

More Related Content

What's hot

Using CMOS Sub-Micron Technology VLSI Implementation of Low Power, High Speed...
Using CMOS Sub-Micron Technology VLSI Implementation of Low Power, High Speed...Using CMOS Sub-Micron Technology VLSI Implementation of Low Power, High Speed...
Using CMOS Sub-Micron Technology VLSI Implementation of Low Power, High Speed...
VLSICS Design
 
Ijetcas14 542
Ijetcas14 542Ijetcas14 542
Ijetcas14 542
Iasir Journals
 
Ijmer 46050106
Ijmer 46050106Ijmer 46050106
Ijmer 46050106
IJMER
 
Crimson Publishers-Performance Analysis of CNFET Based 6T SRAM
Crimson Publishers-Performance Analysis of CNFET Based 6T SRAMCrimson Publishers-Performance Analysis of CNFET Based 6T SRAM
Crimson Publishers-Performance Analysis of CNFET Based 6T SRAM
Crimsonpublishers-Electronics
 
Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...
Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...
Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...
VLSICS Design
 
Performance and analysis of ultra deep sub micron technology using complement...
Performance and analysis of ultra deep sub micron technology using complement...Performance and analysis of ultra deep sub micron technology using complement...
Performance and analysis of ultra deep sub micron technology using complement...
VLSICS Design
 
SRAM Design
SRAM DesignSRAM Design
SRAM Design
Bharat Biyani
 
Design and Analysis of Power and Variability Aware Digital Summing Circuit
Design and Analysis of Power and Variability Aware Digital Summing CircuitDesign and Analysis of Power and Variability Aware Digital Summing Circuit
Design and Analysis of Power and Variability Aware Digital Summing Circuit
IDES Editor
 
Emt
EmtEmt
H010225257
H010225257H010225257
H010225257
IOSR Journals
 
Design and implementation of 4 t, 3t and 3t1d dram cell design on 32 nm techn...
Design and implementation of 4 t, 3t and 3t1d dram cell design on 32 nm techn...Design and implementation of 4 t, 3t and 3t1d dram cell design on 32 nm techn...
Design and implementation of 4 t, 3t and 3t1d dram cell design on 32 nm techn...
VLSICS Design
 
Iaetsd design of a low power multiband clock distribution circuit
Iaetsd design of a low power multiband clock distribution circuitIaetsd design of a low power multiband clock distribution circuit
Iaetsd design of a low power multiband clock distribution circuit
Iaetsd Iaetsd
 
Testing DRAM and Correcting errorsby using Adaptive Technique
Testing DRAM and Correcting errorsby using Adaptive TechniqueTesting DRAM and Correcting errorsby using Adaptive Technique
Testing DRAM and Correcting errorsby using Adaptive Technique
IJERA Editor
 
RAM Design
RAM DesignRAM Design
RAM Design
Allegorised Akshay
 
Dv32754758
Dv32754758Dv32754758
Dv32754758
IJERA Editor
 
Topology
TopologyTopology
Topology
ankush9927
 
Network And Topology
Network And Topology Network And Topology
Network And Topology
M Sajid R
 

What's hot (17)

Using CMOS Sub-Micron Technology VLSI Implementation of Low Power, High Speed...
Using CMOS Sub-Micron Technology VLSI Implementation of Low Power, High Speed...Using CMOS Sub-Micron Technology VLSI Implementation of Low Power, High Speed...
Using CMOS Sub-Micron Technology VLSI Implementation of Low Power, High Speed...
 
Ijetcas14 542
Ijetcas14 542Ijetcas14 542
Ijetcas14 542
 
Ijmer 46050106
Ijmer 46050106Ijmer 46050106
Ijmer 46050106
 
Crimson Publishers-Performance Analysis of CNFET Based 6T SRAM
Crimson Publishers-Performance Analysis of CNFET Based 6T SRAMCrimson Publishers-Performance Analysis of CNFET Based 6T SRAM
Crimson Publishers-Performance Analysis of CNFET Based 6T SRAM
 
Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...
Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...
Comparative Performance Analysis of XORXNOR Function Based High-Speed CMOS Fu...
 
Performance and analysis of ultra deep sub micron technology using complement...
Performance and analysis of ultra deep sub micron technology using complement...Performance and analysis of ultra deep sub micron technology using complement...
Performance and analysis of ultra deep sub micron technology using complement...
 
SRAM Design
SRAM DesignSRAM Design
SRAM Design
 
Design and Analysis of Power and Variability Aware Digital Summing Circuit
Design and Analysis of Power and Variability Aware Digital Summing CircuitDesign and Analysis of Power and Variability Aware Digital Summing Circuit
Design and Analysis of Power and Variability Aware Digital Summing Circuit
 
Emt
EmtEmt
Emt
 
H010225257
H010225257H010225257
H010225257
 
Design and implementation of 4 t, 3t and 3t1d dram cell design on 32 nm techn...
Design and implementation of 4 t, 3t and 3t1d dram cell design on 32 nm techn...Design and implementation of 4 t, 3t and 3t1d dram cell design on 32 nm techn...
Design and implementation of 4 t, 3t and 3t1d dram cell design on 32 nm techn...
 
Iaetsd design of a low power multiband clock distribution circuit
Iaetsd design of a low power multiband clock distribution circuitIaetsd design of a low power multiband clock distribution circuit
Iaetsd design of a low power multiband clock distribution circuit
 
Testing DRAM and Correcting errorsby using Adaptive Technique
Testing DRAM and Correcting errorsby using Adaptive TechniqueTesting DRAM and Correcting errorsby using Adaptive Technique
Testing DRAM and Correcting errorsby using Adaptive Technique
 
RAM Design
RAM DesignRAM Design
RAM Design
 
Dv32754758
Dv32754758Dv32754758
Dv32754758
 
Topology
TopologyTopology
Topology
 
Network And Topology
Network And Topology Network And Topology
Network And Topology
 

Similar to poster_final

Emerging Memory Technologies
Emerging Memory TechnologiesEmerging Memory Technologies
Emerging Memory Technologies
theijes
 
Design of STT-RAM cell in 45nm hybrid CMOS/MTJ process
Design of STT-RAM cell in 45nm hybrid CMOS/MTJ processDesign of STT-RAM cell in 45nm hybrid CMOS/MTJ process
Design of STT-RAM cell in 45nm hybrid CMOS/MTJ process
Editor IJCATR
 
ThePaper (1)
ThePaper (1)ThePaper (1)
ThePaper (1)
Fernando Lorenzo
 
Process Variation and Radiation-Immune Single Ended 6T SRAM Cell
Process Variation and Radiation-Immune Single Ended 6T SRAM CellProcess Variation and Radiation-Immune Single Ended 6T SRAM Cell
Process Variation and Radiation-Immune Single Ended 6T SRAM Cell
IDES Editor
 
MTJ-Based Nonvolatile 9T SRAM Cell
MTJ-Based Nonvolatile 9T SRAM CellMTJ-Based Nonvolatile 9T SRAM Cell
MTJ-Based Nonvolatile 9T SRAM Cell
idescitation
 
Bc36330333
Bc36330333Bc36330333
Bc36330333
IJERA Editor
 
IRJET- A Study on the Leakage Mechanism of Standard 6T Sram Cell
IRJET- A Study on the Leakage Mechanism of Standard 6T Sram CellIRJET- A Study on the Leakage Mechanism of Standard 6T Sram Cell
IRJET- A Study on the Leakage Mechanism of Standard 6T Sram Cell
IRJET Journal
 
A new improved mcml logic for dpa resistant circuits
A new improved mcml logic for dpa resistant circuitsA new improved mcml logic for dpa resistant circuits
A new improved mcml logic for dpa resistant circuits
VLSICS Design
 
Low power sram
Low power sramLow power sram
Low power sram
IAEME Publication
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
researchinventy
 
Db4301594597
Db4301594597Db4301594597
Db4301594597
IJERA Editor
 
Memory base
Memory baseMemory base
Memory base
UdhayaKumar264
 
Fault model analysis by parasitic extraction method for embedded sram
Fault model analysis by parasitic extraction method for embedded sramFault model analysis by parasitic extraction method for embedded sram
Fault model analysis by parasitic extraction method for embedded sram
eSAT Journals
 
Fault model analysis by parasitic extraction method for embedded sram
Fault model analysis by parasitic extraction method for embedded sramFault model analysis by parasitic extraction method for embedded sram
Fault model analysis by parasitic extraction method for embedded sram
eSAT Publishing House
 
Hx3313651367
Hx3313651367Hx3313651367
Hx3313651367
IJERA Editor
 
Iw2616951698
Iw2616951698Iw2616951698
Iw2616951698
IJERA Editor
 
Cg34503507
Cg34503507Cg34503507
Cg34503507
IJERA Editor
 
ANALYSIS OF CMOS AND MTCMOS CIRCUITS USING 250 NANO METER TECHNOLOGY
ANALYSIS OF CMOS AND MTCMOS CIRCUITS USING 250 NANO METER TECHNOLOGYANALYSIS OF CMOS AND MTCMOS CIRCUITS USING 250 NANO METER TECHNOLOGY
ANALYSIS OF CMOS AND MTCMOS CIRCUITS USING 250 NANO METER TECHNOLOGY
cscpconf
 
Iedm 2012 techprogram
Iedm 2012 techprogramIedm 2012 techprogram
Iedm 2012 techprogram
hquynh
 
Presentation STT-RAM Survey
Presentation STT-RAM SurveyPresentation STT-RAM Survey
Presentation STT-RAM Survey
Swapnil Bhosale
 

Similar to poster_final (20)

Emerging Memory Technologies
Emerging Memory TechnologiesEmerging Memory Technologies
Emerging Memory Technologies
 
Design of STT-RAM cell in 45nm hybrid CMOS/MTJ process
Design of STT-RAM cell in 45nm hybrid CMOS/MTJ processDesign of STT-RAM cell in 45nm hybrid CMOS/MTJ process
Design of STT-RAM cell in 45nm hybrid CMOS/MTJ process
 
ThePaper (1)
ThePaper (1)ThePaper (1)
ThePaper (1)
 
Process Variation and Radiation-Immune Single Ended 6T SRAM Cell
Process Variation and Radiation-Immune Single Ended 6T SRAM CellProcess Variation and Radiation-Immune Single Ended 6T SRAM Cell
Process Variation and Radiation-Immune Single Ended 6T SRAM Cell
 
MTJ-Based Nonvolatile 9T SRAM Cell
MTJ-Based Nonvolatile 9T SRAM CellMTJ-Based Nonvolatile 9T SRAM Cell
MTJ-Based Nonvolatile 9T SRAM Cell
 
Bc36330333
Bc36330333Bc36330333
Bc36330333
 
IRJET- A Study on the Leakage Mechanism of Standard 6T Sram Cell
IRJET- A Study on the Leakage Mechanism of Standard 6T Sram CellIRJET- A Study on the Leakage Mechanism of Standard 6T Sram Cell
IRJET- A Study on the Leakage Mechanism of Standard 6T Sram Cell
 
A new improved mcml logic for dpa resistant circuits
A new improved mcml logic for dpa resistant circuitsA new improved mcml logic for dpa resistant circuits
A new improved mcml logic for dpa resistant circuits
 
Low power sram
Low power sramLow power sram
Low power sram
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Db4301594597
Db4301594597Db4301594597
Db4301594597
 
Memory base
Memory baseMemory base
Memory base
 
Fault model analysis by parasitic extraction method for embedded sram
Fault model analysis by parasitic extraction method for embedded sramFault model analysis by parasitic extraction method for embedded sram
Fault model analysis by parasitic extraction method for embedded sram
 
Fault model analysis by parasitic extraction method for embedded sram
Fault model analysis by parasitic extraction method for embedded sramFault model analysis by parasitic extraction method for embedded sram
Fault model analysis by parasitic extraction method for embedded sram
 
Hx3313651367
Hx3313651367Hx3313651367
Hx3313651367
 
Iw2616951698
Iw2616951698Iw2616951698
Iw2616951698
 
Cg34503507
Cg34503507Cg34503507
Cg34503507
 
ANALYSIS OF CMOS AND MTCMOS CIRCUITS USING 250 NANO METER TECHNOLOGY
ANALYSIS OF CMOS AND MTCMOS CIRCUITS USING 250 NANO METER TECHNOLOGYANALYSIS OF CMOS AND MTCMOS CIRCUITS USING 250 NANO METER TECHNOLOGY
ANALYSIS OF CMOS AND MTCMOS CIRCUITS USING 250 NANO METER TECHNOLOGY
 
Iedm 2012 techprogram
Iedm 2012 techprogramIedm 2012 techprogram
Iedm 2012 techprogram
 
Presentation STT-RAM Survey
Presentation STT-RAM SurveyPresentation STT-RAM Survey
Presentation STT-RAM Survey
 

poster_final

  • 1. Simulation tools and process design kit are donated by the Synopsys Inc. Funding for this research is provided through a grant by DARPA. Design Optimization of Write Circuit for Spin Transfer Torque Magnetic Latches and Look-Up-Tables in 32/28nm CMOS Process School of Engineering, San Francisco State University Andrew Miller <amiller2@mail.sfsu.edu>, Hamid Mahmoodi <mahmoodi@sfsu.edu> Contributors: Tyler Sheaves <tsheaves@mail.sfsu.edu>, Aliyar Attaran <aattaran@mail.sfsu.edu> Canonical Issues Moore’s law… blah blah blah. The problem for modern information storage reaches beyond density, but also is the speed and power choke-point for many high-end products, where density is often the enemy. Processor performance gains are beginning to degrade as wire widths are reaching molecular sizes, and non-volatile caches need to be located in a non-CMOS chip. Similar multi-die issues arise in FPGA designs, most FPGAs lack a non-volatile single chip solution. New Issues/Applications What about the security of your design? Most designers ship out their designs to a fab, relying on NDAs to protect their IP. What if you could configure your hardware with software, like an FPGA, but retain the speed and power of custom gates? STT based magnetic switching could offer this with non-volatile, low power look-up-tables (LUTs) by replacing gates to obfuscate functionality. How about a computer that turns on instantly? Integrating non-volatile memory into a CMOS process would enable true single-chip computer systems. STT- MTJ devices offer this flexibility. What’s wrong with modern memory? State of the Art & Spintronics Goal of Spintronics: Manipulate electrons’ spin axis of freedom for interesting effects. Existing Technology:  Giant Magneto-Resistance (GMR) com- mercialized in modern Hard Disk Drives  Tunneling-Magneto-Resistance (TMR) Realized in Magnetic Tunnel Junctions In the figure to the right, we have two mag- netic materials. One is permanently magnetized (Fixed layer) and the other is a ferromagnetic thin film (Free layer). If current is passed through the two devices, electrons become spin polarized by one layer, which applies a spin-torque to another layer. When enough torque is applied (i.e. the Criti- cal Current), magnetic domain switching can occur. The longer this current is applied, the more likely the flipping is to occur. Shown in the figure below, the energy can only have two states, therefor it makes a good “binary storage element” WARNING: PHYSICS DETOUR BELOW! Structure of STT-MTJ, the “Spin Valve”, The STT Unit Bit Cell An MTJ Device processed using a thin film deposition in a CMOS fabrication plant. This device controls electron tunneling, with no classical path of conduction between the two magnetic layers. Barrier to Scalability: Writing States STT-Latch Write Path Sizing Requirements Δϴ≈π PASS FAIL Single latch topology uses two MTJs to enhance reliability  Current Reference mode converts magnetic state (resistance) to voltage.  Differentially programmed MTJ yields better sensing reliability (no reference needed).  Standard 1.05 Volts requires > 1.3 micron2 Relatively large write current is the current bottleneck of STTRAM technology to make it competitive with other more dense topologies. Much work has been done to optimize the write circuitry for memory arrays, primarily using the Spin-Valve Topology. However, not much has been explored for the write optimization of STT based latches and look up tables which enable reconfigurability and IP Security. We will target a generic commercially available 32 / 28nm, currently one of the smallest processes available to fabless companies. The plot above illustrates required transistor sizing in the conventional latch circuit presented to the left [1]. Area required to create a single bit of memory order of 1.3 square microns! (roughly 96 Kbytes/ mm2 areal density.) Parallel Mode, Single Transistors Access Series Mode, Single Transistor Access Write Circuit Topologies Investigated Optimization (yellow), offset added to absorb process variations. Expectation Sample Simulation Results Results Write Circuit Voltage Modes 1.05 1.05 (lvt) 1.8 2.5 FPPF_SA FAILURE FAILURE 6.02 0.300 PFFP_SA FAILURE FAILURE FAILURE 0.285 FPPF_PP FAILURE 0.6 1.21 0.424 PFFP_PP FAILURE 0.6 1.21 0.424 PLL_SA 1.04 0.45 0.705 0.261 PLL_TG 0.171 0.09 0.396 0.288 Area of simulated circuits in micro-meters2 . FAILURE means a circuit failing to write with transistor width less than 300 microns. Each voltage mode represents a different transistor choice in the same process. According to our simulations, the best circuit for the formation of dense memory block write circuitry is one that utilizes the “Parallel” writing scheme with a transmission gate topology. From the graph above, we see that the target transistor is the Low Threshold Voltage “LVT” type that operates at the nominal 1.05 Volts, has a minimum gate width of 30nm. This configuration also has advantages beyond the unit bit cell. Unique to the “Parallel” writing scheme is the ability to share one of the gates between many bits, forming blocks of functions that can further utilize sharing of the read mode circuitry via a selection tree (already incorporated into the LUT designs[1]). Since it must repeat two gates per differential bit, but shares the third gate, the area can be recalculated for N bits sharing — like so: In comparison, series mode circuits do not share components between bits, resulting in a linear function of area over N number of bits. Some estimated figures can be seen in the table below. Though these figures do not compete with gigabyte scale density of 3D ICs for memory, it should be noted that this technology can be (CMOS) integrated directly with processors and FPGAs, or in place of critical IP logic blocks, in a single chip solution. 3D technology to date requires multiple chips. [1] H. Mahmoodi, S. Srinivasan Lakshmipuram† , M. Arora , Y. Asgarieh , H. Homayoun , B. Lin and D. M.Tullsen “Resistive Computation: A Critique.” IEEE COMPUTER ARCHI- TECTURE LETTERS, VOL. 13, NO. 2 , JULY-DECEMBER 2014 [2] W. Zhao, E. Belhaire and C. Chappert “Spin-MTJ based Non-Volatile Flip-Flop.” Proceedings of the 7th IEEE International Conference on Nanotechnology August 2 - 5, 2007, Hong Kong. [3] I.Y. Loh “Mechanism and Assessment of Spin Transfer Torque (STT) Based Memory.” Department of Materials Science and Engineering, Massachusetts Institute of Technol- ogy 2009, accesses from EBSCO 4/20/2015 2 MB per mm2 CMOS compatible reconfigurable logic! Left: an example layout of the STT-Latch with “Parallel” mode write with only single NMOS access transistors in the write path (top NMOS) and the Sense Amplifier (mid-to- bottom). Right: a preliminary design of the transmis- sion gate access control scheme that shows the best density performance. Entire LUT de- sign will be required to reap the benefits of its reduced size, as the picture on the right shows a cell that is about 0.5 microns2 References 3. MTJ devices are grouped into Look-Up Ta- bles (LUT) to maximize density and provide replacement for select static logic blocks. How We Use MTJs Series Mode, Push-Pull Access  Acronym: “PLL_SA”  2 cycle write  Vdd-Vth Full Swing  “WEN” Potential Sharing opportunity across latches  Acronyms: “PFFP_SA” or “FPPF_SA”  Layer Orientation may effect performance (F=Free P=Pinned)  Less current due to series resistance  Faster, single cycle write  No potential for transistor sharing across latches D. Suzuki et al. (2009) Iong Ying Loh (2009) [3] Iong Ying Loh (2009) [3] D. Suzuki et al. (2014) Fong et al. (2016) Kawahara et al. (2012)  Acronyms: “PFFP_PP”, “FPPF_PP”, or “Series_PP”  Improved voltage across MTJs  Improved current performance  Single cycle write  2 less transistors than Transmission Gate access.  Acronym: “PLL_TG”  Same “parallel” topology  Maximum available voltage drop across each MTJ = most current per area  Common “WEN” transistor could be shared across latches. Area (µm2 ) Density (MB/mm2 ) 1.05 1.05(lvt) 1.8 2.5 1.05 1.05(lvt) 1.8 2.5 8 Bit Sharing PLL_SA 5.92 2.55 4.00 1.48 1.39 3.14 2.00 5.41 PLL_TG 0.97 0.51 2.24 1.63 8.26 15.69 3.57 4.90 Series_PP* 64.80 4.80 9.66 3.39 0.12 1.67 0.83 2.36 16 Bit Sharing PLL_ SA 11.48 4.95 7.76 2.87 1.35 3.23 2.06 5.57 PLL_ TG 1.88 0.99 4.36 3.17 8.51 16.16 3.67 5.05 Series_PP* 129.60 9.60 19.31 6.78 0.12 1.67 0.83 2.36 VDD SEn Q Q’ Sense Amplifier Differential MTJs WEN BL’ EN1 EN2 BL SEp SEp Read Path Write Path Parallel Mode, Transmission Gate Access *FPPF and PFFP configurations less than 1% difference in any circuit, so their symbol has been grouped into “Series_PP” to represent layer independent topology. 3.2µ Sense Amp MTJ Pre-Amp MTJ Access & Control Optimally sized transmission gates for 2 bits 0.9µ