SlideShare a Scribd company logo
1 of 43
Download to read offline
Future Memory Technologies
Seminar WS2012/13
Benjamin Klenk
2013/02/08
Supervisor: Prof. Dr. Holger Fröning
Department of Computer Engineering
University of Heidelberg
1
Amdahls rule of thumb
1 byte of memory and 1 byte per second of I/O
are required for each instruction per second
supported by a computer.
Gene Myron Amdahl
# System Performance Memory B/FLOPs
1 Titan Cray XK7 (Oak Ridge, USA) 17,590 TFLOP/s 710 TB 4.0 %
2 Sequoia BlueGene/Q (Livermore, USA) 16,325 TFLOP/s 1,572 TB 9.6 %
3 K computer (Kobe, Japan) 10,510 TFLOP/s 1,410 TB 13.4 %
4 Mira BlueGene/Q (Argonne, USA) 8,162 TFLOP/s 768 TB 9.4 %
5 JUQUEEN BlueGene/Q (Juelich, GER) 4,141 TFLOP/s 393 TB 9.4 %
[www.top500.org] November 2012
2
Outline
Motivation
State of the art
• RAM
• FLASH
Alternative technologies
• PCM
• HMC
• Racetrack
• STTRAM
Conclusion
3
Why do we need other technologies?
Motivation
4
The memory system
Modern processors integrate
memory controller (IMC)
Problem: Pin limitation
Core
$
Core
$
L3$
IMC
Core
$
Core
$
Intel i7-3770
e.g.: 4x8GB = 32 GB
(typical one rank per module)
Bank 0
2 x DDR3 Channel
Max 25.6 GB/s
Bank 0 Bank 1 Bank 2
Bank 1 Bank 2
Bank 0
Bank 0 Bank 1 Bank 2
Bank 1 Bank 2
Rank 0
Rank 1
Rank 2
Rank 3
5
Performance and Power limitations
Memory Wall Power Wall
6
31%
11%
3%3%6%2%
9%
10%
25%
Server Power Breakdown
Processor Memory
Planar PCI
Drivers Standby
Fans DC/DC Loss
AC/DC Loss
0
500
1000
1500
2000
2500
3000
3500
4000
1990 1994 1998 2002
CPU
DRAM
Frequency [MHz]
[1] [Intel Whitepaper: Power Management in Intel Architecture Servers, April 2009]
Memory bandwidth is limited
 The demand of working sets
increases by the number of
cores
 Bandwidth and capacity must
scale linearly
 1 GB/s memory bandwidth per
thread [1]
 Adding more cores doesn‘t
make sense unless there is
enough memory bandwidth! 1
3
5
7
9
11
13
15
1 33 65 97
ideal
BW
threads
Normalized performance
[1]
7
DIMM count per channel is limited
Channel capacity does not
increase
Higher data rates result in
less DIMMs per channel
(to maintain signal
integrity)
High capacity DIMMs are
pretty expensive
0
8
16
24
32
40
0
2
4
6
8
10
0 400 800
ChannelCapacity[GB]
#DIMM/Channel
Datarate [MHz]
#DIMM
Capacity
[1]
8
Motivation
What are the problems?
• Memory Wall
• Power Wall
• DIMM count per channel decreases
• Capacity per DIMM grows pretty slow
What do we need?
• High memory bandwidth
• High bank count (concurrent execution of several threads)
• High capacity (less page faults and less swapping)
• Low latency (less stalls and less time waiting for data)
• And at long last: Low power consumption
9
What are current memory technologies?
State of the art
10
Random Access Memory
SRAM
Fast access and no need
of frequent refreshes
Consists of six transistors
Low density results in
bigger chips with less
capacity than DRAM
 Caches
DRAM
Consists merely of one
transistor and a capacitor
(high density)
Needs to be refreshed
frequently (leak current)
Slower access than SRAM
Higher power consumption
 Main Memory
11
DRAM
Organized like an array
(example 4x4)
Horizontal Line: Word Line
Vertical Line: Bit Line
Refresh every 64ms
Refresh logic is integrated
in DRAM controller
www.wikipedia.com
12
The history of DDR-DRAM
DDR SDRAM is state of the art for main memory
There are several versions of DDR SDRAM:
Version Clock [MHz] Transfer Rate [MT/s] Voltage [V] DIMM pins
DDR1 100-200 200-400 2.5/2.6 184
DDR2 200-533 400-1066 1.8 240
DDR3 400-1066 800-2133 1.5 240
DDR4 1066-2133 2133–4266 1.2 284
[9] ExaScale Computing Study
13
Power consumption and the impact of refreshes
Refresh takes 7.8µs
(<85°C) / 3.9µs (<95°C)
Refresh every 64ms
Multiple banks enable
concurrent refreshes
Commands flood
command bus
1990 Today
Bits/row 4096 8192
Capacity Tens of MB Tens of GB
Refreshes 10 per ms 10.000 per ms
[1]
RAIDR: Retention-Aware Intelligent DRAM Refresh, Jamie Liu et al.
14
Flash
FLASH memory cells are based on floating gate
transistors
MOSFET with two gates: Control (CG) & Floating Gate
(FG)
FG is electrically isolated and electrons are trapped there
(only capacitive connected)
Programming by hot-electron injection
Erasing by quantum tunneling
http://en.wikipedia.org/wiki/Floating-gate_transistor
15
Problems to solve
DRAM
• Limited DIMM count  limits capacity for main memory
• Unnecessary power consumption of refreshes
• Low bandwidth
FLASH
• Slow access time
• Limited write cycles
• Pretty low bandwidth
16
Which technologies show promise for the future?
Alternative technologies
17
Outline
Phase Change Memory (PCM, PRAM, PCRAM)
Hybrid Memory Cube (HMC)
Racetrack Memory
Spin-Torque Transfer RAM (STTRAM)
18
Phase Change Memory (PCM)
Based on chalcogenide glasses (also
used for CD-ROMs)
PCM lost competition with FLASH and
DRAM because of power issues
PCM cells become smaller and smaller
and hence the power consumption
decreases
SET RESET
Amorphous
Crystalline
[http://www.nano-
ou.net/Applications/PRAM.aspx]
19
How to read and write
Resistance changes with state
(amorphous, crystalline)
Transition can be forced by
optical or electrical impulses
Tmelt
Tx
Time t
http://agigatech.com/blog/pcm-phase-change-memory-
basics-and-technology-advances/
RESET
SET
TemperatureT
20
Access time of common memory techniques
PRAM still “slower“ than DRAM
Only PRAM would perform worse (access time 2-10x
slower)
But: Density much better! (4-5F2 compared to 6F2 of
DRAM)
We need to find a tradeoff
Typical access time (cylces for a 4GHz processor)
21 23 25 27 29 211 213
215 217
L1 $ L3 $ DRAM PRAM FLASH
[6] 21
Hybrid Memory: DRAM and PRAM
We still use DRAM as buffer / cache
Technique to hide higher latency of PRAM
CPU
CPU
CPU
…
DRAM
Buffer
PRAM
Main Memory
Disk
WRQ
Bypass
Write Queue [6]
22
Performance of a hybrid memory approach
Assume: Density: 4x higher, Latency: 4x slower (in-
house simulator of IBM)
Normalized to 8GB DRAM
0,00
0,20
0,40
0,60
0,80
1,00
1,20
1,40
1,60
32GB PCM
32 GB DRAM
1GB DRAM + 32 GB PRAM
[Scalable High Performance Main Memory System Using Phase-Change Memory Technology, Qureshi et al.]
23
Hybrid Memory Cube
Promising memory technology
Leading companies: Micron, Samsung, Intel
3D disposal of DRAM modules
Enables high concurrency
[3]
24
What has changed?
Former
CPU is directly connected
to DRAM (Memory
Controller)
Complex scheduler
(queues, reordering)
DRAM timing parameter
standardized across
vendors
Slow performance growth
HMC
Abstracted high speed
interface
Only abstracted protocol,
no timing constraints
(packet based protocol)
Innovation inside HMC
HMC takes requests and
delivers results in most
advantageous order
25
HMC architecture
DRAM logic is stripped
away
Common logic on the
Logic Die
Vertical Connection
through TSV
High speed processor
interface
Array
Array
C
M
D
&
A
D
D
T
S
V
DATA
TSVHFF HFF
Logic DieCPU
DRAM Slice 1
DRAM Slice …
DRAM Slice 8
High speed interface
(packet based protocol)
[3]
[4]
26
More concurrency and bandwidth
Conventional DRAM:
• 8 devices and 8 banks/device results in 64 banks
HMC gen1:
• 4 DRAMs * 16 slices * 2 banks results in 128 banks
• If 8 DRAMs are used: 256 banks
Processor Interface:
• 16 Transmit and16 Receive lanes: 32 x 10Gbps per link
• 40 GBps per Link
• 8 links per cube: 320 GBps per cube (compared to about 25.6 GBps
of recent memory channels)
[3]
27
Performance comparison
Technology VDD IDD BW GB/s Power W mW/GBps pj/bit Real pj/bit
SDRAM PC133 1GB 3.3 1.50 1.06 4.96 4664.97 583.12 762.0
DDR 333 1GB 2.5 2.19 2.66 5.48 2057.06 257.13 245.0
DDR 2 667 2GB 1.8 2.88 5.34 5.18 971.51 121.44 139.0
DDR 3 1333 2GB 1.5 3.68 10.66 5.52 517.63 64.70 52.0
DDR 4 2667 4 GB 1.2 5.50 21.34 6.60 309.34 38.67 39.0
HMCgen1 1.2 9.23 128.00 11.08 86.53 10.82 13.7
HMC is costly because of TSV and 3D stacking!
Further features of HMCgen1:
• 1GB 50nm DRAM Array
• 512 MB total DRAM cube
• 128 GB/s Bandwidth
[3]
28
Electron spin and polarized current
Spin another property of
particles (like mass,
charge)
Spin is either “up“ or
“down“
Normal materials consist
of equally populated spin-
up and down electrons
Ferromagnetic materials
consist of an unequally
population
Ferromagnetic
material
Unpolarized current polarized current
[5]
29
Magnetic Tunnel Junction (MTJ)
Discovered in 1975 by M.Julliére
Electrons become spin-polarized by the first
magnetic electrode
Ferromagnetic material
Ferromagnetic material
Insulator barrier
Contact
Contact
V
30
Two phenomena:
• Tunnel Magneto-Resistance
• Spin Torque Transfer
Tunneling Magneto-Resistance (TMR)
Magnetic moments parallel:
Low resistance
Otherwise: High resistance
1995: Resistance difference
of 18% at room temperature
Nowadays: 70% can be
fabricated with reproducible
characteristics
Insulator barrier
Insulator barrier
Low resistance
High resistance
31
Spin Torque Transfer (STT)
Thick and pinned layer
(PL)  can not be
changed
Thin and free layer
(FL)  can be
changed
FL magnetic structure
needs to be smaller
than 100-200nm
Polarized
current
Polarized
current
PL FL
PL FL
32
Racetrack Memory
Ferromagnetic
nanowire (racetrack)
Plenty of magnetic
domain walls (DW)
DW are magnetized
either “up“ or “down“
Racetrack operates
like a shift register
Magnetic Domain Domain Wall
http://www.tf.uni-kiel.de/matwis/amat/elmat_en/kap_4/backbone/r4_3_3.html
http://researcher.watson.ibm.com/researcher/view_project_subpage.php?id=3811
33
Racetrack
DW are shifted along the track by current pulses
(~100m/s)
Principle of spin-momentum transfer
[Scientific American 300 (2009), Data in the Fast Lanes of Racetrack Memory]
34
Read & Write
Read
Resistance depends on
magnetic momentum of
magnetic domain (TMR
effect)
Write
 Multiple possibilities:
• Self field of current from metallic
neighbor elements
• Spin momentum transfer torque
from magnetic Nano elements
Read Amplifier
Magnetic field of current
MTJ X
35
STTRAM
Memory cell based on
MTJ
Resistance changed
because of TMR
Spin-polarized current
instead of magnetic
field to program cell
[7]
36
STTRAM provides…
High scalability because write current scales with cell size
• 90nm: 150µA, 45nm: 40µA
Write current about 100µA and therefore low power
consumption
Nearly unlimited endurance (>1016)
Uses CMOS technology
• less than 3% more costs
TMR about 100%
Dual MTJ
• less write current density
• higher TMR [7]
37
What have we learned and what can we expect?
Conclusion
38
Characteristics
Technology Cell size State Access Time (W/R) Energy/Bit Retention
DRAM 6𝐹2 Product 10/10 ns 2pJ/bit 64 ms
PRAM 4-5𝐹2 Prototype 100/20 ns 100 pJ/bit years
Racetrack 20𝐹2
𝐷𝑊𝑠
~ 5 𝐹2 Research 20-30 ns 2 pJ/bit years
STTRAM 4𝐹2 Prototype 2-10 ns 0.02 pJ/bit years
 HMC improves the architecture but still rely on DRAM as memory
technology
 Energy/Bit is unequal to power consumption! (Interface and control
also need power)
 e.g. DRAM cells are very efficient but the interface is power hungry!
 Access time means access to the cell! Latency also depends on
access and control logic
39
[3,6,7,10,11]
Glance into the crystal ball
Technology Benefits Biggest challenges Prediction
PRAM • High Capacity • Access Time
• Power
Only as hybrid
approach or mass
storage
HMC • Huge bandwidth
• High capacity
• Fabrication costs Good chances in near
future
Racetrack • High capacity • Fabrication
• Access time depends on
density
Still a lot of research
necessary
STTRAM • Fast access
• High density
• Tradoff between Thermal
stabiltiy and write current
density
Needs also more
research
Prediction is pretty hard
DRAM will certainly remain as memory technology within
this decade
Every technology has its own challenges
40
[…] There is no holy grail of memory that
encapsulates every desired attribute […]
Dean Klein, VP of Micron's Memory System Development, 2012
[http://www.hpcwire.com/hpcwire/2012-07-10/hybrid_memory_cube_angles_for_exascale.html]
Thank you for your attention!
Questions?
41
References I
[1] Jacob, Bruce (2009): The Memory System: Morgan & Claypool Publishers
[2] Minas, Lauri (2012): The Problem of Power Consumption in Servers: Intel
Inc.
[3] Pawlowski, J.Thomas (2011) Hybrid Memory Cube (HMC): Micron
Technology, Inc
[4] Jeddeloh, Joe and Keeth, Brent (2012): Hybrid Memory Cube: New DRAM
Architecture Increases Density and Performance: IEEE Symposium on VLSI
Technology Digest of Technical Papers
[5] Gao, Li (2009): Spin Polarized Current Phenomena In Magnetic Tunnel
Junctions: Dissertation, Stanford University
[6] Qureshi, Moinuddin K. and Gurumurthi, Sudhanva and Rajendran, Bipin
(2012): Phase Change Memory: Morgan & Claypool Publishers
42
References II
[7] Krounbi, Mohamad T. (2010): Status and Challenges for Non-Volatile Spin-
Transfer Torque RAM (STT-RAM): International Symposium on Advanced Gate
Stack Techology, Albany, NY
[8] Bez, Roberto et al. (2003): Introduction to Flash Memory: Invited Paper,
Proceedings of the IEEE Vol 91, No4
[9] Kogge, Peter et al. (2008): ExaScale Computing Study: Public Report
[10] Kryder, Mark and Chang Soo, Kim (2009): After Hard Drives – What comes
next?: IEEE Transactions On Magnetics Vol 45, No 10
[11] Parkin, Stewart (2011): magnetic Domain-Wall Racetrack Memory:
Scientific Magazine January 14, 2011
43

More Related Content

What's hot

User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersRyousei Takano
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesTakahiro Hirofuchi
 
Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Cheng-Hsuan Li
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuAlan Sill
 
Storage Geeks 101 - 2019
Storage Geeks 101 - 2019Storage Geeks 101 - 2019
Storage Geeks 101 - 2019Andrew McGee
 
Spectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSpectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSandeep Patil
 
RAMCloud: Scalable Datacenter Storage Entirely in DRAM
RAMCloud: Scalable  Datacenter Storage  Entirely in DRAMRAMCloud: Scalable  Datacenter Storage  Entirely in DRAM
RAMCloud: Scalable Datacenter Storage Entirely in DRAMGeorge Ang
 
HPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCHPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCRyousei Takano
 
OSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De FreneOSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De FreneOpenStorageSummit
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysisbrettallison
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformGanesan Narayanasamy
 
Impact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCImpact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCMemVerge
 
Ics21 workshop decoupling compute from memory, storage &amp; io with omi - ...
Ics21 workshop   decoupling compute from memory, storage &amp; io with omi - ...Ics21 workshop   decoupling compute from memory, storage &amp; io with omi - ...
Ics21 workshop decoupling compute from memory, storage &amp; io with omi - ...Vaibhav R
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIAllan Cantle
 
Presentation to dm as november 2007 with dynamic provisioning information
Presentation to dm as   november 2007 with dynamic provisioning informationPresentation to dm as   november 2007 with dynamic provisioning information
Presentation to dm as november 2007 with dynamic provisioning informationxKinAnx
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialmadhuinturi
 
Generic Resource Manager - László Vadkerti, András Kovács
Generic Resource Manager - László Vadkerti, András KovácsGeneric Resource Manager - László Vadkerti, András Kovács
Generic Resource Manager - László Vadkerti, András Kovácsharryvanhaaren
 

What's hot (20)

User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
From Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computersFrom Rack scale computers to Warehouse scale computers
From Rack scale computers to Warehouse scale computers
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
 
Virtualization for Emerging Memory Devices
Virtualization for Emerging Memory DevicesVirtualization for Emerging Memory Devices
Virtualization for Emerging Memory Devices
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Storage Geeks 101 - 2019
Storage Geeks 101 - 2019Storage Geeks 101 - 2019
Storage Geeks 101 - 2019
 
Spectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf WeiserSpectrum Scale Best Practices by Olaf Weiser
Spectrum Scale Best Practices by Olaf Weiser
 
RAMCloud: Scalable Datacenter Storage Entirely in DRAM
RAMCloud: Scalable  Datacenter Storage  Entirely in DRAMRAMCloud: Scalable  Datacenter Storage  Entirely in DRAM
RAMCloud: Scalable Datacenter Storage Entirely in DRAM
 
HPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPCHPC Cloud: Clouds on supercomputers for HPC
HPC Cloud: Clouds on supercomputers for HPC
 
OSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De FreneOSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
OSS Presentation Metro Cluster by Andy Bennett & Roel De Frene
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysis
 
MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
Impact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPCImpact of Intel Optane Technology on HPC
Impact of Intel Optane Technology on HPC
 
Ics21 workshop decoupling compute from memory, storage &amp; io with omi - ...
Ics21 workshop   decoupling compute from memory, storage &amp; io with omi - ...Ics21 workshop   decoupling compute from memory, storage &amp; io with omi - ...
Ics21 workshop decoupling compute from memory, storage &amp; io with omi - ...
 
Shared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMIShared Memory Centric Computing with CXL & OMI
Shared Memory Centric Computing with CXL & OMI
 
Presentation to dm as november 2007 with dynamic provisioning information
Presentation to dm as   november 2007 with dynamic provisioning informationPresentation to dm as   november 2007 with dynamic provisioning information
Presentation to dm as november 2007 with dynamic provisioning information
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
Generic Resource Manager - László Vadkerti, András Kovács
Generic Resource Manager - László Vadkerti, András KovácsGeneric Resource Manager - László Vadkerti, András Kovács
Generic Resource Manager - László Vadkerti, András Kovács
 

Viewers also liked

Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...Expert Insight
 
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesAnalyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesMatthieu Schapranow
 
Molecular Biology of Memory
Molecular Biology of MemoryMolecular Biology of Memory
Molecular Biology of MemoryVasyl Mykytyuk
 
Seminar presentation on Electronic waste/E waste
Seminar presentation on Electronic waste/E wasteSeminar presentation on Electronic waste/E waste
Seminar presentation on Electronic waste/E wasteEr Gupta
 
face recognition system using LBP
face recognition system using LBPface recognition system using LBP
face recognition system using LBPMarwan H. Noman
 
Holographic memory systems
Holographic memory systemsHolographic memory systems
Holographic memory systemsSantosh Kumar
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)k.surya kumar
 
Nanotechnology
NanotechnologyNanotechnology
NanotechnologyNajiya Kpp
 
blackberry Presentation
blackberry Presentationblackberry Presentation
blackberry PresentationAditya Verma
 

Viewers also liked (20)

Note book pc
Note book pcNote book pc
Note book pc
 
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
Session - Debugging memory stomps and other atrocities - Stefan Reinalter - T...
 
Racetrack
RacetrackRacetrack
Racetrack
 
Blue eyes technology
Blue eyes technologyBlue eyes technology
Blue eyes technology
 
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesAnalyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
 
Molecular Biology of Memory
Molecular Biology of MemoryMolecular Biology of Memory
Molecular Biology of Memory
 
Seminar presentation on Electronic waste/E waste
Seminar presentation on Electronic waste/E wasteSeminar presentation on Electronic waste/E waste
Seminar presentation on Electronic waste/E waste
 
Memory2
Memory2Memory2
Memory2
 
face recognition system using LBP
face recognition system using LBPface recognition system using LBP
face recognition system using LBP
 
Holographic memory
Holographic memoryHolographic memory
Holographic memory
 
Holographic memory systems
Holographic memory systemsHolographic memory systems
Holographic memory systems
 
Eye gaze2
Eye gaze2Eye gaze2
Eye gaze2
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
 
Nanotechnology
NanotechnologyNanotechnology
Nanotechnology
 
Blackberry ppt
Blackberry pptBlackberry ppt
Blackberry ppt
 
blackberry Presentation
blackberry Presentationblackberry Presentation
blackberry Presentation
 
seminar on invisible eye
seminar on invisible eyeseminar on invisible eye
seminar on invisible eye
 
Eye gaze communication
Eye gaze communicationEye gaze communication
Eye gaze communication
 
An atm with an eye
An atm with an eyeAn atm with an eye
An atm with an eye
 
Pill camera presentation
Pill camera presentationPill camera presentation
Pill camera presentation
 

Similar to 2012 benjamin klenk-future-memory_technologies-presentation

Microelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptMicroelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptPavikaSharma3
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKofficeaiotfab
 
RRAM Status and Opportunities
RRAM Status and Opportunities RRAM Status and Opportunities
RRAM Status and Opportunities Deepak Sekar
 
Racs2012 djshin
Racs2012 djshinRacs2012 djshin
Racs2012 djshin동재 신
 
DDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemoryDDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemorySubhajit Sahu
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationSubhajit Sahu
 
Decoupling Compute from Memory, Storage and IO with OMI
Decoupling Compute from Memory, Storage and IO with OMIDecoupling Compute from Memory, Storage and IO with OMI
Decoupling Compute from Memory, Storage and IO with OMIAllan Cantle
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programminginside-BigData.com
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
End of Moore's Law?
End of Moore's Law? End of Moore's Law?
End of Moore's Law? Jeffrey Funk
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010TELECOM I+D
 
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptxonur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptxsivasubramanianManic2
 
TCAM Design using Flash Transistors
TCAM Design using Flash TransistorsTCAM Design using Flash Transistors
TCAM Design using Flash TransistorsViacheslav Fedorov
 

Similar to 2012 benjamin klenk-future-memory_technologies-presentation (20)

Microelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptMicroelectronics U4.pptx.ppt
Microelectronics U4.pptx.ppt
 
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDKlecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
lecture asdkvakm;bk;dv;advvAVHD;KASV;DVKHSVDK
 
Slides of talk
Slides of talkSlides of talk
Slides of talk
 
Cpu spec
Cpu specCpu spec
Cpu spec
 
DF_Captronics06
DF_Captronics06DF_Captronics06
DF_Captronics06
 
Memoryhierarchy
MemoryhierarchyMemoryhierarchy
Memoryhierarchy
 
Intelligent ram
Intelligent ramIntelligent ram
Intelligent ram
 
RRAM Status and Opportunities
RRAM Status and Opportunities RRAM Status and Opportunities
RRAM Status and Opportunities
 
Racs2012 djshin
Racs2012 djshinRacs2012 djshin
Racs2012 djshin
 
Phase Change memory
Phase Change memoryPhase Change memory
Phase Change memory
 
DDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemoryDDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM Memory
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : Presentation
 
Decoupling Compute from Memory, Storage and IO with OMI
Decoupling Compute from Memory, Storage and IO with OMIDecoupling Compute from Memory, Storage and IO with OMI
Decoupling Compute from Memory, Storage and IO with OMI
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
End of Moore's Law?
End of Moore's Law? End of Moore's Law?
End of Moore's Law?
 
Valladolid final-septiembre-2010
Valladolid final-septiembre-2010Valladolid final-septiembre-2010
Valladolid final-septiembre-2010
 
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptxonur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
 
Accelerix ISSCC 1998 Paper
Accelerix ISSCC 1998 PaperAccelerix ISSCC 1998 Paper
Accelerix ISSCC 1998 Paper
 
TCAM Design using Flash Transistors
TCAM Design using Flash TransistorsTCAM Design using Flash Transistors
TCAM Design using Flash Transistors
 

Recently uploaded

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information SystemsAnge Felix NSANZIYERA
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...ssuserdfc773
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257subhasishdas79
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxMustafa Ahmed
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessorAshwiniTodkar4
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 

Recently uploaded (20)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Introduction to Geographic Information Systems
Introduction to Geographic Information SystemsIntroduction to Geographic Information Systems
Introduction to Geographic Information Systems
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257
 
Augmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptxAugmented Reality (AR) with Augin Software.pptx
Augmented Reality (AR) with Augin Software.pptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 

2012 benjamin klenk-future-memory_technologies-presentation

  • 1. Future Memory Technologies Seminar WS2012/13 Benjamin Klenk 2013/02/08 Supervisor: Prof. Dr. Holger Fröning Department of Computer Engineering University of Heidelberg 1
  • 2. Amdahls rule of thumb 1 byte of memory and 1 byte per second of I/O are required for each instruction per second supported by a computer. Gene Myron Amdahl # System Performance Memory B/FLOPs 1 Titan Cray XK7 (Oak Ridge, USA) 17,590 TFLOP/s 710 TB 4.0 % 2 Sequoia BlueGene/Q (Livermore, USA) 16,325 TFLOP/s 1,572 TB 9.6 % 3 K computer (Kobe, Japan) 10,510 TFLOP/s 1,410 TB 13.4 % 4 Mira BlueGene/Q (Argonne, USA) 8,162 TFLOP/s 768 TB 9.4 % 5 JUQUEEN BlueGene/Q (Juelich, GER) 4,141 TFLOP/s 393 TB 9.4 % [www.top500.org] November 2012 2
  • 3. Outline Motivation State of the art • RAM • FLASH Alternative technologies • PCM • HMC • Racetrack • STTRAM Conclusion 3
  • 4. Why do we need other technologies? Motivation 4
  • 5. The memory system Modern processors integrate memory controller (IMC) Problem: Pin limitation Core $ Core $ L3$ IMC Core $ Core $ Intel i7-3770 e.g.: 4x8GB = 32 GB (typical one rank per module) Bank 0 2 x DDR3 Channel Max 25.6 GB/s Bank 0 Bank 1 Bank 2 Bank 1 Bank 2 Bank 0 Bank 0 Bank 1 Bank 2 Bank 1 Bank 2 Rank 0 Rank 1 Rank 2 Rank 3 5
  • 6. Performance and Power limitations Memory Wall Power Wall 6 31% 11% 3%3%6%2% 9% 10% 25% Server Power Breakdown Processor Memory Planar PCI Drivers Standby Fans DC/DC Loss AC/DC Loss 0 500 1000 1500 2000 2500 3000 3500 4000 1990 1994 1998 2002 CPU DRAM Frequency [MHz] [1] [Intel Whitepaper: Power Management in Intel Architecture Servers, April 2009]
  • 7. Memory bandwidth is limited  The demand of working sets increases by the number of cores  Bandwidth and capacity must scale linearly  1 GB/s memory bandwidth per thread [1]  Adding more cores doesn‘t make sense unless there is enough memory bandwidth! 1 3 5 7 9 11 13 15 1 33 65 97 ideal BW threads Normalized performance [1] 7
  • 8. DIMM count per channel is limited Channel capacity does not increase Higher data rates result in less DIMMs per channel (to maintain signal integrity) High capacity DIMMs are pretty expensive 0 8 16 24 32 40 0 2 4 6 8 10 0 400 800 ChannelCapacity[GB] #DIMM/Channel Datarate [MHz] #DIMM Capacity [1] 8
  • 9. Motivation What are the problems? • Memory Wall • Power Wall • DIMM count per channel decreases • Capacity per DIMM grows pretty slow What do we need? • High memory bandwidth • High bank count (concurrent execution of several threads) • High capacity (less page faults and less swapping) • Low latency (less stalls and less time waiting for data) • And at long last: Low power consumption 9
  • 10. What are current memory technologies? State of the art 10
  • 11. Random Access Memory SRAM Fast access and no need of frequent refreshes Consists of six transistors Low density results in bigger chips with less capacity than DRAM  Caches DRAM Consists merely of one transistor and a capacitor (high density) Needs to be refreshed frequently (leak current) Slower access than SRAM Higher power consumption  Main Memory 11
  • 12. DRAM Organized like an array (example 4x4) Horizontal Line: Word Line Vertical Line: Bit Line Refresh every 64ms Refresh logic is integrated in DRAM controller www.wikipedia.com 12
  • 13. The history of DDR-DRAM DDR SDRAM is state of the art for main memory There are several versions of DDR SDRAM: Version Clock [MHz] Transfer Rate [MT/s] Voltage [V] DIMM pins DDR1 100-200 200-400 2.5/2.6 184 DDR2 200-533 400-1066 1.8 240 DDR3 400-1066 800-2133 1.5 240 DDR4 1066-2133 2133–4266 1.2 284 [9] ExaScale Computing Study 13
  • 14. Power consumption and the impact of refreshes Refresh takes 7.8µs (<85°C) / 3.9µs (<95°C) Refresh every 64ms Multiple banks enable concurrent refreshes Commands flood command bus 1990 Today Bits/row 4096 8192 Capacity Tens of MB Tens of GB Refreshes 10 per ms 10.000 per ms [1] RAIDR: Retention-Aware Intelligent DRAM Refresh, Jamie Liu et al. 14
  • 15. Flash FLASH memory cells are based on floating gate transistors MOSFET with two gates: Control (CG) & Floating Gate (FG) FG is electrically isolated and electrons are trapped there (only capacitive connected) Programming by hot-electron injection Erasing by quantum tunneling http://en.wikipedia.org/wiki/Floating-gate_transistor 15
  • 16. Problems to solve DRAM • Limited DIMM count  limits capacity for main memory • Unnecessary power consumption of refreshes • Low bandwidth FLASH • Slow access time • Limited write cycles • Pretty low bandwidth 16
  • 17. Which technologies show promise for the future? Alternative technologies 17
  • 18. Outline Phase Change Memory (PCM, PRAM, PCRAM) Hybrid Memory Cube (HMC) Racetrack Memory Spin-Torque Transfer RAM (STTRAM) 18
  • 19. Phase Change Memory (PCM) Based on chalcogenide glasses (also used for CD-ROMs) PCM lost competition with FLASH and DRAM because of power issues PCM cells become smaller and smaller and hence the power consumption decreases SET RESET Amorphous Crystalline [http://www.nano- ou.net/Applications/PRAM.aspx] 19
  • 20. How to read and write Resistance changes with state (amorphous, crystalline) Transition can be forced by optical or electrical impulses Tmelt Tx Time t http://agigatech.com/blog/pcm-phase-change-memory- basics-and-technology-advances/ RESET SET TemperatureT 20
  • 21. Access time of common memory techniques PRAM still “slower“ than DRAM Only PRAM would perform worse (access time 2-10x slower) But: Density much better! (4-5F2 compared to 6F2 of DRAM) We need to find a tradeoff Typical access time (cylces for a 4GHz processor) 21 23 25 27 29 211 213 215 217 L1 $ L3 $ DRAM PRAM FLASH [6] 21
  • 22. Hybrid Memory: DRAM and PRAM We still use DRAM as buffer / cache Technique to hide higher latency of PRAM CPU CPU CPU … DRAM Buffer PRAM Main Memory Disk WRQ Bypass Write Queue [6] 22
  • 23. Performance of a hybrid memory approach Assume: Density: 4x higher, Latency: 4x slower (in- house simulator of IBM) Normalized to 8GB DRAM 0,00 0,20 0,40 0,60 0,80 1,00 1,20 1,40 1,60 32GB PCM 32 GB DRAM 1GB DRAM + 32 GB PRAM [Scalable High Performance Main Memory System Using Phase-Change Memory Technology, Qureshi et al.] 23
  • 24. Hybrid Memory Cube Promising memory technology Leading companies: Micron, Samsung, Intel 3D disposal of DRAM modules Enables high concurrency [3] 24
  • 25. What has changed? Former CPU is directly connected to DRAM (Memory Controller) Complex scheduler (queues, reordering) DRAM timing parameter standardized across vendors Slow performance growth HMC Abstracted high speed interface Only abstracted protocol, no timing constraints (packet based protocol) Innovation inside HMC HMC takes requests and delivers results in most advantageous order 25
  • 26. HMC architecture DRAM logic is stripped away Common logic on the Logic Die Vertical Connection through TSV High speed processor interface Array Array C M D & A D D T S V DATA TSVHFF HFF Logic DieCPU DRAM Slice 1 DRAM Slice … DRAM Slice 8 High speed interface (packet based protocol) [3] [4] 26
  • 27. More concurrency and bandwidth Conventional DRAM: • 8 devices and 8 banks/device results in 64 banks HMC gen1: • 4 DRAMs * 16 slices * 2 banks results in 128 banks • If 8 DRAMs are used: 256 banks Processor Interface: • 16 Transmit and16 Receive lanes: 32 x 10Gbps per link • 40 GBps per Link • 8 links per cube: 320 GBps per cube (compared to about 25.6 GBps of recent memory channels) [3] 27
  • 28. Performance comparison Technology VDD IDD BW GB/s Power W mW/GBps pj/bit Real pj/bit SDRAM PC133 1GB 3.3 1.50 1.06 4.96 4664.97 583.12 762.0 DDR 333 1GB 2.5 2.19 2.66 5.48 2057.06 257.13 245.0 DDR 2 667 2GB 1.8 2.88 5.34 5.18 971.51 121.44 139.0 DDR 3 1333 2GB 1.5 3.68 10.66 5.52 517.63 64.70 52.0 DDR 4 2667 4 GB 1.2 5.50 21.34 6.60 309.34 38.67 39.0 HMCgen1 1.2 9.23 128.00 11.08 86.53 10.82 13.7 HMC is costly because of TSV and 3D stacking! Further features of HMCgen1: • 1GB 50nm DRAM Array • 512 MB total DRAM cube • 128 GB/s Bandwidth [3] 28
  • 29. Electron spin and polarized current Spin another property of particles (like mass, charge) Spin is either “up“ or “down“ Normal materials consist of equally populated spin- up and down electrons Ferromagnetic materials consist of an unequally population Ferromagnetic material Unpolarized current polarized current [5] 29
  • 30. Magnetic Tunnel Junction (MTJ) Discovered in 1975 by M.Julliére Electrons become spin-polarized by the first magnetic electrode Ferromagnetic material Ferromagnetic material Insulator barrier Contact Contact V 30 Two phenomena: • Tunnel Magneto-Resistance • Spin Torque Transfer
  • 31. Tunneling Magneto-Resistance (TMR) Magnetic moments parallel: Low resistance Otherwise: High resistance 1995: Resistance difference of 18% at room temperature Nowadays: 70% can be fabricated with reproducible characteristics Insulator barrier Insulator barrier Low resistance High resistance 31
  • 32. Spin Torque Transfer (STT) Thick and pinned layer (PL)  can not be changed Thin and free layer (FL)  can be changed FL magnetic structure needs to be smaller than 100-200nm Polarized current Polarized current PL FL PL FL 32
  • 33. Racetrack Memory Ferromagnetic nanowire (racetrack) Plenty of magnetic domain walls (DW) DW are magnetized either “up“ or “down“ Racetrack operates like a shift register Magnetic Domain Domain Wall http://www.tf.uni-kiel.de/matwis/amat/elmat_en/kap_4/backbone/r4_3_3.html http://researcher.watson.ibm.com/researcher/view_project_subpage.php?id=3811 33
  • 34. Racetrack DW are shifted along the track by current pulses (~100m/s) Principle of spin-momentum transfer [Scientific American 300 (2009), Data in the Fast Lanes of Racetrack Memory] 34
  • 35. Read & Write Read Resistance depends on magnetic momentum of magnetic domain (TMR effect) Write  Multiple possibilities: • Self field of current from metallic neighbor elements • Spin momentum transfer torque from magnetic Nano elements Read Amplifier Magnetic field of current MTJ X 35
  • 36. STTRAM Memory cell based on MTJ Resistance changed because of TMR Spin-polarized current instead of magnetic field to program cell [7] 36
  • 37. STTRAM provides… High scalability because write current scales with cell size • 90nm: 150µA, 45nm: 40µA Write current about 100µA and therefore low power consumption Nearly unlimited endurance (>1016) Uses CMOS technology • less than 3% more costs TMR about 100% Dual MTJ • less write current density • higher TMR [7] 37
  • 38. What have we learned and what can we expect? Conclusion 38
  • 39. Characteristics Technology Cell size State Access Time (W/R) Energy/Bit Retention DRAM 6𝐹2 Product 10/10 ns 2pJ/bit 64 ms PRAM 4-5𝐹2 Prototype 100/20 ns 100 pJ/bit years Racetrack 20𝐹2 𝐷𝑊𝑠 ~ 5 𝐹2 Research 20-30 ns 2 pJ/bit years STTRAM 4𝐹2 Prototype 2-10 ns 0.02 pJ/bit years  HMC improves the architecture but still rely on DRAM as memory technology  Energy/Bit is unequal to power consumption! (Interface and control also need power)  e.g. DRAM cells are very efficient but the interface is power hungry!  Access time means access to the cell! Latency also depends on access and control logic 39 [3,6,7,10,11]
  • 40. Glance into the crystal ball Technology Benefits Biggest challenges Prediction PRAM • High Capacity • Access Time • Power Only as hybrid approach or mass storage HMC • Huge bandwidth • High capacity • Fabrication costs Good chances in near future Racetrack • High capacity • Fabrication • Access time depends on density Still a lot of research necessary STTRAM • Fast access • High density • Tradoff between Thermal stabiltiy and write current density Needs also more research Prediction is pretty hard DRAM will certainly remain as memory technology within this decade Every technology has its own challenges 40
  • 41. […] There is no holy grail of memory that encapsulates every desired attribute […] Dean Klein, VP of Micron's Memory System Development, 2012 [http://www.hpcwire.com/hpcwire/2012-07-10/hybrid_memory_cube_angles_for_exascale.html] Thank you for your attention! Questions? 41
  • 42. References I [1] Jacob, Bruce (2009): The Memory System: Morgan & Claypool Publishers [2] Minas, Lauri (2012): The Problem of Power Consumption in Servers: Intel Inc. [3] Pawlowski, J.Thomas (2011) Hybrid Memory Cube (HMC): Micron Technology, Inc [4] Jeddeloh, Joe and Keeth, Brent (2012): Hybrid Memory Cube: New DRAM Architecture Increases Density and Performance: IEEE Symposium on VLSI Technology Digest of Technical Papers [5] Gao, Li (2009): Spin Polarized Current Phenomena In Magnetic Tunnel Junctions: Dissertation, Stanford University [6] Qureshi, Moinuddin K. and Gurumurthi, Sudhanva and Rajendran, Bipin (2012): Phase Change Memory: Morgan & Claypool Publishers 42
  • 43. References II [7] Krounbi, Mohamad T. (2010): Status and Challenges for Non-Volatile Spin- Transfer Torque RAM (STT-RAM): International Symposium on Advanced Gate Stack Techology, Albany, NY [8] Bez, Roberto et al. (2003): Introduction to Flash Memory: Invited Paper, Proceedings of the IEEE Vol 91, No4 [9] Kogge, Peter et al. (2008): ExaScale Computing Study: Public Report [10] Kryder, Mark and Chang Soo, Kim (2009): After Hard Drives – What comes next?: IEEE Transactions On Magnetics Vol 45, No 10 [11] Parkin, Stewart (2011): magnetic Domain-Wall Racetrack Memory: Scientific Magazine January 14, 2011 43