SlideShare a Scribd company logo
Micron
Hynix
DDR SDRAM
• Not backward compatible
• Low frequency =>
• reduced signal integrity
requirements.
• Column x Row x Bank x Rank
• = DIMM
• 8chips x 8bit
• = 64bits (module)
Joint Electron Device
Engineering Council
DDR3 SDRAM
• Upto 16GB, 1GB/chip, 1.5V (2005)
• CL-RCD-RP: col. delay, row delay, prech. delay
• BW: mem CK x 4 bus mult. X 2 DDR x 64b
• eXtended Memory Profile (XMP, 2007)
• DDR3L: 1.35V (2010), DDR3U: 1.25V (2011)
• DDR4-800: 100MHz mem CK x 4 = 400MHz IO CK
• 400MHz x 2 DDR x 64b = 6.4 GB/s (5-5-5)
• DDR4-2133: 266MHz mem CK x 4 = 1066MHz IO CK
• 1066MHz x 2 DDR x 64b = 17 GB/s (11-11-11)
• Hynix 16GB: (16 x 1GB) 2133MHz x 64b = 17 GB/s
Transcend 4GBECC DDR3 DRAM, 1600 MT/s (2013)
16GBDDR3 SO-DIMM,1600 MT/s, 1.35/1.5V
DDR4 SDRAM
• Upto 64GB, 1.2V, 40-16nm (2011)
• Power: refresh strategies, data bus inv.
• Row hammer: large capacitors, ASLR (addr. rnd.)
• DIMM: 288 pins, 0.85mm spacink
• SO-DIMM: 260 pins, 0.5mm spacing (skylake)
• DDR4-1600: 200MHz mem CK x 4 = 800MHz IO CK
• 800MHz x 2 DDR x 64b = 12.8 GB/s (10-10-10)
• DDR4-3200: 400MHz mem CK x 4 = 1600MHz IO CK
• 1600MHz x 2 DDR x 64b = 25.6 GB/s (20-20-20)
• Hynix 16GB: (16 x 1GB) 2133MHz x 64b = 17 GB/s
Samsung 2GBECC DDR4 DRAM, 2133 MT/s, 1.2V (2011)
GOOD RAM16GBDDR4 DRAM, 2133 MT/s, 1.2V
Registered Memory (RDIMM)
• For high-density chips (36)
• +1 cycle latency
• Normally features ECC
• Costlier, used in servers
• Motherboard must support
• RDIMM: +address, +command
• LRDIMM: +data (+capacity)
Two 8 GB DDR4-2133 ECC 1.2 V RDIMMs
ECC Memory
• Immune to 1bit errors (/64bits)
• Servers: scientific, finance, L1-L2
• Error => crash, data corr., sec.
• Row hammer: security exploit
• Electrical, magnetic interference
• Cosmic secondaries – neutrons
• 1.5km: 3.5x, 10km: 300x
• Cassini-Huygens: =112/Gb/day
• Google servers: 1.67/Gb/day
• Doesn’t inc. with dec. Size
• Hamming code, Chipkill ECC (interleaved)
• Non-ECC cant correct errorsECC DIMMs typically have9 memory chips on each side
• Rival to Rambus XDR DRAM (2005)
• GDDR4: Based on DDR3, 32bit, 1.5V
• Data Bus Inversion (DBI): -PWR, -GND bounce
• 2.5 Gb/s x 32bit = 10 GB/s (256Mb, 2005)
• 4 Gb/s x 32bit = 16 GB/s (512Mb, 2007)
• AMD Radeon HD 3870 GDDR4: 512MB, 72 GB/s
• 1126MHz mem clock x 2 DDR = 2.25 Gb/s
• (8 x 512Mb) x 2.25 Gb/s x 32b = 72 GB/s, 256bit
GDDR4 SDRAM
ATi Radeon HD 3870 512MBVRAM, 72 GB/s, 256bit(2007)
• Used in GFX, consoles, HPC (2007)
• GDDR5: Based on DDR3, 32bit, 67 signal, 170 BGA
• Command CK: 1.25GHz, Data WCK: 2.5GHz (2x)
• 32b x 2 DDR = 64bit/WCK
• 32b x 2 DDR x 2 WCK x 2 CK = 256bit (transfer)
• GDDR5: 5 Gb/s x 32bit = 20 GB/s (1Gb, 2008)
• GDDR5: 8 Gb/s x 32bit = 32 GB/s (8Gb, 2015)
• GDDR5X: DDR / Quad Date Rate (QDR), 190 BGA
• GDDR5X: 11 Gb/s x 32bit = 44 GB/s (2016)
• NVIDIA Titan X GDDR5X: 12GB, 480 GB/s
• (12 x 1GB) x 40 GB/s = 480 GB/s, 384bit
GDDR5 SDRAM
NVIDIA Titan X 16nmFinFET, 12GBVRAM, 480 GB/s or 10 Gb/s (2016)
• Used in GFX, consoles, HPC (2018)
• GDDR6: 18 Gb/s x 32bit = 72 GB/s (2GB, 2018)
• GDDR6X: 20 Gb/s x 32bit = 80 GB/s (2020)
• GDDR6: Quad Date Rate (QDR), 1.35V
• GDDR6X: PAM4, +PHY, -SNR, +BW, -PWR/bit
• GDDR6 RAM: 12GB, 768 GB/s
• (12 x 1GB) x 72 GB/s = 768 GB/s, 384bit
• NVIDIA RTX 3090: 24GB, 19.5 Gb/s
• (12 x 2GB) x 19.5 Gb/s x 32b = 936 GB/s, 384b
GDDR6 SDRAM
NVIDIA GeForceRTX 3090, Ampere, 24GBVRAM, 19.5 Gb/s (2016)
High Bandwidth Memory (HBM)
• 65nm sillicon interposer, 1.3V
• Heat, engineering issues
• Costlier than PCB
• Used in high BW apps
• HBM: AMD Fiji Radeon R9 Fury X (2015)
• 128bit x 2 channels x 4 dies = 1024bit
• 500MHz x 2 DDR x 1024b = 128 GB/s (4GB, 2013)
• HBM2: Nvidia Tesla P100 (2016)
• HBM2: 2 GT/s x 1024b = 256 GB/s (8GB, 2016)
• HBM2E: 3.6 GT/s x 1024b = 460 GB/s (16GB, 2019)
• (independent channels)
AMD Fiji, the firstGPUto use HBM (2015)
High Bandwidth Memory (HBM)
• GDDR5 chips not getting smaller
• Large no. of chips needed for high BW
• High PCB area needed, complex routing
• Higher power needed due to termination
• More signal integrity challenges
• Larger SoC PHY area needed
• HBM ~3x less power requirement than GDDR5
• 94% less chip area than GDDR5 (=capacity)
• 50% less PCB area than GDDR5 system
• 2.5D interposer costly, relatively new
Non-Volatile RAM (NVRAM)
• Floating-gate MOSFET
• EPROM, EEPROM, EEPROM+SRAM, Flash
• Phase-change RAM (PCRAM): 3D
XPoint Intel Optane, Micron Quantx
• Ferroelectric RAM (FeRAM)
• Magnetoresistive RAM (MRAM)
• Electrochemical RAM (ECRAM)
• Resistive RAM (ReRAM)
• Can be fully-powered down
• For in-memory analog mat-mul (DNN)
Two Intel Optane 256GB DDR4 DIMMs with AES-256 (2019)
Serial Presence Detect (SPD)
• Standardized way to automatically
access information about a memory
module
• What memory timings to use to
access the memory
• Overclocking: over-ride SPD data
• Accessed using SMBus, a variant of the I²C
protocol
• SMBus: System Monitor - power supply, CPU
temp., fan speeds.
• XMP uses bytes 176–255, unallocated by JEDEC, to
encode higher-performance memory timings. Memory device on an SDRAM module, containing SPD data
DDR PHY Training
DDR, GDDR, HBM Memory : Presentation

More Related Content

What's hot

AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
AMD
 
DDR
DDRDDR
Q4.11: Introduction to eMMC
Q4.11: Introduction to eMMCQ4.11: Introduction to eMMC
Q4.11: Introduction to eMMC
Linaro
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED.
Sk Cheah
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD
 
Architecture of pentium family
Architecture of pentium familyArchitecture of pentium family
Architecture of pentium family
University of Gujrat, Pakistan
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
Marco77328
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD
 
Soc architecture and design
Soc architecture and designSoc architecture and design
Soc architecture and design
Satya Harish
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
Yi-Hsiu Hsu
 
9 semiconductor memory
9 semiconductor memory9 semiconductor memory
9 semiconductor memory
Usha Mehta
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
Hossam Adel
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
 
ARM Architecture in Details
ARM Architecture in Details ARM Architecture in Details
ARM Architecture in Details
GlobalLogic Ukraine
 
Processing-in-Memory
Processing-in-MemoryProcessing-in-Memory
Processing-in-Memory
Facultad de Informática UCM
 
PCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingPCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingDVClub
 
Introduction of ram ddr3
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3Technocratz
 
Dual port ram
Dual port ramDual port ram
Dual port ram
PravallikaTammisetty
 

What's hot (20)

AMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor ArchitectureAMD EPYC™ Microprocessor Architecture
AMD EPYC™ Microprocessor Architecture
 
DDR
DDRDDR
DDR
 
Q4.11: Introduction to eMMC
Q4.11: Introduction to eMMCQ4.11: Introduction to eMMC
Q4.11: Introduction to eMMC
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 
Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED.
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
Architecture of pentium family
Architecture of pentium familyArchitecture of pentium family
Architecture of pentium family
 
Introduction to FPGA acceleration
Introduction to FPGA accelerationIntroduction to FPGA acceleration
Introduction to FPGA acceleration
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
Soc architecture and design
Soc architecture and designSoc architecture and design
Soc architecture and design
 
RISC-V Introduction
RISC-V IntroductionRISC-V Introduction
RISC-V Introduction
 
9 semiconductor memory
9 semiconductor memory9 semiconductor memory
9 semiconductor memory
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
ARM Architecture in Details
ARM Architecture in Details ARM Architecture in Details
ARM Architecture in Details
 
Processing-in-Memory
Processing-in-MemoryProcessing-in-Memory
Processing-in-Memory
 
PCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingPCI Express Verification using Reference Modeling
PCI Express Verification using Reference Modeling
 
Introduction of ram ddr3
Introduction of ram ddr3Introduction of ram ddr3
Introduction of ram ddr3
 
DDR2 SDRAM
DDR2 SDRAMDDR2 SDRAM
DDR2 SDRAM
 
Dual port ram
Dual port ramDual port ram
Dual port ram
 

Similar to DDR, GDDR, HBM Memory : Presentation

Argonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer ArchitectureArgonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer Architecture
inside-BigData.com
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
Allan Cantle
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
Memory Fabric Forum
 
DDR SDRAM : Notes
DDR SDRAM : NotesDDR SDRAM : Notes
DDR SDRAM : Notes
Subhajit Sahu
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
inside-BigData.com
 
Webinar Gravado: Microprocessadores STM32MP1: Conhecendo a flexibilidade entr...
Webinar Gravado: Microprocessadores STM32MP1: Conhecendo a flexibilidade entr...Webinar Gravado: Microprocessadores STM32MP1: Conhecendo a flexibilidade entr...
Webinar Gravado: Microprocessadores STM32MP1: Conhecendo a flexibilidade entr...
Embarcados
 
Jedex stec DRAM Module Market Overview
Jedex stec DRAM Module Market  OverviewJedex stec DRAM Module Market  Overview
Jedex stec DRAM Module Market Overview
Michael Zhang
 
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Tokyo Institute of Technology
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
Fisnik Kraja
 
2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation
Saket Vihari
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 HardwareJacob Wu
 
Khx8500 d2k2 4g
Khx8500 d2k2 4gKhx8500 d2k2 4g
32bit Cortex Arduinoの布教とラパイド実験のススメ
32bit Cortex Arduinoの布教とラパイド実験のススメ32bit Cortex Arduinoの布教とラパイド実験のススメ
32bit Cortex Arduinoの布教とラパイド実験のススメ
Yuya Yamada
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
Nicolas Poggi
 
Am335x SOM from Calixto. Represented by Teknown in Americas
Am335x SOM from Calixto. Represented by Teknown in AmericasAm335x SOM from Calixto. Represented by Teknown in Americas
Am335x SOM from Calixto. Represented by Teknown in Americas
Muhammad Riyaz
 
Memory module
Memory moduleMemory module
Memory modulelemar12
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
David Grier
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
DataStax Academy
 

Similar to DDR, GDDR, HBM Memory : Presentation (20)

Argonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer ArchitectureArgonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer Architecture
 
If AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC ArchitectureIf AMD Adopted OMI in their EPYC Architecture
If AMD Adopted OMI in their EPYC Architecture
 
MemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the BudgetMemVerge: Memory Expansion Without Breaking the Budget
MemVerge: Memory Expansion Without Breaking the Budget
 
DDR SDRAM : Notes
DDR SDRAM : NotesDDR SDRAM : Notes
DDR SDRAM : Notes
 
Theta and the Future of Accelerator Programming
Theta and the Future of Accelerator ProgrammingTheta and the Future of Accelerator Programming
Theta and the Future of Accelerator Programming
 
Webinar Gravado: Microprocessadores STM32MP1: Conhecendo a flexibilidade entr...
Webinar Gravado: Microprocessadores STM32MP1: Conhecendo a flexibilidade entr...Webinar Gravado: Microprocessadores STM32MP1: Conhecendo a flexibilidade entr...
Webinar Gravado: Microprocessadores STM32MP1: Conhecendo a flexibilidade entr...
 
Jedex stec DRAM Module Market Overview
Jedex stec DRAM Module Market  OverviewJedex stec DRAM Module Market  Overview
Jedex stec DRAM Module Market Overview
 
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memor...
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
Memoryhierarchy
MemoryhierarchyMemoryhierarchy
Memoryhierarchy
 
2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation2012 benjamin klenk-future-memory_technologies-presentation
2012 benjamin klenk-future-memory_technologies-presentation
 
Exaflop In 2018 Hardware
Exaflop In 2018   HardwareExaflop In 2018   Hardware
Exaflop In 2018 Hardware
 
Khx8500 d2k2 4g
Khx8500 d2k2 4gKhx8500 d2k2 4g
Khx8500 d2k2 4g
 
32bit Cortex Arduinoの布教とラパイド実験のススメ
32bit Cortex Arduinoの布教とラパイド実験のススメ32bit Cortex Arduinoの布教とラパイド実験のススメ
32bit Cortex Arduinoの布教とラパイド実験のススメ
 
UNIT 2 P1
UNIT 2 P1UNIT 2 P1
UNIT 2 P1
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Am335x SOM from Calixto. Represented by Teknown in Americas
Am335x SOM from Calixto. Represented by Teknown in AmericasAm335x SOM from Calixto. Represented by Teknown in Americas
Am335x SOM from Calixto. Represented by Teknown in Americas
 
Memory module
Memory moduleMemory module
Memory module
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
 

More from Subhajit Sahu

About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
Subhajit Sahu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Adjusting Bitset for graph : SHORT REPORT / NOTES
Adjusting Bitset for graph : SHORT REPORT / NOTESAdjusting Bitset for graph : SHORT REPORT / NOTES
Adjusting Bitset for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Experiments with Primitive operations : SHORT REPORT / NOTES
Experiments with Primitive operations : SHORT REPORT / NOTESExperiments with Primitive operations : SHORT REPORT / NOTES
Experiments with Primitive operations : SHORT REPORT / NOTES
Subhajit Sahu
 
PageRank Experiments : SHORT REPORT / NOTES
PageRank Experiments : SHORT REPORT / NOTESPageRank Experiments : SHORT REPORT / NOTES
PageRank Experiments : SHORT REPORT / NOTES
Subhajit Sahu
 
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Subhajit Sahu
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
Subhajit Sahu
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
Subhajit Sahu
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
Subhajit Sahu
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
Subhajit Sahu
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Subhajit Sahu
 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTES
Subhajit Sahu
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
Subhajit Sahu
 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTES
Subhajit Sahu
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Subhajit Sahu
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Subhajit Sahu
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Subhajit Sahu
 

More from Subhajit Sahu (20)

About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
About TrueTime, Spanner, Clock synchronization, CAP theorem, Two-phase lockin...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Adjusting Bitset for graph : SHORT REPORT / NOTES
Adjusting Bitset for graph : SHORT REPORT / NOTESAdjusting Bitset for graph : SHORT REPORT / NOTES
Adjusting Bitset for graph : SHORT REPORT / NOTES
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Experiments with Primitive operations : SHORT REPORT / NOTES
Experiments with Primitive operations : SHORT REPORT / NOTESExperiments with Primitive operations : SHORT REPORT / NOTES
Experiments with Primitive operations : SHORT REPORT / NOTES
 
PageRank Experiments : SHORT REPORT / NOTES
PageRank Experiments : SHORT REPORT / NOTESPageRank Experiments : SHORT REPORT / NOTES
PageRank Experiments : SHORT REPORT / NOTES
 
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
Algorithmic optimizations for Dynamic Monolithic PageRank (from STICD) : SHOR...
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings o...
 
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTESDyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
DyGraph: A Dynamic Graph Generator and Benchmark Suite : NOTES
 
Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)Shared memory Parallelism (NOTES)
Shared memory Parallelism (NOTES)
 
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTESA Dynamic Algorithm for Local Community Detection in Graphs : NOTES
A Dynamic Algorithm for Local Community Detection in Graphs : NOTES
 
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTESScalable Static and Dynamic Community Detection Using Grappolo : NOTES
Scalable Static and Dynamic Community Detection Using Grappolo : NOTES
 
Application Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTESApplication Areas of Community Detection: A Review : NOTES
Application Areas of Community Detection: A Review : NOTES
 
Community Detection on the GPU : NOTES
Community Detection on the GPU : NOTESCommunity Detection on the GPU : NOTES
Community Detection on the GPU : NOTES
 
Survey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTESSurvey for extra-child-process package : NOTES
Survey for extra-child-process package : NOTES
 
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTERDynamic Batch Parallel Algorithms for Updating PageRank : POSTER
Dynamic Batch Parallel Algorithms for Updating PageRank : POSTER
 
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
Abstract for IPDPS 2022 PhD Forum on Dynamic Batch Parallel Algorithms for Up...
 
Fast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTESFast Incremental Community Detection on Dynamic Graphs : NOTES
Fast Incremental Community Detection on Dynamic Graphs : NOTES
 

Recently uploaded

Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
ThalapathyVijay15
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
ArjunJain44
 
Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
perweeng31
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
eemet
 
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
Amil baba
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
kywwoyk
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
PinkySharma900491
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
kywwoyk
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
freshgammer09
 

Recently uploaded (9)

Drugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptxDrugs used in parkinsonism and other movement disorders.pptx
Drugs used in parkinsonism and other movement disorders.pptx
 
F5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptxF5 LTM TROUBLESHOOTING Guide latest.pptx
F5 LTM TROUBLESHOOTING Guide latest.pptx
 
Cyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber securityCyber Sequrity.pptx is life of cyber security
Cyber Sequrity.pptx is life of cyber security
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
NO1 Uk Amil Baba In Lahore Kala Jadu In Lahore Best Amil In Lahore Amil In La...
 
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
一比一原版UVM毕业证佛蒙特大学毕业证成绩单如何办理
 
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
MATHEMATICS BRIDGE COURSE (TEN DAYS PLANNER) (FOR CLASS XI STUDENTS GOING TO ...
 
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
一比一原版SDSU毕业证圣地亚哥州立大学毕业证成绩单如何办理
 
web-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jerweb-tech-lab-manual-final-abhas.pdf. Jer
web-tech-lab-manual-final-abhas.pdf. Jer
 

DDR, GDDR, HBM Memory : Presentation

  • 1. Micron Hynix DDR SDRAM • Not backward compatible • Low frequency => • reduced signal integrity requirements. • Column x Row x Bank x Rank • = DIMM • 8chips x 8bit • = 64bits (module) Joint Electron Device Engineering Council
  • 2. DDR3 SDRAM • Upto 16GB, 1GB/chip, 1.5V (2005) • CL-RCD-RP: col. delay, row delay, prech. delay • BW: mem CK x 4 bus mult. X 2 DDR x 64b • eXtended Memory Profile (XMP, 2007) • DDR3L: 1.35V (2010), DDR3U: 1.25V (2011) • DDR4-800: 100MHz mem CK x 4 = 400MHz IO CK • 400MHz x 2 DDR x 64b = 6.4 GB/s (5-5-5) • DDR4-2133: 266MHz mem CK x 4 = 1066MHz IO CK • 1066MHz x 2 DDR x 64b = 17 GB/s (11-11-11) • Hynix 16GB: (16 x 1GB) 2133MHz x 64b = 17 GB/s Transcend 4GBECC DDR3 DRAM, 1600 MT/s (2013) 16GBDDR3 SO-DIMM,1600 MT/s, 1.35/1.5V
  • 3. DDR4 SDRAM • Upto 64GB, 1.2V, 40-16nm (2011) • Power: refresh strategies, data bus inv. • Row hammer: large capacitors, ASLR (addr. rnd.) • DIMM: 288 pins, 0.85mm spacink • SO-DIMM: 260 pins, 0.5mm spacing (skylake) • DDR4-1600: 200MHz mem CK x 4 = 800MHz IO CK • 800MHz x 2 DDR x 64b = 12.8 GB/s (10-10-10) • DDR4-3200: 400MHz mem CK x 4 = 1600MHz IO CK • 1600MHz x 2 DDR x 64b = 25.6 GB/s (20-20-20) • Hynix 16GB: (16 x 1GB) 2133MHz x 64b = 17 GB/s Samsung 2GBECC DDR4 DRAM, 2133 MT/s, 1.2V (2011) GOOD RAM16GBDDR4 DRAM, 2133 MT/s, 1.2V
  • 4. Registered Memory (RDIMM) • For high-density chips (36) • +1 cycle latency • Normally features ECC • Costlier, used in servers • Motherboard must support • RDIMM: +address, +command • LRDIMM: +data (+capacity) Two 8 GB DDR4-2133 ECC 1.2 V RDIMMs
  • 5. ECC Memory • Immune to 1bit errors (/64bits) • Servers: scientific, finance, L1-L2 • Error => crash, data corr., sec. • Row hammer: security exploit • Electrical, magnetic interference • Cosmic secondaries – neutrons • 1.5km: 3.5x, 10km: 300x • Cassini-Huygens: =112/Gb/day • Google servers: 1.67/Gb/day • Doesn’t inc. with dec. Size • Hamming code, Chipkill ECC (interleaved) • Non-ECC cant correct errorsECC DIMMs typically have9 memory chips on each side
  • 6. • Rival to Rambus XDR DRAM (2005) • GDDR4: Based on DDR3, 32bit, 1.5V • Data Bus Inversion (DBI): -PWR, -GND bounce • 2.5 Gb/s x 32bit = 10 GB/s (256Mb, 2005) • 4 Gb/s x 32bit = 16 GB/s (512Mb, 2007) • AMD Radeon HD 3870 GDDR4: 512MB, 72 GB/s • 1126MHz mem clock x 2 DDR = 2.25 Gb/s • (8 x 512Mb) x 2.25 Gb/s x 32b = 72 GB/s, 256bit GDDR4 SDRAM ATi Radeon HD 3870 512MBVRAM, 72 GB/s, 256bit(2007)
  • 7. • Used in GFX, consoles, HPC (2007) • GDDR5: Based on DDR3, 32bit, 67 signal, 170 BGA • Command CK: 1.25GHz, Data WCK: 2.5GHz (2x) • 32b x 2 DDR = 64bit/WCK • 32b x 2 DDR x 2 WCK x 2 CK = 256bit (transfer) • GDDR5: 5 Gb/s x 32bit = 20 GB/s (1Gb, 2008) • GDDR5: 8 Gb/s x 32bit = 32 GB/s (8Gb, 2015) • GDDR5X: DDR / Quad Date Rate (QDR), 190 BGA • GDDR5X: 11 Gb/s x 32bit = 44 GB/s (2016) • NVIDIA Titan X GDDR5X: 12GB, 480 GB/s • (12 x 1GB) x 40 GB/s = 480 GB/s, 384bit GDDR5 SDRAM NVIDIA Titan X 16nmFinFET, 12GBVRAM, 480 GB/s or 10 Gb/s (2016)
  • 8. • Used in GFX, consoles, HPC (2018) • GDDR6: 18 Gb/s x 32bit = 72 GB/s (2GB, 2018) • GDDR6X: 20 Gb/s x 32bit = 80 GB/s (2020) • GDDR6: Quad Date Rate (QDR), 1.35V • GDDR6X: PAM4, +PHY, -SNR, +BW, -PWR/bit • GDDR6 RAM: 12GB, 768 GB/s • (12 x 1GB) x 72 GB/s = 768 GB/s, 384bit • NVIDIA RTX 3090: 24GB, 19.5 Gb/s • (12 x 2GB) x 19.5 Gb/s x 32b = 936 GB/s, 384b GDDR6 SDRAM NVIDIA GeForceRTX 3090, Ampere, 24GBVRAM, 19.5 Gb/s (2016)
  • 9. High Bandwidth Memory (HBM) • 65nm sillicon interposer, 1.3V • Heat, engineering issues • Costlier than PCB • Used in high BW apps • HBM: AMD Fiji Radeon R9 Fury X (2015) • 128bit x 2 channels x 4 dies = 1024bit • 500MHz x 2 DDR x 1024b = 128 GB/s (4GB, 2013) • HBM2: Nvidia Tesla P100 (2016) • HBM2: 2 GT/s x 1024b = 256 GB/s (8GB, 2016) • HBM2E: 3.6 GT/s x 1024b = 460 GB/s (16GB, 2019) • (independent channels) AMD Fiji, the firstGPUto use HBM (2015)
  • 10. High Bandwidth Memory (HBM) • GDDR5 chips not getting smaller • Large no. of chips needed for high BW • High PCB area needed, complex routing • Higher power needed due to termination • More signal integrity challenges • Larger SoC PHY area needed • HBM ~3x less power requirement than GDDR5 • 94% less chip area than GDDR5 (=capacity) • 50% less PCB area than GDDR5 system • 2.5D interposer costly, relatively new
  • 11. Non-Volatile RAM (NVRAM) • Floating-gate MOSFET • EPROM, EEPROM, EEPROM+SRAM, Flash • Phase-change RAM (PCRAM): 3D XPoint Intel Optane, Micron Quantx • Ferroelectric RAM (FeRAM) • Magnetoresistive RAM (MRAM) • Electrochemical RAM (ECRAM) • Resistive RAM (ReRAM) • Can be fully-powered down • For in-memory analog mat-mul (DNN) Two Intel Optane 256GB DDR4 DIMMs with AES-256 (2019)
  • 12. Serial Presence Detect (SPD) • Standardized way to automatically access information about a memory module • What memory timings to use to access the memory • Overclocking: over-ride SPD data • Accessed using SMBus, a variant of the I²C protocol • SMBus: System Monitor - power supply, CPU temp., fan speeds. • XMP uses bytes 176–255, unallocated by JEDEC, to encode higher-performance memory timings. Memory device on an SDRAM module, containing SPD data