In this presentation we'll be discussing HBM2 and 2E memory technology. There's a lot of decisions that need to be made when you're developing high speed products for AI, and HBM has become the memory of choice. So we're going to go into detail and talk about the selection criteria and implementation details for HBM2E memory.
2. 2
Exponential Data Growth Mandates Increased Bandwidth
Source: Adapted from Jeff Dean, “Recent Advances in Artificial Intelligence and the
Implications for Computer System Design,” HotChips 29 Keynote, August 2017
More
Compute Neural Networks
Other Approaches
Accuracy
Scale (Data Size, Model Size)
1980s
– 1990s Now
• Exponential data growth is driving the need for new architectures
• Advances in computing have pushed bottleneck to memory
• Faster compute and large training sets needed for AI applications
Annual Size of the Global Datasphere
Source: Adapted from Data Age 2025, sponsored by Seagate
with data from IDC Global DataSphere, Nov 2018
2020 20252010 2015
20
40
60
80
100
120
140
160
180
175 ZB
Zettabytes
3. 3
Two Important Memories for AI/ML
Extremely High Bandwidth
and High Capacity
HBM2E
High Bandwidth, High Reliability and
Low Latency
GDDR6
• AI Training
• HPC
• Network (NIC)
• AI Inference
• Graphics
• Automotive (ADAS)
5. 5
Choosing the Correct Memory: Comparison Data
Parameter LPDDR4x LPDDR5 DDR4 GDDR6 HBM2E
Bandwidth (Gbps) Low-Medium
(136)
Medium
(204)
Medium (200) High (512) Highest (3686)
Data Rate (Gbps) 4.266 6.4 3.2 16 3.6
Interface width (bits) 32 32 64 32 1024
Board Area / System
Design
Large / Medium Medium/
Medium
Large / Easy Medium /
Medium
Small / Complex
Efficiency (mW/Gbps) High (3) High (3) Moderate (10) Moderate (10) Highest (2)
Cost ($) Medium Medium Low Medium High
Reliability/Yield Good Good Good Good Moderate
Applications Mobile, AI Mobile, AI Compute,
Network
AI, Graphics, Auto AI, HPC, Network
6. 6
Rambus HBM2E Solution Summary
HBM2E Memory Interface Subsystem
Advantages:
• Production experience
• Hardened timing-closed PHY
• System design support: interposer and
package
• Lab Station development environment:
bring-up support
HBM2E Interface Summary
• Verified HBM2E PHY and controller
• 461 GB/s bandwidth (@ 3.6 Gbps)
• Speed bins: 2, 2.4, 2.8, 3.2, 3.6,4.0Gbps
• DRAMs: Stack height of 2, 4, 8, 12
• Channels: 8x 128 bits
• ASIC Interface: DFI style
• Lane repair
• IEEE 1500 test support
• PHY independent mode
7. 7
Interposer Reference Design
• Interposer design is a critical component of
2.5D system: Rambus provides reference
designs
• Support all foundries/OSAT 2.5D
manufacturing process
• Rambus works with customers to support
their interposer/package design for the
highest data rates
• Channel simulations
• Layout reviews and feedback
• Channel parameter optimization:
• Channel length, width, line spacing and
pitch, number of routing/ground layers
HBM DRAM Stack
Processor
HBM Interface
Interposer
Substrate
PCB
1024
Processor
HBMInterface
Interposer
HBM
DRAM
Stack
9. 9
Rambus HBM2/2E Memory Interface Solution: Controller
• Complete, configurable solution
• Handles all design, test and bring-up challenges
• Fully validated
AXI
Interface 1
AXI
Interface 2
RMW
ECC
HBM2E
Memory
Controller
Core
Multi-PortFront-End
Memory
Test
Mem Test
Analyzer
HBM2E
DRAM
Memory Controller
HBM2E PHY
8x 128-bit
Channels
Optional
blocks
Customer
SOC/ASIC
PHY
Rambus Integrated Memory Subsystem
10. 10
HBM2/2E Controller Core
• Supports HBM2 / HBM2E
• 4, 8 / 12-high stacks
• 4, 8 / 6, 8, 12, 16, 24 Gbit density per channel
• 2, 2.4 / 2.8, 3.2, 3.6, 4.0 Gbit/s/pin
• Modular, highly configurable solution
• Delivered configured to customer requirements to minimize size, power, latency
• Memory parameters are run-time programmable
• High performance
• High bus efficiency across a wide variety of configurations (AXI, native interface) and traffic scenarios (random
and sequential accesses; short and long bursts, etc.)
• Reliability, Availability, Serviceability (RAS) support including ECC , ECC scrubbing and data path parity
protection
• Full featured Memory Test support
• Algorithmic, Arbitrary, Microcode Programmable address and data pattern options
• Delivered fully integrated and verified with the target HBM2/2E PHY
11. 11
HBM2/2E Controller Core Validation
• Simulation-Based Verification
• UVM Memory Testbench
• Over 100 Test Sequences
• Vendor (Samsung, SK hynix) and Avery Design Systems memory models
• Hardware-Based Validation
• Perform testing across a wide range of motherboard and plug-in FPGA-based boards
• Utilize Rambus GUI, Command Line App to drive tests
• Silicon-proven
• Memory system testchips
• Controller Core + PHY
• Deployed in multiple customer designs
12. 12
• #1 in market share ~50 customer designs
• First-time silicon success (no re-spins)
• Multiple tier 1 networking and AI/ML customers in production
• Very mature solution used in wide range of applications
• Performance
• Maximum throughput from both a bus efficiency and data rate
• First to achieve 4.0 Gbps for HBM2E memory interface
• Integrated and verified PHY and Controller solution
• PHY and Controller validated in both hardware and software
• PHY is a complete hardened macro including, PHY, IO, decap
• Provide interposer and package reference design – reduces customer
effort and design risk
• Strong customer support
• Work closely with customer in all project phases (design, tapeout, bring up)
• Lab Station development environment accelerates bring up
Why Choose Rambus HBM2/2E
HBM2E Hardware Development Board
Memory WR/RD scope shot
Why AI, Why Now?
Many of the techniques in use today were developed in the 80’s and 90’s during the last big wave of interest in neural networks. But they never took off back then, and conventional algorithms were used instead -- why is that?
There are 2 main reasons:
Compute wasn’t fast enough
Memory performance and capacity weren’t good enough, resulting in conventional approaches performing better
Fast forward a few decades to today, and Moore’s Law has given us 5 orders of magnitude better compute, and 3-4 orders of magnitude of improvement in memory performance and capacity
Now neural networks can outperform conventional algorithms
Another big reason for the proliferation of neural networks is the large and growing amount of digital data available to train them and improve their performance
The world’s digital data is growing exponentially, at a rate faster than technology growth rates for processing, memory, and networks
There is an interesting dependence that is developing – neural networks are increasingly needed to make sense of the growing amount of digital data, and this data in turn is needed to train and improve the performance of neural networks
We believe this trend will continue in the future
The industry has recognized the importance of neural networks by recognizing three distinguished researchers, Yann Le Cun, Geoffrey Hinton, and Yoshua Bengio, with the 2019 ACM Turing Award, often referred to as the Nobel Prize in Computing
Neural networks are still in their infancy, and there is much more untapped potential, but much more is performance needed. The irony is that just at the time we need more performance, the tools we’ve relied on for the past several decades, Moore’s Law and Dennard Scaling, are either slowing or have ended.
The critical question is how, as an industry, do we move forward?