Big Memory Software for HPC Accelerates AI Workloads

Big Memory Software for HPC
Dr. Charles Fan
CEO, MemVerge

• Multi-Cloud
• Memory-Centric
• Software-Composed
Future of Infrastructure
On-Prem Data Centers
Private Clouds

Today’s Computer
DRAM
Storage
App
CPU
DRAM
Pros
• Fast
Cons
• Low Capacity
• High Cost
• Volatile
Storage
Pros
• High-Capacity
• Low-Cost
• Non-Volatile
Cons
• Slow
Apps Run in DRAM
I/O

Data Has Become Big & Fast
Capital Markets
3D Animation Oil & Gas
Big Data Analytics
Virtual Servers
AI/ML Inference
Demanding Memory-Centric Infrastructure
0%
5%
10%
15%
20%
25%
30%
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
40,000,000
2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
Shareofreal-timedata(%)
Real-timedata(PB)
WW Real-Time Data Share, 2015-2024, IDC
Real-time data (PB)
Share of real-time data with Global Datashphere (%)

The Rise of Big Memory Computing
App
CPU
DRAM + PMEM
Pros
• Fast
• High-Capacity
• Low-Cost
• Non-Volatile
DRAM
Apps Run in DRAM and PMEM
Big Memory Software
$0
$500
$1,000
$1,500
$2,000
$2,500
$3,000
Byte-Addressable PMEM Revenue, IDC ($M)
2019 2020 2021 2022 2023
$2.6B
248% CAGR 2019-
2023

Memory Machine™: World’s First
Big Memory Software
6
Memory Machine™ Platform
DRAM
Bigger Memory at Lower Cost
without Performance Compromise
• Up to 9TB memory/2-way server
• 30-50% Memory Cost Savings
• DRAM-Performance
Persistence On-demand
• ZeroIO™ In-Memory Snapshot
• Fast Crash Recovery
• Thin-Clones
No Application Change!

Big Memory Software Impacts HPC
AvailabilityPerformance Agility

• Motivation
• Large model and embedding table size
• Model size to GB level, embedding table size to TB level
• Multiple models on single server
• Online inference service: real time and low latency
• Return results in tens of ms
• Ideal solution
• Put models and embedding tables into DRAM
• Limitations
• High TCO
• Limited DRAM space
• Volatile
Inference with Large Model and Feature Embeddings
8

• Our solution
• Models and embedding tables in DRAM + PMEM
• Benefit
• Big memory can include all embedding tables on one server
• Similar read performance as DRAM, very suitable for read-heavy
scenario such as online inference
• Data persistence on PMEM
Inference on Memory Machine
9

Example 1: Facebook’s DLRM
10
• Deep learning recommendation model for personalization and recommendation systems
• Consists of dense and sparse features
• Dense feature: a vector of floating-point values
• Sparse feature: a list of sparse indices into embedding tables
• Open source:
https://github.com/facebookresearch/dlrm
M. Naumov, et al. Deep Learning Recommendation Model for Personalization and
Recommendation Systems, 2019 https://arxiv.org/abs/1906.00091

Evaluation Setup
• Hardware:
• Intel(R) Xeon(R) Platinum 8280 CPU @ 2.70GHz (112 cores)
• 192 GB DRAM, 1.5TB PMEM, 400GB NVMe SSD
• Software
• RHEL 8.2
• Memory Machine v1.0
• Latest DRLM framework
• Testing cases: model + embedding
• In memory data size 26G/52G/104G/192G
• Features: 100 sparse features (100 embedding tables, embedding vector
dimension is 64), 512 dense features
• Measuring inference time for 20480 records in one batch (Criteo Dataset)
11

Example 1: DLRM Inference Performance
12
3592 4965
8429
174721
5487 6961 7740 8556
180778
187072
199472
203846
0
50000
100000
150000
200000
250000
26GB 52GB 104GB 192GB
Inference Time (ms)
All DRAM DRAM+PMEM DRAM+NVMe

Example 2: Image Recognition Performance
0
5
10
15
20
25
30
1 2 4 8 16 32 64 128 256
TPS
Mongo+DRAM Memory Engine
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
1 2 4 8 16 32 64 128 256
Latency
Mongo+DRAM(us) Memory Engine(us)
13

• How to improve the fault tolerance of new model publishing?
• Pushing new model into production is risky
• If failed, revert to last workable version ASAP
• Rollback/Model reloading takes time (for large models) due to slow I/O
• Leveraging PMEM’s persistence
• Take a snapshot of the model serving application
• Restore a snapshot without reloading from disk or remote storage
• Snapshot can be published to many serving nodes via memory-to-memory snapshot replication
• Solution
• Instantaneous snapshot without interrupting online inference
• Instantaneous rollback without loading and publishing time
• Snapshot, rollback, and recovery are within 1 second
Persistent Memory for
Instant Model Rollback/Recovery

• Memory Machine provides
o Larger and cheaper heterogenous memory for faster inference
o Persistent memory for instant model snapshot and recovery
o No application change is needed
• Human reasons everything fully from memory
o So will machine learning in the era of Big Memory
Summary
15

Big Memory Software Will Be a $10B+ Market
Compute
Memory
Performance Storage
Capacity Storage
Compute
Big Memory
Capacity Storage

Big Memory Software for HPC Accelerates AI Workloads

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Memory Software for HPC Accelerates AI Workloads

Similar to Big Memory Software for HPC Accelerates AI Workloads (20)

More from MemVerge

More from MemVerge (8)

Recently uploaded

Recently uploaded (20)

Big Memory Software for HPC Accelerates AI Workloads