Optimizing SSD Architecture for Client Workloads

Santa Clara, CA—August 2016 1
Optimizing SSD Architecture for
Client Workloads
Elad Baram
Sr. Director, SSD Product Management

Agenda
 Workloads and Locality Concept
 SSD Architecture and Performance Enablers
 Locality of Workloads Study
 Recommendations

Workload Key Attributes
 Three key characteristics of workloads
• Bandwidth over time
– MB/s
• Transactions over time
– IOPs
• Locality over time
– The degree of repetitiveness in host logical addresses accesses
– Defined as % of hit/miss ratio relative to a given Logical to Physical (L2P) mapping
table size (i.e., workload with 4GB locality means a device with 4GB addresses
mapped table will have 90% hit rate)

Locality Overview
 SSD architectures use different sizes for L2P mapping tables
• From small tables in DRAM-less SSDs, to full 1:1 4KB mapping with DRAM
 L2P table size is a cost/performance optimization decision
 A study done to quantify impact of L2P table sizes on SSD performance
in different applications environments
• Study the locality of real-life client workloads
• Locality of benchmarks
• Understand optimal cost/performance
 Main outcome - client workloads are highly localized

SSD Block Diagram
Host
HMB
CPU/Logic
Host
Interface
Flash
Interface
DDR
NAND
SSD

Host
HMB
CPU/Logic
Host
Interface
Flash
Interface
DDR
NAND
SSD
Factors limiting performance
SSD Block Diagram
Sequential Read/Write Enablers

Host
HMB
CPU/Logic/FW
Host
Interface
Flash
Interface
DDR
NAND
SSD
Factors limiting performance
SSD Block Diagram
IOPS Enablers

Host
HMB
CPU/Logic
Host
Interface
Flash
Interface
DDR
NAND
 DDR usage
• L2P (logical-to-physical) translation tables (>90% of space)
• Buffering
• Code space
SSD
SSD Block Diagram – What is Enabled by DDR?

L2P Table Size Impact on SSDs Performance
RR IOPS
Workload LBA Range
4KB 1:1 L2P Mapping
1GB 128GB
Maximum
system
IOPS
256GB
‘Control read’* penalty
 L2P table size does NOT define
the maximum performance
• Those are defined by NAND, CPU,
FW efficiencies
 L2P table size defines the envelope
in which IOPS can be maintained
PCMark
Vantage
Crystal
Diskmark
* Control read is an internal read command issued by SSD to bring meta-data, such as mapping table page

Deeper look into workloads

Createfile
Sequential Read
multiple IOs, threads
Random Read
multiple IOs, threads
Random Read
Single IO, thread
Sequential
Write Single IO,
thread
Sequential
Write multiple
IOs, threads
Random Write
multiple IOs,
threads
Random Write
Single IO, thread
Sequential Read
Single IO, thread
Synthetic Benchmark—Crystal Disk Mark
Read
Write
Logical address accessed by the host over time
CDM accesses ~1GB logical range

Synthetic Benchmark—Crystal Disk Mark
Sequential Read
Multiple threads
Random Read
Multiple threads
Sequential Read
Single thread
Random Read
Single thread
Sequential Write
Multiple threads
Random Write
Multiple threads
Sequential Write
Single thread
Random Write
Single thread
Read
Write

Windows
Defender
Gaming Importing Pictures Windows Startup Windows Media Center Adding music
to Windows
Media
Video Editing Applications
Loading
PCMark Vantage Workload Read
Write
Different logical access pattern for each use case
Bandwidth & IOPS are bursty

3 Days 6 Days 8 Days
Real User Workload (10 Days) Read
Write
Broad address spread
Bandwidth & IOPS are bursty

Real User Workload
Read
Write
How can you translate access pattern raw data into insightful design decisions?

Locality of Client SSD Workloads

Research Flow
L2P Tables
Read
request
Is requested
address stored
in table?
Evacuate space
(defined policy)
Fetch new
address
(miss)
Continue
(hit)
Command Trace
Fed into simulator
Yes
No
NAND NAND NAND NAND

4MB L2P Table Size Drives Higher than 90% Hit Rates
0
10
20
30
40
50
60
70
80
90
100
5
38
42
79
90
119
122
144
196
310
429
435
450
510
583
589
608
615
1067
1968
2137
2247
2773
3643
3648
3665
3768
3796
3801
3810
3817
4622
4940
4958
5135
5433
5436
5485
5579
5583
5590
5599
6285
HITRATE(%)
TIME
0.5MB
4MB
512MB
L2P Table Size
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
SYSMark 2014 Hit Rates for Various L2P Table Sizes

0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
HITRATE(%)
TIME
0.5MB
4MB
256MB
4MB L2P Table Size Drives Higher than 90% Hit Rates
Trace period 21 days.
Real User (Developer Profile) Hit Rates for Various L2P Table Sizes

0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB
HITRATE(%)
L2P SIZE
PCMARK VANTAGE
PCMark 8
SYSMark 2014
MobileMark 2014
Copy Files and Folders 11.6GB
Corporate Profile
Developer Profile
4GB Logical Range Coverage (4MB L2P Table Size) Provides
95%+ Average Hit Rate for Benchmarks and Workloads
Hit Rates in L2P Table Sizes for Various Workloads

IOMeter 40GB LBA range, SW SR RW RR
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB
HITRATE(%)
L2P SIZE
PCMARK VANTAGE
PCMark 8
SYSMark 2014
MobileMark 2014
Copy Files and Folders 11.6GB
Corporate Profile
Developer Profile
IOMeter
Synthetic workload
is not a reflection
of typical client
Hit Rates in L2P Table Sizes for Various Workloads

Locality of Workloads
Locality represents the required L2P table size that enables 90% hit rate for read pattern
1
10
100
1,000
1 10 100 1,000
TotalReads(GB)
L2P Size (MB)
CDM 1GB CDM 4GB
PCMark 8
Copy Files and Folders
IOMeter Full Range
PCMark Vantage
Office Productivity
Media Creation

Additional Optimizations for Client SSD
• Eliminating DRAM component enables
• Higher density on single side M.2
• Power savings
• Cost optimization
M.2 2280
NAND
DDR
CTRL
NAND NAND NAND
Controller
M.2 2280
NAND NAND NAND NANDDDR
X

Summary
 Client workloads are bursty – SLC caching is appropriate
 Client workloads are highly localized
• Windows productivity applications
• PCMark / Sysmark are good representatives for locality of user applications
• Full range logical test area is not a reflection of client workloads
 A 4GB logical mapping range is the optimal cost/performance point

Optimizing SSD Architecture for Client Workloads

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Optimizing SSD Architecture for Client Workloads

Similar to Optimizing SSD Architecture for Client Workloads (20)

Recently uploaded

Recently uploaded (20)

Optimizing SSD Architecture for Client Workloads

Editor's Notes