Sr. Director of SSD Product Management, Elad Baram shares insights on ways to optimize SSD for client workloads. This presentation was originally shared during Flash Memory Summit 2016.
Developer Data Modeling Mistakes: From Postgres to NoSQL
Optimizing SSD Architecture for Client Workloads
1. Santa Clara, CA—August 2016 1
Optimizing SSD Architecture for
Client Workloads
Elad Baram
Sr. Director, SSD Product Management
2. Santa Clara, CA—August 2016 2
Agenda
Workloads and Locality Concept
SSD Architecture and Performance Enablers
Locality of Workloads Study
Recommendations
3. Santa Clara, CA—August 2016 3
Workload Key Attributes
Three key characteristics of workloads
• Bandwidth over time
– MB/s
• Transactions over time
– IOPs
• Locality over time
– The degree of repetitiveness in host logical addresses accesses
– Defined as % of hit/miss ratio relative to a given Logical to Physical (L2P) mapping
table size (i.e., workload with 4GB locality means a device with 4GB addresses
mapped table will have 90% hit rate)
4. Santa Clara, CA—August 2016 4
Locality Overview
SSD architectures use different sizes for L2P mapping tables
• From small tables in DRAM-less SSDs, to full 1:1 4KB mapping with DRAM
L2P table size is a cost/performance optimization decision
A study done to quantify impact of L2P table sizes on SSD performance
in different applications environments
• Study the locality of real-life client workloads
• Locality of benchmarks
• Understand optimal cost/performance
Main outcome - client workloads are highly localized
8. Santa Clara, CA—August 2016 8
Host
HMB
CPU/Logic
Host
Interface
Flash
Interface
DDR
NAND
DDR usage
• L2P (logical-to-physical) translation tables (>90% of space)
• Buffering
• Code space
SSD
SSD Block Diagram – What is Enabled by DDR?
9. Santa Clara, CA—August 2016 9
L2P Table Size Impact on SSDs Performance
RR IOPS
Workload LBA Range
4KB 1:1 L2P Mapping
1GB 128GB
Maximum
system
IOPS
256GB
‘Control read’* penalty
L2P table size does NOT define
the maximum performance
• Those are defined by NAND, CPU,
FW efficiencies
L2P table size defines the envelope
in which IOPS can be maintained
PCMark
Vantage
Crystal
Diskmark
* Control read is an internal read command issued by SSD to bring meta-data, such as mapping table page
11. Santa Clara, CA—August 2016 11
Createfile
Sequential Read
multiple IOs, threads
Random Read
multiple IOs, threads
Random Read
Single IO, thread
Sequential
Write Single IO,
thread
Sequential
Write multiple
IOs, threads
Random Write
multiple IOs,
threads
Random Write
Single IO, thread
Sequential Read
Single IO, thread
Synthetic Benchmark—Crystal Disk Mark
Read
Write
Logical address accessed by the host over time
CDM accesses ~1GB logical range
12. Santa Clara, CA—August 2016 12
Synthetic Benchmark—Crystal Disk Mark
Sequential Read
Multiple threads
Random Read
Multiple threads
Sequential Read
Single thread
Random Read
Single thread
Sequential Write
Multiple threads
Random Write
Multiple threads
Sequential Write
Single thread
Random Write
Single thread
Read
Write
13. Santa Clara, CA—August 2016 13
Windows
Defender
Gaming Importing Pictures Windows Startup Windows Media Center Adding music
to Windows
Media
Video Editing Applications
Loading
PCMark Vantage Workload Read
Write
Different logical access pattern for each use case
Bandwidth & IOPS are bursty
14. Santa Clara, CA—August 2016 14
3 Days 6 Days 8 Days
Real User Workload (10 Days) Read
Write
Broad address spread
Bandwidth & IOPS are bursty
15. Santa Clara, CA—August 2016 15
Real User Workload
Read
Write
How can you translate access pattern raw data into insightful design decisions?
17. Santa Clara, CA—August 2016 17
Research Flow
L2P Tables
Read
request
Is requested
address stored
in table?
Evacuate space
(defined policy)
Fetch new
address
(miss)
Continue
(hit)
Command Trace
Fed into simulator
Yes
No
NAND NAND NAND NAND
18. Santa Clara, CA—August 2016 18
4MB L2P Table Size Drives Higher than 90% Hit Rates
0
10
20
30
40
50
60
70
80
90
100
5
38
42
79
90
119
122
144
196
310
429
435
450
510
583
589
608
615
1067
1968
2137
2247
2773
3643
3648
3665
3768
3796
3801
3810
3817
4622
4940
4958
5135
5433
5436
5485
5579
5583
5590
5599
6285
HITRATE(%)
TIME
0.5MB
4MB
512MB
L2P Table Size
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
SYSMark 2014 Hit Rates for Various L2P Table Sizes
19. Santa Clara, CA—August 2016 19
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
HITRATE(%)
TIME
0.5MB
4MB
256MB
4MB L2P Table Size Drives Higher than 90% Hit Rates
Trace period 21 days.
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
Real User (Developer Profile) Hit Rates for Various L2P Table Sizes
20. Santa Clara, CA—August 2016 20
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB
HITRATE(%)
L2P SIZE
PCMARK VANTAGE
PCMark 8
SYSMark 2014
MobileMark 2014
Copy Files and Folders 11.6GB
Corporate Profile
Developer Profile
4GB Logical Range Coverage (4MB L2P Table Size) Provides
95%+ Average Hit Rate for Benchmarks and Workloads
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
Hit Rates in L2P Table Sizes for Various Workloads
21. Santa Clara, CA—August 2016 21
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
IOMeter 40GB LBA range, SW SR RW RR
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB
HITRATE(%)
L2P SIZE
PCMARK VANTAGE
PCMark 8
SYSMark 2014
MobileMark 2014
Copy Files and Folders 11.6GB
Corporate Profile
Developer Profile
IOMeter
Synthetic workload
is not a reflection
of typical client
Hit Rates in L2P Table Sizes for Various Workloads
22. Locality of Workloads
Locality represents the required L2P table size that enables 90% hit rate for read pattern
1
10
100
1,000
1 10 100 1,000
TotalReads(GB)
L2P Size (MB)
CDM 1GB CDM 4GB
PCMark 8
Copy Files and Folders
IOMeter Full Range
PCMark Vantage
Office Productivity
Media Creation
23. Additional Optimizations for Client SSD
• Eliminating DRAM component enables
• Higher density on single side M.2
• Power savings
• Cost optimization
M.2 2280
NAND
DDR
CTRL
NAND NAND NAND
Controller
M.2 2280
NAND NAND NAND NANDDDR
X
24. Santa Clara, CA—August 2016 24
Summary
Client workloads are bursty – SLC caching is appropriate
Client workloads are highly localized
• Windows productivity applications
• PCMark / Sysmark are good representatives for locality of user applications
• Full range logical test area is not a reflection of client workloads
A 4GB logical mapping range is the optimal cost/performance point
Editor's Notes
Explain only the address according to operation chart