SlideShare a Scribd company logo
1 of 24
Santa Clara, CA—August 2016 1
Optimizing SSD Architecture for
Client Workloads
Elad Baram
Sr. Director, SSD Product Management
Santa Clara, CA—August 2016 2
Agenda
 Workloads and Locality Concept
 SSD Architecture and Performance Enablers
 Locality of Workloads Study
 Recommendations
Santa Clara, CA—August 2016 3
Workload Key Attributes
 Three key characteristics of workloads
• Bandwidth over time
– MB/s
• Transactions over time
– IOPs
• Locality over time
– The degree of repetitiveness in host logical addresses accesses
– Defined as % of hit/miss ratio relative to a given Logical to Physical (L2P) mapping
table size (i.e., workload with 4GB locality means a device with 4GB addresses
mapped table will have 90% hit rate)
Santa Clara, CA—August 2016 4
Locality Overview
 SSD architectures use different sizes for L2P mapping tables
• From small tables in DRAM-less SSDs, to full 1:1 4KB mapping with DRAM
 L2P table size is a cost/performance optimization decision
 A study done to quantify impact of L2P table sizes on SSD performance
in different applications environments
• Study the locality of real-life client workloads
• Locality of benchmarks
• Understand optimal cost/performance
 Main outcome - client workloads are highly localized
Santa Clara, CA—August 2016 5
SSD Block Diagram
Host
HMB
CPU/Logic
Host
Interface
Flash
Interface
DDR
NAND
SSD
Santa Clara, CA—August 2016 6
Host
HMB
CPU/Logic
Host
Interface
Flash
Interface
DDR
NAND
SSD
Factors limiting performance
SSD Block Diagram
Sequential Read/Write Enablers
Santa Clara, CA—August 2016 7
Host
HMB
CPU/Logic/FW
Host
Interface
Flash
Interface
DDR
NAND
SSD
Factors limiting performance
SSD Block Diagram
IOPS Enablers
Santa Clara, CA—August 2016 8
Host
HMB
CPU/Logic
Host
Interface
Flash
Interface
DDR
NAND
 DDR usage
• L2P (logical-to-physical) translation tables (>90% of space)
• Buffering
• Code space
SSD
SSD Block Diagram – What is Enabled by DDR?
Santa Clara, CA—August 2016 9
L2P Table Size Impact on SSDs Performance
RR IOPS
Workload LBA Range
4KB 1:1 L2P Mapping
1GB 128GB
Maximum
system
IOPS
256GB
‘Control read’* penalty
 L2P table size does NOT define
the maximum performance
• Those are defined by NAND, CPU,
FW efficiencies
 L2P table size defines the envelope
in which IOPS can be maintained
PCMark
Vantage
Crystal
Diskmark
* Control read is an internal read command issued by SSD to bring meta-data, such as mapping table page
Santa Clara, CA—August 2016 10
Deeper look into workloads
Santa Clara, CA—August 2016 11
Createfile
Sequential Read
multiple IOs, threads
Random Read
multiple IOs, threads
Random Read
Single IO, thread
Sequential
Write Single IO,
thread
Sequential
Write multiple
IOs, threads
Random Write
multiple IOs,
threads
Random Write
Single IO, thread
Sequential Read
Single IO, thread
Synthetic Benchmark—Crystal Disk Mark
Read
Write
Logical address accessed by the host over time
CDM accesses ~1GB logical range
Santa Clara, CA—August 2016 12
Synthetic Benchmark—Crystal Disk Mark
Sequential Read
Multiple threads
Random Read
Multiple threads
Sequential Read
Single thread
Random Read
Single thread
Sequential Write
Multiple threads
Random Write
Multiple threads
Sequential Write
Single thread
Random Write
Single thread
Read
Write
Santa Clara, CA—August 2016 13
Windows
Defender
Gaming Importing Pictures Windows Startup Windows Media Center Adding music
to Windows
Media
Video Editing Applications
Loading
PCMark Vantage Workload Read
Write
Different logical access pattern for each use case
Bandwidth & IOPS are bursty
Santa Clara, CA—August 2016 14
3 Days 6 Days 8 Days
Real User Workload (10 Days) Read
Write
Broad address spread
Bandwidth & IOPS are bursty
Santa Clara, CA—August 2016 15
Real User Workload
Read
Write
How can you translate access pattern raw data into insightful design decisions?
Santa Clara, CA—August 2016 16
Locality of Client SSD Workloads
Santa Clara, CA—August 2016 17
Research Flow
L2P Tables
Read
request
Is requested
address stored
in table?
Evacuate space
(defined policy)
Fetch new
address
(miss)
Continue
(hit)
Command Trace
Fed into simulator
Yes
No
NAND NAND NAND NAND
Santa Clara, CA—August 2016 18
4MB L2P Table Size Drives Higher than 90% Hit Rates
0
10
20
30
40
50
60
70
80
90
100
5
38
42
79
90
119
122
144
196
310
429
435
450
510
583
589
608
615
1067
1968
2137
2247
2773
3643
3648
3665
3768
3796
3801
3810
3817
4622
4940
4958
5135
5433
5436
5485
5579
5583
5590
5599
6285
HITRATE(%)
TIME
0.5MB
4MB
512MB
L2P Table Size
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
SYSMark 2014 Hit Rates for Various L2P Table Sizes
Santa Clara, CA—August 2016 19
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
HITRATE(%)
TIME
0.5MB
4MB
256MB
4MB L2P Table Size Drives Higher than 90% Hit Rates
Trace period 21 days.
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
Real User (Developer Profile) Hit Rates for Various L2P Table Sizes
Santa Clara, CA—August 2016 20
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB
HITRATE(%)
L2P SIZE
PCMARK VANTAGE
PCMark 8
SYSMark 2014
MobileMark 2014
Copy Files and Folders 11.6GB
Corporate Profile
Developer Profile
4GB Logical Range Coverage (4MB L2P Table Size) Provides
95%+ Average Hit Rate for Benchmarks and Workloads
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
Hit Rates in L2P Table Sizes for Various Workloads
Santa Clara, CA—August 2016 21
Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform
IOMeter 40GB LBA range, SW SR RW RR
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB
HITRATE(%)
L2P SIZE
PCMARK VANTAGE
PCMark 8
SYSMark 2014
MobileMark 2014
Copy Files and Folders 11.6GB
Corporate Profile
Developer Profile
IOMeter
Synthetic workload
is not a reflection
of typical client
Hit Rates in L2P Table Sizes for Various Workloads
Locality of Workloads
Locality represents the required L2P table size that enables 90% hit rate for read pattern
1
10
100
1,000
1 10 100 1,000
TotalReads(GB)
L2P Size (MB)
CDM 1GB CDM 4GB
PCMark 8
Copy Files and Folders
IOMeter Full Range
PCMark Vantage
Office Productivity
Media Creation
Additional Optimizations for Client SSD
• Eliminating DRAM component enables
• Higher density on single side M.2
• Power savings
• Cost optimization
M.2 2280
NAND
DDR
CTRL
NAND NAND NAND
Controller
M.2 2280
NAND NAND NAND NANDDDR
X
Santa Clara, CA—August 2016 24
Summary
 Client workloads are bursty – SLC caching is appropriate
 Client workloads are highly localized
• Windows productivity applications
• PCMark / Sysmark are good representatives for locality of user applications
• Full range logical test area is not a reflection of client workloads
 A 4GB logical mapping range is the optimal cost/performance point

More Related Content

What's hot

Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
 Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Overview of Apache Spark 2.3: What’s New? with Sameer AgarwalDatabricks
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationshadooparchbook
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteDatabricks
 
Taking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level – ArchitectureTaking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level – ArchitectureSplunk
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareDatabricks
 
Solving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQLSolving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQLJulien Pierre
 
AWS Segment XO Group Joint webinar
AWS Segment XO Group Joint webinarAWS Segment XO Group Joint webinar
AWS Segment XO Group Joint webinarArti Bhatia
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesAnand Narayanan
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Chicago Hadoop Users Group
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsDatabricks
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Databricks
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengBuilding Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengDatabricks
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsMichael Stack
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Databricks
 
What SQL DBA's need to know about SharePoint-St. Louis 2013
What SQL DBA's need to know about SharePoint-St. Louis 2013What SQL DBA's need to know about SharePoint-St. Louis 2013
What SQL DBA's need to know about SharePoint-St. Louis 2013J.D. Wade
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...Michael Stack
 
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUsCreating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUsDatabricks
 

What's hot (20)

HBase/PHOENIX @ Scale
HBase/PHOENIX @ ScaleHBase/PHOENIX @ Scale
HBase/PHOENIX @ Scale
 
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
 Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier LeauteWhy is My Stream Processing Job Slow? with Xavier Leaute
Why is My Stream Processing Job Slow? with Xavier Leaute
 
Taking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level – ArchitectureTaking Splunk to the Next Level – Architecture
Taking Splunk to the Next Level – Architecture
 
Apache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why CareApache Spark 3.0: Overview of What’s New and Why Care
Apache Spark 3.0: Overview of What’s New and Why Care
 
Solving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQLSolving low latency query over big data with Spark SQL
Solving low latency query over big data with Spark SQL
 
AWS Segment XO Group Joint webinar
AWS Segment XO Group Joint webinarAWS Segment XO Group Joint webinar
AWS Segment XO Group Joint webinar
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Spark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika TechnologiesSpark Internals Training | Apache Spark | Spark | Anika Technologies
Spark Internals Training | Apache Spark | Spark | Anika Technologies
 
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
Using HBase Co-Processors to Build a Distributed, Transactional RDBMS - Splic...
 
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload DiagnosticsTracing the Breadcrumbs: Apache Spark Workload Diagnostics
Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang PengBuilding Operational Data Lake using Spark and SequoiaDB with Yang Peng
Building Operational Data Lake using Spark and SequoiaDB with Yang Peng
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1
 
What SQL DBA's need to know about SharePoint-St. Louis 2013
What SQL DBA's need to know about SharePoint-St. Louis 2013What SQL DBA's need to know about SharePoint-St. Louis 2013
What SQL DBA's need to know about SharePoint-St. Louis 2013
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
 
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUsCreating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
Creating an 86,000 Hour Speech Dataset with Apache Spark and TPUs
 

Similar to Optimizing SSD Architecture for Client Workloads

Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
FlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalkFlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalkI Goo Lee
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank:  Rocking the Database World with RocksDBThe Hive Think Tank:  Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheNicolas Poggi
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...HostedbyConfluent
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs FasterBob Ward
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesSnappyData
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)Nicolas Poggi
 
Using ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceUsing ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceMind The Firebird
 
GigaSpaces Flash Memory Summit 2014
GigaSpaces Flash Memory Summit 2014GigaSpaces Flash Memory Summit 2014
GigaSpaces Flash Memory Summit 2014Shay Hassidim
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Employing ECCs via Overprovisioning to Improve Flash Reliability:
Employing ECCs via Overprovisioning to Improve Flash Reliability:Employing ECCs via Overprovisioning to Improve Flash Reliability:
Employing ECCs via Overprovisioning to Improve Flash Reliability:Jonathan Long
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail Internet World
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...Amazon Web Services
 
Identifying the Potential of Near Data Processing for Apache Spark
Identifying the Potential of Near Data Processing for Apache SparkIdentifying the Potential of Near Data Processing for Apache Spark
Identifying the Potential of Near Data Processing for Apache SparkAhsan Javed Awan
 

Similar to Optimizing SSD Architecture for Client Workloads (20)

Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
FlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalkFlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalk
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank:  Rocking the Database World with RocksDBThe Hive Think Tank:  Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
Accelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket CacheAccelerating HBase with NVMe and Bucket Cache
Accelerating HBase with NVMe and Bucket Cache
 
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
 
SQL Server It Just Runs Faster
SQL Server It Just Runs FasterSQL Server It Just Runs Faster
SQL Server It Just Runs Faster
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out Databases
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
Using ТРСС to study Firebird performance
Using ТРСС to study Firebird performanceUsing ТРСС to study Firebird performance
Using ТРСС to study Firebird performance
 
GigaSpaces Flash Memory Summit 2014
GigaSpaces Flash Memory Summit 2014GigaSpaces Flash Memory Summit 2014
GigaSpaces Flash Memory Summit 2014
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Employing ECCs via Overprovisioning to Improve Flash Reliability:
Employing ECCs via Overprovisioning to Improve Flash Reliability:Employing ECCs via Overprovisioning to Improve Flash Reliability:
Employing ECCs via Overprovisioning to Improve Flash Reliability:
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 
Storage and performance, Whiptail
Storage and performance, Whiptail Storage and performance, Whiptail
Storage and performance, Whiptail
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
23 October 2013 - AWS 201 - A Walk through the AWS Cloud: Introduction to Ama...
 
Identifying the Potential of Near Data Processing for Apache Spark
Identifying the Potential of Near Data Processing for Apache SparkIdentifying the Potential of Near Data Processing for Apache Spark
Identifying the Potential of Near Data Processing for Apache Spark
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Optimizing SSD Architecture for Client Workloads

  • 1. Santa Clara, CA—August 2016 1 Optimizing SSD Architecture for Client Workloads Elad Baram Sr. Director, SSD Product Management
  • 2. Santa Clara, CA—August 2016 2 Agenda  Workloads and Locality Concept  SSD Architecture and Performance Enablers  Locality of Workloads Study  Recommendations
  • 3. Santa Clara, CA—August 2016 3 Workload Key Attributes  Three key characteristics of workloads • Bandwidth over time – MB/s • Transactions over time – IOPs • Locality over time – The degree of repetitiveness in host logical addresses accesses – Defined as % of hit/miss ratio relative to a given Logical to Physical (L2P) mapping table size (i.e., workload with 4GB locality means a device with 4GB addresses mapped table will have 90% hit rate)
  • 4. Santa Clara, CA—August 2016 4 Locality Overview  SSD architectures use different sizes for L2P mapping tables • From small tables in DRAM-less SSDs, to full 1:1 4KB mapping with DRAM  L2P table size is a cost/performance optimization decision  A study done to quantify impact of L2P table sizes on SSD performance in different applications environments • Study the locality of real-life client workloads • Locality of benchmarks • Understand optimal cost/performance  Main outcome - client workloads are highly localized
  • 5. Santa Clara, CA—August 2016 5 SSD Block Diagram Host HMB CPU/Logic Host Interface Flash Interface DDR NAND SSD
  • 6. Santa Clara, CA—August 2016 6 Host HMB CPU/Logic Host Interface Flash Interface DDR NAND SSD Factors limiting performance SSD Block Diagram Sequential Read/Write Enablers
  • 7. Santa Clara, CA—August 2016 7 Host HMB CPU/Logic/FW Host Interface Flash Interface DDR NAND SSD Factors limiting performance SSD Block Diagram IOPS Enablers
  • 8. Santa Clara, CA—August 2016 8 Host HMB CPU/Logic Host Interface Flash Interface DDR NAND  DDR usage • L2P (logical-to-physical) translation tables (>90% of space) • Buffering • Code space SSD SSD Block Diagram – What is Enabled by DDR?
  • 9. Santa Clara, CA—August 2016 9 L2P Table Size Impact on SSDs Performance RR IOPS Workload LBA Range 4KB 1:1 L2P Mapping 1GB 128GB Maximum system IOPS 256GB ‘Control read’* penalty  L2P table size does NOT define the maximum performance • Those are defined by NAND, CPU, FW efficiencies  L2P table size defines the envelope in which IOPS can be maintained PCMark Vantage Crystal Diskmark * Control read is an internal read command issued by SSD to bring meta-data, such as mapping table page
  • 10. Santa Clara, CA—August 2016 10 Deeper look into workloads
  • 11. Santa Clara, CA—August 2016 11 Createfile Sequential Read multiple IOs, threads Random Read multiple IOs, threads Random Read Single IO, thread Sequential Write Single IO, thread Sequential Write multiple IOs, threads Random Write multiple IOs, threads Random Write Single IO, thread Sequential Read Single IO, thread Synthetic Benchmark—Crystal Disk Mark Read Write Logical address accessed by the host over time CDM accesses ~1GB logical range
  • 12. Santa Clara, CA—August 2016 12 Synthetic Benchmark—Crystal Disk Mark Sequential Read Multiple threads Random Read Multiple threads Sequential Read Single thread Random Read Single thread Sequential Write Multiple threads Random Write Multiple threads Sequential Write Single thread Random Write Single thread Read Write
  • 13. Santa Clara, CA—August 2016 13 Windows Defender Gaming Importing Pictures Windows Startup Windows Media Center Adding music to Windows Media Video Editing Applications Loading PCMark Vantage Workload Read Write Different logical access pattern for each use case Bandwidth & IOPS are bursty
  • 14. Santa Clara, CA—August 2016 14 3 Days 6 Days 8 Days Real User Workload (10 Days) Read Write Broad address spread Bandwidth & IOPS are bursty
  • 15. Santa Clara, CA—August 2016 15 Real User Workload Read Write How can you translate access pattern raw data into insightful design decisions?
  • 16. Santa Clara, CA—August 2016 16 Locality of Client SSD Workloads
  • 17. Santa Clara, CA—August 2016 17 Research Flow L2P Tables Read request Is requested address stored in table? Evacuate space (defined policy) Fetch new address (miss) Continue (hit) Command Trace Fed into simulator Yes No NAND NAND NAND NAND
  • 18. Santa Clara, CA—August 2016 18 4MB L2P Table Size Drives Higher than 90% Hit Rates 0 10 20 30 40 50 60 70 80 90 100 5 38 42 79 90 119 122 144 196 310 429 435 450 510 583 589 608 615 1067 1968 2137 2247 2773 3643 3648 3665 3768 3796 3801 3810 3817 4622 4940 4958 5135 5433 5436 5485 5579 5583 5590 5599 6285 HITRATE(%) TIME 0.5MB 4MB 512MB L2P Table Size Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform SYSMark 2014 Hit Rates for Various L2P Table Sizes
  • 19. Santa Clara, CA—August 2016 19 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% HITRATE(%) TIME 0.5MB 4MB 256MB 4MB L2P Table Size Drives Higher than 90% Hit Rates Trace period 21 days. Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform Real User (Developer Profile) Hit Rates for Various L2P Table Sizes
  • 20. Santa Clara, CA—August 2016 20 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB HITRATE(%) L2P SIZE PCMARK VANTAGE PCMark 8 SYSMark 2014 MobileMark 2014 Copy Files and Folders 11.6GB Corporate Profile Developer Profile 4GB Logical Range Coverage (4MB L2P Table Size) Provides 95%+ Average Hit Rate for Benchmarks and Workloads Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform Hit Rates in L2P Table Sizes for Various Workloads
  • 21. Santa Clara, CA—August 2016 21 Source: SanDisk Technical Marketing research lab; Simulation results based on traces from: Intel Skylake, Intel Core i7, 8GB RAM, Microsoft Windows 10 Pro x64 Platform IOMeter 40GB LBA range, SW SR RW RR 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 512MB 256MB 128MB 64MB 32MB 16MB 8MB 4MB 2MB 1MB 512KB 32KB HITRATE(%) L2P SIZE PCMARK VANTAGE PCMark 8 SYSMark 2014 MobileMark 2014 Copy Files and Folders 11.6GB Corporate Profile Developer Profile IOMeter Synthetic workload is not a reflection of typical client Hit Rates in L2P Table Sizes for Various Workloads
  • 22. Locality of Workloads Locality represents the required L2P table size that enables 90% hit rate for read pattern 1 10 100 1,000 1 10 100 1,000 TotalReads(GB) L2P Size (MB) CDM 1GB CDM 4GB PCMark 8 Copy Files and Folders IOMeter Full Range PCMark Vantage Office Productivity Media Creation
  • 23. Additional Optimizations for Client SSD • Eliminating DRAM component enables • Higher density on single side M.2 • Power savings • Cost optimization M.2 2280 NAND DDR CTRL NAND NAND NAND Controller M.2 2280 NAND NAND NAND NANDDDR X
  • 24. Santa Clara, CA—August 2016 24 Summary  Client workloads are bursty – SLC caching is appropriate  Client workloads are highly localized • Windows productivity applications • PCMark / Sysmark are good representatives for locality of user applications • Full range logical test area is not a reflection of client workloads  A 4GB logical mapping range is the optimal cost/performance point

Editor's Notes

  1. Explain only the address according to operation chart
  2. Overview of CDM benchmark
  3. PCMV benchmark Loality Burst concept
  4. Locality – need to zoom in Burst behavior
  5. Locality – how do we proceed from here?