Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Brought to you by
Vanquishing Latency Outliers in
the Lightbits LightOS Software
Defined Storage System
Abel Gordon
Chief S...
Agenda
■ Introduction
■ Storage disaggregation: why ? latency is king!
■ Latency challenges: flash, network and management
...
Introduction
■ Chief System Architect, Lightbits Labs
■ Storage, network, I/O performance,
I/O virtualization, memory over...
Storage disaggregation: why ?
■ I/O intensive Applications
● Require high bandwidth, but 3GB/s-5GB/s is
usually sufficient
●...
Storage disaggregation: latency is king!
■ Real solution: storage disaggregation
● Pool NVMe flash together and share it ac...
Latency challenges
■ Flash (NAND) media
● Reads in presence of writes
● Garbage collection
● Read-modify-writes for data p...
Lightbits LightOS
LightOS Backend: Intelligent Flash Management
LightOS Frontend: NVMe/TCP Target
TCP/IP
NVMe Replication ...
Lightbits LightOS: controlling flash latency
■ Intelligent Flash Management
○ Stripes writes across all local SSDs
○ Append...
Lightbits LightOS: controlling flash latency
■ Write buffer
○ Integrated within the Front-end
○ Optionally persisted using ...
Lightbits LightOS: controlling network latency
■ Lightbits front-end implemented using Seastar
○ Fully sharded, dedicated ...
Lightbits LightOS: isolated management
■ Tens of managements operation per second are expected at scale
■ Management opera...
Canonical latency
■ End-to-end, measured using FIO in the client
■ Canonical read latency (usecs)
■ Canonical write latenc...
Latency at load
■ End-to-end, measured using FIO in the clients with 3 servers LightOS cluster
■ Read latency, > 1.5M 4KB ...
NVMe/TCP scalability
Management isolation
■ Comparing data path
performance with and
without management
operations
■ Different workloads
■ Tens...
Conclusion
■ NAND flash latency is not predictable. Write-buffer, NAND friendly write
strategy and read-write isolation are...
Brought to you by
Abel Gordon
abel@lightbitslabs.com
https://www.lightbitslabs.com/careers/
Upcoming SlideShare
Loading in …5
×

of

Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 1 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 2 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 3 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 4 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 5 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 6 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 7 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 8 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 9 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 10 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 11 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 12 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 13 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 14 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 15 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 16 Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Slide 17
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System

Download to read offline

Overview on how LIghtbits LightOS improves latency of high performance low latency NVMe based storage accessed over standard TCP/IP network

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System

  1. 1. Brought to you by Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storage System Abel Gordon Chief System Architect at
  2. 2. Agenda ■ Introduction ■ Storage disaggregation: why ? latency is king! ■ Latency challenges: flash, network and management ■ Keeping latency under control with Lightbits LightOS ■ Performance measurements ■ Conclusion
  3. 3. Introduction ■ Chief System Architect, Lightbits Labs ■ Storage, network, I/O performance, I/O virtualization, memory over-commitment @ Lightbits, Stratoscale and IBM Research ■ Curious about my previous works and publications ? Click here
  4. 4. Storage disaggregation: why ? ■ I/O intensive Applications ● Require high bandwidth, but 3GB/s-5GB/s is usually sufficient ● Low latency, few hundreds microseconds ■ Easy but wasteful (expensive) solution: Local (NVMe) flash ● Serves are over-provisioned with NVMe flash to handle pick load ● Unused IOPs and/or capacity can not be shared across servers/applications CAPACITY UTILIZATION 15-25% PERFORMANCE UTILIZATION 50%
  5. 5. Storage disaggregation: latency is king! ■ Real solution: storage disaggregation ● Pool NVMe flash together and share it across servers/applications ● Access over existing ethernet/IP networks ■ Challenges ● Low latency, close to direct-attached storage ● High IOPs ● Scalable
  6. 6. Latency challenges ■ Flash (NAND) media ● Reads in presence of writes ● Garbage collection ● Read-modify-writes for data protection (RAID) or high capacity SSDs (QLC) ● Many (sequential) write streams from multiple clients lose context and behave like random writes ■ Network (HW and SW) ● Storage transport/protocol ● Software networking stack ● OS scheduling ● Interrupts ■ Management ● Management operations (create/delete/update volumes) can affect data-path and create latency spikes
  7. 7. Lightbits LightOS LightOS Backend: Intelligent Flash Management LightOS Frontend: NVMe/TCP Target TCP/IP NVMe Replication Service Cluster Replication Write Buffer Resizable Logical Volumes Thin Provisioning Data Reduction (Compression) SSD Hot-swap (add / remove) SSD optimized I/O Placement Endurance Optimizer Flash Error Detect/Fix/Rebuild Erasure Coding Automatic Rebalancing Snapshots and Thin Clones Control Plane: Scalable Management and Cluster Services ETCd REST API Prometheus Automated Management
  8. 8. Lightbits LightOS: controlling flash latency ■ Intelligent Flash Management ○ Stripes writes across all local SSDs ○ Append-only write strategy, maximum bandwidth lower latency ○ Software-based garbage collection ■ No garbage collection caused by the SSDs’ FTL (WAF close to 1) ■ Software decides when and how to do garbage collection ○ No reads during writes for data protection, data reduction or using (QLC) SSDs with IU > 4KB ■ No writes in-place ■ No read-modify-writes operations ○ Single IOP per read ■ No need to read from flash to access meta-data ○ Separated read/write pipelines
  9. 9. Lightbits LightOS: controlling flash latency ■ Write buffer ○ Integrated within the Front-end ○ Optionally persisted using NVDIMM or DCPMM (Optane DIMMs) ○ Ack to clients once data is in write-buffer: flash write latency is hidden from the application ○ Allows controlling how and when data is written to flash by the back-end ○ Flow control is a must-have (write buffer can be faster than SSDs)
  10. 10. Lightbits LightOS: controlling network latency ■ Lightbits front-end implemented using Seastar ○ Fully sharded, dedicated CPUs, lock-less ■ NIC interrupt control ○ ADQ for E800 NICs (interrupt-less using standard Linux TCP/IP stack) ○ Interrupt affinity to specific CPUs ■ Transport: NVMe/TCP ○ Scalable with number of cores and network queues ■ each CPU shard manages a set of NVMe/TCP data queues (TCP/IP sockets) ○ Works on existing high bandwidth ethernet/IP networks ■ Separated write and read processing pipelines
  11. 11. Lightbits LightOS: isolated management ■ Tens of managements operation per second are expected at scale ■ Management operations might create data-path latency spikes ■ LightOS isolates management from data-path ○ Separated management processes running on dedicated CPUs ○ Async communication with data-path via shared memory queues ○ Lock-less interaction between management and data path ○ Lightweight and high performance management code written in high-level multi-core async oriented programming language (go-lang)
  12. 12. Canonical latency ■ End-to-end, measured using FIO in the client ■ Canonical read latency (usecs) ■ Canonical write latency (usecs) Avg Tail (99) Tail (99.99) 150 190 400 Avg Tail (99) Tail (99.99) 90 105 700
  13. 13. Latency at load ■ End-to-end, measured using FIO in the clients with 3 servers LightOS cluster ■ Read latency, > 1.5M 4KB random IOPs (usecs) ■ Write latency, > 1.5M 4KB random IOPs (usecs) Avg Tail (99) Tail (99.99) 170 310 995 Avg Tail (99) Tail (99.99) 125 390 900
  14. 14. NVMe/TCP scalability
  15. 15. Management isolation ■ Comparing data path performance with and without management operations ■ Different workloads ■ Tens of management operations per second
  16. 16. Conclusion ■ NAND flash latency is not predictable. Write-buffer, NAND friendly write strategy and read-write isolation are required to reduce latency ■ NVMe/TCP socket per queue model is scalable with number of cores ■ CPU core dedication and sharding are important to reduce latency ■ NIC interrupts must be properly managed to maintain consistent latency ■ Management must interact efficiently with the data path to avoid latency spikes during management operations
  17. 17. Brought to you by Abel Gordon abel@lightbitslabs.com https://www.lightbitslabs.com/careers/

Overview on how LIghtbits LightOS improves latency of high performance low latency NVMe based storage accessed over standard TCP/IP network

Views

Total views

324

On Slideshare

0

From embeds

0

Number of embeds

223

Actions

Downloads

6

Shares

0

Comments

0

Likes

0

×