SlideShare a Scribd company logo
1 of 37
Download to read offline
Performance evaluation of Linux Discard
Support
(Overview, benchmark results, current status)

Red Hat
Luk´ˇ Czerner
   as
May 27, 2011
Copyright ©    2011 Luk´ˇ Czerner, Red Hat.
                         as
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation
License, Version 1.3 or any later version published by the Free
Software Foundation; with no Invariant Sections, no Front-Cover
Texts, and no Back-Cover Texts. A copy of the license is included
in the COPYING file.
Part I
Discard Background
Agenda



1 SSD Description



2 Thinly Provisioned Storage



3 Introducing Linux Discard Support
SSD Description




Solid-State Drive




   Flash memory block device
   Wear-leveling needed
   Firmware = black box
SSD Description




ATA TRIM Command


  Helps handle garbage collection overhead
  Subsequent READ of TRIM’ed blocks
    1   Read data should NOT change between READ’s
    2   Read data should NOT be retrieved from data previously
        written to any other LBA.
  As long as the device has enough free pages to write to we do
  not necessarily need it.
  In a nutshell: TRIM command tells the device what LBA’a is
  not used by the OS anymore.
Thinly Provisioned Storage




Thin Provisioning




   Unlike in traditional storage, there is no fixed one-to-one
   logical bock to physical storage mapping
   More efficient use of storage space
   Block reclamation interface needed
Thinly Provisioned Storage




SCSI UNMAP / WRITE SAME



  Storage space reclamation interface
  Subsequent READ of unmapped blocks
    1   Read data should NOT change between READ’s
    2   Read data should NOT be retrieved from data previously
        written to any other LBA.
  Unlike with SSD’s we can not afford to wait until we run out
  of space for reclamation.
Introducing Linux Discard Support




Linux Discard Implementation



   Abstraction for the two underlying specifications:
     1   ATA TRIM Command
     2   SCSI UNMAP / WRITE SAME
   API for user-space
         BLKDISCARD ioctl
         Added with v2.6.27-rc9-30-gd30a260
   API for File Sytems
     1   sb issue discard()
     2   blkdev issue discard()
Part II
Discard Performance
Agenda




4 Testing Methodology



5 Results
Testing Methodology




What do we need to find out ?



   Does discard really work ? Is it reliable ?
   How fast/slow is it ?
   Is there any difference between devices from different vendors
   ?
   What is the ideal discard size ?
   SSD performance degradation
Testing Methodology




How do we test it ?


   BLKDISCARD ioctl()
   Automatic discard of different ranges
   Different discard patterns
     1   sequential performance
     2   random IO peformance
     3   discard already discarded blocks
   test-discard - discard benchmarking tool
         http://sourceforge.net/projects/test-discard/
   impression - filesystem aging tool
Results




Sequential discard performance
               300                                          1200
                                     Duration Summary
                                            Throughput

               250                                          1000


               200                                          800




                                                                   Throughput [MB/s]
Duration [s]




               150                                          600


               100                                          400


               50                                           200


                0                                            0
                     10              100                 1000
                          Record size [kB]
Results




Different modes comparison
                    1200
                                          Throughput (sequential)
                                          Throughput (random IO)
                                            Throughput (discard2)
                    1000


                     800
Throughput [MB/s]




                     600


                     400


                     200


                      0
                           10               100                     1000
                                Record size [kB]
Results




Difference between various vendors
                    1200
                                             Throughput Vendor 1
                                             Throughput Vendor 2
                                             Throughput Vendor 3
                    1000


                     800
Throughput [MB/s]




                     600


                     400


                     200


                      0
                           10               100                    1000
                                Record size [kB]
Results




SSD performance degradation
                            70
                                                                             Vendor1
                                                                             Vendor2
                                                                             Vendor3
                            60                                         Vendor3 discard
 Write performance [MB/s]




                            50


                            40


                            30


                            20


                            10



                                 0   50   100    150      200       250     300      350   400
                                                Filesystem saturation [%]
Part III
Discard Support for Linux File systems
Agenda



6 Periodic Discard



7 Discard Batching



8 Different Approach
Periodic Discard




Periodic discard



   Easy to implement
   File system support
     1   ext4 (v2.6.27-5185-g8a0aba7)
     2   btrfs (since upstream)
     3   gfs2 (v2.6.29-9-gf15ab56)
     4   fat, swap, nilfs
   mount -o discard /dev/sdc /mnt/test
   TRIM is non-queueable command - implications ?
Periodic Discard




Benchmarking periodic discard



   Expectations ?
   Testing methodology
     1   Metadata intensive load
     2   Load with removing files
     3   Reasonable file size distribution
   Discard-kit
     1   Using PostMark
     2   http://sourceforge.net/projects/test-discard/files/
Periodic Discard




Ext4 performance (18% hit)
               800
                                       Deleted/s
                                       Append/s
                                         Read/s
               700               Files-created/s
                                 Transactions/s
                                      Read[B/s]
               600                    Write[B/s]
 Opetation/s




               500


               400


               300


               200


               100
                     nodiscard       discard
Periodic Discard




Performance with various file systems
                   750
                                                                    ext4
                   700                                              btrfs

                   650                                              gfs2
  Transactions/s




                   600

                   550

                   500

                   450

                   400

                   350
                         no


                                   di




                                              no


                                                        di




                                                                   no


                                                                             di
                                     sc




                                                          sc




                                                                               sc
                          di




                                               di




                                                                    di
                                      ar




                                                           ar




                                                                                ar
                            sc




                                                 sc




                                                                      sc
                                          d




                                                               d




                                                                                    d
                              ar




                                                   ar




                                                                        ar
                                 d




                                                      d




                          -18%                  -7%                 -63%   d
Discard Batching




Discard Batching - The idea



   Fine-grained discard is not necessarily needed
   Small extents are slow
   With time, freed extents tends to coalesce
   Disadvantages
     1 There is a price for tracking freed extents
     2 Discarding already discarded blocks should be easy, but...
     3 Daemon (in-kernel, user-space) needed.
     4 File system independent solution would most likely be pain to
       do right (if possible).
Discard Batching




Batched discard support



   File system specific solution
   Provide ioctl() interface - FITRIM
   Do not disturb other ongoing IO too much
     1   Prevent allocations while trimming
     2   How to handle huge filesystem ?
   File system support
     1   ext4 (v2.6.36-rc6-35-g7360d17)
     2   ext3 (v2.6.37-11-g9c52749)
     3   xfs (v2.6.37-rc4-63-ga46db60)
Discard Batching




FITRIM ioctl


   Ioctl with one RW parameter defined in linux/fs.h
   struct fstrim range {
     u64 start;
     u64 len;
     u64 minlen;
   }
   fstrim tool
        http://sourceforge.net/projects/fstrim/
   util-linux-ng
        Since v2.18-165-gd9e2d0d
Discard Batching




Batched discard benchmark results
                 6
                                                      FITRIM on ext4
                                                       BLKDISCARD

                 5


                 4
  Duration [s]




                 3


                 2


                 1


                 0
                     0   20       40             60           80         100
                              Filesystem saturation [%]
Different Approach




Alternative approach




   It is always a compromise
   The future of SSD’s and thinly provisioned LUN’s (???)
Part IV
Discard Support in user-space
Agenda




9 e2fsprogs



10 Other utilities
e2fsprogs




Discard in e2fsprogs tools

   Using BLKDISCARD ioctl()
   mke2fs
     1   Refresh SSD’s garbage collector
     2   discard zeroes data - significant speed boost
     3   mkfs.ext4 -E discard /dev/sdc
   e2fsck
     1   After the last check discard free space
     2   Non detected file system errors ? oops
     3   fsck.ext4 -E discard /dev/sdc
   resize2fs
     1   Refresh SSD’s garbage collector
     2   discard zeroes data - significant speed boost
     3   resize2fs -E discard /dev/sdc
e2fsprogs




File system creation
               30
                                         nodiscard
                                           discard

               25


               20
Duration [s]




               15


               10


               5


               0
                    EXT4                 XFS
                           File system
Other utilities




Fstrim tool




   Very simple tool to invoke FITRIM ioctl on mounted file
   system
   Stand-alone tool
       http://sourceforge.net/projects/fstrim/
   Since v2.18-165-gd9e2d0d part of util-linux-ng
Part V
Summary
Summary
  Linux Discard support is a abstraction for underlying
  specification
  Exported via BLKDISCARD ioctl to user-space and
  blkdev issue discard() for filesystems
  Discard testing kit (Discard-kit)
    1   test-discard
    2   PostMark
  Filesystem support
    1   Fine grained (online) discard - mount -o discard
    2   Batched discard support - fstrim from util-linux-ng
  Support in user-space utilities
    1   Filesystem creation (mkfs)
    2   e2fsprogs - mkfs,e2fsck,resize2fs
    3   xfsprogs - mkfs
    4   fstrim
The end.
Thanks for listening.
Useful links




   http://sourceforge.net/projects/fstrim/
   http://sourceforge.net/projects/test-discard/
   http://people.redhat.com/lczerner/discard/

More Related Content

Similar to Linux Discard Support Performance Evaluation

Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...Enkitec
 
Presentation sun storage tek™ 2500 series
Presentation   sun storage tek™ 2500 seriesPresentation   sun storage tek™ 2500 series
Presentation sun storage tek™ 2500 seriesxKinAnx
 
Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...Alpen-Adria-Universität
 
Practical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iPractical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iCOMMON Europe
 
An introduction and evaluations of a wide area distributed storage system
An introduction and evaluations of  a wide area distributed storage systemAn introduction and evaluations of  a wide area distributed storage system
An introduction and evaluations of a wide area distributed storage systemHiroki Kashiwazaki
 
Présentation Aerys
Présentation Aerys Présentation Aerys
Présentation Aerys iCOMMUNITY
 
Diamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheetDiamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheetceed2013
 
Perf Storwize V7000 Eng
Perf Storwize V7000 EngPerf Storwize V7000 Eng
Perf Storwize V7000 EngOleg Korol
 
How to Create a High-Speed Template Engine in Python
How to Create a High-Speed Template Engine in PythonHow to Create a High-Speed Template Engine in Python
How to Create a High-Speed Template Engine in Pythonkwatch
 
Toshiba Storage Protfolio 2014
Toshiba Storage Protfolio 2014Toshiba Storage Protfolio 2014
Toshiba Storage Protfolio 2014Mustafa Kuğu
 
Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)inside-BigData.com
 
M series technical presentation-part 1
M series technical presentation-part 1M series technical presentation-part 1
M series technical presentation-part 1xKinAnx
 
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...CAST, Inc.
 
IBM Solid State in eX5 servers
IBM Solid State in eX5 serversIBM Solid State in eX5 servers
IBM Solid State in eX5 serversTony Pearson
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...jins0618
 

Similar to Linux Discard Support Performance Evaluation (20)

Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
 
Presentation sun storage tek™ 2500 series
Presentation   sun storage tek™ 2500 seriesPresentation   sun storage tek™ 2500 series
Presentation sun storage tek™ 2500 series
 
Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...
 
SPC BENCHMARK 2/ENERGY™ EXECUTIVE SUMMARY
SPC BENCHMARK 2/ENERGY™  EXECUTIVE SUMMARYSPC BENCHMARK 2/ENERGY™  EXECUTIVE SUMMARY
SPC BENCHMARK 2/ENERGY™ EXECUTIVE SUMMARY
 
VyattaCore TIPS2013
VyattaCore TIPS2013VyattaCore TIPS2013
VyattaCore TIPS2013
 
Windows Azure Storage – Architecture View
Windows Azure Storage – Architecture ViewWindows Azure Storage – Architecture View
Windows Azure Storage – Architecture View
 
Practical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iPractical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM i
 
An introduction and evaluations of a wide area distributed storage system
An introduction and evaluations of  a wide area distributed storage systemAn introduction and evaluations of  a wide area distributed storage system
An introduction and evaluations of a wide area distributed storage system
 
Présentation Aerys
Présentation Aerys Présentation Aerys
Présentation Aerys
 
Diamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheetDiamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheet
 
Perf Storwize V7000 Eng
Perf Storwize V7000 EngPerf Storwize V7000 Eng
Perf Storwize V7000 Eng
 
How to Create a High-Speed Template Engine in Python
How to Create a High-Speed Template Engine in PythonHow to Create a High-Speed Template Engine in Python
How to Create a High-Speed Template Engine in Python
 
Toshiba Storage Protfolio 2014
Toshiba Storage Protfolio 2014Toshiba Storage Protfolio 2014
Toshiba Storage Protfolio 2014
 
Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)
 
M series technical presentation-part 1
M series technical presentation-part 1M series technical presentation-part 1
M series technical presentation-part 1
 
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
 
Sending Push Notifications using the Windows Push Notification Service and Wi...
Sending Push Notifications using the Windows Push Notification Service and Wi...Sending Push Notifications using the Windows Push Notification Service and Wi...
Sending Push Notifications using the Windows Push Notification Service and Wi...
 
IBM Solid State in eX5 servers
IBM Solid State in eX5 serversIBM Solid State in eX5 servers
IBM Solid State in eX5 servers
 
Linux on System z disk I/O performance
Linux on System z disk I/O performanceLinux on System z disk I/O performance
Linux on System z disk I/O performance
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Linux Discard Support Performance Evaluation

  • 1. Performance evaluation of Linux Discard Support (Overview, benchmark results, current status) Red Hat Luk´ˇ Czerner as May 27, 2011
  • 2. Copyright © 2011 Luk´ˇ Czerner, Red Hat. as Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the COPYING file.
  • 4. Agenda 1 SSD Description 2 Thinly Provisioned Storage 3 Introducing Linux Discard Support
  • 5. SSD Description Solid-State Drive Flash memory block device Wear-leveling needed Firmware = black box
  • 6. SSD Description ATA TRIM Command Helps handle garbage collection overhead Subsequent READ of TRIM’ed blocks 1 Read data should NOT change between READ’s 2 Read data should NOT be retrieved from data previously written to any other LBA. As long as the device has enough free pages to write to we do not necessarily need it. In a nutshell: TRIM command tells the device what LBA’a is not used by the OS anymore.
  • 7. Thinly Provisioned Storage Thin Provisioning Unlike in traditional storage, there is no fixed one-to-one logical bock to physical storage mapping More efficient use of storage space Block reclamation interface needed
  • 8. Thinly Provisioned Storage SCSI UNMAP / WRITE SAME Storage space reclamation interface Subsequent READ of unmapped blocks 1 Read data should NOT change between READ’s 2 Read data should NOT be retrieved from data previously written to any other LBA. Unlike with SSD’s we can not afford to wait until we run out of space for reclamation.
  • 9. Introducing Linux Discard Support Linux Discard Implementation Abstraction for the two underlying specifications: 1 ATA TRIM Command 2 SCSI UNMAP / WRITE SAME API for user-space BLKDISCARD ioctl Added with v2.6.27-rc9-30-gd30a260 API for File Sytems 1 sb issue discard() 2 blkdev issue discard()
  • 12. Testing Methodology What do we need to find out ? Does discard really work ? Is it reliable ? How fast/slow is it ? Is there any difference between devices from different vendors ? What is the ideal discard size ? SSD performance degradation
  • 13. Testing Methodology How do we test it ? BLKDISCARD ioctl() Automatic discard of different ranges Different discard patterns 1 sequential performance 2 random IO peformance 3 discard already discarded blocks test-discard - discard benchmarking tool http://sourceforge.net/projects/test-discard/ impression - filesystem aging tool
  • 14. Results Sequential discard performance 300 1200 Duration Summary Throughput 250 1000 200 800 Throughput [MB/s] Duration [s] 150 600 100 400 50 200 0 0 10 100 1000 Record size [kB]
  • 15. Results Different modes comparison 1200 Throughput (sequential) Throughput (random IO) Throughput (discard2) 1000 800 Throughput [MB/s] 600 400 200 0 10 100 1000 Record size [kB]
  • 16. Results Difference between various vendors 1200 Throughput Vendor 1 Throughput Vendor 2 Throughput Vendor 3 1000 800 Throughput [MB/s] 600 400 200 0 10 100 1000 Record size [kB]
  • 17. Results SSD performance degradation 70 Vendor1 Vendor2 Vendor3 60 Vendor3 discard Write performance [MB/s] 50 40 30 20 10 0 50 100 150 200 250 300 350 400 Filesystem saturation [%]
  • 18. Part III Discard Support for Linux File systems
  • 19. Agenda 6 Periodic Discard 7 Discard Batching 8 Different Approach
  • 20. Periodic Discard Periodic discard Easy to implement File system support 1 ext4 (v2.6.27-5185-g8a0aba7) 2 btrfs (since upstream) 3 gfs2 (v2.6.29-9-gf15ab56) 4 fat, swap, nilfs mount -o discard /dev/sdc /mnt/test TRIM is non-queueable command - implications ?
  • 21. Periodic Discard Benchmarking periodic discard Expectations ? Testing methodology 1 Metadata intensive load 2 Load with removing files 3 Reasonable file size distribution Discard-kit 1 Using PostMark 2 http://sourceforge.net/projects/test-discard/files/
  • 22. Periodic Discard Ext4 performance (18% hit) 800 Deleted/s Append/s Read/s 700 Files-created/s Transactions/s Read[B/s] 600 Write[B/s] Opetation/s 500 400 300 200 100 nodiscard discard
  • 23. Periodic Discard Performance with various file systems 750 ext4 700 btrfs 650 gfs2 Transactions/s 600 550 500 450 400 350 no di no di no di sc sc sc di di di ar ar ar sc sc sc d d d ar ar ar d d -18% -7% -63% d
  • 24. Discard Batching Discard Batching - The idea Fine-grained discard is not necessarily needed Small extents are slow With time, freed extents tends to coalesce Disadvantages 1 There is a price for tracking freed extents 2 Discarding already discarded blocks should be easy, but... 3 Daemon (in-kernel, user-space) needed. 4 File system independent solution would most likely be pain to do right (if possible).
  • 25. Discard Batching Batched discard support File system specific solution Provide ioctl() interface - FITRIM Do not disturb other ongoing IO too much 1 Prevent allocations while trimming 2 How to handle huge filesystem ? File system support 1 ext4 (v2.6.36-rc6-35-g7360d17) 2 ext3 (v2.6.37-11-g9c52749) 3 xfs (v2.6.37-rc4-63-ga46db60)
  • 26. Discard Batching FITRIM ioctl Ioctl with one RW parameter defined in linux/fs.h struct fstrim range { u64 start; u64 len; u64 minlen; } fstrim tool http://sourceforge.net/projects/fstrim/ util-linux-ng Since v2.18-165-gd9e2d0d
  • 27. Discard Batching Batched discard benchmark results 6 FITRIM on ext4 BLKDISCARD 5 4 Duration [s] 3 2 1 0 0 20 40 60 80 100 Filesystem saturation [%]
  • 28. Different Approach Alternative approach It is always a compromise The future of SSD’s and thinly provisioned LUN’s (???)
  • 29. Part IV Discard Support in user-space
  • 31. e2fsprogs Discard in e2fsprogs tools Using BLKDISCARD ioctl() mke2fs 1 Refresh SSD’s garbage collector 2 discard zeroes data - significant speed boost 3 mkfs.ext4 -E discard /dev/sdc e2fsck 1 After the last check discard free space 2 Non detected file system errors ? oops 3 fsck.ext4 -E discard /dev/sdc resize2fs 1 Refresh SSD’s garbage collector 2 discard zeroes data - significant speed boost 3 resize2fs -E discard /dev/sdc
  • 32. e2fsprogs File system creation 30 nodiscard discard 25 20 Duration [s] 15 10 5 0 EXT4 XFS File system
  • 33. Other utilities Fstrim tool Very simple tool to invoke FITRIM ioctl on mounted file system Stand-alone tool http://sourceforge.net/projects/fstrim/ Since v2.18-165-gd9e2d0d part of util-linux-ng
  • 35. Summary Linux Discard support is a abstraction for underlying specification Exported via BLKDISCARD ioctl to user-space and blkdev issue discard() for filesystems Discard testing kit (Discard-kit) 1 test-discard 2 PostMark Filesystem support 1 Fine grained (online) discard - mount -o discard 2 Batched discard support - fstrim from util-linux-ng Support in user-space utilities 1 Filesystem creation (mkfs) 2 e2fsprogs - mkfs,e2fsck,resize2fs 3 xfsprogs - mkfs 4 fstrim
  • 36. The end. Thanks for listening.
  • 37. Useful links http://sourceforge.net/projects/fstrim/ http://sourceforge.net/projects/test-discard/ http://people.redhat.com/lczerner/discard/