SlideShare a Scribd company logo
1 of 37
Download to read offline
Performance evaluation of Linux Discard
Support
(Overview, benchmark results, current status)

Red Hat
Luk´ˇ Czerner
   as
May 27, 2011
Copyright ©    2011 Luk´ˇ Czerner, Red Hat.
                         as
Permission is granted to copy, distribute and/or modify this
document under the terms of the GNU Free Documentation
License, Version 1.3 or any later version published by the Free
Software Foundation; with no Invariant Sections, no Front-Cover
Texts, and no Back-Cover Texts. A copy of the license is included
in the COPYING file.
Part I
Discard Background
Agenda



1 SSD Description



2 Thinly Provisioned Storage



3 Introducing Linux Discard Support
SSD Description




Solid-State Drive




   Flash memory block device
   Wear-leveling needed
   Firmware = black box
SSD Description




ATA TRIM Command


  Helps handle garbage collection overhead
  Subsequent READ of TRIM’ed blocks
    1   Read data should NOT change between READ’s
    2   Read data should NOT be retrieved from data previously
        written to any other LBA.
  As long as the device has enough free pages to write to we do
  not necessarily need it.
  In a nutshell: TRIM command tells the device what LBA’a is
  not used by the OS anymore.
Thinly Provisioned Storage




Thin Provisioning




   Unlike in traditional storage, there is no fixed one-to-one
   logical bock to physical storage mapping
   More efficient use of storage space
   Block reclamation interface needed
Thinly Provisioned Storage




SCSI UNMAP / WRITE SAME



  Storage space reclamation interface
  Subsequent READ of unmapped blocks
    1   Read data should NOT change between READ’s
    2   Read data should NOT be retrieved from data previously
        written to any other LBA.
  Unlike with SSD’s we can not afford to wait until we run out
  of space for reclamation.
Introducing Linux Discard Support




Linux Discard Implementation



   Abstraction for the two underlying specifications:
     1   ATA TRIM Command
     2   SCSI UNMAP / WRITE SAME
   API for user-space
         BLKDISCARD ioctl
         Added with v2.6.27-rc9-30-gd30a260
   API for File Sytems
     1   sb issue discard()
     2   blkdev issue discard()
Part II
Discard Performance
Agenda




4 Testing Methodology



5 Results
Testing Methodology




What do we need to find out ?



   Does discard really work ? Is it reliable ?
   How fast/slow is it ?
   Is there any difference between devices from different vendors
   ?
   What is the ideal discard size ?
   SSD performance degradation
Testing Methodology




How do we test it ?


   BLKDISCARD ioctl()
   Automatic discard of different ranges
   Different discard patterns
     1   sequential performance
     2   random IO peformance
     3   discard already discarded blocks
   test-discard - discard benchmarking tool
         http://sourceforge.net/projects/test-discard/
   impression - filesystem aging tool
Results




Sequential discard performance
               300                                          1200
                                     Duration Summary
                                            Throughput

               250                                          1000


               200                                          800




                                                                   Throughput [MB/s]
Duration [s]




               150                                          600


               100                                          400


               50                                           200


                0                                            0
                     10              100                 1000
                          Record size [kB]
Results




Different modes comparison
                    1200
                                          Throughput (sequential)
                                          Throughput (random IO)
                                            Throughput (discard2)
                    1000


                     800
Throughput [MB/s]




                     600


                     400


                     200


                      0
                           10               100                     1000
                                Record size [kB]
Results




Difference between various vendors
                    1200
                                             Throughput Vendor 1
                                             Throughput Vendor 2
                                             Throughput Vendor 3
                    1000


                     800
Throughput [MB/s]




                     600


                     400


                     200


                      0
                           10               100                    1000
                                Record size [kB]
Results




SSD performance degradation
                            70
                                                                             Vendor1
                                                                             Vendor2
                                                                             Vendor3
                            60                                         Vendor3 discard
 Write performance [MB/s]




                            50


                            40


                            30


                            20


                            10



                                 0   50   100    150      200       250     300      350   400
                                                Filesystem saturation [%]
Part III
Discard Support for Linux File systems
Agenda



6 Periodic Discard



7 Discard Batching



8 Different Approach
Periodic Discard




Periodic discard



   Easy to implement
   File system support
     1   ext4 (v2.6.27-5185-g8a0aba7)
     2   btrfs (since upstream)
     3   gfs2 (v2.6.29-9-gf15ab56)
     4   fat, swap, nilfs
   mount -o discard /dev/sdc /mnt/test
   TRIM is non-queueable command - implications ?
Periodic Discard




Benchmarking periodic discard



   Expectations ?
   Testing methodology
     1   Metadata intensive load
     2   Load with removing files
     3   Reasonable file size distribution
   Discard-kit
     1   Using PostMark
     2   http://sourceforge.net/projects/test-discard/files/
Periodic Discard




Ext4 performance (18% hit)
               800
                                       Deleted/s
                                       Append/s
                                         Read/s
               700               Files-created/s
                                 Transactions/s
                                      Read[B/s]
               600                    Write[B/s]
 Opetation/s




               500


               400


               300


               200


               100
                     nodiscard       discard
Periodic Discard




Performance with various file systems
                   750
                                                                    ext4
                   700                                              btrfs

                   650                                              gfs2
  Transactions/s




                   600

                   550

                   500

                   450

                   400

                   350
                         no


                                   di




                                              no


                                                        di




                                                                   no


                                                                             di
                                     sc




                                                          sc




                                                                               sc
                          di




                                               di




                                                                    di
                                      ar




                                                           ar




                                                                                ar
                            sc




                                                 sc




                                                                      sc
                                          d




                                                               d




                                                                                    d
                              ar




                                                   ar




                                                                        ar
                                 d




                                                      d




                          -18%                  -7%                 -63%   d
Discard Batching




Discard Batching - The idea



   Fine-grained discard is not necessarily needed
   Small extents are slow
   With time, freed extents tends to coalesce
   Disadvantages
     1 There is a price for tracking freed extents
     2 Discarding already discarded blocks should be easy, but...
     3 Daemon (in-kernel, user-space) needed.
     4 File system independent solution would most likely be pain to
       do right (if possible).
Discard Batching




Batched discard support



   File system specific solution
   Provide ioctl() interface - FITRIM
   Do not disturb other ongoing IO too much
     1   Prevent allocations while trimming
     2   How to handle huge filesystem ?
   File system support
     1   ext4 (v2.6.36-rc6-35-g7360d17)
     2   ext3 (v2.6.37-11-g9c52749)
     3   xfs (v2.6.37-rc4-63-ga46db60)
Discard Batching




FITRIM ioctl


   Ioctl with one RW parameter defined in linux/fs.h
   struct fstrim range {
     u64 start;
     u64 len;
     u64 minlen;
   }
   fstrim tool
        http://sourceforge.net/projects/fstrim/
   util-linux-ng
        Since v2.18-165-gd9e2d0d
Discard Batching




Batched discard benchmark results
                 6
                                                      FITRIM on ext4
                                                       BLKDISCARD

                 5


                 4
  Duration [s]




                 3


                 2


                 1


                 0
                     0   20       40             60           80         100
                              Filesystem saturation [%]
Different Approach




Alternative approach




   It is always a compromise
   The future of SSD’s and thinly provisioned LUN’s (???)
Part IV
Discard Support in user-space
Agenda




9 e2fsprogs



10 Other utilities
e2fsprogs




Discard in e2fsprogs tools

   Using BLKDISCARD ioctl()
   mke2fs
     1   Refresh SSD’s garbage collector
     2   discard zeroes data - significant speed boost
     3   mkfs.ext4 -E discard /dev/sdc
   e2fsck
     1   After the last check discard free space
     2   Non detected file system errors ? oops
     3   fsck.ext4 -E discard /dev/sdc
   resize2fs
     1   Refresh SSD’s garbage collector
     2   discard zeroes data - significant speed boost
     3   resize2fs -E discard /dev/sdc
e2fsprogs




File system creation
               30
                                         nodiscard
                                           discard

               25


               20
Duration [s]




               15


               10


               5


               0
                    EXT4                 XFS
                           File system
Other utilities




Fstrim tool




   Very simple tool to invoke FITRIM ioctl on mounted file
   system
   Stand-alone tool
       http://sourceforge.net/projects/fstrim/
   Since v2.18-165-gd9e2d0d part of util-linux-ng
Part V
Summary
Summary
  Linux Discard support is a abstraction for underlying
  specification
  Exported via BLKDISCARD ioctl to user-space and
  blkdev issue discard() for filesystems
  Discard testing kit (Discard-kit)
    1   test-discard
    2   PostMark
  Filesystem support
    1   Fine grained (online) discard - mount -o discard
    2   Batched discard support - fstrim from util-linux-ng
  Support in user-space utilities
    1   Filesystem creation (mkfs)
    2   e2fsprogs - mkfs,e2fsck,resize2fs
    3   xfsprogs - mkfs
    4   fstrim
The end.
Thanks for listening.
Useful links




   http://sourceforge.net/projects/fstrim/
   http://sourceforge.net/projects/test-discard/
   http://people.redhat.com/lczerner/discard/

More Related Content

Similar to Performance evaluation of Linux Discard Support

Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Enkitec
 
Presentation sun storage tek™ 2500 series
Presentation   sun storage tek™ 2500 seriesPresentation   sun storage tek™ 2500 series
Presentation sun storage tek™ 2500 series
xKinAnx
 
Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...
Alpen-Adria-Universität
 
Présentation Aerys
Présentation Aerys Présentation Aerys
Présentation Aerys
iCOMMUNITY
 
Diamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheetDiamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheet
ceed2013
 

Similar to Performance evaluation of Linux Discard Support (20)

Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
Bottlenecks, Bottlenecks, and more Bottlenecks: Lessons Learned from 2 Years ...
 
Presentation sun storage tek™ 2500 series
Presentation   sun storage tek™ 2500 seriesPresentation   sun storage tek™ 2500 series
Presentation sun storage tek™ 2500 series
 
Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...Evaluating the networking performance of linux based home router platforms fo...
Evaluating the networking performance of linux based home router platforms fo...
 
SPC BENCHMARK 2/ENERGY™ EXECUTIVE SUMMARY
SPC BENCHMARK 2/ENERGY™  EXECUTIVE SUMMARYSPC BENCHMARK 2/ENERGY™  EXECUTIVE SUMMARY
SPC BENCHMARK 2/ENERGY™ EXECUTIVE SUMMARY
 
VyattaCore TIPS2013
VyattaCore TIPS2013VyattaCore TIPS2013
VyattaCore TIPS2013
 
Windows Azure Storage – Architecture View
Windows Azure Storage – Architecture ViewWindows Azure Storage – Architecture View
Windows Azure Storage – Architecture View
 
Practical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM iPractical experiences and best practices for SSD and IBM i
Practical experiences and best practices for SSD and IBM i
 
An introduction and evaluations of a wide area distributed storage system
An introduction and evaluations of  a wide area distributed storage systemAn introduction and evaluations of  a wide area distributed storage system
An introduction and evaluations of a wide area distributed storage system
 
Présentation Aerys
Présentation Aerys Présentation Aerys
Présentation Aerys
 
Diamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheetDiamondmax plus 8_data_sheet
Diamondmax plus 8_data_sheet
 
Perf Storwize V7000 Eng
Perf Storwize V7000 EngPerf Storwize V7000 Eng
Perf Storwize V7000 Eng
 
How to Create a High-Speed Template Engine in Python
How to Create a High-Speed Template Engine in PythonHow to Create a High-Speed Template Engine in Python
How to Create a High-Speed Template Engine in Python
 
Toshiba Storage Protfolio 2014
Toshiba Storage Protfolio 2014Toshiba Storage Protfolio 2014
Toshiba Storage Protfolio 2014
 
Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)Lustre client performance comparison and tuning (1.8.x to 2.x)
Lustre client performance comparison and tuning (1.8.x to 2.x)
 
M series technical presentation-part 1
M series technical presentation-part 1M series technical presentation-part 1
M series technical presentation-part 1
 
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
CPU Subsystem Total Power Consumption: Understanding the Factors and Selectin...
 
Sending Push Notifications using the Windows Push Notification Service and Wi...
Sending Push Notifications using the Windows Push Notification Service and Wi...Sending Push Notifications using the Windows Push Notification Service and Wi...
Sending Push Notifications using the Windows Push Notification Service and Wi...
 
IBM Solid State in eX5 servers
IBM Solid State in eX5 serversIBM Solid State in eX5 servers
IBM Solid State in eX5 servers
 
Linux on System z disk I/O performance
Linux on System z disk I/O performanceLinux on System z disk I/O performance
Linux on System z disk I/O performance
 
Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...Calton pu experimental methods on performance in cloud and accuracy in big da...
Calton pu experimental methods on performance in cloud and accuracy in big da...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Performance evaluation of Linux Discard Support

  • 1. Performance evaluation of Linux Discard Support (Overview, benchmark results, current status) Red Hat Luk´ˇ Czerner as May 27, 2011
  • 2. Copyright © 2011 Luk´ˇ Czerner, Red Hat. as Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the COPYING file.
  • 4. Agenda 1 SSD Description 2 Thinly Provisioned Storage 3 Introducing Linux Discard Support
  • 5. SSD Description Solid-State Drive Flash memory block device Wear-leveling needed Firmware = black box
  • 6. SSD Description ATA TRIM Command Helps handle garbage collection overhead Subsequent READ of TRIM’ed blocks 1 Read data should NOT change between READ’s 2 Read data should NOT be retrieved from data previously written to any other LBA. As long as the device has enough free pages to write to we do not necessarily need it. In a nutshell: TRIM command tells the device what LBA’a is not used by the OS anymore.
  • 7. Thinly Provisioned Storage Thin Provisioning Unlike in traditional storage, there is no fixed one-to-one logical bock to physical storage mapping More efficient use of storage space Block reclamation interface needed
  • 8. Thinly Provisioned Storage SCSI UNMAP / WRITE SAME Storage space reclamation interface Subsequent READ of unmapped blocks 1 Read data should NOT change between READ’s 2 Read data should NOT be retrieved from data previously written to any other LBA. Unlike with SSD’s we can not afford to wait until we run out of space for reclamation.
  • 9. Introducing Linux Discard Support Linux Discard Implementation Abstraction for the two underlying specifications: 1 ATA TRIM Command 2 SCSI UNMAP / WRITE SAME API for user-space BLKDISCARD ioctl Added with v2.6.27-rc9-30-gd30a260 API for File Sytems 1 sb issue discard() 2 blkdev issue discard()
  • 12. Testing Methodology What do we need to find out ? Does discard really work ? Is it reliable ? How fast/slow is it ? Is there any difference between devices from different vendors ? What is the ideal discard size ? SSD performance degradation
  • 13. Testing Methodology How do we test it ? BLKDISCARD ioctl() Automatic discard of different ranges Different discard patterns 1 sequential performance 2 random IO peformance 3 discard already discarded blocks test-discard - discard benchmarking tool http://sourceforge.net/projects/test-discard/ impression - filesystem aging tool
  • 14. Results Sequential discard performance 300 1200 Duration Summary Throughput 250 1000 200 800 Throughput [MB/s] Duration [s] 150 600 100 400 50 200 0 0 10 100 1000 Record size [kB]
  • 15. Results Different modes comparison 1200 Throughput (sequential) Throughput (random IO) Throughput (discard2) 1000 800 Throughput [MB/s] 600 400 200 0 10 100 1000 Record size [kB]
  • 16. Results Difference between various vendors 1200 Throughput Vendor 1 Throughput Vendor 2 Throughput Vendor 3 1000 800 Throughput [MB/s] 600 400 200 0 10 100 1000 Record size [kB]
  • 17. Results SSD performance degradation 70 Vendor1 Vendor2 Vendor3 60 Vendor3 discard Write performance [MB/s] 50 40 30 20 10 0 50 100 150 200 250 300 350 400 Filesystem saturation [%]
  • 18. Part III Discard Support for Linux File systems
  • 19. Agenda 6 Periodic Discard 7 Discard Batching 8 Different Approach
  • 20. Periodic Discard Periodic discard Easy to implement File system support 1 ext4 (v2.6.27-5185-g8a0aba7) 2 btrfs (since upstream) 3 gfs2 (v2.6.29-9-gf15ab56) 4 fat, swap, nilfs mount -o discard /dev/sdc /mnt/test TRIM is non-queueable command - implications ?
  • 21. Periodic Discard Benchmarking periodic discard Expectations ? Testing methodology 1 Metadata intensive load 2 Load with removing files 3 Reasonable file size distribution Discard-kit 1 Using PostMark 2 http://sourceforge.net/projects/test-discard/files/
  • 22. Periodic Discard Ext4 performance (18% hit) 800 Deleted/s Append/s Read/s 700 Files-created/s Transactions/s Read[B/s] 600 Write[B/s] Opetation/s 500 400 300 200 100 nodiscard discard
  • 23. Periodic Discard Performance with various file systems 750 ext4 700 btrfs 650 gfs2 Transactions/s 600 550 500 450 400 350 no di no di no di sc sc sc di di di ar ar ar sc sc sc d d d ar ar ar d d -18% -7% -63% d
  • 24. Discard Batching Discard Batching - The idea Fine-grained discard is not necessarily needed Small extents are slow With time, freed extents tends to coalesce Disadvantages 1 There is a price for tracking freed extents 2 Discarding already discarded blocks should be easy, but... 3 Daemon (in-kernel, user-space) needed. 4 File system independent solution would most likely be pain to do right (if possible).
  • 25. Discard Batching Batched discard support File system specific solution Provide ioctl() interface - FITRIM Do not disturb other ongoing IO too much 1 Prevent allocations while trimming 2 How to handle huge filesystem ? File system support 1 ext4 (v2.6.36-rc6-35-g7360d17) 2 ext3 (v2.6.37-11-g9c52749) 3 xfs (v2.6.37-rc4-63-ga46db60)
  • 26. Discard Batching FITRIM ioctl Ioctl with one RW parameter defined in linux/fs.h struct fstrim range { u64 start; u64 len; u64 minlen; } fstrim tool http://sourceforge.net/projects/fstrim/ util-linux-ng Since v2.18-165-gd9e2d0d
  • 27. Discard Batching Batched discard benchmark results 6 FITRIM on ext4 BLKDISCARD 5 4 Duration [s] 3 2 1 0 0 20 40 60 80 100 Filesystem saturation [%]
  • 28. Different Approach Alternative approach It is always a compromise The future of SSD’s and thinly provisioned LUN’s (???)
  • 29. Part IV Discard Support in user-space
  • 31. e2fsprogs Discard in e2fsprogs tools Using BLKDISCARD ioctl() mke2fs 1 Refresh SSD’s garbage collector 2 discard zeroes data - significant speed boost 3 mkfs.ext4 -E discard /dev/sdc e2fsck 1 After the last check discard free space 2 Non detected file system errors ? oops 3 fsck.ext4 -E discard /dev/sdc resize2fs 1 Refresh SSD’s garbage collector 2 discard zeroes data - significant speed boost 3 resize2fs -E discard /dev/sdc
  • 32. e2fsprogs File system creation 30 nodiscard discard 25 20 Duration [s] 15 10 5 0 EXT4 XFS File system
  • 33. Other utilities Fstrim tool Very simple tool to invoke FITRIM ioctl on mounted file system Stand-alone tool http://sourceforge.net/projects/fstrim/ Since v2.18-165-gd9e2d0d part of util-linux-ng
  • 35. Summary Linux Discard support is a abstraction for underlying specification Exported via BLKDISCARD ioctl to user-space and blkdev issue discard() for filesystems Discard testing kit (Discard-kit) 1 test-discard 2 PostMark Filesystem support 1 Fine grained (online) discard - mount -o discard 2 Batched discard support - fstrim from util-linux-ng Support in user-space utilities 1 Filesystem creation (mkfs) 2 e2fsprogs - mkfs,e2fsck,resize2fs 3 xfsprogs - mkfs 4 fstrim
  • 36. The end. Thanks for listening.
  • 37. Useful links http://sourceforge.net/projects/fstrim/ http://sourceforge.net/projects/test-discard/ http://people.redhat.com/lczerner/discard/