Optimal usage of SSDsunder Linux:Optimize your I/O SubsystemWerner Fischer,Technology Specialist Thomas-Krenn.AGLinuxCon E...
Introduction                  who I am                         who I am not Werner Fischer   from Australia    Linux user ...
Agenda 1) SSD layout 2) I/O performance metrics 3) Configurations tips                     Source: maximumpc.com          ...
1) SSD layout: memory cell  • memory cells    – NAND memory cell =      MOS transistor with      floating gate    – perman...
1) SSD layout: memory cell  • memory cells    – SLC (Single Level Cell) → 1 bit per memory cell    – MLC (Multi Level Cell...
1) SSD layout: page        blackboard #1                     • one line = page within a SSD   1 Stiegl Bier   2 Budweiser ...
1) SSD layout: block        blackboard #1                     • one line = page within a SSD   1 Stiegl Bier   2 Budweiser...
1) SSD layout: change page      blackboard #1   • easier way to change lines?  1 Stiegl Bier  2 Budweiser  3 Hofstettner  ...
1) SSD layout: change page      blackboard #1   • easier way to change lines!  1 Stiegl Bier  2 Budweiser                 ...
1) SSD layout: change page      blackboard #1   • easier way to change lines!  1 Stiegl Bier  2 Budweiser                 ...
1) SSD layout: change page      blackboard #1   • easier way to change lines!  1 Stiegl Bier  2 Budweiser                 ...
1) SSD layout: spare area• SSDs need spare area                                  cont (LBA 14)       cont (LBA 13)    cont...
1) SSD layout: blocks→planes→dies→TSOPs • planes   – multiple blocks make up a plane   – e.g. 1.024 blocks = 1 plane • die...
Agenda 1) SSD layout 2) I/O performance metrics 3) Configurations tips                     Source: maximumpc.com          ...
2) I/O performance metrics  • throughput    – MByte/s    – throughput analogy:       • # of persons/h from Berlin → Prague...
Agenda 1) SSD layout 2) I/O performance metrics 3) Configurations tips    ●   AHCI (NCQ+DIPM)    ●   TRIM (discard)    ●  ...
3) Configuration tips: AHCI (NCQ+DIPM)  • NCQ (Native Command Queuing)    – allows SSD to execute multiple I/O requests in...
3) Configuration tips: AHCI (NCQ+DIPM)  • DIPM (Device Initiated Interface Power Management)     – reduces idle power down...
3) Configuration tips: TRIM (discard)  • ATA TRIM                             cont (LBA 14)       cont (LBA 13)    cont (L...
3) Configuration tips: TRIM (discard)  • ATA TRIM                             cont (LBA 14)       cont (LBA 13)    cont (L...
3) Configuration tips: TRIM (discard)  • ATA TRIM                                 cont (LBA 14)       cont (LBA 13)    con...
3) Configuration tips: TRIM (discard)  • ATA TRIM using discard infrastructure in Linux    – online discard       • Ext4: ...
3) Configuration tips: TRIM (discard)  • ATA TRIM using discard infrastructure in Linux    – I/O stack discard support (de...
3) Configuration tips: TRIM (discard)  • “does TRIM work?” howto                                        slide 24/30
3) Configuration tips: noatime & tmpfs  • noatime / relatime    – omits writes of metadata on every read  • tmpfs    – for...
4) Configuration tips: alignment  • align partition and file systems    – wrong alignment:    – use fdisk parameters: fdis...
4) Configuration tips: over-provisioning• do not use full normal  visible capacity                                 cont (L...
4) Configuration tips: over-provisioning  • over-provisioning is useful when discard cannot be    used yet (e.g. MD-RAID, ...
Review: 1) SSD layout 2) I/O performance metrics 3) Configurations tips    ●   AHCI (NCQ+DIPM)    ●   TRIM (discard)    ● ...
Thanks for your time!Image sources:© Robynmac | Dreamstime.com | http://www.dreamstime.com/stock-photography-uluru-austral...
Upcoming SlideShare
Loading in …5
×

20111026 optimal-usage-of-ssds-under-linux-updated

872 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
872
On SlideShare
0
From Embeds
0
Number of Embeds
478
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

20111026 optimal-usage-of-ssds-under-linux-updated

  1. 1. Optimal usage of SSDsunder Linux:Optimize your I/O SubsystemWerner Fischer,Technology Specialist Thomas-Krenn.AGLinuxCon Europe 2011,October 26th - 28th 2011,Prague, Czech Republic
  2. 2. Introduction who I am who I am not Werner Fischer from Australia Linux user Kernel since 2001 developer working for a freelancer Piano learner SSD Server vendor IT journalist developer slide 2/30
  3. 3. Agenda 1) SSD layout 2) I/O performance metrics 3) Configurations tips Source: maximumpc.com slide 3/30
  4. 4. 1) SSD layout: memory cell • memory cells – NAND memory cell = MOS transistor with floating gate – permanently store charge Source: Intel • programming puts electrons on floating gate • erase takes them off – one program/erase (p/e) cycle is a round trip by the electrons – back-and-forth round trips gradually damage the tunnel oxide – endurance is limited, measured in number of p/e cycles: • 50nm MLC ~ 10.000 p/e cycles • 34nm/25nm/20nm MLC ~ 3.000 – 5.000 p/e cycles slide 4/30
  5. 5. 1) SSD layout: memory cell • memory cells – SLC (Single Level Cell) → 1 bit per memory cell – MLC (Multi Level Cell) → 2 bits per memory cell Source: anandtech.com – TLC (Triple Level Cell) → 3 bits per memory cell – 16LC (16 Level Cell) → 4 bits per memory cell • multiple memory cells (e.g. 16.384) build up a “page” – page = smallest area, which can be read/written slide 5/30
  6. 6. 1) SSD layout: page blackboard #1 • one line = page within a SSD 1 Stiegl Bier 2 Budweiser – 8.192 Bytes (8 kiB) 3 Hofstettner – can be read/written individually 4 Heinecken – cannot be changed/erased individually 5 6 7 8 Note: example sizes of pages and blocks are taken from Intels Series 320 SSDs (with IMFTs 25nm Flash chips) slide 6/30
  7. 7. 1) SSD layout: block blackboard #1 • one line = page within a SSD 1 Stiegl Bier 2 Budweiser – 8.192 Bytes (8 kiB) 3 Hofstettner – can be read/written individually 4 Heinecken – cannot be changed/erased individually 5 6 • one blackboard = block within a SSD 7 – consists of 256 lines (pages), 8 2.097.152 Bytes (2 MiB) – smallest area which can be individually erased (we have only watering-cans for that ;-) Note: example sizes of pages and blocks are taken from Intels Series 320 SSDs (with IMFTs 25nm Flash chips) slide 7/30
  8. 8. 1) SSD layout: change page blackboard #1 • easier way to change lines? 1 Stiegl Bier 2 Budweiser 3 Hofstettner 4 Heinecken 5 6 7 8 slide 8/30
  9. 9. 1) SSD layout: change page blackboard #1 • easier way to change lines! 1 Stiegl Bier 2 Budweiser – (1) mark old line as invalid 3 Hofstettner 4 Heinecken 5 6 7 8 slide 9/30
  10. 10. 1) SSD layout: change page blackboard #1 • easier way to change lines! 1 Stiegl Bier 2 Budweiser – (1) mark old line as invalid 3 Hofstettner – (2) store new content in a free line 4 Heinecken 5 Pilsener 6 7 8 slide 10/30
  11. 11. 1) SSD layout: change page blackboard #1 • easier way to change lines! 1 Stiegl Bier 2 Budweiser – (1) mark old line as invalid 3 Hofstettner – (2) store new content in a free line 4 Heinecken 5 Pilsener • this can be done as long as there are 6 enough free lines left... 7 8 we need spare blackboards to have enough free lines! slide 11/30
  12. 12. 1) SSD layout: spare area• SSDs need spare area cont (LBA 14) cont (LBA 13) cont (LBA 19) cont (LBA 93) cont (LBA 92) cont (LBA 91) – avoids the erasement cont (LBA 15) cont (LBA 27) cont (LBA 04) cont (LBA 11) cont (LBA 26) cont (LBA 09) cont (LBA 15) cont (LBA 29) cont (LBA 04) of a block when a single cont (LBA 43) cont (LBA 58) cont (LBA 42) cont (LBA 57) cont (LBA 43) cont (LBA 59) page is changed cont (LBA 84) cont (LBA 77) cont (LBA 89) – after some time spare area will be filled up, too cont (LBA 02) cont (LBA 93) cont (LBA 26) – cleaning gets necessary cont (LBA 57) (garbage collection) visible capacity, spare area, e.g. 300 GB e.g. 44 GB slide 12/30
  13. 13. 1) SSD layout: blocks→planes→dies→TSOPs • planes – multiple blocks make up a plane – e.g. 1.024 blocks = 1 plane • dies – multiple planes make up a die – e.g. 4 planes = 1 die Source: http://newsroom.intel.com/community/intel_newsroom/blog/2011/04/14 • TSOP (thin small outline package) – multiple dies (e.g. 1 - 8) • SSDs – e.g. 10 TSOPs Source: maximumpc.com slide 13/30
  14. 14. Agenda 1) SSD layout 2) I/O performance metrics 3) Configurations tips Source: maximumpc.com slide 14/30
  15. 15. 2) I/O performance metrics • throughput – MByte/s – throughput analogy: • # of persons/h from Berlin → Prague • # of I/O operations per second – IOPS analogy: • # of individual trips to Prague (from Berlin, Vienna, Paris, Rome, ...) • latency because of queue depth – queue depth analogy: • vehicles must use a ferry to reach destination • with how many vehicles does the ferry depart? slide 15/30
  16. 16. Agenda 1) SSD layout 2) I/O performance metrics 3) Configurations tips ● AHCI (NCQ+DIPM) ● TRIM (discard) ● noatime ● tmpfs ● alignment ● over-provisioning Source: maximumpc.com slide 16/30
  17. 17. 3) Configuration tips: AHCI (NCQ+DIPM) • NCQ (Native Command Queuing) – allows SSD to execute multiple I/O requests in parallel – boosts throughput – configure queue depth to get your optimal balance between max. # of IOPS and lowest latency slide 17/30
  18. 18. 3) Configuration tips: AHCI (NCQ+DIPM) • DIPM (Device Initiated Interface Power Management) – reduces idle power down to 0,1 Watt • Aggressive Link Power Management – min_power – medium_power – max_performance More Details: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/ALPM.htmlupdated slide! (has been corrected after the talk) slide 18/30
  19. 19. 3) Configuration tips: TRIM (discard) • ATA TRIM cont (LBA 14) cont (LBA 13) cont (LBA 19) cont (LBA 93) cont (LBA 92) cont (LBA 91) – tells SSD which data cont (LBA 15) cont (LBA 27) cont (LBA 04) cont (LBA 11) cont (LBA 26) cont (LBA 09) cont (LBA 15) cont (LBA 29) cont (LBA 04) can be discarded cont (LBA 43) cont (LBA 58) cont (LBA 42) cont (LBA 57) cont (LBA 43) cont (LBA 59) cont (LBA 84) cont (LBA 77) cont (LBA 89) cont (LBA 02) visible capacity, spare area, e.g. 300 GB e.g. 44 GB slide 19/30
  20. 20. 3) Configuration tips: TRIM (discard) • ATA TRIM cont (LBA 14) cont (LBA 13) cont (LBA 19) cont (LBA 93) cont (LBA 92) cont (LBA 91) – tells SSD which data cont (LBA 15) cont (LBA 27) cont (LBA 04) cont (LBA 11) cont (LBA 26) cont (LBA 09) cont (LBA 15) cont (LBA 29) cont (LBA 04) can be discarded cont (LBA 43) cont (LBA 58) cont (LBA 42) cont (LBA 57) cont (LBA 43) cont (LBA 59) cont (LBA 84) cont (LBA 77) cont (LBA 89) cont (LBA 02) cont (LBA 93) visible capacity, spare area, e.g. 300 GB e.g. 44 GB slide 20/30
  21. 21. 3) Configuration tips: TRIM (discard) • ATA TRIM cont (LBA 14) cont (LBA 13) cont (LBA 19) cont (LBA 93) cont (LBA 92) cont (LBA 91) – tells SSD which data cont (LBA 15) cont (LBA 27) cont (LBA 04) cont (LBA 11) cont (LBA 26) cont (LBA 09) cont (LBA 15) cont (LBA 29) cont (LBA 04) can be discarded cont (LBA 43) cont (LBA 58) cont (LBA 42) cont (LBA 57) cont (LBA 43) cont (LBA 59) cont (LBA 84) cont (LBA 77) cont (LBA 89) – without TRIM: • deleting a big file (e.g. 100 GB) would cont (LBA 02) cont (LBA 93) lead to keep unusable data • unusable data will be maintained during garbage collection! • more overhead → lower performance & visible capacity, spare area, lower endurance! e.g. 300 GB e.g. 44 GB slide 21/30
  22. 22. 3) Configuration tips: TRIM (discard) • ATA TRIM using discard infrastructure in Linux – online discard • Ext4: since Kernel 2.6.33 • XFS: since Kernel 3.0 • Btrfs: since Kernel 2.6.32 – batched discard (using fstrim command) • Ext4: since Kernel 2.6.37 • XFS: since Kernel 2.6.38 • Btrfs: since Kernel 2.6.39 – pre-discard on format • E2fsprogs >= 1.41.10 • Xfsprogs >= 3.1.0 slide 22/30
  23. 23. 3) Configuration tips: TRIM (discard) • ATA TRIM using discard infrastructure in Linux – I/O stack discard support (device mapper): • since Kernel 2.6.36: DM targets delay, linear, mpath, stripe • since Kernel 2.6.38: DM mirror target – no I/O stack discard support yet: • MD raid • alternatives to discard: wiper.sh / raid1ext4trim.sh – use hdparm --trim-sector-ranges-stdin – read warnings in the source of those scripts – cannot be used with device mapper slide 23/30
  24. 24. 3) Configuration tips: TRIM (discard) • “does TRIM work?” howto slide 24/30
  25. 25. 3) Configuration tips: noatime & tmpfs • noatime / relatime – omits writes of metadata on every read • tmpfs – for temporary data like /tmp/, /var/tmp/, /var/cache/, ... slide 25/30
  26. 26. 4) Configuration tips: alignment • align partition and file systems – wrong alignment: – use fdisk parameters: fdisk -c -u /dev/sda – correct alignment: slide 26/30
  27. 27. 4) Configuration tips: over-provisioning• do not use full normal visible capacity cont (LBA 14) cont (LBA 13) cont (LBA 93) cont (LBA 92) cont (LBA 15) cont (LBA 11) cont (LBA 27) cont (LBA 26) cont (LBA 04) cont (LBA 09) – activate HPA cont (LBA 43) cont (LBA 58) cont (LBA 84) cont (LBA 42) cont (LBA 57) cont (LBA 77) (host protected area) • ATA8-ACS SET MAX ADDRESS cont (LBA 02) cont (LBA 93) • use hdparm -N cont (LBA 26) cont (LBA 57) – or simply do not partition full visible capacity – in either case if SSD has been used before: HPA limited capacity e.g. 200 GB • do a secure erase to normal visible capacity, spare area, TRIM all blocks e.g. 300 GB e.g. 44 GB slide 27/30
  28. 28. 4) Configuration tips: over-provisioning • over-provisioning is useful when discard cannot be used yet (e.g. MD-RAID, hardware RAID, ...) • measurements by Intel: Source: Intel slide 28/30
  29. 29. Review: 1) SSD layout 2) I/O performance metrics 3) Configurations tips ● AHCI (NCQ+DIPM) ● TRIM (discard) ● noatime ● tmpfs ● alignment ● over-provisioning Source: maximumpc.com slide 29/30
  30. 30. Thanks for your time!Image sources:© Robynmac | Dreamstime.com | http://www.dreamstime.com/stock-photography-uluru-australia-kangaroo-sign-image19716692© stormpic | aboutpixel.de | http://www.aboutpixel.de/foto/im-wartezimmer/stormpic/77740© meisterleise | aboutpixel.de | http://www.aboutpixel.de/foto/tinnitus-g/meisterleise/113752© goenz | aboutpixel.de | http://www.aboutpixel.de/foto/computer-1/goenz/104076© Mr. Monk | aboutpixel.de | http://www.aboutpixel.de/foto/los-beschreib-mich%21/mr_-monk/39871© RancoR | aboutpixel.de | http://www.aboutpixel.de/foto/giesskanne/rancor/29812© Michael Hirschka | pixelio.de | http://www.pixelio.de/media/427783© Sommaruga Fabio | pixelio.de | http://www.pixelio.de/media/448050© Dieter Schütz | pixelio.de | http://www.pixelio.de/media/537686

×