SlideShare a Scribd company logo
1 of 30
Download to read offline
Optimal usage of SSDs
under Linux:
Optimize your I/O Subsystem

Werner Fischer,
Technology Specialist Thomas-Krenn.AG

LinuxCon Europe 2011,
October 26th - 28th 2011,
Prague, Czech Republic
Introduction

                  who I am                         who I am not
 Werner Fischer   from Australia    Linux user        Kernel
                                    since 2001       developer




 working for a      freelancer     Piano learner       SSD
 Server vendor     IT journalist                     developer




                                                      slide 2/30
Agenda

 1) SSD layout

 2) I/O performance metrics

 3) Configurations tips




                     Source: maximumpc.com


                                             slide 3/30
1) SSD layout: memory cell

  • memory cells
    – NAND memory cell =
      MOS transistor with
      floating gate
    – permanently store
      charge                         Source: Intel




        • programming puts electrons on floating gate
        • erase takes them off
    – one program/erase (p/e) cycle is a round trip by the electrons
    – back-and-forth round trips gradually damage the tunnel oxide
    – endurance is limited, measured in number of p/e cycles:
        • 50nm MLC ~ 10.000 p/e cycles
        • 34nm/25nm/20nm MLC ~ 3.000 – 5.000 p/e cycles

                                                           slide 4/30
1) SSD layout: memory cell

  • memory cells
    – SLC (Single Level Cell) → 1 bit per memory cell
    – MLC (Multi Level Cell) → 2 bits per memory cell




          Source: anandtech.com


    – TLC (Triple Level Cell) → 3 bits per memory cell
    – 16LC (16 Level Cell) → 4 bits per memory cell

  • multiple memory cells (e.g. 16.384) build up a “page”
    – page = smallest area, which can be read/written

                                                         slide 5/30
1) SSD layout: page

        blackboard #1                     • one line = page within a SSD
   1 Stiegl Bier
   2 Budweiser
                                               – 8.192 Bytes (8 kiB)
   3 Hofstettner                               – can be read/written individually
   4 Heinecken                                 – cannot be changed/erased individually
   5
   6
   7
   8




 Note: example sizes of pages and blocks are taken from Intel's Series 320 SSDs (with IMFT's 25nm Flash chips)


                                                                                                 slide 6/30
1) SSD layout: block

        blackboard #1                     • one line = page within a SSD
   1 Stiegl Bier
   2 Budweiser
                                               – 8.192 Bytes (8 kiB)
   3 Hofstettner                               – can be read/written individually
   4 Heinecken                                 – cannot be changed/erased individually
   5
   6                                      • one blackboard = block within a SSD
   7                                           – consists of 256 lines (pages),
   8                                             2.097.152 Bytes (2 MiB)
                                               – smallest area which
                                                 can be individually erased
                                                 (we have only
                                                 watering-cans for that ;-)
 Note: example sizes of pages and blocks are taken from Intel's Series 320 SSDs (with IMFT's 25nm Flash chips)


                                                                                                 slide 7/30
1) SSD layout: change page

      blackboard #1   • easier way to change lines?
  1 Stiegl Bier
  2 Budweiser
  3 Hofstettner
  4 Heinecken
  5
  6
  7
  8




                                                 slide 8/30
1) SSD layout: change page

      blackboard #1   • easier way to change lines!
  1 Stiegl Bier
  2 Budweiser
                        – (1) mark old line as invalid
  3 Hofstettner
  4 Heinecken
  5
  6
  7
  8




                                                         slide 9/30
1) SSD layout: change page

      blackboard #1   • easier way to change lines!
  1 Stiegl Bier
  2 Budweiser
                        – (1) mark old line as invalid
  3 Hofstettner         – (2) store new content in a free line
  4 Heinecken
  5 Pilsener
  6
  7
  8




                                                       slide 10/30
1) SSD layout: change page

      blackboard #1   • easier way to change lines!
  1 Stiegl Bier
  2 Budweiser
                        – (1) mark old line as invalid
  3 Hofstettner         – (2) store new content in a free line
  4 Heinecken
  5 Pilsener          • this can be done as long as there are
  6                     enough free lines left...
  7
  8                                        we need spare
                                           blackboards to have
                                           enough free lines!




                                                       slide 11/30
1) SSD layout: spare area

• SSDs need spare area
                                  cont (LBA 14)       cont (LBA 13)    cont (LBA 19)
                                  cont (LBA 93)       cont (LBA 92)    cont (LBA 91)


  – avoids the erasement          cont (LBA 15)
                                  cont (LBA 27)
                                  cont (LBA 04)
                                                      cont (LBA 11)
                                                      cont (LBA 26)
                                                      cont (LBA 09)
                                                                       cont (LBA 15)
                                                                       cont (LBA 29)
                                                                       cont (LBA 04)

    of a block when a single      cont (LBA 43)
                                  cont (LBA 58)
                                                       cont (LBA 42)
                                                       cont (LBA 57)
                                                                       cont (LBA 43)
                                                                       cont (LBA 59)


    page is changed
                                  cont (LBA 84)        cont (LBA 77)   cont (LBA 89)




  – after some time spare
    area will be filled up, too   cont (LBA 02)
                                  cont (LBA 93)
                                  cont (LBA 26)


  – cleaning gets necessary       cont (LBA 57)




    (garbage collection)




                                                  visible capacity,                       spare area,
                                                    e.g. 300 GB                           e.g. 44 GB



                                                                                       slide 12/30
1) SSD layout: blocks→planes→dies→TSOPs

 • planes
   – multiple blocks make up a plane
   – e.g. 1.024 blocks = 1 plane
 • dies
   – multiple planes make up a die
   – e.g. 4 planes = 1 die             Source: http://newsroom.intel.com/community/intel_newsroom/blog/2011/04/14




 • TSOP (thin small outline package)
   – multiple dies (e.g. 1 - 8)
 • SSDs
   – e.g. 10 TSOPs

                                       Source: maximumpc.com




                                                                               slide 13/30
Agenda

 1) SSD layout

 2) I/O performance metrics

 3) Configurations tips




                     Source: maximumpc.com


                                             slide 14/30
2) I/O performance metrics

  • throughput
    – MByte/s
    – throughput analogy:
       • # of persons/h from Berlin → Prague

  • # of I/O operations per second
    – IOPS analogy:
       • # of individual trips to Prague
         (from Berlin, Vienna, Paris, Rome, ...)

  • latency because of queue depth
    – queue depth analogy:
       • vehicles must use a ferry to reach destination
       • with how many vehicles does
         the ferry depart?
                                                          slide 15/30
Agenda

 1) SSD layout

 2) I/O performance metrics

 3) Configurations tips
    ●   AHCI (NCQ+DIPM)
    ●   TRIM (discard)
    ●   noatime
    ●   tmpfs
    ●   alignment
    ●   over-provisioning


                       Source: maximumpc.com


                                               slide 16/30
3) Configuration tips: AHCI (NCQ+DIPM)

  • NCQ (Native Command Queuing)
    – allows SSD to execute multiple I/O requests in parallel
    – boosts throughput
    – configure queue depth to get your optimal balance
      between max. # of IOPS and lowest latency




                                                          slide 17/30
3) Configuration tips: AHCI (NCQ+DIPM)

  • DIPM (Device Initiated Interface Power Management)
     – reduces idle power down to 0,1 Watt




  • Aggressive
    Link Power
    Management
     – min_power
     – medium_power
     – max_performance
     More Details: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/ALPM.html



updated slide! (has been corrected after the talk)                                                       slide 18/30
3) Configuration tips: TRIM (discard)

  • ATA TRIM
                             cont (LBA 14)       cont (LBA 13)    cont (LBA 19)
                             cont (LBA 93)       cont (LBA 92)    cont (LBA 91)


    – tells SSD which data   cont (LBA 15)
                             cont (LBA 27)
                             cont (LBA 04)
                                                 cont (LBA 11)
                                                 cont (LBA 26)
                                                  cont (LBA 09)
                                                                  cont (LBA 15)
                                                                  cont (LBA 29)
                                                                  cont (LBA 04)

      can be discarded       cont (LBA 43)
                             cont (LBA 58)
                                                  cont (LBA 42)
                                                  cont (LBA 57)
                                                                  cont (LBA 43)
                                                                  cont (LBA 59)
                             cont (LBA 84)        cont (LBA 77)   cont (LBA 89)




                             cont (LBA 02)




                                             visible capacity,                       spare area,
                                               e.g. 300 GB                           e.g. 44 GB



                                                                                  slide 19/30
3) Configuration tips: TRIM (discard)

  • ATA TRIM
                             cont (LBA 14)       cont (LBA 13)    cont (LBA 19)
                             cont (LBA 93)       cont (LBA 92)    cont (LBA 91)


    – tells SSD which data   cont (LBA 15)
                             cont (LBA 27)
                             cont (LBA 04)
                                                 cont (LBA 11)
                                                 cont (LBA 26)
                                                  cont (LBA 09)
                                                                  cont (LBA 15)
                                                                  cont (LBA 29)
                                                                  cont (LBA 04)

      can be discarded       cont (LBA 43)
                             cont (LBA 58)
                                                  cont (LBA 42)
                                                  cont (LBA 57)
                                                                  cont (LBA 43)
                                                                  cont (LBA 59)
                             cont (LBA 84)        cont (LBA 77)   cont (LBA 89)




                             cont (LBA 02)
                             cont (LBA 93)




                                             visible capacity,                       spare area,
                                               e.g. 300 GB                           e.g. 44 GB



                                                                                  slide 20/30
3) Configuration tips: TRIM (discard)

  • ATA TRIM
                                 cont (LBA 14)       cont (LBA 13)    cont (LBA 19)
                                 cont (LBA 93)       cont (LBA 92)    cont (LBA 91)


    – tells SSD which data       cont (LBA 15)
                                 cont (LBA 27)
                                 cont (LBA 04)
                                                     cont (LBA 11)
                                                     cont (LBA 26)
                                                      cont (LBA 09)
                                                                      cont (LBA 15)
                                                                      cont (LBA 29)
                                                                      cont (LBA 04)

      can be discarded           cont (LBA 43)
                                 cont (LBA 58)
                                                      cont (LBA 42)
                                                      cont (LBA 57)
                                                                      cont (LBA 43)
                                                                      cont (LBA 59)
                                 cont (LBA 84)        cont (LBA 77)   cont (LBA 89)


    – without TRIM:
       • deleting a big file
         (e.g. 100 GB) would     cont (LBA 02)
                                 cont (LBA 93)


         lead to keep
         unusable data
       • unusable data will be
         maintained during
         garbage collection!
       • more overhead →
         lower performance &                     visible capacity,                       spare area,
         lower endurance!                          e.g. 300 GB                           e.g. 44 GB



                                                                                      slide 21/30
3) Configuration tips: TRIM (discard)

  • ATA TRIM using discard infrastructure in Linux
    – online discard
       • Ext4: since Kernel 2.6.33
       • XFS: since Kernel 3.0
       • Btrfs: since Kernel 2.6.32
    – batched discard (using fstrim command)
       • Ext4: since Kernel 2.6.37
       • XFS: since Kernel 2.6.38
       • Btrfs: since Kernel 2.6.39
    – pre-discard on format
       • E2fsprogs >= 1.41.10
       • Xfsprogs >= 3.1.0



                                                 slide 22/30
3) Configuration tips: TRIM (discard)

  • ATA TRIM using discard infrastructure in Linux
    – I/O stack discard support (device mapper):
        • since Kernel 2.6.36: DM targets delay, linear, mpath, stripe
        • since Kernel 2.6.38: DM mirror target
    – no I/O stack discard support yet:
        • MD raid

  • alternatives to discard: wiper.sh / raid1ext4trim.sh
    – use hdparm --trim-sector-ranges-stdin
    – read warnings in the source of those scripts
    – cannot be used with device mapper



                                                                 slide 23/30
3) Configuration tips: TRIM (discard)

  • “does TRIM work?” howto




                                        slide 24/30
3) Configuration tips: noatime & tmpfs

  • noatime / relatime
    – omits writes of metadata on every read




  • tmpfs
    – for temporary data like /tmp/, /var/tmp/, /var/cache/, ...


                                                              slide 25/30
4) Configuration tips: alignment

  • align partition and file systems
    – wrong alignment:




    – use fdisk parameters: fdisk -c -u /dev/sda
    – correct alignment:




                                                   slide 26/30
4) Configuration tips: over-provisioning

• do not use full normal
  visible capacity
                                 cont (LBA 14)   cont (LBA 13)
                                 cont (LBA 93)   cont (LBA 92)
                                 cont (LBA 15)   cont (LBA 11)
                                 cont (LBA 27)   cont (LBA 26)
                                 cont (LBA 04)   cont (LBA 09)


  – activate HPA                 cont (LBA 43)
                                 cont (LBA 58)
                                 cont (LBA 84)
                                                 cont (LBA 42)
                                                 cont (LBA 57)
                                                 cont (LBA 77)

    (host protected area)
      • ATA8-ACS SET MAX
        ADDRESS                  cont (LBA 02)
                                 cont (LBA 93)


      • use hdparm -N
                                 cont (LBA 26)
                                 cont (LBA 57)




  – or simply do not partition
    full visible capacity
  – in either case if SSD has
    been used before:            HPA limited capacity
                                    e.g. 200 GB
      • do a secure erase to
                                         normal visible capacity,      spare area,
        TRIM all blocks                      e.g. 300 GB               e.g. 44 GB



                                                                    slide 27/30
4) Configuration tips: over-provisioning

  • over-provisioning is useful when discard cannot be
    used yet (e.g. MD-RAID, hardware RAID, ...)
  • measurements by Intel:




    Source: Intel




                                                 slide 28/30
Review:

 1) SSD layout

 2) I/O performance metrics

 3) Configurations tips
    ●   AHCI (NCQ+DIPM)
    ●   TRIM (discard)
    ●   noatime
    ●   tmpfs
    ●   alignment
    ●   over-provisioning


                       Source: maximumpc.com


                                               slide 29/30
Thanks for your time!




Image sources:
© Robynmac | Dreamstime.com | http://www.dreamstime.com/stock-photography-uluru-australia-kangaroo-sign-image19716692
© stormpic | aboutpixel.de | http://www.aboutpixel.de/foto/im-wartezimmer/stormpic/77740
© meisterleise | aboutpixel.de | http://www.aboutpixel.de/foto/tinnitus-g/meisterleise/113752
© goenz | aboutpixel.de | http://www.aboutpixel.de/foto/computer-1/goenz/104076
© Mr. Monk | aboutpixel.de | http://www.aboutpixel.de/foto/los-beschreib-mich%21/mr_-monk/39871
© RancoR | aboutpixel.de | http://www.aboutpixel.de/foto/giesskanne/rancor/29812
© Michael Hirschka | pixelio.de | http://www.pixelio.de/media/427783
© Sommaruga Fabio | pixelio.de | http://www.pixelio.de/media/448050
© Dieter Schütz | pixelio.de | http://www.pixelio.de/media/537686

More Related Content

Similar to 20111026 optimal-usage-of-ssds-under-linux-updated

TokuDB vs RocksDB
TokuDB vs RocksDBTokuDB vs RocksDB
TokuDB vs RocksDBVlad Lesin
 
Flash! (Modern File Systems)
Flash! (Modern File Systems)Flash! (Modern File Systems)
Flash! (Modern File Systems)David Evans
 
Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02Seshu Chakravarthy
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensMatthew Ahrens
 
9_Storage_Devices.pptx
9_Storage_Devices.pptx9_Storage_Devices.pptx
9_Storage_Devices.pptxJawaharPrasad3
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBshimosawa
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State DrivesDataStax Academy
 
Reshaping Core Genomics Software Tools for the Manycore Era
Reshaping Core Genomics Software Tools for the Manycore EraReshaping Core Genomics Software Tools for the Manycore Era
Reshaping Core Genomics Software Tools for the Manycore EraIntel® Software
 
Things you should know about Oracle truncate
Things you should know about Oracle truncateThings you should know about Oracle truncate
Things you should know about Oracle truncateKazuhiro Takahashi
 
Secondary storage devices
Secondary storage devices Secondary storage devices
Secondary storage devices Slideshare
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TERyosuke IWANAGA
 
Improving the Performance of the qcow2 Format (KVM Forum 2017)
Improving the Performance of the qcow2 Format (KVM Forum 2017)Improving the Performance of the qcow2 Format (KVM Forum 2017)
Improving the Performance of the qcow2 Format (KVM Forum 2017)Igalia
 
W1.1 i os in database
W1.1   i os in databaseW1.1   i os in database
W1.1 i os in databasegafurov_x
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheDavid Grier
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, functionTeddyIswahyudi1
 

Similar to 20111026 optimal-usage-of-ssds-under-linux-updated (20)

TokuDB vs RocksDB
TokuDB vs RocksDBTokuDB vs RocksDB
TokuDB vs RocksDB
 
04 cache memory
04 cache memory04 cache memory
04 cache memory
 
Flash! (Modern File Systems)
Flash! (Modern File Systems)Flash! (Modern File Systems)
Flash! (Modern File Systems)
 
Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02Secondarystoragedevices1 130119040144-phpapp02
Secondarystoragedevices1 130119040144-phpapp02
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
9_Storage_Devices.pptx
9_Storage_Devices.pptx9_Storage_Devices.pptx
9_Storage_Devices.pptx
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
 
9_Storage_Devices.pptx
9_Storage_Devices.pptx9_Storage_Devices.pptx
9_Storage_Devices.pptx
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
 
Reshaping Core Genomics Software Tools for the Manycore Era
Reshaping Core Genomics Software Tools for the Manycore EraReshaping Core Genomics Software Tools for the Manycore Era
Reshaping Core Genomics Software Tools for the Manycore Era
 
Tainted LOB
Tainted LOBTainted LOB
Tainted LOB
 
Things you should know about Oracle truncate
Things you should know about Oracle truncateThings you should know about Oracle truncate
Things you should know about Oracle truncate
 
Secondary storage devices
Secondary storage devices Secondary storage devices
Secondary storage devices
 
Flash 101
Flash 101Flash 101
Flash 101
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE
 
Improving the Performance of the qcow2 Format (KVM Forum 2017)
Improving the Performance of the qcow2 Format (KVM Forum 2017)Improving the Performance of the qcow2 Format (KVM Forum 2017)
Improving the Performance of the qcow2 Format (KVM Forum 2017)
 
W1.1 i os in database
W1.1   i os in databaseW1.1   i os in database
W1.1 i os in database
 
Accelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cacheAccelerating hbase with nvme and bucket cache
Accelerating hbase with nvme and bucket cache
 
cache memory introduction, level, function
cache memory introduction, level, functioncache memory introduction, level, function
cache memory introduction, level, function
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 

20111026 optimal-usage-of-ssds-under-linux-updated

  • 1. Optimal usage of SSDs under Linux: Optimize your I/O Subsystem Werner Fischer, Technology Specialist Thomas-Krenn.AG LinuxCon Europe 2011, October 26th - 28th 2011, Prague, Czech Republic
  • 2. Introduction who I am who I am not Werner Fischer from Australia Linux user Kernel since 2001 developer working for a freelancer Piano learner SSD Server vendor IT journalist developer slide 2/30
  • 3. Agenda 1) SSD layout 2) I/O performance metrics 3) Configurations tips Source: maximumpc.com slide 3/30
  • 4. 1) SSD layout: memory cell • memory cells – NAND memory cell = MOS transistor with floating gate – permanently store charge Source: Intel • programming puts electrons on floating gate • erase takes them off – one program/erase (p/e) cycle is a round trip by the electrons – back-and-forth round trips gradually damage the tunnel oxide – endurance is limited, measured in number of p/e cycles: • 50nm MLC ~ 10.000 p/e cycles • 34nm/25nm/20nm MLC ~ 3.000 – 5.000 p/e cycles slide 4/30
  • 5. 1) SSD layout: memory cell • memory cells – SLC (Single Level Cell) → 1 bit per memory cell – MLC (Multi Level Cell) → 2 bits per memory cell Source: anandtech.com – TLC (Triple Level Cell) → 3 bits per memory cell – 16LC (16 Level Cell) → 4 bits per memory cell • multiple memory cells (e.g. 16.384) build up a “page” – page = smallest area, which can be read/written slide 5/30
  • 6. 1) SSD layout: page blackboard #1 • one line = page within a SSD 1 Stiegl Bier 2 Budweiser – 8.192 Bytes (8 kiB) 3 Hofstettner – can be read/written individually 4 Heinecken – cannot be changed/erased individually 5 6 7 8 Note: example sizes of pages and blocks are taken from Intel's Series 320 SSDs (with IMFT's 25nm Flash chips) slide 6/30
  • 7. 1) SSD layout: block blackboard #1 • one line = page within a SSD 1 Stiegl Bier 2 Budweiser – 8.192 Bytes (8 kiB) 3 Hofstettner – can be read/written individually 4 Heinecken – cannot be changed/erased individually 5 6 • one blackboard = block within a SSD 7 – consists of 256 lines (pages), 8 2.097.152 Bytes (2 MiB) – smallest area which can be individually erased (we have only watering-cans for that ;-) Note: example sizes of pages and blocks are taken from Intel's Series 320 SSDs (with IMFT's 25nm Flash chips) slide 7/30
  • 8. 1) SSD layout: change page blackboard #1 • easier way to change lines? 1 Stiegl Bier 2 Budweiser 3 Hofstettner 4 Heinecken 5 6 7 8 slide 8/30
  • 9. 1) SSD layout: change page blackboard #1 • easier way to change lines! 1 Stiegl Bier 2 Budweiser – (1) mark old line as invalid 3 Hofstettner 4 Heinecken 5 6 7 8 slide 9/30
  • 10. 1) SSD layout: change page blackboard #1 • easier way to change lines! 1 Stiegl Bier 2 Budweiser – (1) mark old line as invalid 3 Hofstettner – (2) store new content in a free line 4 Heinecken 5 Pilsener 6 7 8 slide 10/30
  • 11. 1) SSD layout: change page blackboard #1 • easier way to change lines! 1 Stiegl Bier 2 Budweiser – (1) mark old line as invalid 3 Hofstettner – (2) store new content in a free line 4 Heinecken 5 Pilsener • this can be done as long as there are 6 enough free lines left... 7 8 we need spare blackboards to have enough free lines! slide 11/30
  • 12. 1) SSD layout: spare area • SSDs need spare area cont (LBA 14) cont (LBA 13) cont (LBA 19) cont (LBA 93) cont (LBA 92) cont (LBA 91) – avoids the erasement cont (LBA 15) cont (LBA 27) cont (LBA 04) cont (LBA 11) cont (LBA 26) cont (LBA 09) cont (LBA 15) cont (LBA 29) cont (LBA 04) of a block when a single cont (LBA 43) cont (LBA 58) cont (LBA 42) cont (LBA 57) cont (LBA 43) cont (LBA 59) page is changed cont (LBA 84) cont (LBA 77) cont (LBA 89) – after some time spare area will be filled up, too cont (LBA 02) cont (LBA 93) cont (LBA 26) – cleaning gets necessary cont (LBA 57) (garbage collection) visible capacity, spare area, e.g. 300 GB e.g. 44 GB slide 12/30
  • 13. 1) SSD layout: blocks→planes→dies→TSOPs • planes – multiple blocks make up a plane – e.g. 1.024 blocks = 1 plane • dies – multiple planes make up a die – e.g. 4 planes = 1 die Source: http://newsroom.intel.com/community/intel_newsroom/blog/2011/04/14 • TSOP (thin small outline package) – multiple dies (e.g. 1 - 8) • SSDs – e.g. 10 TSOPs Source: maximumpc.com slide 13/30
  • 14. Agenda 1) SSD layout 2) I/O performance metrics 3) Configurations tips Source: maximumpc.com slide 14/30
  • 15. 2) I/O performance metrics • throughput – MByte/s – throughput analogy: • # of persons/h from Berlin → Prague • # of I/O operations per second – IOPS analogy: • # of individual trips to Prague (from Berlin, Vienna, Paris, Rome, ...) • latency because of queue depth – queue depth analogy: • vehicles must use a ferry to reach destination • with how many vehicles does the ferry depart? slide 15/30
  • 16. Agenda 1) SSD layout 2) I/O performance metrics 3) Configurations tips ● AHCI (NCQ+DIPM) ● TRIM (discard) ● noatime ● tmpfs ● alignment ● over-provisioning Source: maximumpc.com slide 16/30
  • 17. 3) Configuration tips: AHCI (NCQ+DIPM) • NCQ (Native Command Queuing) – allows SSD to execute multiple I/O requests in parallel – boosts throughput – configure queue depth to get your optimal balance between max. # of IOPS and lowest latency slide 17/30
  • 18. 3) Configuration tips: AHCI (NCQ+DIPM) • DIPM (Device Initiated Interface Power Management) – reduces idle power down to 0,1 Watt • Aggressive Link Power Management – min_power – medium_power – max_performance More Details: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/ALPM.html updated slide! (has been corrected after the talk) slide 18/30
  • 19. 3) Configuration tips: TRIM (discard) • ATA TRIM cont (LBA 14) cont (LBA 13) cont (LBA 19) cont (LBA 93) cont (LBA 92) cont (LBA 91) – tells SSD which data cont (LBA 15) cont (LBA 27) cont (LBA 04) cont (LBA 11) cont (LBA 26) cont (LBA 09) cont (LBA 15) cont (LBA 29) cont (LBA 04) can be discarded cont (LBA 43) cont (LBA 58) cont (LBA 42) cont (LBA 57) cont (LBA 43) cont (LBA 59) cont (LBA 84) cont (LBA 77) cont (LBA 89) cont (LBA 02) visible capacity, spare area, e.g. 300 GB e.g. 44 GB slide 19/30
  • 20. 3) Configuration tips: TRIM (discard) • ATA TRIM cont (LBA 14) cont (LBA 13) cont (LBA 19) cont (LBA 93) cont (LBA 92) cont (LBA 91) – tells SSD which data cont (LBA 15) cont (LBA 27) cont (LBA 04) cont (LBA 11) cont (LBA 26) cont (LBA 09) cont (LBA 15) cont (LBA 29) cont (LBA 04) can be discarded cont (LBA 43) cont (LBA 58) cont (LBA 42) cont (LBA 57) cont (LBA 43) cont (LBA 59) cont (LBA 84) cont (LBA 77) cont (LBA 89) cont (LBA 02) cont (LBA 93) visible capacity, spare area, e.g. 300 GB e.g. 44 GB slide 20/30
  • 21. 3) Configuration tips: TRIM (discard) • ATA TRIM cont (LBA 14) cont (LBA 13) cont (LBA 19) cont (LBA 93) cont (LBA 92) cont (LBA 91) – tells SSD which data cont (LBA 15) cont (LBA 27) cont (LBA 04) cont (LBA 11) cont (LBA 26) cont (LBA 09) cont (LBA 15) cont (LBA 29) cont (LBA 04) can be discarded cont (LBA 43) cont (LBA 58) cont (LBA 42) cont (LBA 57) cont (LBA 43) cont (LBA 59) cont (LBA 84) cont (LBA 77) cont (LBA 89) – without TRIM: • deleting a big file (e.g. 100 GB) would cont (LBA 02) cont (LBA 93) lead to keep unusable data • unusable data will be maintained during garbage collection! • more overhead → lower performance & visible capacity, spare area, lower endurance! e.g. 300 GB e.g. 44 GB slide 21/30
  • 22. 3) Configuration tips: TRIM (discard) • ATA TRIM using discard infrastructure in Linux – online discard • Ext4: since Kernel 2.6.33 • XFS: since Kernel 3.0 • Btrfs: since Kernel 2.6.32 – batched discard (using fstrim command) • Ext4: since Kernel 2.6.37 • XFS: since Kernel 2.6.38 • Btrfs: since Kernel 2.6.39 – pre-discard on format • E2fsprogs >= 1.41.10 • Xfsprogs >= 3.1.0 slide 22/30
  • 23. 3) Configuration tips: TRIM (discard) • ATA TRIM using discard infrastructure in Linux – I/O stack discard support (device mapper): • since Kernel 2.6.36: DM targets delay, linear, mpath, stripe • since Kernel 2.6.38: DM mirror target – no I/O stack discard support yet: • MD raid • alternatives to discard: wiper.sh / raid1ext4trim.sh – use hdparm --trim-sector-ranges-stdin – read warnings in the source of those scripts – cannot be used with device mapper slide 23/30
  • 24. 3) Configuration tips: TRIM (discard) • “does TRIM work?” howto slide 24/30
  • 25. 3) Configuration tips: noatime & tmpfs • noatime / relatime – omits writes of metadata on every read • tmpfs – for temporary data like /tmp/, /var/tmp/, /var/cache/, ... slide 25/30
  • 26. 4) Configuration tips: alignment • align partition and file systems – wrong alignment: – use fdisk parameters: fdisk -c -u /dev/sda – correct alignment: slide 26/30
  • 27. 4) Configuration tips: over-provisioning • do not use full normal visible capacity cont (LBA 14) cont (LBA 13) cont (LBA 93) cont (LBA 92) cont (LBA 15) cont (LBA 11) cont (LBA 27) cont (LBA 26) cont (LBA 04) cont (LBA 09) – activate HPA cont (LBA 43) cont (LBA 58) cont (LBA 84) cont (LBA 42) cont (LBA 57) cont (LBA 77) (host protected area) • ATA8-ACS SET MAX ADDRESS cont (LBA 02) cont (LBA 93) • use hdparm -N cont (LBA 26) cont (LBA 57) – or simply do not partition full visible capacity – in either case if SSD has been used before: HPA limited capacity e.g. 200 GB • do a secure erase to normal visible capacity, spare area, TRIM all blocks e.g. 300 GB e.g. 44 GB slide 27/30
  • 28. 4) Configuration tips: over-provisioning • over-provisioning is useful when discard cannot be used yet (e.g. MD-RAID, hardware RAID, ...) • measurements by Intel: Source: Intel slide 28/30
  • 29. Review: 1) SSD layout 2) I/O performance metrics 3) Configurations tips ● AHCI (NCQ+DIPM) ● TRIM (discard) ● noatime ● tmpfs ● alignment ● over-provisioning Source: maximumpc.com slide 29/30
  • 30. Thanks for your time! Image sources: © Robynmac | Dreamstime.com | http://www.dreamstime.com/stock-photography-uluru-australia-kangaroo-sign-image19716692 © stormpic | aboutpixel.de | http://www.aboutpixel.de/foto/im-wartezimmer/stormpic/77740 © meisterleise | aboutpixel.de | http://www.aboutpixel.de/foto/tinnitus-g/meisterleise/113752 © goenz | aboutpixel.de | http://www.aboutpixel.de/foto/computer-1/goenz/104076 © Mr. Monk | aboutpixel.de | http://www.aboutpixel.de/foto/los-beschreib-mich%21/mr_-monk/39871 © RancoR | aboutpixel.de | http://www.aboutpixel.de/foto/giesskanne/rancor/29812 © Michael Hirschka | pixelio.de | http://www.pixelio.de/media/427783 © Sommaruga Fabio | pixelio.de | http://www.pixelio.de/media/448050 © Dieter Schütz | pixelio.de | http://www.pixelio.de/media/537686