PostgreSQL Portland
Performance Practice Project
    Database Test 2 (DBT-2)
    Characterizing Filesystems


         Mar...
Review


  From last time:
         DBT-2 live demo.
    ◮


            __      __
           / ~~~/  . o O ( Questions? ...
This is not:
This is:
                                          A comprehensive Linux
      A collection of reason why    ...
Contents




        Criteria
    ◮

        Preconceptions
    ◮

        fio Parameters
    ◮

        Testing
    ◮

   ...
Some hardware details

        HP DL380 G5
    ◮

        2 x Quad Core Intel(R) Xeon(R) E5405
    ◮

        HP 32GB Full...
Criteria


   PostgreSQL dictates the following (even though some of them are
   actually configurable:
         8KB block ...
Linux Filesystems Tested
          ext2
      ◮

          ext3
      ◮

          ext4
      ◮

          jfs
      ◮

  ...
Hardware RAID Configurations

        RAID 0
    ◮

        RAID 1
    ◮

        RAID 1+0
    ◮

        RAID 5
    ◮

   ...
Preconceptions




              __     __
             / ~~~/  . o O ( Everyone gets to participate. )
       ,----(     ...
__     __
          / ~~~/  . o O ( atime slows us down a lot. )
    ,----(     oo    )
  /       __     __/
 /|          ...
__     __
          / ~~~/  . o O ( Partition alignment is huge. )
    ,----(     oo    )
  /       __     __/
 /|        ...
__     __      /                            
          / ~~~/  . o O | Doubling your disks doubles |
    ,----(     oo    ...
__     __
          / ~~~/  . o O ( RAID-5 is awful at writes. )
    ,----(     oo    )
  /       __     __/
 /|          ...
__     __
          / ~~~/  . o O ( RAID-6 improves on RAID-5. )
    ,----(     oo    )
  /       __     __/
 /|          ...
__     __      /                           
          / ~~~/  . o O | Journaling filesystems are |
    ,----(     oo    ) ...
fio1




      Flexible I/O by Jens Axboe.
             Sequential Read
        ◮

             Sequential Write
        ◮
...
Sequential Read


   [seq-read]
   rw=read
   size=8g
   directory=/test
   fadvise_hint=0
   blocksize=8k
   direct=0
   ...
Sequential Write


   [seq-write]
   rw=write
   size=8g
   directory=/test
   fadvise_hint=0
   blocksize=8k
   direct=0
...
Random Read


  [random-read]
  rw=randread
  size=8g
  directory=/test
  fadvise_hint=0
  blocksize=8k
  direct=0
  numjo...
Random Write


  [random-write]
  rw=randwrite
  size=8g
  directory=/test
  fadvise_hint=0
  blocksize=8k
  direct=0
  nu...
Random Read/Write


  [read-write]
  rw=rw
  rwmixread=50
  size=8g
  directory=/test
  fadvise_hint=0
  blocksize=8k
  di...
Versions Tested




   Linux 2.6.25-gentoo-r6
   fio v1.22
   and
   Linux 2.6.28-gentoo
   fio v1.23
__     __
          / ~~~/  . o O ( Results! )
    ,----(     oo    )
  /       __     __/
 /|          ( |(
^     /___ / ...
A few words about statistics...




   Not enough data to calculate good statistics.
   Examine the following charts for a...
Random Read with Linux 2.6.28-gentoo ext2 on 1 Disk
Random Write with Linux 2.6.28-gentoo ext2 on 1 Disk
Read/Write Mix with Linux 2.6.28-gentoo ext2 on 1 Disk
Sequential Read with Linux 2.6.28-gentoo ext2 on 1 Disk
Sequential Write with Linux 2.6.28-gentoo ext2 on 1 Disk
Results for all the filesystems when using only a single disk...
(If you see missing data, it means the test didn’t complet...
Results for all the filesystems when striping across four disks...
Let’s look at that step by step for RAID 0.
Note trends might not be the same when adding disks to other
RAID configurations.
What benefits are there in mirroring?
Should have same write performance as a single disk.
◮

    May be able to take advantage of using two disks for reading.
◮
What is the best use of 4 disks?
Is RAID-6 and improvement over RAID-5?2
       RAID 5 (striped disks with parity) combine...
Filesystem Readahead



   Values are in number of 512-byte segments, set per device.
   To get the current readahead size...
What does this all mean for my database system?




   It depends... Let’s look at one sample of just using difference
   fi...
Raw data for comparing filesystems




   Initial inspection of results suggests we are close to being bound by
   processo...
Linux I/O Elevator Algorithm



   To get current algorithm in []’s:

   $ cat /sys/block/sda/queue/scheduler
   noop anti...
Raw data for comparing I/O elevator algorithms




   Initial inspection of results suggests we are close to being bound b...
Why use deadline?3



          Attention! Database servers, especially those using ”TCQ”
    ◮
          disks should inv...
Bonus Data - Preview of FreeBSD
                ,         ,
               /(         )‘
                ___   /|
        ...
Final Notes




        You have to test your own system to see what it can do.
    ◮
Materials Are Freely Available - In a new location!




   PDF
         http://www.slideshare.net/markwkm
     ◮

   LTEX ...
Time and Location




   When: 2nd Thursday of the month
   Location: Portland State University
   Room: FAB 86-01 (Fourth...
Coming up next time. . . Effects of tuning PostgreSQL
Parameters



             __      __
            / ~~~/  . o O ( Tha...
Acknowledgements



  Haley Jane Wakenshaw

            __      __
           / ~~~/ 
     ,----(      oo    )
    /      ...
License




   This work is licensed under a Creative Commons Attribution 3.0
   Unported License. To view a copy of this ...
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization
Upcoming SlideShare
Loading in …5
×

PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization

3,904 views
3,805 views

Published on

Fifth presentation in a speaker series sponsored by the Portland State University Computer Science Department. The series covers PostgreSQL performance with an OLTP (on-line transaction processing) workload called Database Test 2 (DBT-2). This presentation goes through results of different hardware RAID configurations to show why it is important to test your own hardware: it might be performing in way you don't expect.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,904
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
126
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

PostgreSQL Portland Performance Practice Project - Database Test 2 Filesystem Characterization

  1. 1. PostgreSQL Portland Performance Practice Project Database Test 2 (DBT-2) Characterizing Filesystems Mark Wong markwkm@postgresql.org Portland State University April 9, 2009
  2. 2. Review From last time: DBT-2 live demo. ◮ __ __ / ~~~/ . o O ( Questions? ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  3. 3. This is not: This is: A comprehensive Linux A collection of reason why ◮ ◮ filesystem evaluation. you should test your own system. A filesystem tuning guide ◮ (mkfs and mount options). A guide for how to ◮ understand the i/o behavior Representative of your ◮ of your own system. system (unless you have the exact same hardware.) A model of simple i/o ◮ behavior based on A test of reliability or ◮ PostgreSQL durability.
  4. 4. Contents Criteria ◮ Preconceptions ◮ fio Parameters ◮ Testing ◮ Test Results ◮
  5. 5. Some hardware details HP DL380 G5 ◮ 2 x Quad Core Intel(R) Xeon(R) E5405 ◮ HP 32GB Fully Buffered DIMM PC2-5300 8x4GB DR LP ◮ Memory HP Smart Array P800 Controller ◮ MSA70 with 25 x HP 72GB Hot Plug 2.5 SAS 15,000 rpm ◮ Hard Drives __ __ / ~~~/ . o O ( Thank you HP! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  6. 6. Criteria PostgreSQL dictates the following (even though some of them are actually configurable: 8KB block size ◮ No use of posix fadvise ◮ No use of direct i/o ◮ No use of async i/o ◮ Uses processes, not threads. ◮ Based on the hardware we determine these criteria: 64GB data set - To minimize any caching in the 32GB of ◮ system memory. 8 processes performing i/o - One fio process per core. ◮
  7. 7. Linux Filesystems Tested ext2 ◮ ext3 ◮ ext4 ◮ jfs ◮ reiserfs ◮ xfs ◮ a8888b. . o O ( What is your favorite filesystem? ) d888888b. 8Pquot;YPquot;Y88 8|o||o|88 8’ .88 8‘._.’ Y8. d/ ‘8b. .dP . Y8b. d8:’ quot; ‘::88b. d8quot; ‘Y88b :8P ’ :888 8a. : _a88P ._/quot;Yaa_ : .| 88P| YPquot; ‘| 8P ‘. / ._____.d| .’ ‘--..__)888888P‘._.’
  8. 8. Hardware RAID Configurations RAID 0 ◮ RAID 1 ◮ RAID 1+0 ◮ RAID 5 ◮ RAID 6 ◮ __ __ / ~~~/ . o O ( Which RAID would you use? ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  9. 9. Preconceptions __ __ / ~~~/ . o O ( Everyone gets to participate. ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  10. 10. __ __ / ~~~/ . o O ( atime slows us down a lot. ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  11. 11. __ __ / ~~~/ . o O ( Partition alignment is huge. ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  12. 12. __ __ / / ~~~/ . o O | Doubling your disks doubles | ,----( oo ) | your performance. | / __ __/ / /| ( |( ^ /___ / | |__| |__|-quot;
  13. 13. __ __ / ~~~/ . o O ( RAID-5 is awful at writes. ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  14. 14. __ __ / ~~~/ . o O ( RAID-6 improves on RAID-5. ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  15. 15. __ __ / / ~~~/ . o O | Journaling filesystems are | ,----( oo ) | the slowest. | / __ __/ / /| ( |( ^ /___ / | |__| |__|-quot;
  16. 16. fio1 Flexible I/O by Jens Axboe. Sequential Read ◮ Sequential Write ◮ Random Read ◮ Random Write ◮ Random Read/Write ◮ 1 http://brick.kernel.dk/snaps/
  17. 17. Sequential Read [seq-read] rw=read size=8g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  18. 18. Sequential Write [seq-write] rw=write size=8g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  19. 19. Random Read [random-read] rw=randread size=8g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  20. 20. Random Write [random-write] rw=randwrite size=8g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  21. 21. Random Read/Write [read-write] rw=rw rwmixread=50 size=8g directory=/test fadvise_hint=0 blocksize=8k direct=0 numjobs=8 nrfiles=1 runtime=1h exec_prerun=echo 3 > /proc/sys/vm/drop_caches
  22. 22. Versions Tested Linux 2.6.25-gentoo-r6 fio v1.22 and Linux 2.6.28-gentoo fio v1.23
  23. 23. __ __ / ~~~/ . o O ( Results! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot; There is more information than what is in this presentation, raw data is at: http://207.173.203.223/~markwkm/community6/fio/2.6.28/ For example other things of interest may be: Number of i/o operations. ◮ Processor utilization. ◮
  24. 24. A few words about statistics... Not enough data to calculate good statistics. Examine the following charts for a feel of how reliable the results may be.
  25. 25. Random Read with Linux 2.6.28-gentoo ext2 on 1 Disk
  26. 26. Random Write with Linux 2.6.28-gentoo ext2 on 1 Disk
  27. 27. Read/Write Mix with Linux 2.6.28-gentoo ext2 on 1 Disk
  28. 28. Sequential Read with Linux 2.6.28-gentoo ext2 on 1 Disk
  29. 29. Sequential Write with Linux 2.6.28-gentoo ext2 on 1 Disk
  30. 30. Results for all the filesystems when using only a single disk... (If you see missing data, it means the test didn’t complete.)
  31. 31. Results for all the filesystems when striping across four disks...
  32. 32. Let’s look at that step by step for RAID 0.
  33. 33. Note trends might not be the same when adding disks to other RAID configurations.
  34. 34. What benefits are there in mirroring?
  35. 35. Should have same write performance as a single disk. ◮ May be able to take advantage of using two disks for reading. ◮
  36. 36. What is the best use of 4 disks? Is RAID-6 and improvement over RAID-5?2 RAID 5 (striped disks with parity) combines three or more ◮ disks in a way that protects data against loss of any one disk; the storage capacity of the array is reduced by one disk. RAID 6 (striped disks with dual parity) (less common) can ◮ recover from the loss of two disks. Do the answers depend on what filesystem is used? 2 http://en.wikipedia.org/wiki/RAID
  37. 37. Filesystem Readahead Values are in number of 512-byte segments, set per device. To get the current readahead size: $ blockdev --getra /dev/sda 256 To set the readahead size to 8 MB: $ blockdev --setra 16384 /dev/sda
  38. 38. What does this all mean for my database system? It depends... Let’s look at one sample of just using difference filesystems.
  39. 39. Raw data for comparing filesystems Initial inspection of results suggests we are close to being bound by processor and storage performance: ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext2.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext3.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/ext4.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/jfs.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/reiserfs.1500.2/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-fs/xfs.1500.2/
  40. 40. Linux I/O Elevator Algorithm To get current algorithm in []’s: $ cat /sys/block/sda/queue/scheduler noop anticipatory [deadline] cfq To set a different algorithm: $ echo cfg > /sys/block/sda/queue/scheduler noop anticipatory [deadline] cfq
  41. 41. Raw data for comparing I/O elevator algorithms Initial inspection of results suggests we are close to being bound by processor and storage performance: ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.anticipatory.1/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.cfs.1/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.deadline.1/ ◮ http://207.173.203.223/~ markwkm/community6/dbt2/m-elevator/ext2.noop.1/
  42. 42. Why use deadline?3 Attention! Database servers, especially those using ”TCQ” ◮ disks should investigate performance with the ’deadline’ IO scheduler. Any system with high disk performance requirements should do so, in fact. Also, users with hardware RAID controllers, doing striping, ◮ may find highly variable performance results with using the as-iosched. The as-iosched anticipatory implementation is based on the notion that a disk device has only one physical seeking head. A striped RAID controller actually has a head for each physical device in the logical RAID device. 3 Linux source code: Documentation/block/as-iosched.txt
  43. 43. Bonus Data - Preview of FreeBSD , , /( )‘ ___ /| /- _ ‘-/ ’ (// / // |‘ OO )/ | ‘-^--’‘< ’ (_.) _ ) / ‘.___/‘ / ‘-----’ / <----. __ / __ <----|====O)))==) ) /==== <----’ ‘--’ ‘.__,’ | | / / ______( (_ / ______/ ,’ ,-----’ | ‘--{__________)
  44. 44. Final Notes You have to test your own system to see what it can do. ◮
  45. 45. Materials Are Freely Available - In a new location! PDF http://www.slideshare.net/markwkm ◮ LTEX Beamer (source) A ◮ http://git.postgresql.org/git/performance-tuning.git
  46. 46. Time and Location When: 2nd Thursday of the month Location: Portland State University Room: FAB 86-01 (Fourth Avenue Building) Map: http://www.pdx.edu/map.html
  47. 47. Coming up next time. . . Effects of tuning PostgreSQL Parameters __ __ / ~~~/ . o O ( Thank you! ) ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  48. 48. Acknowledgements Haley Jane Wakenshaw __ __ / ~~~/ ,----( oo ) / __ __/ /| ( |( ^ /___ / | |__| |__|-quot;
  49. 49. License This work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, (a) visit http://creativecommons.org/licenses/by/3.0/us/; or, (b) send a letter to Creative Commons, 171 2nd Street, Suite 300, San Francisco, California, 94105, USA.

×