What Assumptions Make: Filesystem I/O from a database perspective

2,265 views

Published on

The Linux operating system provides a number of file systems that can be used, as well as volume management and hardware or software RAID. We are running performance benchmarks for database tuning, and are curious if the file systems really behave like we expect them to, especially when used in conjunction with RAID or volume management. Are these file systems being used in manners for which they were designed? There is also more to file systems than how fast we can read to them or how fast we can write to them. How reliable is the file system, and how do be prove it? We have collected data and will have a server available for development during the conference.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,265
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
48
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

What Assumptions Make: Filesystem I/O from a database perspective

  1. 1. What Assumptions Make: Performance Testing with P4 Portland PostgreSQL Performance Pad Selena Deckelmann selena@endpoint.com End Point Corporation twitter: @selenamarie
  2. 2. www.endpoint.com 09 20 2, -2 21 b Fe 7x E AL SC
  3. 3. www.endpoint.com 09 20 2, -2 21 b Fe 7x E AL SC
  4. 4. www.endpoint.com 09 20 2, -2 21 b Fe 7x E AL SC
  5. 5. Do filesystems do what we expect? SC AL E 7x Fe b 21 www.endpoint.com -2 5 2, 20 09
  6. 6. We are volunteers. SC AL E 7x Fe b 21 www.endpoint.com -2 6 2, 20 09
  7. 7. We think you should run these tests. SC AL E 7x Fe b 21 www.endpoint.com -2 7 2, 20 09
  8. 8. We are: DBAs Sysadmins Performance tuners SC AL E 7x Fe b 21 www.endpoint.com -2 8 2, 20 09
  9. 9. How will this hardware perform? SC AL E 7x Fe b 21 www.endpoint.com -2 9 2, 20 09
  10. 10. How will this filesystem perform? SC AL E 7x Fe b 21 www.endpoint.com -2 10 2, 20 09
  11. 11. Why should you care about filesystem-specific performance? SC AL E 7x Fe b 21 www.endpoint.com -2 11 2, 20 09
  12. 12. Expectations SC AL E 7x Fe b 21 www.endpoint.com -2 12 2, 20 09
  13. 13. PERSONAL CONFESSION SC AL E 7x Fe b 21 www.endpoint.com -2 13 2, 20 09
  14. 14. Where to start? SC AL E 7x Fe b 21 www.endpoint.com -2 14 2, 20 09
  15. 15. The Defaults. SC AL E 7x Fe b 21 www.endpoint.com -2 15 2, 20 09
  16. 16. www.endpoint.com 16
  17. 17. Not addressing reliability SC AL E 7x Fe b 21 www.endpoint.com -2 17 2, 20 09
  18. 18. Very Narrow Use Case: A Relational Database SC AL E 7x Fe b 21 www.endpoint.com -2 18 2, 20 09
  19. 19. Need for periodic testing. (And we've got some hardware!) SC AL E 7x Fe b 21 www.endpoint.com -2 19 2, 20 09
  20. 20. ★Kernel differences ★FS patch-level differences ★Mount options ★mkfs options SC AL E 7x Fe b 21 www.endpoint.com -2 20 2, 20 09
  21. 21. Focused on THROUGHPUT (Because that’s what people who buy large systems look for) SC AL E 7x Fe b 21 www.endpoint.com -2 21 2, 20 09
  22. 22. Later: Response Time Operations per second SC AL E 7x Fe b 21 www.endpoint.com -2 22 2, 20 09
  23. 23. No, we will not be testing ZFS. SC AL E 7x Fe b 21 www.endpoint.com -2 23 2, 20 09
  24. 24. FS BtrFS (nope, not yet) SC AL E 7x Fe b 21 www.endpoint.com -2 24 2, 20 09
  25. 25. What do we expect? SC AL E 7x Fe b 21 www.endpoint.com -2 25 2, 20 09
  26. 26. Some conventional wisdom: SC AL E 7x Fe b 21 www.endpoint.com -2 26 2, 20 09
  27. 27. “RAID5 is the worst choice for a database.” SC AL E 7x Fe b 21 www.endpoint.com -2 27 2, 20 09
  28. 28. “LVM incurs too much overhead to use.” SC AL E 7x Fe b 21 www.endpoint.com -2 28 2, 20 09
  29. 29. “Striping doubles performance.” SC AL E 7x Fe b 21 www.endpoint.com -2 29 2, 20 09
  30. 30. “Turning off 'atime' is a big performance gain.” SC AL E 7x Fe b 21 www.endpoint.com -2 30 2, 20 09
  31. 31. “Getting rid of atime updates would give us more everyday Linux performance than all the pagecache speedups of the last 10 years, _combined_.” SC AL E 7x Fe b 21 www.endpoint.com -2 31 2, 20 09
  32. 32. “Journaling filesystems (ext3) will have worse performance than non- journaling filesystems (ext2).” SC AL E 7x Fe b 21 www.endpoint.com -2 32 2, 20 09
  33. 33. “Your read-ahead buffer is big enough.” SC AL E 7x Fe b 21 www.endpoint.com -2 33 2, 20 09
  34. 34. Now... on to the good stuff. SC AL E 7x Fe b 21 www.endpoint.com -2 34 2, 20 09
  35. 35. www.endpoint.com 35 09 20 2, -2 21 b Fe 7x E AL SC
  36. 36. PostgreSQL’s Portland Performance Pad SC AL Hosted by CommandPrompt, Inc. E 7x Fe b 21 www.endpoint.com -2 36 2, 20 09
  37. 37. Our machine: HP ProLiant DL380G5 Smart Array p800 72GB 15,000 RPM SAS (up to 25 disks) 32GB RAM Linux: 2.6.25-gentoo-r6 *New tests being run with 2.6.28 SC AL E 7x Fe b 21 www.endpoint.com -2 37 2, 20 09
  38. 38. Our machine: Chosen because of it’s low, low price. Thank you, HP. SC AL E 7x Fe b 21 www.endpoint.com -2 38 2, 20 09
  39. 39. Our tests: fio 64 GB working set 8 threads no fadvise no direct i/o 8KB blocksize I/O elevator: deadline SC AL E 7x Fe b 21 www.endpoint.com -2 39 2, 20 09
  40. 40. Our stats: sar mpstat iostat vmstat readprofile SC AL E 7x Fe b 21 www.endpoint.com -2 40 2, 20 09
  41. 41. Our tests: Chosen because of their relevance to PostgreSQL SC AL E 7x Fe b 21 www.endpoint.com -2 41 2, 20 09
  42. 42. Filesystems Tested: ext2 ext3 jfs xfs reiserfs ext4 (but having trouble) SC AL E 7x Fe b 21 www.endpoint.com -2 42 2, 20 09
  43. 43. Disk configs tested: Single disk RAID-0 RAID-1 RAID-5 RAID-10 RAID-6 SC AL E 7x Fe b 21 www.endpoint.com -2 43 2, 20 09
  44. 44. The Data: http://moourl.com/fsperf SC AL E 7x Fe b 21 www.endpoint.com -2 44 2, 20 09
  45. 45. Confessions: • May be high standard deviation with results (don’t know yet!) •No filesystem tuning, all default create and mount options •No software raid comparison or lvm (volume management test) for 2.6.28 tests SC AL E 7x Fe b 21 www.endpoint.com -2 45 2, 20 09
  46. 46. Confessions: • Some xfs runs had to be repeated and some ext4 runs did not complete successfully • Only presenting throughput • Interested in system performance for a specific application, not code performance SC AL E 7x Fe b 21 www.endpoint.com -2 46 2, 20 09
  47. 47. Confessions: •I/O profiles don’t exhibit atime or partition alignment issues •Disk controller firmware not at the latest version in 2.6.25 tests •Software RAID is on top of 1 disk RAID 0 devices (HP SmartArray doesn’t have JBOD option) SC AL E 7x Fe b 21 www.endpoint.com -2 47 2, 20 09
  48. 48. AUDIENCE PARTICIPATION Higher throughput: ext2 or ext3? SC AL E 7x Fe b 21 www.endpoint.com -2 48 2, 20 09
  49. 49. www.endpoint.com 49 09 20 2, -2 21 b Fe 7x E AL SC
  50. 50. www.endpoint.com 50 09 20 2, -2 21 b Fe 7x E AL SC
  51. 51. www.endpoint.com 51 09 20 2, -2 21 b Fe 7x E AL SC
  52. 52. www.endpoint.com 52 09 20 2, -2 21 b Fe 7x E AL SC
  53. 53. Seek bundling/batching in ext3 is better? SC AL E 7x Fe b 21 www.endpoint.com -2 53 2, 20 09
  54. 54. What if we add a disk? SC AL E 7x Fe b 21 www.endpoint.com -2 54 2, 20 09
  55. 55. www.endpoint.com 55 09 20 2, -2 21 b Fe 7x E AL SC
  56. 56. www.endpoint.com 56 09 20 2, -2 21 b Fe 7x E AL SC
  57. 57. www.endpoint.com 57 09 20 2, -2 21 b Fe 7x E AL SC
  58. 58. www.endpoint.com 58 09 20 2, -2 21 b Fe 7x E AL SC
  59. 59. www.endpoint.com 59 09 20 2, -2 21 b Fe 7x E AL SC
  60. 60. AUDIENCE PARTICIPATION RAID 0 (stripe) versus RAID 1 (mirroring) performance? SC AL E 7x Fe b 21 www.endpoint.com -2 60 2, 20 09
  61. 61. www.endpoint.com 61 09 20 2, -2 21 b Fe 7x E AL SC
  62. 62. www.endpoint.com 62 09 20 2, -2 21 b Fe 7x E AL SC
  63. 63. www.endpoint.com 63 09 20 2, -2 21 b Fe 7x E AL SC
  64. 64. What happens when we: add disks to a RAID 0 (stripe) LUN? SC AL E 7x Fe b 21 www.endpoint.com -2 64 2, 20 09
  65. 65. www.endpoint.com 65 09 20 2, -2 21 b Fe 7x E AL SC
  66. 66. www.endpoint.com 66 09 20 2, -2 21 b Fe 7x E AL SC
  67. 67. www.endpoint.com 67 09 20 2, -2 21 b Fe 7x E AL SC
  68. 68. www.endpoint.com 68 09 20 2, -2 21 b Fe 7x E AL SC
  69. 69. Adding disks to a RAID 5 LUN SC AL E 7x Fe b 21 www.endpoint.com -2 69 2, 20 09
  70. 70. www.endpoint.com 70 09 20 2, -2 21 b Fe 7x E AL SC
  71. 71. www.endpoint.com 71 09 20 2, -2 21 b Fe 7x E AL SC
  72. 72. www.endpoint.com 72 09 20 2, -2 21 b Fe 7x E AL SC
  73. 73. Only have 4 disks? What should you do? SC AL E 7x Fe b 21 www.endpoint.com -2 73 2, 20 09
  74. 74. www.endpoint.com 74 09 20 2, -2 21 b Fe 7x E AL SC
  75. 75. www.endpoint.com 75 09 20 2, -2 21 b Fe 7x E AL SC
  76. 76. www.endpoint.com 76 09 20 2, -2 21 b Fe 7x E AL SC
  77. 77. www.endpoint.com 77 09 20 2, -2 21 b Fe 7x E AL SC
  78. 78. In most cases, RAID 5 out performs on sequential writes (xlog). Random writes is only an improvement on xfs and reiserfs. SC AL E 7x Fe b 21 www.endpoint.com -2 78 2, 20 09
  79. 79. Are software RAID and LVM are slow? SC AL E 7x Fe b 21 www.endpoint.com -2 79 2, 20 09
  80. 80. www.endpoint.com 80 09 20 2, -2 21 b Fe 7x E AL SC
  81. 81. www.endpoint.com 81 09 20 2, -2 21 b Fe 7x E AL SC
  82. 82. The Read-ahead buffer SC AL E 7x Fe b 21 www.endpoint.com -2 82 2, 20 09
  83. 83. AUDIENCE PARTICIPATION Readahead buffer: Default is 128 K What do you think it should be? SC AL E 7x Fe b 21 www.endpoint.com -2 83 2, 20 09
  84. 84. www.endpoint.com 84 09 20 2, -2 21 b Fe 7x E AL SC
  85. 85. And is there a cost to increasing the buffer that much? SC AL E 7x Fe b 21 www.endpoint.com -2 85 2, 20 09
  86. 86. www.endpoint.com 86 09 20 2, -2 21 b Fe 7x E AL SC
  87. 87. http://moourl.com/readaheadconfirm SC AL E 7x Fe b 21 www.endpoint.com -2 87 2, 20 09
  88. 88. Future Work •OLTP system characterization, sizing •Daily OLTP regression testing •More presentations •P5 - PostgreSQL Portland Performance Pad PRACTICE SC AL E 7x Fe b 21 www.endpoint.com -2 88 2, 20 09
  89. 89. MOAR Hardware? Thanks again, HP! MSA70, DL380 in 2009 ?? SC AL E 7x Fe b 21 www.endpoint.com -2 89 2, 20 09
  90. 90. Let’s recap... SC AL E 7x Fe b 21 www.endpoint.com -2 90 2, 20 09
  91. 91. “RAID5 is the worst choice for a database.” Fast for sequential writes in our tests. “LVM incurs too much overhead to use. Software RAID is slower.” For reads – throughput is about the same, but saw higher CPU. “Turning off 'atime' is a big performance gain.” Not in our tests. But, 2-3% for “free”. SC AL E 7x Fe b 21 www.endpoint.com -2 91 2, 20 09
  92. 92. “Journaling filesystems will have worse performance than non-journaling filesystems.” Turn the data journaling off on ext3, and you do see better performance, but there are edge cases and performance differences we could not explain. “Striping doubles performance.” Performance is better, but no where near double. Why? SC AL E 7x Fe b 21 www.endpoint.com -2 92 2, 20 09
  93. 93. “Your read-ahead buffer is big enough.” Your read-ahead buffer IS NOT big enough. Make it 8MB. And can we make that the default? SC AL E 7x Fe b 21 www.endpoint.com -2 93 2, 20 09
  94. 94. Thank you! Results: http://wiki.postgresql.org/wiki/ HP_ProLiant_DL380_G5_Tuning_Guide http://moourl.com/fsperf Selena Deckelmann selena@endpoint.com twitter: @selenamarie SC AL E 7x Fe b 21 www.endpoint.com -2 94 2, 20 09

×