The Linux operating system provides a number of file systems that can be used, as well as volume management and hardware or software RAID. We are running performance benchmarks for database tuning, and are curious if the file systems really behave like we expect them to, especially when used in conjunction with RAID or volume management. Are these file systems being used in manners for which they were designed? There is also more to file systems than how fast we can read to them or how fast we can write to them. How reliable is the file system, and how do be prove it? We have collected data and will have a server available for development during the conference.
Boost Fertility New Invention Ups Success Rates.pdf
What Assumptions Make: Filesystem I/O from a database perspective
1. What Assumptions Make:
Performance Testing with P4
Portland PostgreSQL Performance Pad
Selena Deckelmann
selena@endpoint.com
End Point Corporation
twitter: @selenamarie
18. Very Narrow Use Case:
A Relational Database
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
18
2,
20
09
19. Need for periodic testing.
(And we've got some
hardware!)
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
19
2,
20
09
20. ★Kernel differences
★FS patch-level differences
★Mount options
★mkfs options
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
20
2,
20
09
21. Focused on
THROUGHPUT
(Because that’s what people who
buy large systems look for)
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
21
2,
20
09
22. Later:
Response Time
Operations per second
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
22
2,
20
09
23. No, we will not
be testing ZFS.
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
23
2,
20
09
24. FS
BtrFS
(nope, not yet)
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
24
2,
20
09
25. What do we expect?
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
25
2,
20
09
26. Some conventional
wisdom:
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
26
2,
20
09
27. “RAID5 is the
worst choice
for a database.”
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
27
2,
20
09
28. “LVM incurs
too much overhead
to use.”
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
28
2,
20
09
29. “Striping doubles
performance.”
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
29
2,
20
09
30. “Turning off 'atime'
is a big
performance gain.”
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
30
2,
20
09
31. “Getting rid of atime
updates would give us
more everyday Linux
performance than all
the pagecache speedups
of the last 10 years,
_combined_.”
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
31
2,
20
09
32. “Journaling filesystems
(ext3) will have worse
performance than non-
journaling filesystems
(ext2).”
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
32
2,
20
09
33. “Your read-ahead
buffer
is big enough.”
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
33
2,
20
09
34. Now... on to the good stuff.
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
34
2,
20
09
37. Our machine:
HP ProLiant DL380G5
Smart Array p800
72GB 15,000 RPM SAS (up to 25 disks)
32GB RAM
Linux:
2.6.25-gentoo-r6
*New tests being run with 2.6.28
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
37
2,
20
09
38. Our machine:
Chosen because
of it’s low, low price.
Thank you, HP.
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
38
2,
20
09
39. Our tests:
fio
64 GB working set
8 threads
no fadvise
no direct i/o
8KB blocksize
I/O elevator: deadline
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
39
2,
20
09
40. Our stats:
sar
mpstat
iostat
vmstat
readprofile
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
40
2,
20
09
41. Our tests:
Chosen because of their
relevance to PostgreSQL
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
41
2,
20
09
42. Filesystems Tested:
ext2
ext3
jfs
xfs
reiserfs
ext4 (but having trouble)
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
42
2,
20
09
43. Disk configs tested:
Single disk
RAID-0
RAID-1
RAID-5
RAID-10
RAID-6
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
43
2,
20
09
44. The Data:
http://moourl.com/fsperf
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
44
2,
20
09
45. Confessions:
• May be high standard deviation with
results (don’t know yet!)
•No filesystem tuning, all default create
and mount options
•No software raid comparison or lvm
(volume management test) for 2.6.28
tests
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
45
2,
20
09
46. Confessions:
• Some xfs runs had to be repeated and
some ext4 runs did not complete
successfully
• Only presenting throughput
• Interested in system performance for a
specific application, not code
performance
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
46
2,
20
09
47. Confessions:
•I/O profiles don’t exhibit atime or
partition alignment issues
•Disk controller firmware not at the
latest version in 2.6.25 tests
•Software RAID is on top of 1 disk RAID 0
devices (HP SmartArray doesn’t have
JBOD option)
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
47
2,
20
09
48. AUDIENCE PARTICIPATION
Higher throughput:
ext2 or ext3?
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
48
2,
20
09
78. In most cases, RAID 5 out performs
on sequential writes (xlog).
Random writes is only an improvement
on xfs and reiserfs.
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
78
2,
20
09
79. Are software RAID
and LVM are slow?
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
79
2,
20
09
88. Future Work
•OLTP system characterization,
sizing
•Daily OLTP regression testing
•More presentations
•P5 - PostgreSQL Portland
Performance Pad PRACTICE
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
88
2,
20
09
89. MOAR Hardware?
Thanks again, HP!
MSA70, DL380 in 2009 ??
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
89
2,
20
09
91. “RAID5 is the worst choice for a
database.” Fast for sequential writes in
our tests.
“LVM incurs too much overhead to use.
Software RAID is slower.” For reads –
throughput is about the same, but saw
higher CPU.
“Turning off 'atime' is a big performance
gain.” Not in our tests. But, 2-3% for
“free”.
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
91
2,
20
09
92. “Journaling filesystems will have worse
performance than non-journaling
filesystems.” Turn the data journaling
off on ext3, and you do see better
performance, but there are edge cases
and performance differences we could
not explain.
“Striping doubles performance.”
Performance is better, but no where
near double. Why?
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
92
2,
20
09
93. “Your read-ahead buffer is big enough.”
Your read-ahead buffer IS NOT big
enough. Make it 8MB. And can we make
that the default?
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
93
2,
20
09
94. Thank you!
Results:
http://wiki.postgresql.org/wiki/
HP_ProLiant_DL380_G5_Tuning_Guide
http://moourl.com/fsperf
Selena Deckelmann
selena@endpoint.com
twitter: @selenamarie
SC
AL
E
7x
Fe
b
21
www.endpoint.com
-2
94
2,
20
09