Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

PostgreSQL on EXT4, XFS, BTRFS and ZFS

34,541 views

Published on

A comparison of how PostgreSQL performs on current Linux file systems - ext4, XFS, BTRFS and ZFS, with pgbench and (a subset of) TPC-DS.

Published in: Software
  • Why do people keep mounting with discard? I'm surprised they even got this good of performance. Just run trim in cron weekly at most.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Tuning on ZFS can be: A) Add L2ARC and ZIL; B) Enable compression
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Thanks for sharing! CFQ or Deadline?
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • @Jeff R. Allen I don't think the talk was recorded. I don't think ZFS can't handle large data bases - if you actually need those advanced features (snapshotting, ...) you can implement some of them even with the traditional file systems on top of LVM, for example. But it will definitely impact performance, so the "ZFS is slower" may not be true anymore as CoW have snapshotting "baked in".
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • This is really cool work. Thanks for posting it. I would love to watch a replay of you talk, is it available? It would also be helpful to get feedback on this question: "So ZFS is slower, but are the benefits for easy point in time recovery, pool expansion, etc so compelling that there are good use cases for Postgres on ZFS anyway?" It looks like one of them is for small databases, where FS performance is not a determining factor.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

PostgreSQL on EXT4, XFS, BTRFS and ZFS

  1. 1. PostgreSQL on EXT3/4, XFS, BTRFS and ZFS comparing modern (Linux) file systems Tomas Vondra <tomas@2ndquadrant.com>
  2. 2. Linux file systems ● plenty of choices, with different – goals, features, tuning options – maturity level, reliability – ext3/4, XFS – traditional, design from the 90s – improving over time, reasonably “modern” ● BTRFS, ZFS – next-generation, new architecture / design ● other (not included in this talk) – log-organized file systems, distributed, clustered, ...
  3. 3. EXT3, EXT4, XFS
  4. 4. EXT3, EXT4, XFS - history ● ext3 (2001) / ext4 (2008) – evolution of original Linux filesystem (ext, ext2, ...) – continuous improvements / fixes ● XFS (2002) – originally from SGI Irix 5.3 (1994) – 2000 released under GPL – 2002 merged into 2.5.36 ● both are – reliable journaling file systems – proven by time on many deployments
  5. 5. EXT3, EXT4, XFS - features ● traditional design with journal ● not handling – multiple devices – volume management – snapshots – ... ● need additional layers for those things – hardware RAID – software RAID (dm) – LVM / LVM2
  6. 6. EXT3, EXT4, XFS - evolution ● conceived in times of rotational storage – mostly work with SSD – stop-gap for future storage (NVRAM, ...) ● evolution, not a revolution (mostly) – fixing bugs (some real, some imaginary) – adding features (e.g. TRIM, barriers, ...) – scalability improvements (metadata, ...) – be careful when reading old articles / benchmarks – be vary of anecdotal evidence (without context) – synthetic benchmarks are misleading
  7. 7. EXT3, EXT4, XFS - sources ● Linux Filesystems: Where did they come from? (Dave Chinner @ linux.conf.au 2014) https://www.youtube.com/watch?v=SMcVdZk7wV8 ● Ted Ts'o on the ext4 Filesystem (Ted Ts'o, NYLUG, 2013) https://www.youtube.com/watch?v=2mYDFr5T4tY ● XFS: There and Back … and There Again? (Dave Chinner @ Vault 2015) https://lwn.net/Articles/638546/ ● XFS: Recent and Future Adventures in Filesystem Scalability (Dave Chinner, linux.conf.au 2012) https://www.youtube.com/watch?v=FegjLbCnoBw ● XFS: the filesystem of the future? (Jonathan Corbet, Dave Chinner, LWN, 2012) http://lwn.net/Articles/476263/
  8. 8. BTRFS, ZFS
  9. 9. BTRFS, ZFS - goals ● ideas – integrate the layers – design for commodity hardware (expect failures) – design for huge data volumes ● so that we get … – flexible management – built-in snapshotting – compression, deduplication – checksums – ...
  10. 10. BTRFS, ZFS - history ● BTRFS – merged in 2009, but considered “experimental” – on-disk format “stable” (1.0) – some claim it’s “stable” but I doubt that … – (What are the criteria for filesystem to be “stable”?) ● ZFS – originally from Solaris, but got Oracled :-( – today a bit fragmented development – available on other BSD systems (FreeBSD) – “ZFS on Linux” project (CDDL vs. GPL)
  11. 11. Tuning options
  12. 12. Generic tuning options ● TRIM (discard) – enable / disable TRIM on SSDs – impacts garbage collection / wear leveling ● write barriers – prevent disk from optimizing order of writes – still may loose data, but no filesystem corruption – write cache + battery => disable barriers ● SSD alignment – alignment on SSDs matter (pages, blocks, …) – not dedicated tuning options (can use stripe unit / width)
  13. 13. BTRFS tuning options ● nodatacow (BTRFS) – disable copy on write – still can do snapshots (will do necessary COW) – disables checksums (needs full COW) ● zfs_arc_max – limit the size of ARC cache – should be released automatically, but ...
  14. 14. BTRFS tuning options ● recordsize=8kB – match the fs page with PostgreSQL page ● ashift=13 (8kB) – align the writes to SSD pages ● primarycache=metadata – prevent double buffering (shared buffers) http://open-zfs.org/wiki/Performance_tuning
  15. 15. file systems
  16. 16. ● ext3 (default) ● default ● ext4 ● default ● discard, nobarrier, stripe-width ● xfs ● default ● LVM ● LVM + snapshot ● discard, nobarrier ● discard, nobarrier, agcount, sunit/swidth
  17. 17. ● btrfs ● default ● nodatacow ● nodiscard (+fstrim) ● zfs ● default ● recordsize=8k, ashift=13, primarycache=metadata (open-zfs) ● recordsize=8k, ashift=13, max_arc_size=5GB (custom)
  18. 18. benchmarks
  19. 19. pgbench (TPC-B) ● transactional benchmark – small queries (access by PK, ...) ● modes – read-only – read-write ● scales – small (~200MB) – medium (~50% RAM) – large (~200% RAM)
  20. 20. TPC-DS ● warehouse, analytical – large amounts of data – queries processing a lot of data ● complex queries – aggregations – joins – CTEs – … ● successor to TPC-H – more elaborate / realistic
  21. 21. System ● PostgreSQL 9.4.1 ● Gentoo with kernel 3.17 ● CPU: Intel i5-2500k – 4 cores @ 3.3 GHz (3.7GHz) – 6MB cache – 2011-2013 ● 8GB RAM (DDR3 1333) ● SSD Intel S3500 100GB (SATA)
  22. 22. pgbench read-only
  23. 23. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 10000 20000 30000 40000 50000 60000 pgbench / small (150MB) / read-only transactions per second
  24. 24. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 10000 20000 30000 40000 50000 60000 pgbench / medium (50% RAM) / read-only transactions per second
  25. 25. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-lvm-snapshot ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 pgbench / large (200% RAM) / read-only transactions per second
  26. 26. pgbench read-write
  27. 27. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 1000 2000 3000 4000 5000 6000 7000 8000 pgbench / small (150MB) / read-write transactions per second
  28. 28. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 1000 2000 3000 4000 5000 6000 pgbench / medium (50% RAM) / read-write transactions per second
  29. 29. btrfs btrfs-nodatacow btrfs-nodiscard-fstrim ext3 ext4 ext4-discard-lvm-snapshot ext4-discard-nobarrier-stripe xfs xfs-discard-lvm-snapshot xfs-discard-nobarrier xfs-lvm xfs-tuned-agcount-su-sw zfs zfs-tuned zfs-tuned-2 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 pgbench / large (200% RAM) / read-write transactions per second
  30. 30. performance variability
  31. 31. EXT / XFS conclusions EXT4 ● good “default” choice ● disable barriers (with protected write cache) ● tune alignment to match the SSD ● very “smooth” results XFS ● does not outperform ext4 (in this test) ● not much worse, if properly tuned ● disable write barriers, tune alignment to SSD ● more anomalies than ext4 (sudden performance drops, ...)
  32. 32. BTRFS & ZFS
  33. 33. TPC-DS
  34. 34. mkfs / mount options ● ext4, xfs – mkfs.ext4 ­E stripe­width=256 /dev/sda1 – mkfs.xfs ­d su=512k,sw=1 ­l su=512k ­f /dev/sda1 – mount: defaults,noatime,discard,nobarrier ● btrfs – mkfs.btrfs ­l 8192 ­L pgdata /dev/sda1 – mount: defaults,noatime,ssd,discard,nobarrier  [compress=lzo] ● zfs – zpool create pgpool /dev/sda1 – zfs create pgpool/pgdata – zfs set recordsize=8k pgpool/pgdata – zfs set atime=off pgpool/pgdata
  35. 35. ext4 xfs btrfs btrfs (lzo) zfs zfs (lz4) 0 1000 2000 3000 4000 5000 6000 TPC-DS load duration on EXT4, XFS, BTRFS and ZFS data indexes duration[seconds]
  36. 36. ext4 xfs btrfs btrfs lzo zfs zfs (lz4) 0 100 200 300 400 500 600 700 TPC-DS query performance EXT4, XFS, BTRFS and ZFS duration[seconds]
  37. 37. ext4 xfs btrfs btrfs lzo zfs zfs (lz4) 0 10 20 30 40 50 60 70 TPC-DS space used on EXT4, XFS, BTRFS and ZFS size[GB]
  38. 38. TPC-DS summary ● EXT4, XFS, BTRFS – about the same performance ● compression is nice – uncompressed: 60GB – compressed: ~30GB ● mostly storage capacity, queries not faster ● ZFS much slower :-(

×