A Deep Dive into Secure Product Development Frameworks.pdf
PostgreSQL on EXT4, XFS, BTRFS and ZFS
1. PostgreSQL on EXT3/4, XFS,
BTRFS and ZFS
comparing modern (Linux) file systems
Tomas Vondra <tomas@2ndquadrant.com>
2. Linux file systems
● plenty of choices, with different
– goals, features, tuning options
– maturity level, reliability
– ext3/4, XFS
– traditional, design from the 90s
– improving over time, reasonably “modern”
● BTRFS, ZFS
– next-generation, new architecture / design
● other (not included in this talk)
– log-organized file systems, distributed, clustered, ...
4. EXT3, EXT4, XFS - history
● ext3 (2001) / ext4 (2008)
– evolution of original Linux filesystem (ext, ext2, ...)
– continuous improvements / fixes
● XFS (2002)
– originally from SGI Irix 5.3 (1994)
– 2000 released under GPL
– 2002 merged into 2.5.36
● both are
– reliable journaling file systems
– proven by time on many deployments
5. EXT3, EXT4, XFS - features
● traditional design with journal
● not handling
– multiple devices
– volume management
– snapshots
– ...
● need additional layers for those things
– hardware RAID
– software RAID (dm)
– LVM / LVM2
6. EXT3, EXT4, XFS - evolution
● conceived in times of rotational storage
– mostly work with SSD
– stop-gap for future storage (NVRAM, ...)
● evolution, not a revolution (mostly)
– fixing bugs (some real, some imaginary)
– adding features (e.g. TRIM, barriers, ...)
– scalability improvements (metadata, ...)
– be careful when reading old articles / benchmarks
– be vary of anecdotal evidence (without context)
– synthetic benchmarks are misleading
7. EXT3, EXT4, XFS - sources
● Linux Filesystems: Where did they come from?
(Dave Chinner @ linux.conf.au 2014)
https://www.youtube.com/watch?v=SMcVdZk7wV8
● Ted Ts'o on the ext4 Filesystem
(Ted Ts'o, NYLUG, 2013)
https://www.youtube.com/watch?v=2mYDFr5T4tY
● XFS: There and Back … and There Again?
(Dave Chinner @ Vault 2015)
https://lwn.net/Articles/638546/
● XFS: Recent and Future Adventures in Filesystem Scalability
(Dave Chinner, linux.conf.au 2012)
https://www.youtube.com/watch?v=FegjLbCnoBw
● XFS: the filesystem of the future?
(Jonathan Corbet, Dave Chinner, LWN, 2012)
http://lwn.net/Articles/476263/
9. BTRFS, ZFS - goals
● ideas
– integrate the layers
– design for commodity hardware (expect failures)
– design for huge data volumes
● so that we get …
– flexible management
– built-in snapshotting
– compression, deduplication
– checksums
– ...
10. BTRFS, ZFS - history
● BTRFS
– merged in 2009, but considered “experimental”
– on-disk format “stable” (1.0)
– some claim it’s “stable” but I doubt that …
– (What are the criteria for filesystem to be “stable”?)
● ZFS
– originally from Solaris, but got Oracled :-(
– today a bit fragmented development
– available on other BSD systems (FreeBSD)
– “ZFS on Linux” project (CDDL vs. GPL)
12. Generic tuning options
● TRIM (discard)
– enable / disable TRIM on SSDs
– impacts garbage collection / wear leveling
● write barriers
– prevent disk from optimizing order of writes
– still may loose data, but no filesystem corruption
– write cache + battery => disable barriers
● SSD alignment
– alignment on SSDs matter (pages, blocks, …)
– not dedicated tuning options (can use stripe unit / width)
13. BTRFS tuning options
● nodatacow (BTRFS)
– disable copy on write
– still can do snapshots (will do necessary COW)
– disables checksums (needs full COW)
● zfs_arc_max
– limit the size of ARC cache
– should be released automatically, but ...
14. BTRFS tuning options
● recordsize=8kB
– match the fs page with PostgreSQL page
● ashift=13 (8kB)
– align the writes to SSD pages
● primarycache=metadata
– prevent double buffering (shared buffers)
http://open-zfs.org/wiki/Performance_tuning
19. pgbench (TPC-B)
● transactional benchmark
– small queries (access by PK, ...)
● modes
– read-only
– read-write
● scales
– small (~200MB)
– medium (~50% RAM)
– large (~200% RAM)
20. TPC-DS
● warehouse, analytical
– large amounts of data
– queries processing a lot of data
● complex queries
– aggregations
– joins
– CTEs
– …
● successor to TPC-H
– more elaborate / realistic
39. EXT / XFS conclusions
EXT4
●
good “default” choice
●
disable barriers (with protected write cache)
●
tune alignment to match the SSD
●
very “smooth” results
XFS
●
does not outperform ext4 (in this test)
●
not much worse, if properly tuned
● disable write barriers, tune alignment to SSD
●
more anomalies than ext4 (sudden performance drops, ...)