The document discusses myths and truths about using NFS to store PostgreSQL databases. It begins by describing some common beliefs that NFS is fast and that databases should not be built on NFS due to reliability issues. It then provides details on NFS protocols, configuration options, and challenges for synchronization and reliability when used with PostgreSQL. The document presents results of benchmarks measuring file system performance, synchronous write times, and database commit rates under different NFS mount options to evaluate reliability versus performance. It aims to determine safe NFS configuration options for PostgreSQL that do not severely impact performance while sufficiently guaranteeing data integrity in the event of crashes.
1. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
PostgreSQL
&
NFS
Myths & Truths
www.2ndquadrant.com
2. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about NFS:
3. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about NFS:
IT’S FAST!
4. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about DBs based on NFS:
5. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about DBs based on NFS:
Don't do it!
6. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about DBs based on NFS:
Don't do it!
Don't do it!
7. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about DBs based on NFS:
Don't do it!
Don't do it!
Don't do it!
8. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
~$ whoami~$ whoami
Giuseppe Broccolo, PhDGiuseppe Broccolo, PhD
PostgreSQL & PostGIS consultantPostgreSQL & PostGIS consultant
@giubro gbroccolo7
giuseppe.broccolo@2ndquadrant.it
gbroccolo gemini__81
9. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A personal experience with NFS...A personal experience with NFS...
“The Athen's School” - Raffaello Sanzio (1509-1511), Vatican’s Museums
10. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Part 1Part 1
Network File System
general characteristics
11. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Network File SystemNetwork File System (Sun Microsystem(Sun Microsystem®®
, 1984), 1984)
• A protocol for distributed file system – v2, v3, v4
• servers export the drive, clients mounts the export locally
• Many clients, one server
• High performances through fast network
LAN
12. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
NFS v3 vs. NFS v2NFS v3 vs. NFS v2
Strength:
– NFS v3 (default since RHEL 5) adds some excellent features to NFS v2:
• Asynchronous communication through write and read caching
• TCP connection, but UDP protocol can be used if explicitely requested through
mount options
• NFS server connections from clients is managed by a deamon which perform
host authentication
• a deamon on server side manages file locking
13. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
NFS v3 vs. NFS v2NFS v3 vs. NFS v2
Weakness:
– NFS v3 still present some weakness
• the server is stateless
• authentication is weak, only host authentication is performed
• data transmitted in clear text
• NFS depends on portmapper service for dynamic port assignment
14. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
NFS v4 vs. NFS v3NFS v4 vs. NFS v3
Partial solutions:
– NFS v4 (default since RHEL 6) adds:
• remove client/server dependency on portmapper service, use single port (2049)
– easy to apply firewall rules to filter NFS traffic
• authentication and locking deamons included in the NFS protocol
• connections are always statefull
• UDP not even allowed for connection
15. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Transfer optimizationTransfer optimization
Block size:
– NFS requests are organised in several data blocks exchanged between client & server
– set block size to optimise transfer speed of NFS requests from client side:
• wsize/rsize: size of the chunks of transfered data
• NFS v2 theoretical limit: 8kB
• NFS v3 & v4 theoretical limit: depends on kernel version (up to 32kB÷64kB)
• Pay attention: too large block size produce odd effects specially during readings
– i.e. ls returns a not complete list of the content of a file system
16. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Network settingNetwork setting
Packet size & MTU:
– packets size can be optimised to transfer blocks within the network → jumbo frames
– routers could decrease the size of transfered packets
• use tracepath to determine if path’s MTU is different than network card’s MTU
– too large packets could get dropped!
• NFS requests have to be retransmitted → drop in performance
• use netstat & nfsstat to determine the amount of dropped packets
17. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Choice the protocolChoice the protocol
NFS relies on TCP or UDP protocols:
– rsize/wsize larger than network’s MTU causes IP packet fragmentation over UDP
• IP packet fragmentation and reassembly require CPU resources
• consider to use jumbo frames (high MTU values) in case of UDP in ~O(Gb/s) ethernet
• TCP automatically determine the proper IP packet size
– UDP is a stateless protocol / TCP is a statefull protocol
• UDP performs better than TCP in ideal networks
• TCP requires that just the dropped packets have to be retrasmitted, and not the entire NFS request like for
UDP
– TCP performs better than UDP in lossy networks
• in case of TCP, the exported volumes have to be remounted if NFS server crashes
18. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Main concerns about synchronisationMain concerns about synchronisation
Time synchronisation
– NFS does not synchronize time between client and server
– NFS v3 allows clients specifying the time when updating a file
• doesn’t help in cases where concurrent clients are used
– Use NTP!
19. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Main concerns about synchronisationMain concerns about synchronisation
File Locking
– NFS v2 does not natively support file locking
• it’s prone to corruptions in case of concurrent accesses
– locking deamon in NFS v3/v4, similar to Unix file locking
• performance drop
– server is stateless for NFS v2/v3
• locks are lost in case of clients failure, if locking deamon is not used
• if locking deamon is used:
– if client is rebooted: server is notified and locks are released
– if clients is not rebooted: locks are released if the deamon is restarted
20. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Main concerns about synchronisationMain concerns about synchronisation
Delayed write cache
– clients cache small writes, to minimise LAN traffic
– clients can keep different file’s views for several seconds
– accesses from different concurrent clients allow corruptions
– write cache can be disabled, with large impact to performance
21. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Main concerns about synchronisationMain concerns about synchronisation
Read (metadata) cache
– NFS clients cache file attributes
• last file access, last inode change, last file modification
– NFS server may show not-updated attibutes
– any program that relies on file attributes may not work
– read cache can be disabled, impacting read performance
22. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Part 2Part 2
NFS & PostgreSQL
23. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
NFS & PostgreSQLNFS & PostgreSQL
LAN
www.postgresql.org/docs/9.6/static/creating-cluster.html#CREATING-CLUSTER-NFS
• if NFS client/server implementation does not
provide standard file system semantics, this can
cause reliability problems
• PostgreSQL advices:
– NFS mount options
• avoid soft-mounting: hard
– General mount options
• avoid asynchronous writes: sync
PostgreSQL assumes NFS behaves exactly like locally-connected drives
24. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
sources of corruptionssources of corruptions
hard-mount option:
– NFS calls must be retried indefinitely
• both data and WAL entries sequentiality may not be
preserved
write cache:
– WALs have to be flushed as a database action is committed
• many processes are involved during WALs flush on disk
→ several NFS clients
→ WAL entries sequentiality may not be preserved
25. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
(other) sources of corruptions(other) sources of corruptions
attribute cache:
many processes are involved during WALs flush on disk
→ several NFS clients
→ files attributes may not have consistent views
26. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Enhance reliabilityEnhance reliability
protocol:
– TCP
• optimises the amount of requested packets in case of lossy networks
• jumbo frames could be avoided (also for ~O(Gb/s) ethernet)
block size:
– Many processes contribute in I/O load
→ optimise data transfer
→ reduce # of client calls
27. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Synchronous writesSynchronous writes
sync vs. fsync=on:
– NFS v2:
• if sync is specified, the server will not complete a request until both data/metadata are
written to the disk
– NFS v3 / v4:
• if sync is specified, the server will complete a request returning the status of the write:
– NFS_FILE_SYNC, NFS_DATA_SYNC, NFS_UNSTABLE
• data is effectively forced to be flushed on disk once a sync method system call is issued
– be careful to set fsync=on
28. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability vs. PerformanceReliability vs. Performance
• NFS exports must be mounted with safe options
• Are these option fine for a database?
– is the performance deeply impacted?
– is data safely guaranteed in case of crash?
29. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Part 3Part 3
NFS & PostgreSQL
performance & reliability
benchmarks
30. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
The customerThe customer
Ethernet
10Gb/s
No routers
2 CPUs x 8GB RAM
RHEL 6
Kernel v. 2.6.32
PostgreSQL 9.5.3
NetApp
FAS 8080 Full Flash
Clustered Mode
volumes
mounted
via
NFS v3
NO
paravirtualised
network
driver
physical
host
storage
31. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
What has been measuredWhat has been measured
• file system speeds – bonnie++
– sequential reads (blocks reading)
– sequential writes (block writing & block rewriting)
– seek rate
• test for sync’ed writes - pg_test_fsync
– µs needed by a single process to flush 8kB on disk
• database commit rate – pgbench
– SELECT, INSERT, TPC-B
– single & multiclient operations
32. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A first configurationA first configuration
• perform benchmarks with different mount options:
– rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
33. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A first configurationA first configuration
• perform benchmarks with different mount options:
– rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
file system I/O speed:
– reads: ~50MB/s writes: ~12MB/s
34. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A first configurationA first configuration
• perform benchmarks with different mount options:
– rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
file system I/O speed:
– reads: ~50MB/s writes: ~12MB/s
file system I/O speed with ac:
– reads: ~700MB/s writes: ~16MB/s
35. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A first configurationA first configuration
• perform benchmarks with different mount options:
– rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
file system I/O speed:
– reads: ~50MB/s writes: ~12MB/s
file system I/O speed with ac:
– reads: ~700MB/s writes: ~16MB/s
forced flush on disk:
– sync’ed 8kB writes / non-sync’ed 8kB writes = ~40%
36. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
DB performanceDB performance
• perform benchmarks with different mount options:
– rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
DB commit rates for different operations:
– INSERT:
• noac/ac ~ 1 (single client & multi client)
– SELECT:
• noac ~ 16X ac
– TPC-B:
• noac ~ 2X ac
37. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A second configurationA second configuration
• increase block & packet size (using jumbo frames), and check the writes:
– rw,hard,bg,timeo=600,proto=tcp,sync,wsize=65536,nointr, noatime,noac
– rw,hard,bg,timeo=600,proto=tcp,sync,wsize=65536,nointr, noatime,ac
file system I/O speed:
– writes: ~25MB/s
[root@pgsql] ~# netstat -a | grep X.X.X.X
tcp 0 25020 Y.Y.Y.Y:ftps-data X.X.X.X:nfs ESTABLISHED
forced flush on disk:
– sync’ed 8kB writes are almost the same
→ no dependence from write cache
38. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• try different crash scenarios under high concurrency load, check if the
instance recovers properly, then execute a VACUUM FULL:
– kill -9 to postmaster
– forced reboot throughkernel SysReq:
• echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
– power off of the VM from the VMWare vSphere®
remote panel
– kill the TCP/IP connections through tcpkill
[root@pgsql] ~# tcpkill host X.X.X.X
tcpkill: listening on eth0 [host X.X.X.X]
39. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• try different crash scenarios under high concurrency load, check if the
instance recovers properly, then execute a VACUUM FULL:
– kill -9 to postmaster
– forced reboot throughkernel SysReq:
• echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
– power off of the VM from the VMWare vSphere®
remote panel
– kill the TCP/IP connections through tcpkill
[root@pgsql] ~# tcpkill host X.X.X.X
tcpkill: listening on eth0 [host X.X.X.X]
40. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• try different crash scenarios under high concurrency load, check if the
instance recovers properly, then execute a VACUUM FULL:
– kill -9 to postmaster
– forced reboot throughkernel SysReq:
• echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
– power off of the VM from the VMWare vSphere®
remote panel
– kill the TCP/IP connections through tcpkill
[root@pgsql] ~# tcpkill host X.X.X.X
tcpkill: listening on eth0 [host X.X.X.X]
41. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• try different crash scenarios under high concurrency load, check if the
instance recovers properly, then execute a VACUUM FULL:
– kill -9 to postmaster
– forced reboot throughkernel SysReq:
• echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
– power off of the VM from the VMWare vSphere®
remote panel
– kill the TCP/IP connections through tcpkill
[root@pgsql] ~# tcpkill host X.X.X.X
tcpkill: listening on eth0 [host X.X.X.X]
42. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• try different crash scenarios under high concurrency load, check if the
instance recovers properly, then execute a VACUUM FULL:
– kill -9 to postmaster
– forced reboot throughkernel SysReq:
• echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
– power off of the VM from the VMWare vSphere®
remote panel
– kill the TCP/IP connections through tcpkill
[root@pgsql] ~# tcpkill host X.X.X.X
tcpkill: listening on eth0 [host X.X.X.X]
43. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• T. Vondra – test the persistence of recycled WALs in ext4
– 56583BDD.9060302@2ndquadrant.com#56583BDD.9060302@2ndquadrant.com
• update attributes of recycled WALs → flush them on disk through fdatasync
• fdatasync does not force the flush of metadata → the update may get lost after a crash
• logged changes are contained in file “in the future” → data loss!
– github.com/2ndQuadrant/ext4-data-loss
• INSERT/UPDATE new records in parallel and synchronously on the db and on a file
• simulate a crash → compare db & file contents after the crash recovery
44. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• Check NFS behaviour for recycled WALs (file attributes caching)
– noac
•
– ac
•
45. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• Check NFS behaviour for recycled WALs (file attributes caching)
– noac
• several tests, no data loss!
– ac
•
46. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• Check NFS behaviour for recycled WALs (file attributes caching)
– noac
• several tests, no data loss!
– ac
• several tests, no data loss!
47. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• Check NFS behaviour for recycled WALs (file attributes caching)
– noac
• several tests, no data loss!
– ac
• several tests, no data loss!
48. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
All tests passed...but this does not ensure that it is totally safe!
49. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
The reaction of the customerThe reaction of the customer
51. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
The reaction of the consultantThe reaction of the consultant
52. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Enhance reliability for PostgreSQL DBsEnhance reliability for PostgreSQL DBs
• allow page checksums (9.3+): initdb --data-checksums ...
– execute CRC32 calculation for each 8kB data block
• a checksum failure means that the data block is corrupted
– force wal_log_hint=true
• write the entire 8kB page to the WAL, even for hint bits modification
– take into account the impact to the performance:
• R: checksum extra-calculation every 8kB
• W: increase the amount of information logged into WALs
53. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Enhance reliability for PostgreSQL DBsEnhance reliability for PostgreSQL DBs
• allow page checksums (9.3+): initdb --data-checksums ...
– execute CRC32 calculation for each 8kB data block
• a checksum failure means that the data block is corrupted
– force wal_log_hint=true
• write the entire 8kB page to the WAL, even for hint bits modification
– take into account the impact to the performance:
• R: checksum extra-calculation every 8kB
• W: increase the amount of information logged into WALs
• BUG (9.3+)! FSM & VM truncation not persisted when wal_log_hints=true :
– https://wiki.postgresql.org/wiki/Free_Space_Map_Problems (P. Deolasee, H. Linnakangas)
54. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Enhance performance for PostgreSQL DBsEnhance performance for PostgreSQL DBs
• sync’ed writes are slow, even if NFS caching is enabled
• if possible, consider asynchronous commit:
– let the server return success as soon as the transaction is logically completed
• synchronous_commit=off – it can be set per user/session
• WAL entries will be flushed in a second moment, but not later than 3X wal_writer_delay
• in case of crashes, the DB can be recovered in a consistent state, but there could be a
window of data loss
55. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
33rdrd
cfg:cfg: rw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointrrw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr
file system I/O speed:
– reads: ~50MB/s → 700MB/s
– writes: ~12MB/s → 700MB/s
synchronous_commit=on:
– INSERT:
• noac/ac ~ 1 (single & multi client)
• 3rd
vs noac:+10% (single & multi client)
– SELECT:
• noac ~ 16X ac
• 3rd
~ 2X ac
DB commit rate and SELECT rate:
synchronous_commit=off:
– INSERT:
• noac/ac ~ 1 (single & multi client)
• 3rd
~ 8X noac (single & multi client)
– SELECT: same as synchronous_commit=on
– TPC-B:
• noac ~ 2X ac → 3rd
~ 1.8X ac
56. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
33rdrd
cfg:cfg: rw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointrrw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr
file system I/O speed:
– reads: ~50MB/s → 700MB/s
– writes: ~12MB/s → 700MB/s
synchronous_commit=on:
– INSERT:
• noac/ac ~ 1 (single & multi client)
• 3rd
vs noac:+10% (single & multi client)
– SELECT:
• noac ~ 16X ac
• 3rd
~ 2X ac
DB commit rate and SELECT rate:
synchronous_commit=off:
– INSERT:
• noac/ac ~ 1 (single & multi client)
• 3rd
~ 8X noac (single & multi client)
– SELECT: same as synchronous_commit=on
– TPC-B:
• noac ~ 2X ac → 3rd
~ 1.8X ac
57. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
ConclusionsConclusions
• NFS is not natively thought for reliability purposes
– the protocol is thought to enhance the performance
– NFS v4 is preferable
• PostgreSQL allows to adopt many countermeasures
– it is at least able to promptly detect data corruptions
• PostgreSQL can be used with NFS
– ready to accept minimal data loss