SlideShare a Scribd company logo
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
PostgreSQL
&
NFS
Myths & Truths
www.2ndquadrant.com
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about NFS:
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about NFS:
IT’S FAST!
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about DBs based on NFS:
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about DBs based on NFS:
Don't do it!
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about DBs based on NFS:
Don't do it!
Don't do it!
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Myths & TruthsMyths & Truths
• What people say about DBs based on NFS:
Don't do it!
Don't do it!
Don't do it!
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
~$ whoami~$ whoami
Giuseppe Broccolo, PhDGiuseppe Broccolo, PhD
PostgreSQL & PostGIS consultantPostgreSQL & PostGIS consultant
@giubro gbroccolo7
giuseppe.broccolo@2ndquadrant.it
gbroccolo gemini__81
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A personal experience with NFS...A personal experience with NFS...
“The Athen's School” - Raffaello Sanzio (1509-1511), Vatican’s Museums
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Part 1Part 1
Network File System
general characteristics
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Network File SystemNetwork File System (Sun Microsystem(Sun Microsystem®®
, 1984), 1984)
• A protocol for distributed file system – v2, v3, v4
• servers export the drive, clients mounts the export locally
• Many clients, one server
• High performances through fast network
LAN
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
NFS v3 vs. NFS v2NFS v3 vs. NFS v2
Strength:
– NFS v3 (default since RHEL 5) adds some excellent features to NFS v2:
• Asynchronous communication through write and read caching
• TCP connection, but UDP protocol can be used if explicitely requested through
mount options
• NFS server connections from clients is managed by a deamon which perform
host authentication
• a deamon on server side manages file locking
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
NFS v3 vs. NFS v2NFS v3 vs. NFS v2
Weakness:
– NFS v3 still present some weakness
• the server is stateless
• authentication is weak, only host authentication is performed
• data transmitted in clear text
• NFS depends on portmapper service for dynamic port assignment
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
NFS v4 vs. NFS v3NFS v4 vs. NFS v3
Partial solutions:
– NFS v4 (default since RHEL 6) adds:
• remove client/server dependency on portmapper service, use single port (2049)
– easy to apply firewall rules  to filter NFS traffic
• authentication and locking deamons included in the NFS protocol
• connections are always  statefull
• UDP not even allowed for connection
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Transfer optimizationTransfer optimization
Block size:
– NFS requests are organised in several data blocks exchanged between client & server
– set block size to optimise transfer speed of NFS requests from client side:
• wsize/rsize: size of the chunks of transfered data
• NFS v2 theoretical limit: 8kB
• NFS v3 & v4 theoretical limit: depends on kernel version (up to 32kB÷64kB)
• Pay attention: too large block size produce odd effects specially during readings
– i.e. ls returns a not complete list of the content of a file system
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Network settingNetwork setting
Packet size & MTU:
– packets size can be optimised to transfer blocks within the network → jumbo frames
– routers could decrease the size of transfered packets
• use tracepath to determine if path’s MTU is different than network card’s MTU
– too large packets could get dropped!
• NFS requests have to be retransmitted → drop in performance
• use netstat & nfsstat to determine the amount of dropped packets
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Choice the protocolChoice the protocol
NFS relies on TCP or UDP protocols:
– rsize/wsize larger than network’s MTU causes IP packet fragmentation over UDP
• IP packet fragmentation and reassembly require CPU resources
• consider to use jumbo frames (high MTU values) in case of UDP in ~O(Gb/s) ethernet
• TCP automatically determine the proper IP packet size
– UDP is a stateless protocol / TCP is a statefull protocol
• UDP performs better than TCP in ideal networks
• TCP requires that just the dropped packets have to be retrasmitted, and not the entire NFS request like for
UDP
– TCP performs better than UDP in lossy networks
• in case of TCP, the exported volumes have to be remounted if NFS server crashes
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Main concerns about synchronisationMain concerns about synchronisation
Time synchronisation
– NFS does not synchronize time between client and server
– NFS v3 allows clients specifying the time when updating a file
• doesn’t help in cases where concurrent clients are used
– Use NTP!
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Main concerns about synchronisationMain concerns about synchronisation
File Locking
– NFS v2 does not natively support file locking
• it’s prone to corruptions in case of concurrent accesses
– locking deamon in NFS v3/v4, similar to Unix file locking
• performance drop
– server is stateless for NFS v2/v3
• locks are lost in case of clients failure, if locking deamon is not used
• if locking deamon is used:
– if client is rebooted: server is notified and locks are released
– if clients is not rebooted: locks are released if the deamon is restarted
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Main concerns about synchronisationMain concerns about synchronisation
Delayed write cache
– clients cache small writes, to minimise LAN traffic
– clients can keep different file’s views for several seconds
– accesses from different concurrent clients allow corruptions
– write cache can be disabled, with large impact to performance
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Main concerns about synchronisationMain concerns about synchronisation
Read (metadata) cache
– NFS clients cache file attributes
• last file access, last inode change, last file modification
– NFS server may show not-updated attibutes
– any program that relies on file attributes may not work
– read cache can be disabled, impacting read performance
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Part 2Part 2
NFS & PostgreSQL
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
NFS & PostgreSQLNFS & PostgreSQL
LAN
www.postgresql.org/docs/9.6/static/creating-cluster.html#CREATING-CLUSTER-NFS
• if NFS client/server implementation does not
provide standard file system semantics, this can
cause reliability problems
• PostgreSQL advices:
– NFS mount options
• avoid soft-mounting: hard
– General mount options
• avoid asynchronous writes: sync
PostgreSQL assumes NFS behaves exactly like locally-connected drives
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
sources of corruptionssources of corruptions
hard-mount option:
– NFS calls must be retried indefinitely
• both data and WAL entries sequentiality may not be
preserved
write cache:
– WALs have to be flushed as a database action is committed
• many processes are involved during WALs flush on disk
→ several NFS clients
→ WAL entries sequentiality may not be preserved
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
(other) sources of corruptions(other) sources of corruptions
attribute cache:
many processes are involved during WALs flush on disk
→ several NFS clients
→ files attributes may not have consistent views
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Enhance reliabilityEnhance reliability
protocol:
– TCP
• optimises the amount of requested packets in case of lossy networks
• jumbo frames could be avoided (also for ~O(Gb/s) ethernet)
block size:
– Many processes contribute in I/O load
→ optimise data transfer
→ reduce # of client calls
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Synchronous writesSynchronous writes
sync vs. fsync=on:
– NFS v2:
• if sync is specified, the server will not complete a request until both data/metadata are
written to the disk
– NFS v3 / v4:
• if sync is specified, the server will complete a request returning the status of the write:
– NFS_FILE_SYNC, NFS_DATA_SYNC, NFS_UNSTABLE
• data is effectively forced to be flushed on disk once a sync method system call is issued
– be careful to set fsync=on
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability vs. PerformanceReliability vs. Performance
• NFS exports must be mounted with safe options
• Are these option fine for a database?
– is the performance deeply impacted?
– is data safely guaranteed in case of crash?
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Part 3Part 3
NFS & PostgreSQL
performance & reliability
benchmarks
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
The customerThe customer
Ethernet
10Gb/s
No routers
2 CPUs x 8GB RAM
RHEL 6
Kernel v. 2.6.32
PostgreSQL 9.5.3
NetApp
FAS 8080 Full Flash
Clustered Mode
volumes
mounted
via
NFS v3
NO
paravirtualised
network
driver
physical
host
storage
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
What has been measuredWhat has been measured
• file system speeds – bonnie++
– sequential reads (blocks reading)
– sequential writes (block writing & block rewriting)
– seek rate
• test for sync’ed writes - pg_test_fsync
– µs needed by a single process to flush 8kB on disk
• database commit rate – pgbench
– SELECT, INSERT, TPC-B
– single & multiclient operations
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A first configurationA first configuration
• perform benchmarks with different mount options:
– rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A first configurationA first configuration
• perform benchmarks with different mount options:
– rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
file system I/O speed:
– reads: ~50MB/s writes: ~12MB/s
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A first configurationA first configuration
• perform benchmarks with different mount options:
– rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
file system I/O speed:
– reads: ~50MB/s writes: ~12MB/s
file system I/O speed with ac:
– reads: ~700MB/s writes: ~16MB/s
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A first configurationA first configuration
• perform benchmarks with different mount options:
– rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
file system I/O speed:
– reads: ~50MB/s writes: ~12MB/s
file system I/O speed with ac:
– reads: ~700MB/s writes: ~16MB/s
forced flush on disk:
– sync’ed 8kB writes / non-sync’ed 8kB writes = ~40%
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
DB performanceDB performance
• perform benchmarks with different mount options:
– rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
DB commit rates for different operations:
– INSERT:
• noac/ac ~ 1 (single client & multi client)
– SELECT:
• noac ~ 16X ac
– TPC-B:
• noac ~ 2X ac
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
A second configurationA second configuration
• increase block & packet size (using jumbo frames), and check the writes:
– rw,hard,bg,timeo=600,proto=tcp,sync,wsize=65536,nointr, noatime,noac
– rw,hard,bg,timeo=600,proto=tcp,sync,wsize=65536,nointr, noatime,ac
file system I/O speed:
– writes: ~25MB/s
[root@pgsql] ~# netstat -a | grep X.X.X.X
tcp 0 25020 Y.Y.Y.Y:ftps-data X.X.X.X:nfs ESTABLISHED
forced flush on disk:
– sync’ed 8kB writes are almost the same
→ no dependence from write cache
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• try different crash scenarios under high concurrency load, check if the
instance recovers properly, then execute a VACUUM FULL:
– kill -9 to postmaster
– forced reboot throughkernel SysReq:
• echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
– power off of the VM from the VMWare vSphere®
remote panel
– kill the TCP/IP connections through tcpkill
[root@pgsql] ~# tcpkill host X.X.X.X
tcpkill: listening on eth0 [host X.X.X.X]
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• try different crash scenarios under high concurrency load, check if the
instance recovers properly, then execute a VACUUM FULL:
– kill -9 to postmaster
– forced reboot throughkernel SysReq:
• echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
– power off of the VM from the VMWare vSphere®
remote panel
– kill the TCP/IP connections through tcpkill
[root@pgsql] ~# tcpkill host X.X.X.X
tcpkill: listening on eth0 [host X.X.X.X]
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• try different crash scenarios under high concurrency load, check if the
instance recovers properly, then execute a VACUUM FULL:
– kill -9 to postmaster
– forced reboot throughkernel SysReq:
• echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
– power off of the VM from the VMWare vSphere®
remote panel
– kill the TCP/IP connections through tcpkill
[root@pgsql] ~# tcpkill host X.X.X.X
tcpkill: listening on eth0 [host X.X.X.X]
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• try different crash scenarios under high concurrency load, check if the
instance recovers properly, then execute a VACUUM FULL:
– kill -9 to postmaster
– forced reboot throughkernel SysReq:
• echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
– power off of the VM from the VMWare vSphere®
remote panel
– kill the TCP/IP connections through tcpkill
[root@pgsql] ~# tcpkill host X.X.X.X
tcpkill: listening on eth0 [host X.X.X.X]
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• try different crash scenarios under high concurrency load, check if the
instance recovers properly, then execute a VACUUM FULL:
– kill -9 to postmaster
– forced reboot throughkernel SysReq:
• echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger
– power off of the VM from the VMWare vSphere®
remote panel
– kill the TCP/IP connections through tcpkill
[root@pgsql] ~# tcpkill host X.X.X.X
tcpkill: listening on eth0 [host X.X.X.X]
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• T. Vondra – test the persistence of recycled WALs in ext4
– 56583BDD.9060302@2ndquadrant.com#56583BDD.9060302@2ndquadrant.com
• update attributes of recycled WALs → flush them on disk through fdatasync
• fdatasync does not force the flush of metadata → the update may get lost after a crash
• logged changes are contained in file “in the future” → data loss!
– github.com/2ndQuadrant/ext4-data-loss
• INSERT/UPDATE new records in parallel and synchronously on the db and on a file
• simulate a crash → compare db & file contents after the crash recovery
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• Check NFS behaviour for recycled WALs (file attributes caching)
– noac
•
– ac
•
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• Check NFS behaviour for recycled WALs (file attributes caching)
– noac
• several tests, no data loss!
– ac
•
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• Check NFS behaviour for recycled WALs (file attributes caching)
– noac
• several tests, no data loss!
– ac
• several tests, no data loss!
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
• Check NFS behaviour for recycled WALs (file attributes caching)
– noac
• several tests, no data loss!
– ac
• several tests, no data loss!
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Reliability testsReliability tests
All tests passed...but this does not ensure that it is totally safe!
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
The reaction of the customerThe reaction of the customer
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
NetAppNetApp®®
indicationsindications
• J. Steiner (NetApp®
) – Oracle®
DB based on VMs & NetApp®
storages (NFS v3)
– community.hpe.com/t5/Networking/Oracle-DB-over-NFS-on-Netapp-only-35MB-sec/td-p/4116343
• avoid paravirtualised network drivers, expose volumes via NFS
• allow default caching (ac, acregmin=3, acregmax=60)
– actime=o just for consistent views in Oracle RAC©
• set max value for timeo (allow the NFS client to wait the NFS response)
– but retry the request indefinitely
• use max size for packets allowed by the kernel
• do not allow interruption during NFS file operation
rw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
The reaction of the consultantThe reaction of the consultant
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Enhance reliability for PostgreSQL DBsEnhance reliability for PostgreSQL DBs
• allow page checksums (9.3+): initdb --data-checksums ...
– execute CRC32 calculation for each 8kB data block
• a checksum failure means that the data block is corrupted
– force wal_log_hint=true
• write the entire 8kB page to the WAL, even for hint bits modification
– take into account the impact to the performance:
• R: checksum extra-calculation every 8kB
• W: increase the amount of information logged into WALs
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Enhance reliability for PostgreSQL DBsEnhance reliability for PostgreSQL DBs
• allow page checksums (9.3+): initdb --data-checksums ...
– execute CRC32 calculation for each 8kB data block
• a checksum failure means that the data block is corrupted
– force wal_log_hint=true
• write the entire 8kB page to the WAL, even for hint bits modification
– take into account the impact to the performance:
• R: checksum extra-calculation every 8kB
• W: increase the amount of information logged into WALs
• BUG (9.3+)! FSM & VM truncation not persisted when wal_log_hints=true :
– https://wiki.postgresql.org/wiki/Free_Space_Map_Problems (P. Deolasee, H. Linnakangas)
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Enhance performance for PostgreSQL DBsEnhance performance for PostgreSQL DBs
• sync’ed writes are slow, even if NFS caching is enabled
• if possible, consider asynchronous commit:
– let the server return success as soon as the transaction is logically completed
• synchronous_commit=off – it can be set per user/session
• WAL entries will be flushed in a second moment, but not later than 3X wal_writer_delay
• in case of crashes, the DB can be recovered in a consistent state, but there could be a
window of data loss
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
33rdrd
cfg:cfg: rw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointrrw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr
file system I/O speed:
– reads: ~50MB/s → 700MB/s
– writes: ~12MB/s → 700MB/s
synchronous_commit=on:
– INSERT:
• noac/ac ~ 1 (single & multi client)
• 3rd
vs noac:+10% (single & multi client)
– SELECT:
• noac ~ 16X ac
• 3rd
~ 2X ac
DB commit rate and SELECT rate:
synchronous_commit=off:
– INSERT:
• noac/ac ~ 1 (single & multi client)
• 3rd
~ 8X noac (single & multi client)
– SELECT: same as synchronous_commit=on
– TPC-B:
• noac ~ 2X ac → 3rd
~ 1.8X ac
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
33rdrd
cfg:cfg: rw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointrrw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr
file system I/O speed:
– reads: ~50MB/s → 700MB/s
– writes: ~12MB/s → 700MB/s
synchronous_commit=on:
– INSERT:
• noac/ac ~ 1 (single & multi client)
• 3rd
vs noac:+10% (single & multi client)
– SELECT:
• noac ~ 16X ac
• 3rd
~ 2X ac
DB commit rate and SELECT rate:
synchronous_commit=off:
– INSERT:
• noac/ac ~ 1 (single & multi client)
• 3rd
~ 8X noac (single & multi client)
– SELECT: same as synchronous_commit=on
– TPC-B:
• noac ~ 2X ac → 3rd
~ 1.8X ac
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
ConclusionsConclusions
• NFS is not natively thought for reliability purposes
– the protocol is thought to enhance the performance
– NFS v4 is preferable
• PostgreSQL allows to adopt many countermeasures
– it is at least able to promptly detect data corruptions
• PostgreSQL can be used with NFS
– ready to accept minimal data loss
PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it
PGConf EU 2016
8th
edition
Tallinn
Nov., 2nd
2016
Creative Commons license
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License
https://creativecommons.org/licenses/by-nc-sa/4.0/
© 2016 2ndQuadrant Italia – http://www.2ndquadrant.it

More Related Content

What's hot

AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
Zubair Nabi
 
3 scanning-ger paoctes-pub
3  scanning-ger paoctes-pub3  scanning-ger paoctes-pub
3 scanning-ger paoctes-pub
Cassio Ramos
 
Tomas Hlavacek - IP fragmentation attack on DNS
Tomas Hlavacek - IP fragmentation attack on DNSTomas Hlavacek - IP fragmentation attack on DNS
Tomas Hlavacek - IP fragmentation attack on DNS
DefconRussia
 

What's hot (20)

僕の疑問に答えてください。
僕の疑問に答えてください。僕の疑問に答えてください。
僕の疑問に答えてください。
 
Network commands
Network commandsNetwork commands
Network commands
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
The Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitchThe Basic Introduction of Open vSwitch
The Basic Introduction of Open vSwitch
 
DockerCon17 - Beyond the backslash
DockerCon17 - Beyond the backslashDockerCon17 - Beyond the backslash
DockerCon17 - Beyond the backslash
 
OpenZFS code repository
OpenZFS code repositoryOpenZFS code repository
OpenZFS code repository
 
Http2.0 Guide 2013-08-14 #http2study
Http2.0 Guide 2013-08-14 #http2studyHttp2.0 Guide 2013-08-14 #http2study
Http2.0 Guide 2013-08-14 #http2study
 
Snort
SnortSnort
Snort
 
NUS SOC Print
NUS SOC PrintNUS SOC Print
NUS SOC Print
 
Git Without Puns
Git Without PunsGit Without Puns
Git Without Puns
 
Security Onion Advance
Security Onion AdvanceSecurity Onion Advance
Security Onion Advance
 
Packet crafting of2013
Packet crafting of2013Packet crafting of2013
Packet crafting of2013
 
Best Current Operational Practices - Dos, Don’ts and lessons learned
Best Current Operational Practices - Dos, Don’ts and lessons learnedBest Current Operational Practices - Dos, Don’ts and lessons learned
Best Current Operational Practices - Dos, Don’ts and lessons learned
 
3 scanning-ger paoctes-pub
3  scanning-ger paoctes-pub3  scanning-ger paoctes-pub
3 scanning-ger paoctes-pub
 
OpenWRT guide and memo
OpenWRT guide and memoOpenWRT guide and memo
OpenWRT guide and memo
 
Blockchain
BlockchainBlockchain
Blockchain
 
บล๊อกเวปไซท์ บน Open WRT หรือ บน Ubiquiti NanoStation M5 หรือ บน airOS
บล๊อกเวปไซท์ บน Open WRT หรือ บน  Ubiquiti NanoStation M5 หรือ บน airOSบล๊อกเวปไซท์ บน Open WRT หรือ บน  Ubiquiti NanoStation M5 หรือ บน airOS
บล๊อกเวปไซท์ บน Open WRT หรือ บน Ubiquiti NanoStation M5 หรือ บน airOS
 
Protect your edge BGP security made simple
Protect your edge BGP security made simpleProtect your edge BGP security made simple
Protect your edge BGP security made simple
 
Tomas Hlavacek - IP fragmentation attack on DNS
Tomas Hlavacek - IP fragmentation attack on DNSTomas Hlavacek - IP fragmentation attack on DNS
Tomas Hlavacek - IP fragmentation attack on DNS
 

Viewers also liked

Keith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres OpenKeith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
PostgresOpen
 
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
PostgresOpen
 
David Keeney - SQL Database Server Requests from the Browser @ Postgres Open
David Keeney - SQL Database Server Requests from the Browser @ Postgres OpenDavid Keeney - SQL Database Server Requests from the Browser @ Postgres Open
David Keeney - SQL Database Server Requests from the Browser @ Postgres Open
PostgresOpen
 
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
PostgresOpen
 
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres OpenBruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
PostgresOpen
 
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
PostgresOpen
 
Keith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres OpenKeith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres Open
PostgresOpen
 
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
PostgresOpen
 
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
PostgresOpen
 
Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open
Robert Haas Query Planning Gone Wrong Presentation @ Postgres OpenRobert Haas Query Planning Gone Wrong Presentation @ Postgres Open
Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open
PostgresOpen
 
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenSteve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
PostgresOpen
 
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres OpenMichael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
PostgresOpen
 
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To FinishPoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
elliando dias
 
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
PostgresOpen
 

Viewers also liked (20)

Logical replication with pglogical
Logical replication with pglogicalLogical replication with pglogical
Logical replication with pglogical
 
Teaching PostgreSQL to new people
Teaching PostgreSQL to new peopleTeaching PostgreSQL to new people
Teaching PostgreSQL to new people
 
Managing thousands of databases
Managing thousands of databasesManaging thousands of databases
Managing thousands of databases
 
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres OpenKeith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
Keith Fiske - When PostgreSQL Can't, You Can @ Postgres Open
 
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
Gurjeet Singh - How Postgres is Different From (Better Tha) Your RDBMS @ Post...
 
David Keeney - SQL Database Server Requests from the Browser @ Postgres Open
David Keeney - SQL Database Server Requests from the Browser @ Postgres OpenDavid Keeney - SQL Database Server Requests from the Browser @ Postgres Open
David Keeney - SQL Database Server Requests from the Browser @ Postgres Open
 
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
Ryan Jarvinen Open Shift Talk @ Postgres Open 2013
 
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres OpenBruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
Bruce Momjian - Inside PostgreSQL Shared Memory @ Postgres Open
 
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres OpenKevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
Kevin Kempter - PostgreSQL Backup and Recovery Methods @ Postgres Open
 
Keith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres OpenKeith Paskett - Postgres on ZFS @ Postgres Open
Keith Paskett - Postgres on ZFS @ Postgres Open
 
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
Henrietta Dombrovskaya - A New Approach to Resolve Object-Relational Impedanc...
 
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...Selena Deckelmann - Sane Schema Management with  Alembic and SQLAlchemy @ Pos...
Selena Deckelmann - Sane Schema Management with Alembic and SQLAlchemy @ Pos...
 
Islamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuningIslamabad PUG - 7th Meetup - performance tuning
Islamabad PUG - 7th Meetup - performance tuning
 
Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)Out of the box replication in postgres 9.4(pg confus)
Out of the box replication in postgres 9.4(pg confus)
 
Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open
Robert Haas Query Planning Gone Wrong Presentation @ Postgres OpenRobert Haas Query Planning Gone Wrong Presentation @ Postgres Open
Robert Haas Query Planning Gone Wrong Presentation @ Postgres Open
 
Islamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuningIslamabad PUG - 7th meetup - performance tuning
Islamabad PUG - 7th meetup - performance tuning
 
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres OpenSteve Singer - Managing PostgreSQL with Puppet @ Postgres Open
Steve Singer - Managing PostgreSQL with Puppet @ Postgres Open
 
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres OpenMichael Bayer Introduction to SQLAlchemy @ Postgres Open
Michael Bayer Introduction to SQLAlchemy @ Postgres Open
 
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To FinishPoPostgreSQL Web Projects: From Start to FinishStart To Finish
PoPostgreSQL Web Projects: From Start to FinishStart To Finish
 
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
 

Similar to Gbroccolo pgconfeu2016 pgnfs

SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
Ali Ordoubadian
 
Spring sim 2010-riley
Spring sim 2010-rileySpring sim 2010-riley
Spring sim 2010-riley
Sopna Sumāto
 
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGOOpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
OpenNebula Project
 
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Igalia
 

Similar to Gbroccolo pgconfeu2016 pgnfs (20)

An FPGA for high end Open Networking
An FPGA for high end Open NetworkingAn FPGA for high end Open Networking
An FPGA for high end Open Networking
 
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
SNIA Europe - DCSEurope_April2013 (AOrdoubadian)
 
Spring sim 2010-riley
Spring sim 2010-rileySpring sim 2010-riley
Spring sim 2010-riley
 
Update on IPv6 activity in CERNET2
Update on IPv6 activity in CERNET2Update on IPv6 activity in CERNET2
Update on IPv6 activity in CERNET2
 
Archiving data from Durham to RAL using the File Transfer Service (FTS)
Archiving data from Durham to RAL using the File Transfer Service (FTS)Archiving data from Durham to RAL using the File Transfer Service (FTS)
Archiving data from Durham to RAL using the File Transfer Service (FTS)
 
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case study
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case studyOSMC 2021 | Handling 250K flows per second with OpenNMS: a case study
OSMC 2021 | Handling 250K flows per second with OpenNMS: a case study
 
An Optics Life
An Optics LifeAn Optics Life
An Optics Life
 
BGP Traffic Engineering / Routing Optimisation
BGP Traffic Engineering / Routing OptimisationBGP Traffic Engineering / Routing Optimisation
BGP Traffic Engineering / Routing Optimisation
 
Paper9250 implementation of an i pv6 stack for ns-3
Paper9250 implementation of an i pv6 stack for ns-3Paper9250 implementation of an i pv6 stack for ns-3
Paper9250 implementation of an i pv6 stack for ns-3
 
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGOOpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
OpenNebulaConf2017EU: FairShare Scheduling by Valentina Zaccolo, INDIGO
 
PhD Projects in GNS3 Network Research Help
PhD Projects in GNS3 Network Research HelpPhD Projects in GNS3 Network Research Help
PhD Projects in GNS3 Network Research Help
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
Network simulator 2 a simulation tool for linux
Network simulator 2 a simulation tool for linuxNetwork simulator 2 a simulation tool for linux
Network simulator 2 a simulation tool for linux
 
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSD...
 
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
How IBM's Massive POWER9 UNIX Servers Benefit from InfluxDB and Grafana Techn...
 
Ubuica: monitoring internet cersorship
Ubuica: monitoring internet cersorshipUbuica: monitoring internet cersorship
Ubuica: monitoring internet cersorship
 
guna_2015.DOC
guna_2015.DOCguna_2015.DOC
guna_2015.DOC
 
PLNOG 3: Jens Link - IPv6 - Migration Planning
PLNOG 3: Jens Link -  IPv6 - Migration PlanningPLNOG 3: Jens Link -  IPv6 - Migration Planning
PLNOG 3: Jens Link - IPv6 - Migration Planning
 
Configuration & Routing of Clos Networks
Configuration & Routing of Clos NetworksConfiguration & Routing of Clos Networks
Configuration & Routing of Clos Networks
 
Tstat conext compressed
Tstat conext compressedTstat conext compressed
Tstat conext compressed
 

More from Giuseppe Broccolo

More from Giuseppe Broccolo (10)

High scalable applications with Python
High scalable applications with PythonHigh scalable applications with Python
High scalable applications with Python
 
Indexes in PostgreSQL (10)
Indexes in PostgreSQL (10)Indexes in PostgreSQL (10)
Indexes in PostgreSQL (10)
 
Non-parametric regressions & Neural Networks
Non-parametric regressions & Neural NetworksNon-parametric regressions & Neural Networks
Non-parametric regressions & Neural Networks
 
GBroccolo JRouhaud pgconfeu2016_brin4postgis
GBroccolo JRouhaud pgconfeu2016_brin4postgisGBroccolo JRouhaud pgconfeu2016_brin4postgis
GBroccolo JRouhaud pgconfeu2016_brin4postgis
 
Gbroccolo foss4 guk2016_brin4postgis
Gbroccolo foss4 guk2016_brin4postgisGbroccolo foss4 guk2016_brin4postgis
Gbroccolo foss4 guk2016_brin4postgis
 
Relational approach with LiDAR data with PostgreSQL
Relational approach with LiDAR data with PostgreSQLRelational approach with LiDAR data with PostgreSQL
Relational approach with LiDAR data with PostgreSQL
 
BRIN indexes on geospatial databases - FOSS4G.NA 2016
BRIN indexes on geospatial databases - FOSS4G.NA 2016BRIN indexes on geospatial databases - FOSS4G.NA 2016
BRIN indexes on geospatial databases - FOSS4G.NA 2016
 
Gbroccolo itpug p_gday2015_geodbbrin
Gbroccolo itpug p_gday2015_geodbbrinGbroccolo itpug p_gday2015_geodbbrin
Gbroccolo itpug p_gday2015_geodbbrin
 
gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...
gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...
gbroccolo - Use of Indexes on geospatial databases with PostgreSQL - FOSS4G.E...
 
Gbroccolo itpug p_gday2014_geodbindex
Gbroccolo itpug p_gday2014_geodbindexGbroccolo itpug p_gday2014_geodbindex
Gbroccolo itpug p_gday2014_geodbindex
 

Recently uploaded

Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
Kamal Acharya
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
Atif Razi
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Democratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek AryaDemocratizing Fuzzing at Scale by Abhishek Arya
Democratizing Fuzzing at Scale by Abhishek Arya
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdf
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answer
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical SolutionsRS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
RS Khurmi Machine Design Clutch and Brake Exercise Numerical Solutions
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Danfoss NeoCharge Technology -A Revolution in 2024.pdf
Danfoss NeoCharge Technology -A Revolution in 2024.pdfDanfoss NeoCharge Technology -A Revolution in 2024.pdf
Danfoss NeoCharge Technology -A Revolution in 2024.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
Introduction to Casting Processes in Manufacturing
Introduction to Casting Processes in ManufacturingIntroduction to Casting Processes in Manufacturing
Introduction to Casting Processes in Manufacturing
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 

Gbroccolo pgconfeu2016 pgnfs

  • 1. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 PostgreSQL & NFS Myths & Truths www.2ndquadrant.com
  • 2. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Myths & TruthsMyths & Truths • What people say about NFS:
  • 3. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Myths & TruthsMyths & Truths • What people say about NFS: IT’S FAST!
  • 4. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Myths & TruthsMyths & Truths • What people say about DBs based on NFS:
  • 5. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Myths & TruthsMyths & Truths • What people say about DBs based on NFS: Don't do it!
  • 6. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Myths & TruthsMyths & Truths • What people say about DBs based on NFS: Don't do it! Don't do it!
  • 7. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Myths & TruthsMyths & Truths • What people say about DBs based on NFS: Don't do it! Don't do it! Don't do it!
  • 8. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 ~$ whoami~$ whoami Giuseppe Broccolo, PhDGiuseppe Broccolo, PhD PostgreSQL & PostGIS consultantPostgreSQL & PostGIS consultant @giubro gbroccolo7 giuseppe.broccolo@2ndquadrant.it gbroccolo gemini__81
  • 9. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 A personal experience with NFS...A personal experience with NFS... “The Athen's School” - Raffaello Sanzio (1509-1511), Vatican’s Museums
  • 10. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Part 1Part 1 Network File System general characteristics
  • 11. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Network File SystemNetwork File System (Sun Microsystem(Sun Microsystem®® , 1984), 1984) • A protocol for distributed file system – v2, v3, v4 • servers export the drive, clients mounts the export locally • Many clients, one server • High performances through fast network LAN
  • 12. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 NFS v3 vs. NFS v2NFS v3 vs. NFS v2 Strength: – NFS v3 (default since RHEL 5) adds some excellent features to NFS v2: • Asynchronous communication through write and read caching • TCP connection, but UDP protocol can be used if explicitely requested through mount options • NFS server connections from clients is managed by a deamon which perform host authentication • a deamon on server side manages file locking
  • 13. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 NFS v3 vs. NFS v2NFS v3 vs. NFS v2 Weakness: – NFS v3 still present some weakness • the server is stateless • authentication is weak, only host authentication is performed • data transmitted in clear text • NFS depends on portmapper service for dynamic port assignment
  • 14. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 NFS v4 vs. NFS v3NFS v4 vs. NFS v3 Partial solutions: – NFS v4 (default since RHEL 6) adds: • remove client/server dependency on portmapper service, use single port (2049) – easy to apply firewall rules  to filter NFS traffic • authentication and locking deamons included in the NFS protocol • connections are always  statefull • UDP not even allowed for connection
  • 15. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Transfer optimizationTransfer optimization Block size: – NFS requests are organised in several data blocks exchanged between client & server – set block size to optimise transfer speed of NFS requests from client side: • wsize/rsize: size of the chunks of transfered data • NFS v2 theoretical limit: 8kB • NFS v3 & v4 theoretical limit: depends on kernel version (up to 32kB÷64kB) • Pay attention: too large block size produce odd effects specially during readings – i.e. ls returns a not complete list of the content of a file system
  • 16. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Network settingNetwork setting Packet size & MTU: – packets size can be optimised to transfer blocks within the network → jumbo frames – routers could decrease the size of transfered packets • use tracepath to determine if path’s MTU is different than network card’s MTU – too large packets could get dropped! • NFS requests have to be retransmitted → drop in performance • use netstat & nfsstat to determine the amount of dropped packets
  • 17. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Choice the protocolChoice the protocol NFS relies on TCP or UDP protocols: – rsize/wsize larger than network’s MTU causes IP packet fragmentation over UDP • IP packet fragmentation and reassembly require CPU resources • consider to use jumbo frames (high MTU values) in case of UDP in ~O(Gb/s) ethernet • TCP automatically determine the proper IP packet size – UDP is a stateless protocol / TCP is a statefull protocol • UDP performs better than TCP in ideal networks • TCP requires that just the dropped packets have to be retrasmitted, and not the entire NFS request like for UDP – TCP performs better than UDP in lossy networks • in case of TCP, the exported volumes have to be remounted if NFS server crashes
  • 18. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Main concerns about synchronisationMain concerns about synchronisation Time synchronisation – NFS does not synchronize time between client and server – NFS v3 allows clients specifying the time when updating a file • doesn’t help in cases where concurrent clients are used – Use NTP!
  • 19. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Main concerns about synchronisationMain concerns about synchronisation File Locking – NFS v2 does not natively support file locking • it’s prone to corruptions in case of concurrent accesses – locking deamon in NFS v3/v4, similar to Unix file locking • performance drop – server is stateless for NFS v2/v3 • locks are lost in case of clients failure, if locking deamon is not used • if locking deamon is used: – if client is rebooted: server is notified and locks are released – if clients is not rebooted: locks are released if the deamon is restarted
  • 20. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Main concerns about synchronisationMain concerns about synchronisation Delayed write cache – clients cache small writes, to minimise LAN traffic – clients can keep different file’s views for several seconds – accesses from different concurrent clients allow corruptions – write cache can be disabled, with large impact to performance
  • 21. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Main concerns about synchronisationMain concerns about synchronisation Read (metadata) cache – NFS clients cache file attributes • last file access, last inode change, last file modification – NFS server may show not-updated attibutes – any program that relies on file attributes may not work – read cache can be disabled, impacting read performance
  • 22. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Part 2Part 2 NFS & PostgreSQL
  • 23. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 NFS & PostgreSQLNFS & PostgreSQL LAN www.postgresql.org/docs/9.6/static/creating-cluster.html#CREATING-CLUSTER-NFS • if NFS client/server implementation does not provide standard file system semantics, this can cause reliability problems • PostgreSQL advices: – NFS mount options • avoid soft-mounting: hard – General mount options • avoid asynchronous writes: sync PostgreSQL assumes NFS behaves exactly like locally-connected drives
  • 24. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 sources of corruptionssources of corruptions hard-mount option: – NFS calls must be retried indefinitely • both data and WAL entries sequentiality may not be preserved write cache: – WALs have to be flushed as a database action is committed • many processes are involved during WALs flush on disk → several NFS clients → WAL entries sequentiality may not be preserved
  • 25. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 (other) sources of corruptions(other) sources of corruptions attribute cache: many processes are involved during WALs flush on disk → several NFS clients → files attributes may not have consistent views
  • 26. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Enhance reliabilityEnhance reliability protocol: – TCP • optimises the amount of requested packets in case of lossy networks • jumbo frames could be avoided (also for ~O(Gb/s) ethernet) block size: – Many processes contribute in I/O load → optimise data transfer → reduce # of client calls
  • 27. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Synchronous writesSynchronous writes sync vs. fsync=on: – NFS v2: • if sync is specified, the server will not complete a request until both data/metadata are written to the disk – NFS v3 / v4: • if sync is specified, the server will complete a request returning the status of the write: – NFS_FILE_SYNC, NFS_DATA_SYNC, NFS_UNSTABLE • data is effectively forced to be flushed on disk once a sync method system call is issued – be careful to set fsync=on
  • 28. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability vs. PerformanceReliability vs. Performance • NFS exports must be mounted with safe options • Are these option fine for a database? – is the performance deeply impacted? – is data safely guaranteed in case of crash?
  • 29. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Part 3Part 3 NFS & PostgreSQL performance & reliability benchmarks
  • 30. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 The customerThe customer Ethernet 10Gb/s No routers 2 CPUs x 8GB RAM RHEL 6 Kernel v. 2.6.32 PostgreSQL 9.5.3 NetApp FAS 8080 Full Flash Clustered Mode volumes mounted via NFS v3 NO paravirtualised network driver physical host storage
  • 31. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 What has been measuredWhat has been measured • file system speeds – bonnie++ – sequential reads (blocks reading) – sequential writes (block writing & block rewriting) – seek rate • test for sync’ed writes - pg_test_fsync – µs needed by a single process to flush 8kB on disk • database commit rate – pgbench – SELECT, INSERT, TPC-B – single & multiclient operations
  • 32. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 A first configurationA first configuration • perform benchmarks with different mount options: – rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac
  • 33. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 A first configurationA first configuration • perform benchmarks with different mount options: – rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac file system I/O speed: – reads: ~50MB/s writes: ~12MB/s
  • 34. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 A first configurationA first configuration • perform benchmarks with different mount options: – rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac file system I/O speed: – reads: ~50MB/s writes: ~12MB/s file system I/O speed with ac: – reads: ~700MB/s writes: ~16MB/s
  • 35. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 A first configurationA first configuration • perform benchmarks with different mount options: – rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac file system I/O speed: – reads: ~50MB/s writes: ~12MB/s file system I/O speed with ac: – reads: ~700MB/s writes: ~16MB/s forced flush on disk: – sync’ed 8kB writes / non-sync’ed 8kB writes = ~40%
  • 36. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 DB performanceDB performance • perform benchmarks with different mount options: – rw,hard,bg,timeo=600,proto=tcp,sync,nointr,noatime,noac DB commit rates for different operations: – INSERT: • noac/ac ~ 1 (single client & multi client) – SELECT: • noac ~ 16X ac – TPC-B: • noac ~ 2X ac
  • 37. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 A second configurationA second configuration • increase block & packet size (using jumbo frames), and check the writes: – rw,hard,bg,timeo=600,proto=tcp,sync,wsize=65536,nointr, noatime,noac – rw,hard,bg,timeo=600,proto=tcp,sync,wsize=65536,nointr, noatime,ac file system I/O speed: – writes: ~25MB/s [root@pgsql] ~# netstat -a | grep X.X.X.X tcp 0 25020 Y.Y.Y.Y:ftps-data X.X.X.X:nfs ESTABLISHED forced flush on disk: – sync’ed 8kB writes are almost the same → no dependence from write cache
  • 38. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests • try different crash scenarios under high concurrency load, check if the instance recovers properly, then execute a VACUUM FULL: – kill -9 to postmaster – forced reboot throughkernel SysReq: • echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger – power off of the VM from the VMWare vSphere® remote panel – kill the TCP/IP connections through tcpkill [root@pgsql] ~# tcpkill host X.X.X.X tcpkill: listening on eth0 [host X.X.X.X]
  • 39. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests • try different crash scenarios under high concurrency load, check if the instance recovers properly, then execute a VACUUM FULL: – kill -9 to postmaster – forced reboot throughkernel SysReq: • echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger – power off of the VM from the VMWare vSphere® remote panel – kill the TCP/IP connections through tcpkill [root@pgsql] ~# tcpkill host X.X.X.X tcpkill: listening on eth0 [host X.X.X.X]
  • 40. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests • try different crash scenarios under high concurrency load, check if the instance recovers properly, then execute a VACUUM FULL: – kill -9 to postmaster – forced reboot throughkernel SysReq: • echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger – power off of the VM from the VMWare vSphere® remote panel – kill the TCP/IP connections through tcpkill [root@pgsql] ~# tcpkill host X.X.X.X tcpkill: listening on eth0 [host X.X.X.X]
  • 41. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests • try different crash scenarios under high concurrency load, check if the instance recovers properly, then execute a VACUUM FULL: – kill -9 to postmaster – forced reboot throughkernel SysReq: • echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger – power off of the VM from the VMWare vSphere® remote panel – kill the TCP/IP connections through tcpkill [root@pgsql] ~# tcpkill host X.X.X.X tcpkill: listening on eth0 [host X.X.X.X]
  • 42. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests • try different crash scenarios under high concurrency load, check if the instance recovers properly, then execute a VACUUM FULL: – kill -9 to postmaster – forced reboot throughkernel SysReq: • echo 1 > /proc/sys/kernel/sysrq ; echo b > /proc/sysrq-trigger – power off of the VM from the VMWare vSphere® remote panel – kill the TCP/IP connections through tcpkill [root@pgsql] ~# tcpkill host X.X.X.X tcpkill: listening on eth0 [host X.X.X.X]
  • 43. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests • T. Vondra – test the persistence of recycled WALs in ext4 – 56583BDD.9060302@2ndquadrant.com#56583BDD.9060302@2ndquadrant.com • update attributes of recycled WALs → flush them on disk through fdatasync • fdatasync does not force the flush of metadata → the update may get lost after a crash • logged changes are contained in file “in the future” → data loss! – github.com/2ndQuadrant/ext4-data-loss • INSERT/UPDATE new records in parallel and synchronously on the db and on a file • simulate a crash → compare db & file contents after the crash recovery
  • 44. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests • Check NFS behaviour for recycled WALs (file attributes caching) – noac • – ac •
  • 45. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests • Check NFS behaviour for recycled WALs (file attributes caching) – noac • several tests, no data loss! – ac •
  • 46. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests • Check NFS behaviour for recycled WALs (file attributes caching) – noac • several tests, no data loss! – ac • several tests, no data loss!
  • 47. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests • Check NFS behaviour for recycled WALs (file attributes caching) – noac • several tests, no data loss! – ac • several tests, no data loss!
  • 48. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Reliability testsReliability tests All tests passed...but this does not ensure that it is totally safe!
  • 49. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 The reaction of the customerThe reaction of the customer
  • 50. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 NetAppNetApp®® indicationsindications • J. Steiner (NetApp® ) – Oracle® DB based on VMs & NetApp® storages (NFS v3) – community.hpe.com/t5/Networking/Oracle-DB-over-NFS-on-Netapp-only-35MB-sec/td-p/4116343 • avoid paravirtualised network drivers, expose volumes via NFS • allow default caching (ac, acregmin=3, acregmax=60) – actime=o just for consistent views in Oracle RAC© • set max value for timeo (allow the NFS client to wait the NFS response) – but retry the request indefinitely • use max size for packets allowed by the kernel • do not allow interruption during NFS file operation rw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr
  • 51. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 The reaction of the consultantThe reaction of the consultant
  • 52. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Enhance reliability for PostgreSQL DBsEnhance reliability for PostgreSQL DBs • allow page checksums (9.3+): initdb --data-checksums ... – execute CRC32 calculation for each 8kB data block • a checksum failure means that the data block is corrupted – force wal_log_hint=true • write the entire 8kB page to the WAL, even for hint bits modification – take into account the impact to the performance: • R: checksum extra-calculation every 8kB • W: increase the amount of information logged into WALs
  • 53. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Enhance reliability for PostgreSQL DBsEnhance reliability for PostgreSQL DBs • allow page checksums (9.3+): initdb --data-checksums ... – execute CRC32 calculation for each 8kB data block • a checksum failure means that the data block is corrupted – force wal_log_hint=true • write the entire 8kB page to the WAL, even for hint bits modification – take into account the impact to the performance: • R: checksum extra-calculation every 8kB • W: increase the amount of information logged into WALs • BUG (9.3+)! FSM & VM truncation not persisted when wal_log_hints=true : – https://wiki.postgresql.org/wiki/Free_Space_Map_Problems (P. Deolasee, H. Linnakangas)
  • 54. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Enhance performance for PostgreSQL DBsEnhance performance for PostgreSQL DBs • sync’ed writes are slow, even if NFS caching is enabled • if possible, consider asynchronous commit: – let the server return success as soon as the transaction is logically completed • synchronous_commit=off – it can be set per user/session • WAL entries will be flushed in a second moment, but not later than 3X wal_writer_delay • in case of crashes, the DB can be recovered in a consistent state, but there could be a window of data loss
  • 55. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 33rdrd cfg:cfg: rw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointrrw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr file system I/O speed: – reads: ~50MB/s → 700MB/s – writes: ~12MB/s → 700MB/s synchronous_commit=on: – INSERT: • noac/ac ~ 1 (single & multi client) • 3rd vs noac:+10% (single & multi client) – SELECT: • noac ~ 16X ac • 3rd ~ 2X ac DB commit rate and SELECT rate: synchronous_commit=off: – INSERT: • noac/ac ~ 1 (single & multi client) • 3rd ~ 8X noac (single & multi client) – SELECT: same as synchronous_commit=on – TPC-B: • noac ~ 2X ac → 3rd ~ 1.8X ac
  • 56. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 33rdrd cfg:cfg: rw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointrrw,bg,hard,proto=tcp,timeo=600,rsize=65536,wsize=65536,nointr file system I/O speed: – reads: ~50MB/s → 700MB/s – writes: ~12MB/s → 700MB/s synchronous_commit=on: – INSERT: • noac/ac ~ 1 (single & multi client) • 3rd vs noac:+10% (single & multi client) – SELECT: • noac ~ 16X ac • 3rd ~ 2X ac DB commit rate and SELECT rate: synchronous_commit=off: – INSERT: • noac/ac ~ 1 (single & multi client) • 3rd ~ 8X noac (single & multi client) – SELECT: same as synchronous_commit=on – TPC-B: • noac ~ 2X ac → 3rd ~ 1.8X ac
  • 57. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 ConclusionsConclusions • NFS is not natively thought for reliability purposes – the protocol is thought to enhance the performance – NFS v4 is preferable • PostgreSQL allows to adopt many countermeasures – it is at least able to promptly detect data corruptions • PostgreSQL can be used with NFS – ready to accept minimal data loss
  • 58. PostgreSQL & NFS: myths & truths Giuseppe Broccolo – giuseppe.broccolo@2ndquadrant.it PGConf EU 2016 8th edition Tallinn Nov., 2nd 2016 Creative Commons license This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License https://creativecommons.org/licenses/by-nc-sa/4.0/ © 2016 2ndQuadrant Italia – http://www.2ndquadrant.it