SlideShare a Scribd company logo
Data corruption
Where does it come from and what
can you do about it?
Tomas Vondra
tomas@2ndquadrant.com / tomas@pgaddict.com
data_checksums = on
Well, not quite ...
The larger and older a database is, the
more likely it's corrupted in some way.
Agenda
● What is data corruption?
● Sources of data corruption
● What to do about it
Data corruption
● many possible causes
● hardware
○ disks / memory
○ environment (cosmic rays, temperature, …)
● software
○ bugs in OS (kernel, fs, libc, ...)
○ bugs in database systems / applications
● administrator mistake
○ remove pg_xlog ...
naturally one-off events
cosmic rays = no clue
storage
storage corruption
● this used to be pretty common
○ crappy disks, RAID controllers
○ insufficient power-loss protection
● it got better over time
○ better disks, better RAID controllers
○ when it failed, it failed totally
● now it's getting worse, again :-(
○ crappy disks connected over network
(SAN, EBS, NFS, …)
○ layers of virtualization everywhere
software issues
Operating System
● PostgreSQL is very trusting
○ relies on a lot of stuff
○ assumes it's perfect
● nothing is perfect
○ kernel bugs
○ filesystem bugs
○ glibc bugs (collation updates, …)
○ ...
Collations
● rules for language-specific text sort
○ defined in glibc
○ change rarely, poor versioning
● indexes require text ordering to be stable
● what if you build index and then
○ upgrade to diffferent collations
○ replica has different collations
● ICU solves this
○ reliable versioning
#fsyncgate
● CHECKPOINT
○ flush data from shared_buffers to disc
○ discard old part of WAL
● in case of fsync error, retry
● assumes two things
○ reliable kernel error reporting
○ dirty data are kept in page cache
#fsyncgate
● fact #1: error reporting unreliable
○ behavior depends on kernel version
○ errors may be consumed by someone else
○ fixed in new kernel (>= 4.13)
● fact #2: dirty data are discarded after error
○ retry never really retries the write
○ replaced with elog(PANIC) forcing recovery
● a bunch of bugs in the error handling ;-)
○ error branches are the least tested code
NFS == cosmic rays
database systems
● broken indexes
○ violated constraints
○ index scan misses data, seq scan finds it
● bogus data
○ insufficient UTF-8 validation
○ corrupted varlena headers
● data loss
○ multixact bug
○ inappropriate removal of data
○ data made visible/invisible inappropriately
database bugs
ERROR: invalid memory alloc request size 1073741824
pilot error
DBA mistakes ....
● drop incorrect table
● free space by removing "logs" from pg_xlog
● let's compress everything in DATADIR
● take backups incorrectly
What can you do about it?
Preventing data corruption
● use good hardware
○ good hardware is cheaper than outages
○ no, desktop machines are not good choice
○ ECC RAM is a must
● update regularly
○ minor releases exist for a reason
● test it
○ Are the numbers way too good?
○ What happens in case of power-loss?
○ Does the virtualization honor fsyncs etc.?
Preventing data corruption
● make sure you have backups
○ take them and test them
○ consider higher retention periods
● consider doing extra checks on backups
○ pg_verify_checksums
○ application tests
● "prophylactic" pg_dump is a good idea
○ pg_dump > /dev/null
○ especially when using pg_basebackup
fixing data corruption
So you want to fix in-place ...
There's no universal recipe :-(
Recipe
1) Try restoring from a backup.
○ The backup may be corrupted too.
○ Maybe it'd take too long. (Well, …)
○ ...
2) If you need to fix a corrupted cluster …
○ Always make sure you have a copy.
○ Take detailed notes about each step.
○ Proceed methodically.
Recipe
3) Asses the extent of data corruption
○ Is it just a single object? What object?
○ How many pages/rows/...?
○ ...
4) Ad-hoc recovery steps
○ Rebuild corrupted indexes.
○ Extract as much data as possible. Select "around", use
loop with exception block, …
○ pageinspect is your friend
○ Zero corrupted pages using "dd" etc.
http://thebuild.com/presentations/corruption-war-stories-fosdem-2017.pdf
So, what about data checksums?
data checksums
checksum(page number, page contents)
● available since PostgreSQL 9.3
○ … so all supported versions have them
○ disabled by default
● protects against (some) storage system issues
○ changes to existing pages / torn pages
● has some overhead
○ a couple of %, depends on HW / workload
● correctness vs. availability trade-off
data checksums are not perfect
● can't detect various types of corruption
○ pages written to different files
○ "forgotten" writes
○ truncated files
○ PostgreSQL bugs (before the write)
○ table vs. index mismatch
● may detect (some) memory issues
How do you verify checksums?
● you have to read all the data
● regular SQL queries
○ only active set (but not "hot" data)
● pg_dump
○ no checks for indexes
● pg_basebackup
○ since PostgreSQL 11
● pg_verify_checksums
○ since PostgreSQL 11, offline only
Questions?

More Related Content

Similar to Data corruption

Logging and ranting / Vytis Valentinavičius (Lamoda)
Logging and ranting / Vytis Valentinavičius (Lamoda)Logging and ranting / Vytis Valentinavičius (Lamoda)
Logging and ranting / Vytis Valentinavičius (Lamoda)
Ontico
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
RichardWarburton
 
If the data cannot come to the algorithm...
If the data cannot come to the algorithm...If the data cannot come to the algorithm...
If the data cannot come to the algorithm...
Robert Burrell Donkin
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing Development
CTruncer
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)
Scott Hernandez
 
OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...
OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...
OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...
OpenNebula Project
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC
 
Mongodb meetup
Mongodb meetupMongodb meetup
Mongodb meetup
Eytan Daniyalzade
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 
Caching in
Caching inCaching in
Caching in
RichardWarburton
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
Red Hat Developers
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
Deep Kapadia
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
Arcus Universe Ltd
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
Demi Ben-Ari
 
Understand and optimize Linux I/O
Understand and optimize Linux I/OUnderstand and optimize Linux I/O
Understand and optimize Linux I/O
Andrea Righi
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
Mukesh Singh
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
MartinStrycek
 
Case study of BtrFS: A fault tolerant File system
Case study of BtrFS: A fault tolerant File systemCase study of BtrFS: A fault tolerant File system
Case study of BtrFS: A fault tolerant File system
Kumar Amit Mehta
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
Tamas K Lengyel
 

Similar to Data corruption (20)

Logging and ranting / Vytis Valentinavičius (Lamoda)
Logging and ranting / Vytis Valentinavičius (Lamoda)Logging and ranting / Vytis Valentinavičius (Lamoda)
Logging and ranting / Vytis Valentinavičius (Lamoda)
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
If the data cannot come to the algorithm...
If the data cannot come to the algorithm...If the data cannot come to the algorithm...
If the data cannot come to the algorithm...
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing Development
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)
 
OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...
OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...
OpenNebulaConf2017EU: Torturing OpenNebula for Fun and Profit by Carlo Daffar...
 
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json  postgre-sql vs. mongodbPGConf APAC 2018 - High performance json  postgre-sql vs. mongodb
PGConf APAC 2018 - High performance json postgre-sql vs. mongodb
 
Mongodb meetup
Mongodb meetupMongodb meetup
Mongodb meetup
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
 
Caching in
Caching inCaching in
Caching in
 
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
Understand and optimize Linux I/O
Understand and optimize Linux I/OUnderstand and optimize Linux I/O
Understand and optimize Linux I/O
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
Case study of BtrFS: A fault tolerant File system
Case study of BtrFS: A fault tolerant File systemCase study of BtrFS: A fault tolerant File system
Case study of BtrFS: A fault tolerant File system
 
Dfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshopDfrws eu 2014 rekall workshop
Dfrws eu 2014 rekall workshop
 

More from Tomas Vondra

CREATE STATISTICS - What is it for? (PostgresLondon)
CREATE STATISTICS - What is it for? (PostgresLondon)CREATE STATISTICS - What is it for? (PostgresLondon)
CREATE STATISTICS - What is it for? (PostgresLondon)
Tomas Vondra
 
CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?
Tomas Vondra
 
DB vs. encryption
DB vs. encryptionDB vs. encryption
DB vs. encryption
Tomas Vondra
 
PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6
Tomas Vondra
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
Tomas Vondra
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAltPostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
Tomas Vondra
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
Tomas Vondra
 
Postgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFSPostgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFS
Tomas Vondra
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
Novinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONBNovinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONB
Tomas Vondra
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
Tomas Vondra
 
Výkonnostní archeologie
Výkonnostní archeologieVýkonnostní archeologie
Výkonnostní archeologie
Tomas Vondra
 
Český fulltext a sdílené slovníky
Český fulltext a sdílené slovníkyČeský fulltext a sdílené slovníky
Český fulltext a sdílené slovníky
Tomas Vondra
 
SSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsyncSSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsync
Tomas Vondra
 
Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)
Tomas Vondra
 
Čtení explain planu (CSPUG 21.6.2011)
Čtení explain planu (CSPUG 21.6.2011)Čtení explain planu (CSPUG 21.6.2011)
Čtení explain planu (CSPUG 21.6.2011)
Tomas Vondra
 
Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)
Tomas Vondra
 
PostgreSQL / Performance monitoring
PostgreSQL / Performance monitoringPostgreSQL / Performance monitoring
PostgreSQL / Performance monitoring
Tomas Vondra
 

More from Tomas Vondra (19)

CREATE STATISTICS - What is it for? (PostgresLondon)
CREATE STATISTICS - What is it for? (PostgresLondon)CREATE STATISTICS - What is it for? (PostgresLondon)
CREATE STATISTICS - What is it for? (PostgresLondon)
 
CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?CREATE STATISTICS - what is it for?
CREATE STATISTICS - what is it for?
 
DB vs. encryption
DB vs. encryptionDB vs. encryption
DB vs. encryption
 
PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAltPostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
PostgreSQL na EXT4, XFS, BTRFS a ZFS / OpenAlt
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
 
Postgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFSPostgresql na EXT3/4, XFS, BTRFS a ZFS
Postgresql na EXT3/4, XFS, BTRFS a ZFS
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Novinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONBNovinky v PostgreSQL 9.4 a JSONB
Novinky v PostgreSQL 9.4 a JSONB
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
 
Výkonnostní archeologie
Výkonnostní archeologieVýkonnostní archeologie
Výkonnostní archeologie
 
Český fulltext a sdílené slovníky
Český fulltext a sdílené slovníkyČeský fulltext a sdílené slovníky
Český fulltext a sdílené slovníky
 
SSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsyncSSD vs HDD / WAL, indexes and fsync
SSD vs HDD / WAL, indexes and fsync
 
Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)Checkpoint (CSPUG 22.11.2011)
Checkpoint (CSPUG 22.11.2011)
 
Čtení explain planu (CSPUG 21.6.2011)
Čtení explain planu (CSPUG 21.6.2011)Čtení explain planu (CSPUG 21.6.2011)
Čtení explain planu (CSPUG 21.6.2011)
 
Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)Replikace (CSPUG 19.4.2011)
Replikace (CSPUG 19.4.2011)
 
PostgreSQL / Performance monitoring
PostgreSQL / Performance monitoringPostgreSQL / Performance monitoring
PostgreSQL / Performance monitoring
 

Recently uploaded

The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
kalichargn70th171
 
The Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdfThe Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdf
mohitd6
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Peter Caitens
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
dakas1
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
kgyxske
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
Jhone kinadey
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
kalichargn70th171
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
Pedro J. Molina
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
Drona Infotech
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
ervikas4
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
Luigi Fugaro
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
sandeepmenon62
 
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
OnePlan Solutions
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
Zycus
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
Yara Milbes
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 

Recently uploaded (20)

The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
 
The Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdfThe Role of DevOps in Digital Transformation.pdf
The Role of DevOps in Digital Transformation.pdf
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
 
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
 
Mobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona InfotechMobile App Development Company In Noida | Drona Infotech
Mobile App Development Company In Noida | Drona Infotech
 
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptxMigration From CH 1.0 to CH 2.0 and  Mule 4.6 & Java 17 Upgrade.pptx
Migration From CH 1.0 to CH 2.0 and Mule 4.6 & Java 17 Upgrade.pptx
 
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Data...
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptxOperational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
Operational ease MuleSoft and Salesforce Service Cloud Solution v1.0.pptx
 
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...Transforming Product Development using OnePlan To Boost Efficiency and Innova...
Transforming Product Development using OnePlan To Boost Efficiency and Innova...
 
How GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdfHow GenAI Can Improve Supplier Performance Management.pdf
How GenAI Can Improve Supplier Performance Management.pdf
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
 
bgiolcb
bgiolcbbgiolcb
bgiolcb
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 

Data corruption

  • 1. Data corruption Where does it come from and what can you do about it? Tomas Vondra tomas@2ndquadrant.com / tomas@pgaddict.com
  • 4. The larger and older a database is, the more likely it's corrupted in some way.
  • 5. Agenda ● What is data corruption? ● Sources of data corruption ● What to do about it
  • 6. Data corruption ● many possible causes ● hardware ○ disks / memory ○ environment (cosmic rays, temperature, …) ● software ○ bugs in OS (kernel, fs, libc, ...) ○ bugs in database systems / applications ● administrator mistake ○ remove pg_xlog ...
  • 8. cosmic rays = no clue
  • 10. storage corruption ● this used to be pretty common ○ crappy disks, RAID controllers ○ insufficient power-loss protection ● it got better over time ○ better disks, better RAID controllers ○ when it failed, it failed totally ● now it's getting worse, again :-( ○ crappy disks connected over network (SAN, EBS, NFS, …) ○ layers of virtualization everywhere
  • 12. Operating System ● PostgreSQL is very trusting ○ relies on a lot of stuff ○ assumes it's perfect ● nothing is perfect ○ kernel bugs ○ filesystem bugs ○ glibc bugs (collation updates, …) ○ ...
  • 13. Collations ● rules for language-specific text sort ○ defined in glibc ○ change rarely, poor versioning ● indexes require text ordering to be stable ● what if you build index and then ○ upgrade to diffferent collations ○ replica has different collations ● ICU solves this ○ reliable versioning
  • 14. #fsyncgate ● CHECKPOINT ○ flush data from shared_buffers to disc ○ discard old part of WAL ● in case of fsync error, retry ● assumes two things ○ reliable kernel error reporting ○ dirty data are kept in page cache
  • 15. #fsyncgate ● fact #1: error reporting unreliable ○ behavior depends on kernel version ○ errors may be consumed by someone else ○ fixed in new kernel (>= 4.13) ● fact #2: dirty data are discarded after error ○ retry never really retries the write ○ replaced with elog(PANIC) forcing recovery ● a bunch of bugs in the error handling ;-) ○ error branches are the least tested code
  • 17. database systems ● broken indexes ○ violated constraints ○ index scan misses data, seq scan finds it ● bogus data ○ insufficient UTF-8 validation ○ corrupted varlena headers ● data loss ○ multixact bug ○ inappropriate removal of data ○ data made visible/invisible inappropriately
  • 18. database bugs ERROR: invalid memory alloc request size 1073741824
  • 20. DBA mistakes .... ● drop incorrect table ● free space by removing "logs" from pg_xlog ● let's compress everything in DATADIR ● take backups incorrectly
  • 21. What can you do about it?
  • 22. Preventing data corruption ● use good hardware ○ good hardware is cheaper than outages ○ no, desktop machines are not good choice ○ ECC RAM is a must ● update regularly ○ minor releases exist for a reason ● test it ○ Are the numbers way too good? ○ What happens in case of power-loss? ○ Does the virtualization honor fsyncs etc.?
  • 23. Preventing data corruption ● make sure you have backups ○ take them and test them ○ consider higher retention periods ● consider doing extra checks on backups ○ pg_verify_checksums ○ application tests ● "prophylactic" pg_dump is a good idea ○ pg_dump > /dev/null ○ especially when using pg_basebackup
  • 25. So you want to fix in-place ...
  • 26. There's no universal recipe :-(
  • 27. Recipe 1) Try restoring from a backup. ○ The backup may be corrupted too. ○ Maybe it'd take too long. (Well, …) ○ ... 2) If you need to fix a corrupted cluster … ○ Always make sure you have a copy. ○ Take detailed notes about each step. ○ Proceed methodically.
  • 28. Recipe 3) Asses the extent of data corruption ○ Is it just a single object? What object? ○ How many pages/rows/...? ○ ... 4) Ad-hoc recovery steps ○ Rebuild corrupted indexes. ○ Extract as much data as possible. Select "around", use loop with exception block, … ○ pageinspect is your friend ○ Zero corrupted pages using "dd" etc.
  • 30. So, what about data checksums?
  • 31. data checksums checksum(page number, page contents) ● available since PostgreSQL 9.3 ○ … so all supported versions have them ○ disabled by default ● protects against (some) storage system issues ○ changes to existing pages / torn pages ● has some overhead ○ a couple of %, depends on HW / workload ● correctness vs. availability trade-off
  • 32. data checksums are not perfect ● can't detect various types of corruption ○ pages written to different files ○ "forgotten" writes ○ truncated files ○ PostgreSQL bugs (before the write) ○ table vs. index mismatch ● may detect (some) memory issues
  • 33. How do you verify checksums? ● you have to read all the data ● regular SQL queries ○ only active set (but not "hot" data) ● pg_dump ○ no checks for indexes ● pg_basebackup ○ since PostgreSQL 11 ● pg_verify_checksums ○ since PostgreSQL 11, offline only