SlideShare a Scribd company logo
Database Recovery
Creating an Automation Plan for Restoration
2
Preparation
+ Note database size, Postgres configuration
+ Enable archiving of database transactions
+ Continuous archive of WAL segments
+ Optional: Create restore points for PITR
+ Backup control function:
pg_create_restore_point(name)
+ Can be done on each deploy
3
Initial Preparation
+ Default logging depends on used packages
+ Likely to be syslog or stderr
+ Have to use log_line_prefix to specify what’s
included
+ Can specify CSV format
+ Import to a table if needed
+ Don’t need to specify what’s reported — all
information outputted
4
Logging
+ In postgresql.conf:
+ logging_collector = on (requires restart)
+ log_destination = 'csvlog'
+ log_directory = '/var/log/postgresql'
+ log_filename = 'postgresql-%a.log'
5
Logging
+ Records of every change made to the database's
data files
+ Postgres maintains a write ahead log in the
pg_xlog/ subdirectory of cluster’s data directory
+ Can "replay" the log entries
6
Write Ahead Log (WAL) Files
+ https://github.com/wal-e/wal-e
+ Continuous WAL archiving Python tool
+ sudo python3 -m pip install wal-e[aws,azure,google,swift]
+ Works on most operating systems
+ Can push to S3, Azure Blob Store, Google Storage, Swift
7
Archiving WAL segments
+ If using cloud-based solution, ensure proper roles and
permissions for storing and retrieving
+ S3: IAM user roles and bucket policies
+ Azure: Custom Role-Based Access Control
+ Google Cloud Store: Access Control Lists
+ Ensure master can access and write to bucket, backup
can access and read
+ Don’t use your root keys!
8
Storing WAL Files
Key commands:
backup-fetch
backup-push
wal-fetch
wal-push
delete
wal-e continuous archiving tool setup
9
/etc/wal-e.d/env environment
variables (for S3):
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
WALE_S3_PREFIX
10
wal-e key commands
+ Pushes a base backup to storage
+ Point to Postgres directory
+ envdir /etc/wal-e.d/env /usr/local/wal-e/bin/wal-e --
terse backup-push /var/lib/pg/9.6/main
+ Recommend adding to a daily cron job
11
backup-push
+ List base backups
+ Should be able to run as the Postgres user
+ Useful to test out wal-e configuration
12
backup-list
13
+ Restores a base backup from storage
+ Allows keyword LATEST for latest base
backup
+ Can specify a backup from backup-list
+ envdir /etc/wal-e.d/env /usr/local/wal-e/bin/wal-e
backup-fetch /var/lib/postgresql/9.6/main LATEST
14
backup-fetch
+ Delete data from storage
+ Needs --confirm flag
+ Also accepts --dry-run
+ Accepts 'before', 'retain', 'everything'
+ wal-e delete [--confirm] retain 5
+ Delete all backups and segment files older
than the 5 most recent
15
delete
+ Use in backup db’s recovery.conf file to fetch
WAL files
+ Accepts --prefetch parameter
+ Download more WAL files as time is spent
recovering
+ 8 WAL files by default, can increase
16
wal-fetch
+ Set as archive_command in master database
server configuration
+ Increase throughput by pooling WAL segments
together to send in groups
+ --pool-size parameter available (defaults to 8
as of version 0.7)
17
wal-push
+ archive_mode = on
+ Defaulted to off. Need to restart database to be put
into effect.
+ archive_command = 'envdir /etc/wal-
e.d/env/ /usr/local/wal-e/bin/wal-e --
terse wal-push %p'
+ %p = relative path and the filename of the WAL
segment to be archived
18
Archiving WAL segments using wal-e
+ Avoid storing secret information in postgresql.conf
+ PostgreSQL users can check pg_settings table and see
archive_command
+ envdir as alternative
+ Allows command to use files as environment variables with
the name as the key
+ Part of daemontools
+ Available in Debian, can write a wrapper script if not easily
installable
19
envdir
S3 Archive
20
21
Restoring the Database
+ Spin up a server
+ Configure Postgresql settings
+ Create a recovery.conf file
+ Begin backup fetch
+ Start Postgres
+ Perform sample queries
+ Notify on success
22
Automated Restoration Script
23
+ Script starts up EC2 instance in AWS
+ Loads custom AMI with scripts for setting up
Postgres and starting the restoration,
environment variables
24
Spinning up a server
25
Configure Postgresql settings
Create a recovery.conf file
Start backup fetch
Start Postgres
Perform sample queries
Notify on success
Automated Restoration Script
26
I, [2016-08-17T20:54:16.516658 #9196] INFO -- :
Setting up configuration files
I, [2016-08-17T20:55:30.782533 #9300] INFO -- :
Setup complete. Beginning backup fetch.
I, [2016-08-18T21:12:05.646145 #29825] INFO -- :
Backup fetch complete.
I, [2016-08-18T22:20:06.445003 #29825] INFO -- :
Starting postgres.
I, [2016-08-18T22:12:07.082780 #29825] INFO -- :
Postgres started. Restore under way
I, [2016-08-18T24:12:07.082855 #29825] INFO -- :
Restore complete. Reporting to Datadog
+ Install Postgres, tune postgresql.conf
+ Create recovery.conf
+ Done with script or configuration
management/orchestration tool
+ May be quicker to start up with script
27
Configure Postgres Settings
cat /var/lib/postgresql/9.6/main/recovery.conf
restore_command = 'envdir /etc/wal-e.d/env
/usr/local/wal-e/bin/wal-e --terse wal-fetch "%f" "%p"'
recovery_target_timeline = 'LATEST'
+ If point in time: recovery_target_time = '2017-01-13 13:00:00'
recovery_target_name = 'deploy tag'
28
recovery.conf setup
wal_e.main INFO MSG: starting WAL-E
DETAIL: The subcommand is "backup-fetch".
STRUCTURED: time=2017-02-16T16:22:33.088767-00 pid=5444
wal_e.worker.s3.s3_worker INFO MSG: beginning partition download
DETAIL: The partition being downloaded is part_00000000.tar.lzo.
HINT: The absolute S3 key is production-
database/basebackups_005/base_000000010000230C00000039_00010808/tar_parti
tions/part_00000000.tar.lzo.
29
fetch log output
30
+ "archive recovery complete" text in csv log
+ recovery.conf file -> recovery.done
31
Checking for Completion
def restore_complete?
day = Date.today.strftime('%a')
! `less /var/log/postgresql/postgresql-#{day}.csv | grep "archive r
end
+ 2017-03-02 21:52:44.282 UTC,,,5292,,58b89426.14ac,12,,2017-03-02
21:52:38 UTC,1/0,0,LOG,00000,"archive recovery complete",,,,,,,,,""
+ 2017-03-02 21:52:44.386 UTC,,,5292,,58b89426.14ac,13,,2017-03-02
21:52:38 UTC,1/0,0,LOG,00000,"MultiXact member wraparound
protections are now enabled",,,,,,,,,""
+ 2017-03-02 21:52:44.389 UTC,,,5290,,58b89426.14aa,3,,2017-03-02
21:52:38 UTC,,0,LOG,00000,"database system is ready to accept
connections",,,,,,,,,""
+ 2017-03-02 21:52:44.389 UTC,,,5592,,58b8942c.15d8,1,,2017-03-02
21:52:44 UTC,,0,LOG,00000,"autovacuum launcher started",,,,,,,,,""
32
Checking for Completion
+ Run queries against database
+ Timestamps of frequently updated tables
33
Checking for Completion
34
Checking for Completion
def latest_session_page_timestamp
end
PG.connect(dbname: 'procore', user: 'postgres').e
DESC LIMIT 1;")[0]["created_at"]
35
Checking for Completion
DETAIL: The partition being downloaded is part_000000
`cat /var/log/syslog | grep "The partition being down
36
Reporting Completion
def report_back_results
end
Datadog::Statsd.new('localhost', 8125).event("Re
37
Reporting Completion
38
Things to look out for
+ Incompatible configurations for Postgres recovery
server vs master db server
+ Instance not large enough to hold recovered db
+ Incorrect keys for wal-e configuration
+ Check Postgres logs for troubleshooting!
39
Things to look out for
+ 
40
+ Run through script, ssh to server periodically to
check in on logs
+ Double-check final recorded transaction log,
frequently updated table timestamp
+ Don’t wait for something to go wrong to test this!
+ Untested backups are not backups!
41
Testing Notes
42
Questions?
(Also, hi, yes, Procore is hiring!)
Tweet at me @enkei9
Email at:
sre@procore.com
nina@procore.com

More Related Content

What's hot

PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
Equnix Business Solutions
 
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky HaryadiPGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
Equnix Business Solutions
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
Open Source Consulting
 
Oracle cluster installation with grid and iscsi
Oracle cluster  installation with grid and iscsiOracle cluster  installation with grid and iscsi
Oracle cluster installation with grid and iscsi
Chanaka Lasantha
 
Oracle cluster installation with grid and nfs
Oracle cluster  installation with grid and nfsOracle cluster  installation with grid and nfs
Oracle cluster installation with grid and nfs
Chanaka Lasantha
 
Oracle goldengate 11g schema replication from standby database
Oracle goldengate 11g schema replication from standby databaseOracle goldengate 11g schema replication from standby database
Oracle goldengate 11g schema replication from standby database
uzzal basak
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
Young Pyo
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Adrian Huang
 
Oracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueuesOracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueuesKyle Hailey
 
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
NETWAYS
 
Apache Hadoop Shell Rewrite
Apache Hadoop Shell RewriteApache Hadoop Shell Rewrite
Apache Hadoop Shell Rewrite
Allen Wittenauer
 
Oracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingOracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingKyle Hailey
 
MySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB StatusMySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB Status
Karwin Software Solutions LLC
 
Hadoop Installation and basic configuration
Hadoop Installation and basic configurationHadoop Installation and basic configuration
Hadoop Installation and basic configurationGerrit van Vuuren
 
Percona Toolkit for Effective MySQL Administration
Percona Toolkit for Effective MySQL AdministrationPercona Toolkit for Effective MySQL Administration
Percona Toolkit for Effective MySQL Administration
Mydbops
 
Postgres 12 Cluster Database operations.
Postgres 12 Cluster Database operations.Postgres 12 Cluster Database operations.
Postgres 12 Cluster Database operations.
Vijay Kumar N
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
PgDay.Seoul
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
InfluxData
 
OOUG: Oracle transaction locking
OOUG: Oracle transaction lockingOOUG: Oracle transaction locking
OOUG: Oracle transaction lockingKyle Hailey
 

What's hot (20)

PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
PGConf.ASIA 2019 Bali - Mission Critical Production High Availability Postgre...
 
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky HaryadiPGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
PGConf.ASIA 2019 - High Availability, 10 Seconds Failover - Lucky Haryadi
 
Ceph issue 해결 사례
Ceph issue 해결 사례Ceph issue 해결 사례
Ceph issue 해결 사례
 
Oracle cluster installation with grid and iscsi
Oracle cluster  installation with grid and iscsiOracle cluster  installation with grid and iscsi
Oracle cluster installation with grid and iscsi
 
Oracle cluster installation with grid and nfs
Oracle cluster  installation with grid and nfsOracle cluster  installation with grid and nfs
Oracle cluster installation with grid and nfs
 
Oracle goldengate 11g schema replication from standby database
Oracle goldengate 11g schema replication from standby databaseOracle goldengate 11g schema replication from standby database
Oracle goldengate 11g schema replication from standby database
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
 
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is bootedVmlinux: anatomy of bzimage and how x86 64 processor is booted
Vmlinux: anatomy of bzimage and how x86 64 processor is booted
 
Oracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueuesOracle 10g Performance: chapter 09 enqueues
Oracle 10g Performance: chapter 09 enqueues
 
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
Open Source Backup Conference 2014: Workshop bareos introduction, by Philipp ...
 
Apache Hadoop Shell Rewrite
Apache Hadoop Shell RewriteApache Hadoop Shell Rewrite
Apache Hadoop Shell Rewrite
 
Oracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 samplingOracle 10g Performance: chapter 00 sampling
Oracle 10g Performance: chapter 00 sampling
 
MySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB StatusMySQL 5.5 Guide to InnoDB Status
MySQL 5.5 Guide to InnoDB Status
 
Hadoop Installation and basic configuration
Hadoop Installation and basic configurationHadoop Installation and basic configuration
Hadoop Installation and basic configuration
 
Oracle Golden Gate
Oracle Golden GateOracle Golden Gate
Oracle Golden Gate
 
Percona Toolkit for Effective MySQL Administration
Percona Toolkit for Effective MySQL AdministrationPercona Toolkit for Effective MySQL Administration
Percona Toolkit for Effective MySQL Administration
 
Postgres 12 Cluster Database operations.
Postgres 12 Cluster Database operations.Postgres 12 Cluster Database operations.
Postgres 12 Cluster Database operations.
 
PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개PostgreSQL 9.6 새 기능 소개
PostgreSQL 9.6 새 기능 소개
 
Performance Profiling in Rust
Performance Profiling in RustPerformance Profiling in Rust
Performance Profiling in Rust
 
OOUG: Oracle transaction locking
OOUG: Oracle transaction lockingOOUG: Oracle transaction locking
OOUG: Oracle transaction locking
 

Similar to Automating Disaster Recovery PostgreSQL

PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRestPGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam
 
Schema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cSchema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12c
uzzal basak
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
YoungHeon (Roy) Kim
 
Restore MySQL database from mysqlbackup
Restore MySQL database from mysqlbackup Restore MySQL database from mysqlbackup
Restore MySQL database from mysqlbackup
AllDatabaseSolutions
 
Oracle applications 11i hot backup cloning with rapid clone
Oracle applications 11i hot backup cloning with rapid cloneOracle applications 11i hot backup cloning with rapid clone
Oracle applications 11i hot backup cloning with rapid cloneDeepti Singh
 
Oracle applications 11i hot backup cloning with rapid clone
Oracle applications 11i hot backup cloning with rapid cloneOracle applications 11i hot backup cloning with rapid clone
Oracle applications 11i hot backup cloning with rapid cloneDeepti Singh
 
Vagrant, Ansible, and OpenStack on your laptop
Vagrant, Ansible, and OpenStack on your laptopVagrant, Ansible, and OpenStack on your laptop
Vagrant, Ansible, and OpenStack on your laptop
Lorin Hochstein
 
Asian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On UblAsian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On Ubl
newrforce
 
htogcp.docx
htogcp.docxhtogcp.docx
htogcp.docx
suman4tibco
 
Puppet
PuppetPuppet
Puppet
csrocks
 
Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)
Anastasia Lubennikova
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
Denish Patel
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Denish Patel
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Denish Patel
 
Setup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationSetup oracle golden gate 11g replication
Setup oracle golden gate 11g replication
Kanwar Batra
 
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش پانزدهم
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش پانزدهمآموزش مدیریت بانک اطلاعاتی اوراکل - بخش پانزدهم
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش پانزدهم
faradars
 
Web2py Code Lab
Web2py Code LabWeb2py Code Lab
Web2py Code LabColin Su
 
X64服务器 lnmp服务器部署标准 new
X64服务器 lnmp服务器部署标准 newX64服务器 lnmp服务器部署标准 new
X64服务器 lnmp服务器部署标准 newYiwei Ma
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights
Sruthi Kumar Annamnidu
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELK
YoungHeon (Roy) Kim
 

Similar to Automating Disaster Recovery PostgreSQL (20)

PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRestPGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
PGDay.Amsterdam 2018 - Stefan Fercot - Save your data with pgBackRest
 
Schema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cSchema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12c
 
MySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & GrafanaMySQL Monitoring using Prometheus & Grafana
MySQL Monitoring using Prometheus & Grafana
 
Restore MySQL database from mysqlbackup
Restore MySQL database from mysqlbackup Restore MySQL database from mysqlbackup
Restore MySQL database from mysqlbackup
 
Oracle applications 11i hot backup cloning with rapid clone
Oracle applications 11i hot backup cloning with rapid cloneOracle applications 11i hot backup cloning with rapid clone
Oracle applications 11i hot backup cloning with rapid clone
 
Oracle applications 11i hot backup cloning with rapid clone
Oracle applications 11i hot backup cloning with rapid cloneOracle applications 11i hot backup cloning with rapid clone
Oracle applications 11i hot backup cloning with rapid clone
 
Vagrant, Ansible, and OpenStack on your laptop
Vagrant, Ansible, and OpenStack on your laptopVagrant, Ansible, and OpenStack on your laptop
Vagrant, Ansible, and OpenStack on your laptop
 
Asian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On UblAsian Spirit 3 Day Dba On Ubl
Asian Spirit 3 Day Dba On Ubl
 
htogcp.docx
htogcp.docxhtogcp.docx
htogcp.docx
 
Puppet
PuppetPuppet
Puppet
 
Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)
 
Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4Out of the box replication in postgres 9.4
Out of the box replication in postgres 9.4
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
 
Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)Out of the Box Replication in Postgres 9.4(PgCon)
Out of the Box Replication in Postgres 9.4(PgCon)
 
Setup oracle golden gate 11g replication
Setup oracle golden gate 11g replicationSetup oracle golden gate 11g replication
Setup oracle golden gate 11g replication
 
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش پانزدهم
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش پانزدهمآموزش مدیریت بانک اطلاعاتی اوراکل - بخش پانزدهم
آموزش مدیریت بانک اطلاعاتی اوراکل - بخش پانزدهم
 
Web2py Code Lab
Web2py Code LabWeb2py Code Lab
Web2py Code Lab
 
X64服务器 lnmp服务器部署标准 new
X64服务器 lnmp服务器部署标准 newX64服务器 lnmp服务器部署标准 new
X64服务器 lnmp服务器部署标准 new
 
Hadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup InsightsHadoop Cluster - Basic OS Setup Insights
Hadoop Cluster - Basic OS Setup Insights
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELK
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 

Automating Disaster Recovery PostgreSQL

  • 1. Database Recovery Creating an Automation Plan for Restoration
  • 3. + Note database size, Postgres configuration + Enable archiving of database transactions + Continuous archive of WAL segments + Optional: Create restore points for PITR + Backup control function: pg_create_restore_point(name) + Can be done on each deploy 3 Initial Preparation
  • 4. + Default logging depends on used packages + Likely to be syslog or stderr + Have to use log_line_prefix to specify what’s included + Can specify CSV format + Import to a table if needed + Don’t need to specify what’s reported — all information outputted 4 Logging
  • 5. + In postgresql.conf: + logging_collector = on (requires restart) + log_destination = 'csvlog' + log_directory = '/var/log/postgresql' + log_filename = 'postgresql-%a.log' 5 Logging
  • 6. + Records of every change made to the database's data files + Postgres maintains a write ahead log in the pg_xlog/ subdirectory of cluster’s data directory + Can "replay" the log entries 6 Write Ahead Log (WAL) Files
  • 7. + https://github.com/wal-e/wal-e + Continuous WAL archiving Python tool + sudo python3 -m pip install wal-e[aws,azure,google,swift] + Works on most operating systems + Can push to S3, Azure Blob Store, Google Storage, Swift 7 Archiving WAL segments
  • 8. + If using cloud-based solution, ensure proper roles and permissions for storing and retrieving + S3: IAM user roles and bucket policies + Azure: Custom Role-Based Access Control + Google Cloud Store: Access Control Lists + Ensure master can access and write to bucket, backup can access and read + Don’t use your root keys! 8 Storing WAL Files
  • 9. Key commands: backup-fetch backup-push wal-fetch wal-push delete wal-e continuous archiving tool setup 9 /etc/wal-e.d/env environment variables (for S3): AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_REGION WALE_S3_PREFIX
  • 11. + Pushes a base backup to storage + Point to Postgres directory + envdir /etc/wal-e.d/env /usr/local/wal-e/bin/wal-e -- terse backup-push /var/lib/pg/9.6/main + Recommend adding to a daily cron job 11 backup-push
  • 12. + List base backups + Should be able to run as the Postgres user + Useful to test out wal-e configuration 12 backup-list
  • 13. 13
  • 14. + Restores a base backup from storage + Allows keyword LATEST for latest base backup + Can specify a backup from backup-list + envdir /etc/wal-e.d/env /usr/local/wal-e/bin/wal-e backup-fetch /var/lib/postgresql/9.6/main LATEST 14 backup-fetch
  • 15. + Delete data from storage + Needs --confirm flag + Also accepts --dry-run + Accepts 'before', 'retain', 'everything' + wal-e delete [--confirm] retain 5 + Delete all backups and segment files older than the 5 most recent 15 delete
  • 16. + Use in backup db’s recovery.conf file to fetch WAL files + Accepts --prefetch parameter + Download more WAL files as time is spent recovering + 8 WAL files by default, can increase 16 wal-fetch
  • 17. + Set as archive_command in master database server configuration + Increase throughput by pooling WAL segments together to send in groups + --pool-size parameter available (defaults to 8 as of version 0.7) 17 wal-push
  • 18. + archive_mode = on + Defaulted to off. Need to restart database to be put into effect. + archive_command = 'envdir /etc/wal- e.d/env/ /usr/local/wal-e/bin/wal-e -- terse wal-push %p' + %p = relative path and the filename of the WAL segment to be archived 18 Archiving WAL segments using wal-e
  • 19. + Avoid storing secret information in postgresql.conf + PostgreSQL users can check pg_settings table and see archive_command + envdir as alternative + Allows command to use files as environment variables with the name as the key + Part of daemontools + Available in Debian, can write a wrapper script if not easily installable 19 envdir
  • 22. + Spin up a server + Configure Postgresql settings + Create a recovery.conf file + Begin backup fetch + Start Postgres + Perform sample queries + Notify on success 22 Automated Restoration Script
  • 23. 23
  • 24. + Script starts up EC2 instance in AWS + Loads custom AMI with scripts for setting up Postgres and starting the restoration, environment variables 24 Spinning up a server
  • 25. 25
  • 26. Configure Postgresql settings Create a recovery.conf file Start backup fetch Start Postgres Perform sample queries Notify on success Automated Restoration Script 26 I, [2016-08-17T20:54:16.516658 #9196] INFO -- : Setting up configuration files I, [2016-08-17T20:55:30.782533 #9300] INFO -- : Setup complete. Beginning backup fetch. I, [2016-08-18T21:12:05.646145 #29825] INFO -- : Backup fetch complete. I, [2016-08-18T22:20:06.445003 #29825] INFO -- : Starting postgres. I, [2016-08-18T22:12:07.082780 #29825] INFO -- : Postgres started. Restore under way I, [2016-08-18T24:12:07.082855 #29825] INFO -- : Restore complete. Reporting to Datadog
  • 27. + Install Postgres, tune postgresql.conf + Create recovery.conf + Done with script or configuration management/orchestration tool + May be quicker to start up with script 27 Configure Postgres Settings
  • 28. cat /var/lib/postgresql/9.6/main/recovery.conf restore_command = 'envdir /etc/wal-e.d/env /usr/local/wal-e/bin/wal-e --terse wal-fetch "%f" "%p"' recovery_target_timeline = 'LATEST' + If point in time: recovery_target_time = '2017-01-13 13:00:00' recovery_target_name = 'deploy tag' 28 recovery.conf setup
  • 29. wal_e.main INFO MSG: starting WAL-E DETAIL: The subcommand is "backup-fetch". STRUCTURED: time=2017-02-16T16:22:33.088767-00 pid=5444 wal_e.worker.s3.s3_worker INFO MSG: beginning partition download DETAIL: The partition being downloaded is part_00000000.tar.lzo. HINT: The absolute S3 key is production- database/basebackups_005/base_000000010000230C00000039_00010808/tar_parti tions/part_00000000.tar.lzo. 29 fetch log output
  • 30. 30
  • 31. + "archive recovery complete" text in csv log + recovery.conf file -> recovery.done 31 Checking for Completion def restore_complete? day = Date.today.strftime('%a') ! `less /var/log/postgresql/postgresql-#{day}.csv | grep "archive r end
  • 32. + 2017-03-02 21:52:44.282 UTC,,,5292,,58b89426.14ac,12,,2017-03-02 21:52:38 UTC,1/0,0,LOG,00000,"archive recovery complete",,,,,,,,,"" + 2017-03-02 21:52:44.386 UTC,,,5292,,58b89426.14ac,13,,2017-03-02 21:52:38 UTC,1/0,0,LOG,00000,"MultiXact member wraparound protections are now enabled",,,,,,,,,"" + 2017-03-02 21:52:44.389 UTC,,,5290,,58b89426.14aa,3,,2017-03-02 21:52:38 UTC,,0,LOG,00000,"database system is ready to accept connections",,,,,,,,,"" + 2017-03-02 21:52:44.389 UTC,,,5592,,58b8942c.15d8,1,,2017-03-02 21:52:44 UTC,,0,LOG,00000,"autovacuum launcher started",,,,,,,,,"" 32 Checking for Completion
  • 33. + Run queries against database + Timestamps of frequently updated tables 33 Checking for Completion
  • 34. 34 Checking for Completion def latest_session_page_timestamp end PG.connect(dbname: 'procore', user: 'postgres').e DESC LIMIT 1;")[0]["created_at"]
  • 35. 35 Checking for Completion DETAIL: The partition being downloaded is part_000000 `cat /var/log/syslog | grep "The partition being down
  • 38. 38 Things to look out for
  • 39. + Incompatible configurations for Postgres recovery server vs master db server + Instance not large enough to hold recovered db + Incorrect keys for wal-e configuration + Check Postgres logs for troubleshooting! 39 Things to look out for
  • 40. + 40
  • 41. + Run through script, ssh to server periodically to check in on logs + Double-check final recorded transaction log, frequently updated table timestamp + Don’t wait for something to go wrong to test this! + Untested backups are not backups! 41 Testing Notes
  • 42. 42 Questions? (Also, hi, yes, Procore is hiring!) Tweet at me @enkei9 Email at: sre@procore.com nina@procore.com