SlideShare a Scribd company logo
1 of 26
Download to read offline
GitLab PostgresMortem:
Lessons Learned
Alexey Lesovsky
alexey.lesovsky@dataegret.com
dataegret.com
31 January Events
Failure's key points
Preventative measures
https://goo.gl/GO5rYJ
02
03
01
31 January
events
01
17:20 - an LVM snapshot of the production db was taken.
19:00 - database load increased due to spam.
23:00 - secondary's replication process started to lag behind.
23:30 - PostgreSQL database directory was wiped.
31 January events01
dataegret.com
Failure's
key points
02
1.LVM snapshots and staging provisioning.
2.When a replica start to lag.
3.Do pg_basebackup properly – part 1.
4.max_wal_senders was exceeded, but how?
5.max_connections = 8000.
6.pg_basebackup «stuck» – do pg_basebackup properly – part 2.
7.strace: good thing in wrong place.
8.rm or not rm?
9.A bit about backup.
10.Different PG versions on the production.
11.Broken mail.
31 January events02
dataegret.com
Snapshot impact on underlying storage.
Provisioning from backup.
Staging based on LVM snapshots02
dataegret.com
Re-initialize the standby.
Monitoring with pg_stat_replication.
Use wal_keep_segments while troubleshooting.
Use WAL archive.
When a replica started to lag02
dataegret.com
Do pg_basebackup into clean directory.
Remove «unnecessary» directory.
Use mv instead of rm.
Do pg_basebackup properly. Part 102
dataegret.com
There was only one standby (which was failed).
Increase max_wal_senders.
Check who has stolen connections.
The limit was exceeded by concurrent pg_basebackups.
max_wal_senders was exceeded.02
dataegret.com
More than 500 is bad idea.
Use pgbouncer to reduce the number of server connections.
max_connections = 800002
dataegret.com
Don't run more than one pg_basebackups.
It didn't stuck, it waited for the checkpoint.
Use «-c» option to make fast checkpoint.
Do pg_basebackup properly. Part 202
dataegret.com
Strace isn't a good tool in that case.
Use strace for system errors tracing.
Check stack trace from /proc/<pid>/stack or GDB.
Good things in wrong place.02
dataegret.com
Data directory was cleaned with rm.
Use mv instead of rm.
rm or not rm02
dataegret.com
Daily pg_dump.
Daily LVM snapshot.
Daily Azure snapshot.
PostgreSQL streaming replication.
Basebackup with WAL archive.
A bit about backup02
dataegret.com
Clean out old packages after major upgrade.
Different versions on a production02
dataegret.com
Setup cron, but forgot notifications.
Use reliable notification systems.
Different versions on a production02
dataegret.com
Preventative
measures
03
1. Update PS1 across all hosts to more clearly differentiate between hosts and environments.
2. Prometheus monitoring for backups.
3. Set PostgreSQL's max_connections to a sane value.
4. Investigate Point in time recovery & continuous archiving for PostgreSQL.
5. Hourly LVM snapshots of the production databases.
6. Azure disk snapshots of production databases.
7. Move staging to the ARM environment.
8. Recover production replica(s).
9. Automated testing of recovering PostgreSQL database backups.
10.Improve PostgreSQL replication documentation/runbooks.
11.Investigate pgbarman for creating PostgreSQL backups.
12.Investigate using WAL-E as a means of Database Backup and Realtime Replication.
13.Build Streaming Database Restore.
14.Assign an owner for data durability.
Different versions on a production03
dataegret.com
1. Update PS1 across all hosts.
Looks OK.
2. Prometheus monitoring for backups.
Size, number, age and recovery status.
3. Set PostgreSQL's max_connections to a sane value.
Better use pgbouncer.
4. Investigate PITR & continuous archiving for PostgreSQL.
Yes, as the part of the backup.
Preventative measures03
dataegret.com
5. Hourly LVM snapshots of the production databases.
Looks unnecessary.
6. Azure disk snapshots of production databases.
Looks unnecessary.
7. Move staging to the ARM environment.
Very and very suspicious.
8. Recover production replica(s).
Do that asap.
Preventative measures03
dataegret.com
9. Automated testing of recovering database backups.
YES!
10. Improve documentation/runbooks.
You need a bureaucrat.
11. Investigate pgbarman.
Looks OK, Barman is stable and reliable.
12. Investigate using WAL-E.
Looks OK, WAL-E is the «setup and forget».
Preventative measures03
dataegret.com
13. Build Streaming Database Restore.
Corresponds with p.9.
14. Assign an owner for data durability.
Hire a DBA.
Preventative measures03
dataegret.com
Check and monitor backups.
Create an emergency instructions.
Learn to use tools properly.
Lessons learned03
dataegret.com
Postmortem of database outage of January 31
https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/
PostgreSQL Statistics Collector: pg_stat_replication view
https://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-REPLICATION-VIEW
pg_basebackup utility
https://www.postgresql.org/docs/current/static/app-pgbasebackup.html
PostgreSQL Replication
https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html
PgBouncer
https://pgbouncer.github.io/
https://wiki.postgresql.org/wiki/PgBouncer
Barman
http://www.pgbarman.org/
Links03
dataegret.com
Thanks for watching!
dataegret.com alexey.lesovsky@dataegret.com

More Related Content

What's hot

Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с дискомPostgreSQL-Consulting
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterAlexey Lesovsky
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1Federico Campoli
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres MonitoringDenish Patel
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationAlexey Lesovsky
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaPostgreSQL-Consulting
 
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Ontico
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Logical replication with pglogical
Logical replication with pglogicalLogical replication with pglogical
Logical replication with pglogicalUmair Shahid
 
Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3Sameer Kumar
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)Wei Shan Ang
 
Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Anastasia Lubennikova
 
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...Ontico
 
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Denish Patel
 

What's hot (20)

Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с диском
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenter
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
 
Pgcenter overview
Pgcenter overviewPgcenter overview
Pgcenter overview
 
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
Peeking into the Black Hole Called PL/PGSQL - the New PL Profiler / Jan Wieck...
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
GUC Tutorial Package (9.0)
GUC Tutorial Package (9.0)GUC Tutorial Package (9.0)
GUC Tutorial Package (9.0)
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Logical replication with pglogical
Logical replication with pglogicalLogical replication with pglogical
Logical replication with pglogical
 
Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3
 
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
pgDay Asia 2016 - Swapping Pacemaker-Corosync for repmgr (1)
 
Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)Advanced backup methods (Postgres@CERN)
Advanced backup methods (Postgres@CERN)
 
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
pg / shardman: шардинг в PostgreSQL на основе postgres / fdw, pg / pathman и ...
 
Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)Out of the Box Replication in Postgres 9.4(pgconfsf)
Out of the Box Replication in Postgres 9.4(pgconfsf)
 

Viewers also liked

PostgreSQL Vacuum: Nine Circles of Hell
PostgreSQL Vacuum: Nine Circles of HellPostgreSQL Vacuum: Nine Circles of Hell
PostgreSQL Vacuum: Nine Circles of HellAlexey Lesovsky
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Linux tuning for PostgreSQL at Secon 2015
Linux tuning for PostgreSQL at Secon 2015Linux tuning for PostgreSQL at Secon 2015
Linux tuning for PostgreSQL at Secon 2015Alexey Lesovsky
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Colin Charles
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA EDB
 
Tuning Linux for Databases.
Tuning Linux for Databases.Tuning Linux for Databases.
Tuning Linux for Databases.Alexey Lesovsky
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practiceAlexey Lesovsky
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
Use Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsUse Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsPGConf APAC
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsJignesh Shah
 
Opslag van long tail producten in e-warehouse
Opslag van long tail producten in e-warehouseOpslag van long tail producten in e-warehouse
Opslag van long tail producten in e-warehouseStoreganizer
 
Расширяемость PostgreSQL для хакеров и архитекторов / Олег Бартунов, Александ...
Расширяемость PostgreSQL для хакеров и архитекторов / Олег Бартунов, Александ...Расширяемость PostgreSQL для хакеров и архитекторов / Олег Бартунов, Александ...
Расширяемость PostgreSQL для хакеров и архитекторов / Олег Бартунов, Александ...Ontico
 
5 мифов о производительности баз данных и Python
5 мифов о производительности баз данных и Python5 мифов о производительности баз данных и Python
5 мифов о производительности баз данных и PythonMax Klymyshyn
 
Python и высокая нагрузка
Python и высокая нагрузкаPython и высокая нагрузка
Python и высокая нагрузкаAlexander Shigin
 
Gtd Dev Labs2010 Part Ii
Gtd Dev Labs2010 Part IiGtd Dev Labs2010 Part Ii
Gtd Dev Labs2010 Part IiMaxim Dorofeev
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Андрей Шорин
 
Gtd Dev Labs2010 Part I
Gtd Dev Labs2010 Part IGtd Dev Labs2010 Part I
Gtd Dev Labs2010 Part IMaxim Dorofeev
 

Viewers also liked (19)

PostgreSQL Vacuum: Nine Circles of Hell
PostgreSQL Vacuum: Nine Circles of HellPostgreSQL Vacuum: Nine Circles of Hell
PostgreSQL Vacuum: Nine Circles of Hell
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Linux tuning for PostgreSQL at Secon 2015
Linux tuning for PostgreSQL at Secon 2015Linux tuning for PostgreSQL at Secon 2015
Linux tuning for PostgreSQL at Secon 2015
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
Tuning Linux for Databases.
Tuning Linux for Databases.Tuning Linux for Databases.
Tuning Linux for Databases.
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Use Case: PostGIS and Agribotics
Use Case: PostGIS and AgriboticsUse Case: PostGIS and Agribotics
Use Case: PostGIS and Agribotics
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
Opslag van long tail producten in e-warehouse
Opslag van long tail producten in e-warehouseOpslag van long tail producten in e-warehouse
Opslag van long tail producten in e-warehouse
 
Расширяемость PostgreSQL для хакеров и архитекторов / Олег Бартунов, Александ...
Расширяемость PostgreSQL для хакеров и архитекторов / Олег Бартунов, Александ...Расширяемость PostgreSQL для хакеров и архитекторов / Олег Бартунов, Александ...
Расширяемость PostgreSQL для хакеров и архитекторов / Олег Бартунов, Александ...
 
5 мифов о производительности баз данных и Python
5 мифов о производительности баз данных и Python5 мифов о производительности баз данных и Python
5 мифов о производительности баз данных и Python
 
Python и высокая нагрузка
Python и высокая нагрузкаPython и высокая нагрузка
Python и высокая нагрузка
 
Big Data aggregation techniques
Big Data aggregation techniquesBig Data aggregation techniques
Big Data aggregation techniques
 
Gtd Dev Labs2010 Part Ii
Gtd Dev Labs2010 Part IiGtd Dev Labs2010 Part Ii
Gtd Dev Labs2010 Part Ii
 
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
Как HeadHunter удалось безопасно нарушить RFC 793 (TCP) и обойти сетевые лову...
 
Gtd Dev Labs2010 Part I
Gtd Dev Labs2010 Part IGtd Dev Labs2010 Part I
Gtd Dev Labs2010 Part I
 

Similar to GitLab PostgresMortem: Lessons Learned

Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Denish Patel
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterSrihari Sriraman
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file Systemdiptipan
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...VMware Tanzu
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file systemLalit Rastogi
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.comRenzo Tomà
 
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...PostgreSQL-Consulting
 
Let the Tiger Roar! - MongoDB 3.0 + WiredTiger
Let the Tiger Roar! - MongoDB 3.0 + WiredTigerLet the Tiger Roar! - MongoDB 3.0 + WiredTiger
Let the Tiger Roar! - MongoDB 3.0 + WiredTigerJon Rangel
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALEPostgreSQL Experts, Inc.
 
MySQL Galera 集群
MySQL Galera 集群MySQL Galera 集群
MySQL Galera 集群YUCHENG HU
 
Mysql replication @ gnugroup
Mysql replication @ gnugroupMysql replication @ gnugroup
Mysql replication @ gnugroupJayant Chutke
 
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!MongoDB
 
Upgrade to 2008 Best of PASS
Upgrade to 2008 Best of PASSUpgrade to 2008 Best of PASS
Upgrade to 2008 Best of PASSsqlserver.co.il
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2PgTraining
 

Similar to GitLab PostgresMortem: Lessons Learned (20)

Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2 Deploying postgre sql on amazon ec2
Deploying postgre sql on amazon ec2
 
Google file system
Google file systemGoogle file system
Google file system
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL Cluster
 
advanced Google file System
advanced Google file Systemadvanced Google file System
advanced Google file System
 
Google File System
Google File SystemGoogle File System
Google File System
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
Start Counting: How We Unlocked Platform Efficiency and Reliability While Sav...
 
Advance google file system
Advance google file systemAdvance google file system
Advance google file system
 
SQL Server On SANs
SQL Server On SANsSQL Server On SANs
SQL Server On SANs
 
Scaling an ELK stack at bol.com
Scaling an ELK stack at bol.comScaling an ELK stack at bol.com
Scaling an ELK stack at bol.com
 
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
Ilya Kosmodemiansky - An ultimate guide to upgrading your PostgreSQL installa...
 
Let the Tiger Roar! - MongoDB 3.0 + WiredTiger
Let the Tiger Roar! - MongoDB 3.0 + WiredTigerLet the Tiger Roar! - MongoDB 3.0 + WiredTiger
Let the Tiger Roar! - MongoDB 3.0 + WiredTiger
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
 
MySQL Galera 集群
MySQL Galera 集群MySQL Galera 集群
MySQL Galera 集群
 
Mysql replication @ gnugroup
Mysql replication @ gnugroupMysql replication @ gnugroup
Mysql replication @ gnugroup
 
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
MongoDB World 2019: Becoming an Ops Manager Backup Superhero!
 
Upgrade to 2008 Best of PASS
Upgrade to 2008 Best of PASSUpgrade to 2008 Best of PASS
Upgrade to 2008 Best of PASS
 
Hadoop availability
Hadoop availabilityHadoop availability
Hadoop availability
 
Lxbrand
LxbrandLxbrand
Lxbrand
 
Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2Oracle to Postgres Migration - part 2
Oracle to Postgres Migration - part 2
 

More from Alexey Lesovsky

Отладка и устранение проблем в PostgreSQL Streaming Replication.
Отладка и устранение проблем в PostgreSQL Streaming Replication.Отладка и устранение проблем в PostgreSQL Streaming Replication.
Отладка и устранение проблем в PostgreSQL Streaming Replication.Alexey Lesovsky
 
Call of Postgres: Advanced Operations (part 5)
Call of Postgres: Advanced Operations (part 5)Call of Postgres: Advanced Operations (part 5)
Call of Postgres: Advanced Operations (part 5)Alexey Lesovsky
 
Call of Postgres: Advanced Operations (part 4)
Call of Postgres: Advanced Operations (part 4)Call of Postgres: Advanced Operations (part 4)
Call of Postgres: Advanced Operations (part 4)Alexey Lesovsky
 
Call of Postgres: Advanced Operations (part 3)
Call of Postgres: Advanced Operations (part 3)Call of Postgres: Advanced Operations (part 3)
Call of Postgres: Advanced Operations (part 3)Alexey Lesovsky
 
Call of Postgres: Advanced Operations (part 2)
Call of Postgres: Advanced Operations (part 2)Call of Postgres: Advanced Operations (part 2)
Call of Postgres: Advanced Operations (part 2)Alexey Lesovsky
 
Call of Postgres: Advanced Operations (part 1)
Call of Postgres: Advanced Operations (part 1)Call of Postgres: Advanced Operations (part 1)
Call of Postgres: Advanced Operations (part 1)Alexey Lesovsky
 
PostgreSQL Streaming Replication
PostgreSQL Streaming ReplicationPostgreSQL Streaming Replication
PostgreSQL Streaming ReplicationAlexey Lesovsky
 
Highload 2014. PostgreSQL: ups, DevOps.
Highload 2014. PostgreSQL: ups, DevOps.Highload 2014. PostgreSQL: ups, DevOps.
Highload 2014. PostgreSQL: ups, DevOps.Alexey Lesovsky
 

More from Alexey Lesovsky (8)

Отладка и устранение проблем в PostgreSQL Streaming Replication.
Отладка и устранение проблем в PostgreSQL Streaming Replication.Отладка и устранение проблем в PostgreSQL Streaming Replication.
Отладка и устранение проблем в PostgreSQL Streaming Replication.
 
Call of Postgres: Advanced Operations (part 5)
Call of Postgres: Advanced Operations (part 5)Call of Postgres: Advanced Operations (part 5)
Call of Postgres: Advanced Operations (part 5)
 
Call of Postgres: Advanced Operations (part 4)
Call of Postgres: Advanced Operations (part 4)Call of Postgres: Advanced Operations (part 4)
Call of Postgres: Advanced Operations (part 4)
 
Call of Postgres: Advanced Operations (part 3)
Call of Postgres: Advanced Operations (part 3)Call of Postgres: Advanced Operations (part 3)
Call of Postgres: Advanced Operations (part 3)
 
Call of Postgres: Advanced Operations (part 2)
Call of Postgres: Advanced Operations (part 2)Call of Postgres: Advanced Operations (part 2)
Call of Postgres: Advanced Operations (part 2)
 
Call of Postgres: Advanced Operations (part 1)
Call of Postgres: Advanced Operations (part 1)Call of Postgres: Advanced Operations (part 1)
Call of Postgres: Advanced Operations (part 1)
 
PostgreSQL Streaming Replication
PostgreSQL Streaming ReplicationPostgreSQL Streaming Replication
PostgreSQL Streaming Replication
 
Highload 2014. PostgreSQL: ups, DevOps.
Highload 2014. PostgreSQL: ups, DevOps.Highload 2014. PostgreSQL: ups, DevOps.
Highload 2014. PostgreSQL: ups, DevOps.
 

Recently uploaded

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Recently uploaded (20)

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 

GitLab PostgresMortem: Lessons Learned

  • 1. GitLab PostgresMortem: Lessons Learned Alexey Lesovsky alexey.lesovsky@dataegret.com
  • 2. dataegret.com 31 January Events Failure's key points Preventative measures https://goo.gl/GO5rYJ 02 03 01
  • 4. 17:20 - an LVM snapshot of the production db was taken. 19:00 - database load increased due to spam. 23:00 - secondary's replication process started to lag behind. 23:30 - PostgreSQL database directory was wiped. 31 January events01 dataegret.com
  • 6. 1.LVM snapshots and staging provisioning. 2.When a replica start to lag. 3.Do pg_basebackup properly – part 1. 4.max_wal_senders was exceeded, but how? 5.max_connections = 8000. 6.pg_basebackup «stuck» – do pg_basebackup properly – part 2. 7.strace: good thing in wrong place. 8.rm or not rm? 9.A bit about backup. 10.Different PG versions on the production. 11.Broken mail. 31 January events02 dataegret.com
  • 7. Snapshot impact on underlying storage. Provisioning from backup. Staging based on LVM snapshots02 dataegret.com
  • 8. Re-initialize the standby. Monitoring with pg_stat_replication. Use wal_keep_segments while troubleshooting. Use WAL archive. When a replica started to lag02 dataegret.com
  • 9. Do pg_basebackup into clean directory. Remove «unnecessary» directory. Use mv instead of rm. Do pg_basebackup properly. Part 102 dataegret.com
  • 10. There was only one standby (which was failed). Increase max_wal_senders. Check who has stolen connections. The limit was exceeded by concurrent pg_basebackups. max_wal_senders was exceeded.02 dataegret.com
  • 11. More than 500 is bad idea. Use pgbouncer to reduce the number of server connections. max_connections = 800002 dataegret.com
  • 12. Don't run more than one pg_basebackups. It didn't stuck, it waited for the checkpoint. Use «-c» option to make fast checkpoint. Do pg_basebackup properly. Part 202 dataegret.com
  • 13. Strace isn't a good tool in that case. Use strace for system errors tracing. Check stack trace from /proc/<pid>/stack or GDB. Good things in wrong place.02 dataegret.com
  • 14. Data directory was cleaned with rm. Use mv instead of rm. rm or not rm02 dataegret.com
  • 15. Daily pg_dump. Daily LVM snapshot. Daily Azure snapshot. PostgreSQL streaming replication. Basebackup with WAL archive. A bit about backup02 dataegret.com
  • 16. Clean out old packages after major upgrade. Different versions on a production02 dataegret.com
  • 17. Setup cron, but forgot notifications. Use reliable notification systems. Different versions on a production02 dataegret.com
  • 19. 1. Update PS1 across all hosts to more clearly differentiate between hosts and environments. 2. Prometheus monitoring for backups. 3. Set PostgreSQL's max_connections to a sane value. 4. Investigate Point in time recovery & continuous archiving for PostgreSQL. 5. Hourly LVM snapshots of the production databases. 6. Azure disk snapshots of production databases. 7. Move staging to the ARM environment. 8. Recover production replica(s). 9. Automated testing of recovering PostgreSQL database backups. 10.Improve PostgreSQL replication documentation/runbooks. 11.Investigate pgbarman for creating PostgreSQL backups. 12.Investigate using WAL-E as a means of Database Backup and Realtime Replication. 13.Build Streaming Database Restore. 14.Assign an owner for data durability. Different versions on a production03 dataegret.com
  • 20. 1. Update PS1 across all hosts. Looks OK. 2. Prometheus monitoring for backups. Size, number, age and recovery status. 3. Set PostgreSQL's max_connections to a sane value. Better use pgbouncer. 4. Investigate PITR & continuous archiving for PostgreSQL. Yes, as the part of the backup. Preventative measures03 dataegret.com
  • 21. 5. Hourly LVM snapshots of the production databases. Looks unnecessary. 6. Azure disk snapshots of production databases. Looks unnecessary. 7. Move staging to the ARM environment. Very and very suspicious. 8. Recover production replica(s). Do that asap. Preventative measures03 dataegret.com
  • 22. 9. Automated testing of recovering database backups. YES! 10. Improve documentation/runbooks. You need a bureaucrat. 11. Investigate pgbarman. Looks OK, Barman is stable and reliable. 12. Investigate using WAL-E. Looks OK, WAL-E is the «setup and forget». Preventative measures03 dataegret.com
  • 23. 13. Build Streaming Database Restore. Corresponds with p.9. 14. Assign an owner for data durability. Hire a DBA. Preventative measures03 dataegret.com
  • 24. Check and monitor backups. Create an emergency instructions. Learn to use tools properly. Lessons learned03 dataegret.com
  • 25. Postmortem of database outage of January 31 https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/ PostgreSQL Statistics Collector: pg_stat_replication view https://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-REPLICATION-VIEW pg_basebackup utility https://www.postgresql.org/docs/current/static/app-pgbasebackup.html PostgreSQL Replication https://www.postgresql.org/docs/9.6/static/runtime-config-replication.html PgBouncer https://pgbouncer.github.io/ https://wiki.postgresql.org/wiki/PgBouncer Barman http://www.pgbarman.org/ Links03 dataegret.com
  • 26. Thanks for watching! dataegret.com alexey.lesovsky@dataegret.com