SlideShare a Scribd company logo
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Keisuke Suzuki
Software engineer
How to Upgrade Major Version of
Your Production PostgreSQL
2018/12/12 PGConf.ASIA
1
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Who am I?
Keisuke Suzuki
• Backend Engineer @ Treasure Data K.K.
– PlazmaDB: distributed storage
– Datatank: data mart
Both uses PostgreSQL internally
• Interest: DB / Distributed system / Performance
optimization
• Twitter: @yajilobee
2
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PostgreSQL Versions
3
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PostgreSQL Versions
• Major version: released yearly - new features
– DB data are not compatible between different major versions
• Minor version: every 3 months at least - bug & security fixes
– DB data are compatible if major versions are the same
9.6.11 10.6
major minor major minor
4
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
EOL of Major Versions
https://www.postgresql.org/support/versioning/
The PostgreSQL Global Development Group supports a
major version for 5 years after its initial release.
5
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Q: Should we follow the latest Major Version?
A: Depends on the case
-> Upgrading requires downtime and may cause incompatibility
• Extended support is provided by venders
– Mission critical systems
• SaaS venders may stop providing old major versions
– e.g.) 9.3 retirement on Amazon RDS
✓ Stop Deployment: Aug. 2018
✓ Force Major Version Upgrade: Nov. 2018
6
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Version
We were here until May 2018...
7
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PostgreSQL Usage in
Treasure Data
8
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Arm Treasure Data eCDP
9
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
PlazmaDB: Core Storage Layer of TD
Streaming data
Bulk load
PlazmaDB
Metadata
(PostgreSQL)
AWS S3 /
Riak CS
10
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Daily Workload & Storage Size of PlazmaDB
Import Analytical Query Storage size
500 Billion Records / day
~ 5.8 Million Records / sec
5 PB (+5~10 TB / day)
55 Trillion Records
600,000 Queries / day
15 Trillion Records / day
11
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
AWS RDS
Server Structure of Meta DB
Master
r3.8xlarge
(32 vcores, 244GB RAM)
Multi-AZ
Read replica
(asynchronous streaming
replication)
m3.large
(2 vcores, 7.5GB RAM)
No production workloadConnection from
applications
12
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Data Volume
PlazmaDB
Meta DB (PostgreSQL)
Realtime Storage Archive Storage
AWS S3 / Riak CS
5 PB
GiST GiST
Partition
Metadata
Partition
Metadata
1 TB
13
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
CPU & IO utilization of PostgreSQL (Meta DB)
AVG: 13%
MAX: 25%
AVG: 700 read IOPS
1100 write IOPS
MAX: 2500 read IOPS
3000 read IOPS
14
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
TPS & WAL Throughput
AVG: 1.2k TPS
MAX: 1.5k TPS
AVG: 4MB/sec
MAX: 20MB/sec
15
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
If PlazmaDB was down..
Streaming data
Bulk load
PlazmaDB
Metadata
(PostgreSQL)
AWS S3 /
Riak CS
16
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
If PlazmaDB was down..
17
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Planning Major Version Upgrade
18
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Long Way to Complete Upgrade
• Choose upgrade strategy
• Choose the next major version
• Plan operation and evaluation (include Rollback ops)
• Regression Test on the new major version
19
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Major Version Upgrade Strategies
• pg_dump/pg_dumpall
• pg_upgrade
• Logical replication
Old ver New ver
SQL
dump restore
Old ver New ver
Log shipping
Convert data
Old ver New ver
20
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade with pg_dump/pg_dumpall
Old ver New ver
SQL
2. pg_dump
/pg_dumpall
3. psql
/ pg_restore
1. Stop DB 4. Sart DB
Downtime: Step 1 to 4
Include Full data copy
21
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade with pg_upgrade Link mode
1. Stop DB
Old ver New ver
System table
User data User data
System table
3. Start DB
2.2. Create symbolic link
2. pg_upgrade
2.1. Rebuild
Downtime: Step 1 to 3
Not include user data copy 22
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade with Logical Replication
Old ver New ver
1. Start replication
& wait synchronization
2. Stop connections
Downtime: Step 2 to 3
DBs don’t stop during upgrade
Applications
3. Start connections
23
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime
pg_dump/pg_dumpall >> pg_upgrade > Logical Replication
Full data copy Rebuild system tables Switch servers
Downtime
comes from...
Not acceptable
24
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime
pg_dump/pg_dumpall >> pg_upgrade > Logical Replication
Full data copy Rebuild system tables Switch servers
Downtime
comes from...
Not acceptable
How Long?
25
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime: pg_upgrade in RDS
• Operation
– Upgrade major version by modify-db-instance API
Perform pg-upgrade link mode internally
• Downtime -> 7-8 mins
• Note
– Downtime happens several minutes after invoking API
Exact time to happen cannot be specified.
26
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Upgrade of Replicas
Old ver New ver Old ver New ver
• Downtime will be longer if upgrade of replicas is required
– 1. Stop DBs
– 2. Upgrade master
– 3. Upgrade slave
– 4. Start DBs
27
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Logical Replication: How to switch servers?
Update application configuration Overwrite DB endpoint
Old ver New ver
App App App App App
Distribute new DB configuration
Release new DB after connections of
old DB are closed to avoid write conflict
App App App App App
IP / DNS / proxy etc..
Old ver New ver
Update only 1 point
28
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Downtime: Logical Replication
• Operation
– Change DB endpoints (DNS) by modify-db-instance API
• Downtime -> 1-2 mins
Old ver New ver
Applications
1. plazma.***.com
-> plazma-old.***.com
2. plazma-new.***.com
-> plazma.***.com
29
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Feasibility of each Strategy
pg_upgrade Logical Replication
• Pros
– Easier operation
Especially in AWS RDS
• Cons
– Longer downtime
– Upgrade only 1 major
version at once (AWS RDS)
• Pros
– Shorter downtime
• Cons
– PostgreSQL 9.3 doesn’t
have native logical
replication
30
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Logical Replication on PostgreSQL
• Before 9.3: 3rd party tool based on trigger
– Bucardo, Londiste, Slony
– Write amplification happens to record delta
– For AWS RDS, replication server is required outside of DB servers
• 9.4 - 9.6: 3rd party tool based on logical decoding
– pglogical, AWS Database Migration Service (DMS)
– DMS provides managed replication server
• After 10: Native logical replication
– No external process is required
31
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Our Plan
• Step 1: 9.3 -> 9.4 by pg_upgrade
– Operation simplicity and manageable downtime
– Done on May 2018
• Step 2: 9.4 -> latest by DMS
– Less downtime and managed service is available
– Coming soon..
32
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Things to Consider on Upgrade
33
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
How to handle downtime?
8 minutes downtime is expected
Endpoint Queue PlazmaDBWorker
Import
Query
Sync Async Retry Retry
Has capacity for 8 mins
Requests were delayed but no error was returned
34
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Impact of Cold Start
• Shared buffers is empty after booting DB
– In our env, Shared buffer = 160GB
– PostgreSQL page size = 8kB
– Pages to load into buffer = 160GB / 8kB = 20M pages
• IOs are issued based on some factors
– Number of queries
– Number of pages read by a query
– Locality
35
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Bonus of pg_upgrade: Page Cache is Hot
• Page cache of OS remains during upgrade
– Binary format of user data file is compatible
Old ver New ver
System table
User data User data
System table
symbolic link
Rebuild
Page cache
36
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Mitigate Impact of Cold Start
• Adjust RDS Preserved IOPS to 10k
– Our major select workload scans ~10k pages/sec
– RAM = 244GB, Buffer = 160GB -> Page cache ~ 80GB
✓ 50% is filled by page cache -> expected IOPS ~ 5k
✓ +5k just in case
• Page cache isn’t hot if you need to change servers
– pg_prewarm
37
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Rollback Plan
1. Promote Replica: 1-2 mins
2. Restore from Backup (if No.1 doesn’t work): 30-60 mins
Master Replica
Applications
1. plazma.***.com
-> plazma-old.***.com
2. plazma-rr.***.com
-> plazma.***.com
3. promote
38
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Evaluate New Major Version
39
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Importance of Evaluation
• Rollback is the worst case
– Longer downtime
– Data created after upgrade should be recovered manually
• Detect degradation by test
– Interface incompatibility
✓ System test on staging environment
– Performance degradation
✓ production != staging in terms of server resources
40
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Real World Application is Complicated..
Many Performance Related Factors
• Type of Workload
• Users’ behaviour
– # of import requests / analytical query requests
– Data skew
• Metadata Storage size
• Server Size (CPU cores, RAM, Storage, Network)
• etc..
Modeling workload and Define target performance
41
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Use Metrics for Modeling Workload
• Metrics Collectors
– Arm Treasure Data:
detailed analyze
– DataDog: visualization
• Metrics
– CPU / IOPS
– Table size
– # of Query / Query Types
– Cache hit ratio
– SEL / INS / DEL / UPD rows
– … 50+ metrics
42
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Created Benchmark Tool based on Metrics
• Enable production size perf measurement without integration
– Iteratively refined benchmark workload by adding lacked parameters
● Number of queries / Types of queries
● DB size
● Data access locality
● Data skew
• Goal of Measurement
– Check if pg 9.4 is capable of processing current production workload (+α)
– Detect different behavior of metrics
43
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Workload of DB
Plazma
Meta DB
Streaming
Import
Bulk
Load
Merge
Presto
Select
Presto
Insert /
delete
Hive
Select
GC
Hive
Insert
44
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Workload of DB
Plazma
Meta DB
Streaming
Import
Bulk
Load
Merge
Presto
Select
Presto
Insert /
delete
Hive
Select
GC
Hive
Insert
80% of query workload
Select 10k rows/sec
93% of import workload
Insert 1k rows/sec
45
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Selectivity
• Random sampling from actual
selectivity distribution
e.g.)
– 40% queries: sl = 1
– 5% queries: sl = [0.01, 0.5]
– 5% queries: sl = [0.5, 0.99]
Percentile of total # of queriesSelectivity
46
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Data Access Locality
• Metadata size = 1TB
• Shared Buffer size = 160GB
• But, Hot Data size is smaller
than Shared Buffer
e.g.)
– 85% of workload comes
from 1% data sets
– 95% of workload comes
from 5% data sets
# of Read Requested on Data Set /day
#ofPartitionsRead/day
47
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Result of Performance Measurement
Observed no performance degradation
48
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Operation in Production
49
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Final Upgrade Plan
• Preparation
– Increase Preserved IOPS to 10k for cold start
– Scale up replica server to the same as master for rollback
• Operation include downtime
– Invoke pg_upgrade via modify-db-instance API in scheduled
maintenance window
✓ ~8 mins downtime starts several minutes after invoking API
✓ Expected Read IOPS after upgrade ~5k
50
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Actual Impact
Requests were delayed 9 minutes (~ DB downtime)
51
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Request Queue Depth
Max 38k reqs
* 12 queues
= 450k reqs
Consumed in
11 mins
Normally < 1
Alert if > 200
52
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Actual Impact of Cold Start
IOPS Buffer cache misses
This could be stopped..
5k IOPS was expected
9k / 19k ~ 47% was hit on page cache
53
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
Summary
• Choose appropriate strategy for your environment
– Downtime: pg_dump >> pg_upgrade > logical replication
– Operation easiness: pg_dump < pg_upgrade < logical replication
– Consider not only DB impact but also entire system impact
– Think of Downtime, Cold Start, Rollback
• Metrics is very important
– Estimate impact, Model workload, Evaluate performance, Measure system
healthiness
54
Thank You!
Danke!
Merci!
谢谢!
Gracias!
Kiitos!
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.5
5

More Related Content

What's hot

HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
Yoni Farin
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Severalnines
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC
 
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
StreamNative
 
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQL
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQLAWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQL
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQL
Grant McAlister
 
45 ways to speed up firebird database
45 ways to speed up firebird database45 ways to speed up firebird database
45 ways to speed up firebird database
Fabio Codebue
 
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksHow to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
Amazon Web Services
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
DataWorks Summit/Hadoop Summit
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
Yahoo Developer Network
 
HBaseCon 2013: Apache HBase Replication
HBaseCon 2013: Apache HBase ReplicationHBaseCon 2013: Apache HBase Replication
HBaseCon 2013: Apache HBase Replication
Cloudera, Inc.
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
DataWorks Summit/Hadoop Summit
 
Deep dive into the Rds PostgreSQL Universe Austin 2017
Deep dive into the Rds PostgreSQL Universe Austin 2017Deep dive into the Rds PostgreSQL Universe Austin 2017
Deep dive into the Rds PostgreSQL Universe Austin 2017
Grant McAlister
 
Business-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud AirBusiness-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud Air
Continuent
 
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Grant McAlister
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
Databricks
 
Firebird migration: from Firebird 1.5 to Firebird 2.5
Firebird migration: from Firebird 1.5 to Firebird 2.5Firebird migration: from Firebird 1.5 to Firebird 2.5
Firebird migration: from Firebird 1.5 to Firebird 2.5
Alexey Kovyazin
 
Pg conf 2017 HIPAA Compliant and HA DB architecture on AWS
Pg conf 2017  HIPAA Compliant and HA DB architecture on AWSPg conf 2017  HIPAA Compliant and HA DB architecture on AWS
Pg conf 2017 HIPAA Compliant and HA DB architecture on AWS
Glenn Poston
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
datamantra
 

What's hot (20)

HBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at XiaomiHBaseCon 2015: HBase Operations at Xiaomi
HBaseCon 2015: HBase Operations at Xiaomi
 
Stateful streaming and the challenge of state
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of state
 
HBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond PanelHBaseCon 2015: HBase 2.0 and Beyond Panel
HBaseCon 2015: HBase 2.0 and Beyond Panel
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
 
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at ScalePGConf APAC 2018 - Monitoring PostgreSQL at Scale
PGConf APAC 2018 - Monitoring PostgreSQL at Scale
 
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
Pulsar in the Lakehouse: Apache Pulsar™ with Apache Spark™ and Delta Lake - P...
 
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQL
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQLAWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQL
AWS re:Invent 2019 - DAT328 Deep Dive on Amazon Aurora PostgreSQL
 
45 ways to speed up firebird database
45 ways to speed up firebird database45 ways to speed up firebird database
45 ways to speed up firebird database
 
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech TalksHow to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
How to Migrate from Cassandra to Amazon DynamoDB - AWS Online Tech Talks
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
HBaseCon 2013: Apache HBase Replication
HBaseCon 2013: Apache HBase ReplicationHBaseCon 2013: Apache HBase Replication
HBaseCon 2013: Apache HBase Replication
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
Deep dive into the Rds PostgreSQL Universe Austin 2017
Deep dive into the Rds PostgreSQL Universe Austin 2017Deep dive into the Rds PostgreSQL Universe Austin 2017
Deep dive into the Rds PostgreSQL Universe Austin 2017
 
Business-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud AirBusiness-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud Air
 
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
Amazon RDS for PostgreSQL - Postgres Open 2016 - New Features and Lessons Lea...
 
Parallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta LakeParallelization of Structured Streaming Jobs Using Delta Lake
Parallelization of Structured Streaming Jobs Using Delta Lake
 
Firebird migration: from Firebird 1.5 to Firebird 2.5
Firebird migration: from Firebird 1.5 to Firebird 2.5Firebird migration: from Firebird 1.5 to Firebird 2.5
Firebird migration: from Firebird 1.5 to Firebird 2.5
 
Pg conf 2017 HIPAA Compliant and HA DB architecture on AWS
Pg conf 2017  HIPAA Compliant and HA DB architecture on AWSPg conf 2017  HIPAA Compliant and HA DB architecture on AWS
Pg conf 2017 HIPAA Compliant and HA DB architecture on AWS
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 

Similar to How to Upgrade Major Version of Your Production PostgreSQL

PostgreSQL
PostgreSQL PostgreSQL
PostgreSQL
Amazon Web Services
 
PostgreSQL
PostgreSQLPostgreSQL
201810 td tech_talk
201810 td tech_talk201810 td tech_talk
201810 td tech_talk
Keisuke Suzuki
 
PostgreSQL
PostgreSQLPostgreSQL
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on Kubernetes
Yugabyte
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
HostedbyConfluent
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
Alexey Grishchenko
 
Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case Study
Nati Shalom
 
MySQL and MariaDB
MySQL and MariaDBMySQL and MariaDB
MySQL and MariaDB
Amazon Web Services
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
Amazon Web Services
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
Amazon Web Services
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Web Services
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
Amazon Web Services
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
Amazon Web Services
 
DAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL PerformanceDAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL Performance
Amazon Web Services
 
Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...
Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...
Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...
Amazon Web Services
 
Oracle & SQL Server on the Cloud: Database Week San Francisco
Oracle & SQL Server on the Cloud: Database Week San FranciscoOracle & SQL Server on the Cloud: Database Week San Francisco
Oracle & SQL Server on the Cloud: Database Week San Francisco
Amazon Web Services
 
Oracle & SQL Server on the Cloud: Database Week SF
Oracle & SQL Server on the Cloud: Database Week SFOracle & SQL Server on the Cloud: Database Week SF
Oracle & SQL Server on the Cloud: Database Week SF
Amazon Web Services
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
Dori Waldman
 
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverHow to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL server
EDB
 

Similar to How to Upgrade Major Version of Your Production PostgreSQL (20)

PostgreSQL
PostgreSQL PostgreSQL
PostgreSQL
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
201810 td tech_talk
201810 td tech_talk201810 td tech_talk
201810 td tech_talk
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Running Stateful Apps on Kubernetes
Running Stateful Apps on KubernetesRunning Stateful Apps on Kubernetes
Running Stateful Apps on Kubernetes
 
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
Running Production CDC Ingestion Pipelines With Balaji Varadarajan and Pritam...
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Real Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case StudyReal Time Analytics for Big Data a Twitter Case Study
Real Time Analytics for Big Data a Twitter Case Study
 
MySQL and MariaDB
MySQL and MariaDBMySQL and MariaDB
MySQL and MariaDB
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
Amazon Redshift 與 Amazon Redshift Spectrum 幫您建立現代化資料倉儲 (Level 300)
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
DAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL PerformanceDAT316_Report from the field on Aurora PostgreSQL Performance
DAT316_Report from the field on Aurora PostgreSQL Performance
 
Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...
Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...
Report from the Field on the PostgreSQL-compatible Edition of Amazon Aurora -...
 
Oracle & SQL Server on the Cloud: Database Week San Francisco
Oracle & SQL Server on the Cloud: Database Week San FranciscoOracle & SQL Server on the Cloud: Database Week San Francisco
Oracle & SQL Server on the Cloud: Database Week San Francisco
 
Oracle & SQL Server on the Cloud: Database Week SF
Oracle & SQL Server on the Cloud: Database Week SFOracle & SQL Server on the Cloud: Database Week SF
Oracle & SQL Server on the Cloud: Database Week SF
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
How to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL serverHow to use postgresql.conf to configure and tune the PostgreSQL server
How to use postgresql.conf to configure and tune the PostgreSQL server
 

Recently uploaded

Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 

Recently uploaded (20)

Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 

How to Upgrade Major Version of Your Production PostgreSQL

  • 1. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Keisuke Suzuki Software engineer How to Upgrade Major Version of Your Production PostgreSQL 2018/12/12 PGConf.ASIA 1
  • 2. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Who am I? Keisuke Suzuki • Backend Engineer @ Treasure Data K.K. – PlazmaDB: distributed storage – Datatank: data mart Both uses PostgreSQL internally • Interest: DB / Distributed system / Performance optimization • Twitter: @yajilobee 2
  • 3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PostgreSQL Versions 3
  • 4. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PostgreSQL Versions • Major version: released yearly - new features – DB data are not compatible between different major versions • Minor version: every 3 months at least - bug & security fixes – DB data are compatible if major versions are the same 9.6.11 10.6 major minor major minor 4
  • 5. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. EOL of Major Versions https://www.postgresql.org/support/versioning/ The PostgreSQL Global Development Group supports a major version for 5 years after its initial release. 5
  • 6. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Q: Should we follow the latest Major Version? A: Depends on the case -> Upgrading requires downtime and may cause incompatibility • Extended support is provided by venders – Mission critical systems • SaaS venders may stop providing old major versions – e.g.) 9.3 retirement on Amazon RDS ✓ Stop Deployment: Aug. 2018 ✓ Force Major Version Upgrade: Nov. 2018 6
  • 7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Our Version We were here until May 2018... 7
  • 8. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PostgreSQL Usage in Treasure Data 8
  • 9. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Arm Treasure Data eCDP 9
  • 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PlazmaDB: Core Storage Layer of TD Streaming data Bulk load PlazmaDB Metadata (PostgreSQL) AWS S3 / Riak CS 10
  • 11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Daily Workload & Storage Size of PlazmaDB Import Analytical Query Storage size 500 Billion Records / day ~ 5.8 Million Records / sec 5 PB (+5~10 TB / day) 55 Trillion Records 600,000 Queries / day 15 Trillion Records / day 11
  • 12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. AWS RDS Server Structure of Meta DB Master r3.8xlarge (32 vcores, 244GB RAM) Multi-AZ Read replica (asynchronous streaming replication) m3.large (2 vcores, 7.5GB RAM) No production workloadConnection from applications 12
  • 13. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Data Volume PlazmaDB Meta DB (PostgreSQL) Realtime Storage Archive Storage AWS S3 / Riak CS 5 PB GiST GiST Partition Metadata Partition Metadata 1 TB 13
  • 14. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. CPU & IO utilization of PostgreSQL (Meta DB) AVG: 13% MAX: 25% AVG: 700 read IOPS 1100 write IOPS MAX: 2500 read IOPS 3000 read IOPS 14
  • 15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. TPS & WAL Throughput AVG: 1.2k TPS MAX: 1.5k TPS AVG: 4MB/sec MAX: 20MB/sec 15
  • 16. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. If PlazmaDB was down.. Streaming data Bulk load PlazmaDB Metadata (PostgreSQL) AWS S3 / Riak CS 16
  • 17. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. If PlazmaDB was down.. 17
  • 18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Planning Major Version Upgrade 18
  • 19. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Long Way to Complete Upgrade • Choose upgrade strategy • Choose the next major version • Plan operation and evaluation (include Rollback ops) • Regression Test on the new major version 19
  • 20. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Major Version Upgrade Strategies • pg_dump/pg_dumpall • pg_upgrade • Logical replication Old ver New ver SQL dump restore Old ver New ver Log shipping Convert data Old ver New ver 20
  • 21. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Upgrade with pg_dump/pg_dumpall Old ver New ver SQL 2. pg_dump /pg_dumpall 3. psql / pg_restore 1. Stop DB 4. Sart DB Downtime: Step 1 to 4 Include Full data copy 21
  • 22. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Upgrade with pg_upgrade Link mode 1. Stop DB Old ver New ver System table User data User data System table 3. Start DB 2.2. Create symbolic link 2. pg_upgrade 2.1. Rebuild Downtime: Step 1 to 3 Not include user data copy 22
  • 23. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Upgrade with Logical Replication Old ver New ver 1. Start replication & wait synchronization 2. Stop connections Downtime: Step 2 to 3 DBs don’t stop during upgrade Applications 3. Start connections 23
  • 24. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Downtime pg_dump/pg_dumpall >> pg_upgrade > Logical Replication Full data copy Rebuild system tables Switch servers Downtime comes from... Not acceptable 24
  • 25. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Downtime pg_dump/pg_dumpall >> pg_upgrade > Logical Replication Full data copy Rebuild system tables Switch servers Downtime comes from... Not acceptable How Long? 25
  • 26. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Downtime: pg_upgrade in RDS • Operation – Upgrade major version by modify-db-instance API Perform pg-upgrade link mode internally • Downtime -> 7-8 mins • Note – Downtime happens several minutes after invoking API Exact time to happen cannot be specified. 26
  • 27. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Upgrade of Replicas Old ver New ver Old ver New ver • Downtime will be longer if upgrade of replicas is required – 1. Stop DBs – 2. Upgrade master – 3. Upgrade slave – 4. Start DBs 27
  • 28. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Logical Replication: How to switch servers? Update application configuration Overwrite DB endpoint Old ver New ver App App App App App Distribute new DB configuration Release new DB after connections of old DB are closed to avoid write conflict App App App App App IP / DNS / proxy etc.. Old ver New ver Update only 1 point 28
  • 29. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Downtime: Logical Replication • Operation – Change DB endpoints (DNS) by modify-db-instance API • Downtime -> 1-2 mins Old ver New ver Applications 1. plazma.***.com -> plazma-old.***.com 2. plazma-new.***.com -> plazma.***.com 29
  • 30. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Feasibility of each Strategy pg_upgrade Logical Replication • Pros – Easier operation Especially in AWS RDS • Cons – Longer downtime – Upgrade only 1 major version at once (AWS RDS) • Pros – Shorter downtime • Cons – PostgreSQL 9.3 doesn’t have native logical replication 30
  • 31. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Logical Replication on PostgreSQL • Before 9.3: 3rd party tool based on trigger – Bucardo, Londiste, Slony – Write amplification happens to record delta – For AWS RDS, replication server is required outside of DB servers • 9.4 - 9.6: 3rd party tool based on logical decoding – pglogical, AWS Database Migration Service (DMS) – DMS provides managed replication server • After 10: Native logical replication – No external process is required 31
  • 32. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Our Plan • Step 1: 9.3 -> 9.4 by pg_upgrade – Operation simplicity and manageable downtime – Done on May 2018 • Step 2: 9.4 -> latest by DMS – Less downtime and managed service is available – Coming soon.. 32
  • 33. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Things to Consider on Upgrade 33
  • 34. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. How to handle downtime? 8 minutes downtime is expected Endpoint Queue PlazmaDBWorker Import Query Sync Async Retry Retry Has capacity for 8 mins Requests were delayed but no error was returned 34
  • 35. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Impact of Cold Start • Shared buffers is empty after booting DB – In our env, Shared buffer = 160GB – PostgreSQL page size = 8kB – Pages to load into buffer = 160GB / 8kB = 20M pages • IOs are issued based on some factors – Number of queries – Number of pages read by a query – Locality 35
  • 36. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Bonus of pg_upgrade: Page Cache is Hot • Page cache of OS remains during upgrade – Binary format of user data file is compatible Old ver New ver System table User data User data System table symbolic link Rebuild Page cache 36
  • 37. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Mitigate Impact of Cold Start • Adjust RDS Preserved IOPS to 10k – Our major select workload scans ~10k pages/sec – RAM = 244GB, Buffer = 160GB -> Page cache ~ 80GB ✓ 50% is filled by page cache -> expected IOPS ~ 5k ✓ +5k just in case • Page cache isn’t hot if you need to change servers – pg_prewarm 37
  • 38. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Rollback Plan 1. Promote Replica: 1-2 mins 2. Restore from Backup (if No.1 doesn’t work): 30-60 mins Master Replica Applications 1. plazma.***.com -> plazma-old.***.com 2. plazma-rr.***.com -> plazma.***.com 3. promote 38
  • 39. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Evaluate New Major Version 39
  • 40. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Importance of Evaluation • Rollback is the worst case – Longer downtime – Data created after upgrade should be recovered manually • Detect degradation by test – Interface incompatibility ✓ System test on staging environment – Performance degradation ✓ production != staging in terms of server resources 40
  • 41. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Real World Application is Complicated.. Many Performance Related Factors • Type of Workload • Users’ behaviour – # of import requests / analytical query requests – Data skew • Metadata Storage size • Server Size (CPU cores, RAM, Storage, Network) • etc.. Modeling workload and Define target performance 41
  • 42. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Use Metrics for Modeling Workload • Metrics Collectors – Arm Treasure Data: detailed analyze – DataDog: visualization • Metrics – CPU / IOPS – Table size – # of Query / Query Types – Cache hit ratio – SEL / INS / DEL / UPD rows – … 50+ metrics 42
  • 43. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Created Benchmark Tool based on Metrics • Enable production size perf measurement without integration – Iteratively refined benchmark workload by adding lacked parameters ● Number of queries / Types of queries ● DB size ● Data access locality ● Data skew • Goal of Measurement – Check if pg 9.4 is capable of processing current production workload (+α) – Detect different behavior of metrics 43
  • 44. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Workload of DB Plazma Meta DB Streaming Import Bulk Load Merge Presto Select Presto Insert / delete Hive Select GC Hive Insert 44
  • 45. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Workload of DB Plazma Meta DB Streaming Import Bulk Load Merge Presto Select Presto Insert / delete Hive Select GC Hive Insert 80% of query workload Select 10k rows/sec 93% of import workload Insert 1k rows/sec 45
  • 46. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Selectivity • Random sampling from actual selectivity distribution e.g.) – 40% queries: sl = 1 – 5% queries: sl = [0.01, 0.5] – 5% queries: sl = [0.5, 0.99] Percentile of total # of queriesSelectivity 46
  • 47. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Data Access Locality • Metadata size = 1TB • Shared Buffer size = 160GB • But, Hot Data size is smaller than Shared Buffer e.g.) – 85% of workload comes from 1% data sets – 95% of workload comes from 5% data sets # of Read Requested on Data Set /day #ofPartitionsRead/day 47
  • 48. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Result of Performance Measurement Observed no performance degradation 48
  • 49. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Operation in Production 49
  • 50. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Final Upgrade Plan • Preparation – Increase Preserved IOPS to 10k for cold start – Scale up replica server to the same as master for rollback • Operation include downtime – Invoke pg_upgrade via modify-db-instance API in scheduled maintenance window ✓ ~8 mins downtime starts several minutes after invoking API ✓ Expected Read IOPS after upgrade ~5k 50
  • 51. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Actual Impact Requests were delayed 9 minutes (~ DB downtime) 51
  • 52. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Request Queue Depth Max 38k reqs * 12 queues = 450k reqs Consumed in 11 mins Normally < 1 Alert if > 200 52
  • 53. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Actual Impact of Cold Start IOPS Buffer cache misses This could be stopped.. 5k IOPS was expected 9k / 19k ~ 47% was hit on page cache 53
  • 54. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Summary • Choose appropriate strategy for your environment – Downtime: pg_dump >> pg_upgrade > logical replication – Operation easiness: pg_dump < pg_upgrade < logical replication – Consider not only DB impact but also entire system impact – Think of Downtime, Cold Start, Rollback • Metrics is very important – Estimate impact, Model workload, Evaluate performance, Measure system healthiness 54
  • 55. Thank You! Danke! Merci! 谢谢! Gracias! Kiitos! Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.5 5