SlideShare a Scribd company logo
1 of 92
Download to read offline
Copyright©2017 NTT corp. All Rights Reserved.
Multi-Master PostgreSQL Cluster
on Kubernetes
PGConf.Asia 2017
NTT OSSセンタ Masanori Oyama/⼤⼭真実
2Copyright©2017 NTT corp. All Rights Reserved.
Who Am I
Masanori Oyama / ⼤⼭ 真実
twitter @ooyamams1987
slideshare https://www.slideshare.net/ooyamams
Work
l PostgreSQL	engineering	support	and	consultation.
l Extensive	quality	verification	of	PostgreSQL	releases.
l PostgreSQL	Security
Ø PostgreSQL	Security.	How	Do	We	Think?	at	PGCon 2017
https://www.slideshare.net/ooyamams/postgresql-security-how-do-we-think-at-pgcon-2017
Prev work
l Hadoop	engineering	support	and	consultation	at	NTT	DATA	Corp.
Ø I	had	managed	a	big	Hadoop	cluster	(1000	node,	200PB!).
3Copyright©2017 NTT corp. All Rights Reserved.
I	will	talk	about	...
• Introduce	Next	Generation	Multi	Master	PostgreSQL	Cluster	and	share	some	
details	about	a	PoC of	IoT log	platform.
• Share	the	knowledge	of	PostgreSQL	cluster	management	using	Kubernetes.
Today’sTopic
4Copyright©2017 NTT corp. All Rights Reserved.
I	will	talk	about	...
• Introduce	Next	Generation	Multi	Master	PostgreSQL	Cluster	and	share	some	
details	about	a	PoC of	IoT log	platform.
• Share	the	knowledge	of	PostgreSQL	cluster	management	using	Kubernetes.
Today’sTopic
Please	read	the	today's	morning	
presentation	material	"Built-in	Sharding“!
I	will	focus	on	this	today.
5Copyright©2017 NTT corp. All Rights Reserved.
I	will	talk	about	...
• Introduce	Next	Generation	Multi	Master	PostgreSQL	Cluster	and	share	some	
details	about	a	PoC of	IoT log	platform.
• Share	the	knowledge	of	PostgreSQL	cluster	management	using	Kubernetes.
Today’sTopic
Please	read	the	today's	morning	
presentation	material	"Built-in	Sharding“!
I	will	focus	on	this	today.
6Copyright©2017 NTT corp. All Rights Reserved.
1. Introduction
2. Crunchy PostgreSQL Containers on OpenShift
3. Multi-Master PostgreSQL Cluster on OpenShift
4. Conclusion
Table of Contents
7Copyright©2017 NTT corp. All Rights Reserved.
1. Introduction
2. Crunchy PostgreSQL Containers on OpenShift
3. Multi-Master PostgreSQL Cluster on OpenShift
4. Conclusion
Table of Contents
8Copyright©2017 NTT corp. All Rights Reserved.
Postgres-XC
• Multi-Master	PostgreSQL	Cluster	forked	from	PostgreSQL	by	NTT
• Scale	out	for	Read/Write	workloads
• The	development	ended	in	2014	
1. Introduction
Coordinators
DataNodes
Global
Transaction
Manager
Clients
Now,	We	are	contributing	back	this	experiences	to	the	PostgreSQL	core.
Please	see	“PostgreSQL	Built-in	Sharding”	of	today's	morning	presentation.
9Copyright©2017 NTT corp. All Rights Reserved.
Why	scale	out?
It	is	hard	to	improve	write	performance	by	scale	up	(by	purchasing	high-
performance	hardware).
1. Introduction
Major	bottleneck	points	of	PostgreSQL
• Get	transaction	snapshot
• Write	WAL
There	is	a	limit	of	performance	improvement	even	if	you	use	many	core	server	
and	NVMe.
To	solve	these	bottlenecks,	a	drastic	improvement	is	necessary.
• New	Transaction	Mechanism	for	many	core	machine
• Parallel	WAL	Mechanism	or	New	WAL	Writer	Mechanism	for	NVMe
Scale	out	of	PostgreSQL	is	one	of	the	workarounds
to	avoid	these	bottlenecks.
10Copyright©2017 NTT corp. All Rights Reserved.
“Instagram”	Way
Tantan told	about	the	homemade	version	similar	to	the	“Instagram”	way	at	
last	year	PGconf.Asia.
1. Introduction
http://www.pgconf.asia/JP/wp-content/uploads/2016/12/From-0-to-350bn-rows-in-2-years-PgConfAsia2016-v1.pdf
Applications	need	to	control	
which	databases	are	
accessed	and	satisfy	the	
ACID	properties	with	
transaction.
l 22TB / 350bn rows Biggest table.
l 270k tuple writes / sec.
l 1.3M TPS Aggregated over all
databases
11Copyright©2017 NTT corp. All Rights Reserved.
PostgreSQL	Built-in	Sharding
1. Introduction
Worker	node	3
PostgreSQL	community	is	working	very	
hard	to	realize	PostgreSQL	Built-in	
Sharding.
Worker	node	2Worker	node	1
12Copyright©2017 NTT corp. All Rights Reserved.
Even	if	we	could	get	a	Perfect	PostgreSQL	Cluster,	we	have	to	spend	a	lot	of	
time	to	manage	the	cluster.
For	example,
• Orchestration
• HA	and	Recovery
• Backup
• Monitoring
• Security
• Resource	Management
• Load	Balancing
• Scale	out		(Add/Delete	Coordinator/Shard)
1. Introduction
13Copyright©2017 NTT corp. All Rights Reserved.
Distributed systems are hard!
14Copyright©2017 NTT corp. All Rights Reserved.
The	Difficulty	of	Scale	Out
For	example,	about	High	Availability.
With	a	ordinary	PostgreSQL,	we	have	to	use	a	middleware	to	achieve	HA.
1. Introduction
PG-REX
PostgreSQL	HA	solution	
with	Pacemaker
How	about	the	PostgreSQL	cluster?
https://www.slideshare.net/kazuhcurry/pgrexpacemaker (Japanese)
15Copyright©2017 NTT corp. All Rights Reserved.
HA	of	PostgreSQL-XC
1. Introduction
https://wiki.postgresql.o
rg/images/4/44/Pgxc_H
A_20121024.pdf
16Copyright©2017 NTT corp. All Rights Reserved.
HA	of	PostgreSQL-XC
1. Introduction
GTM Server
Coordinator and DataNode Servers
17Copyright©2017 NTT corp. All Rights Reserved.
HA	of	PostgreSQL-XC	10	shard	
How	many	HA	settings	are	needed!?
1. Introduction
GTM Server
Coordinator and DataNode Servers
18Copyright©2017 NTT corp. All Rights Reserved.
HA	of	PostgreSQL-XC	10	shard	
How	many	HA	settings	are	needed!?
1. Introduction
GTM Server
Coordinator and DataNode Servers
I don’t want to manage this
by myself !
19Copyright©2017 NTT corp. All Rights Reserved.
How	do	people	solve	this?
PostgreSQL	Related	Slides	and	Presentations
https://wiki.postgresql.org/wiki/PostgreSQL_Related_Slides_and_Presentations
PGCONF.EU	2017
• USING	KUBERNETES,	DOCKER,	AND	HELM	TO	DEPLOY	ON-DEMAND	POSTGRESQL	
STREAMING	REPLICAS	
https://www.postgresql.eu/events/sessions/pgconfeu2017/session/1559/slides/35/Using%20Kubernetes,%
20Docker,%20and%20Helm%20to%20Deploy%20On-Demand%20PostgreSQL%20Streaming%20Replicas.pdf
Postgres	Open	2017
• A	Kubernetes	Operator	for	PostgreSQL	- Architecture	and	Design
https://www.sarahconway.com/slides/postgres-operator.pdf	
• Containerized	Clustered	PostgreSQL
http://jberkus.github.io/container_cluster_pg/#23
PGConf 2017
• Patroni	- HA	PostgreSQL	made	easy
https://www.slideshare.net/AlexanderKukushkin1/patroni-ha-postgresql-made-easy
• PostgreSQL	High	Availability	in	a	Containerized	World
http://jkshah.blogspot.jp/2017/03/pgconf-2017-postgresql-high.html
1. Introduction
20Copyright©2017 NTT corp. All Rights Reserved.
How	do	people	solve	this?
PostgreSQL	Related	Slides	and	Presentations
https://wiki.postgresql.org/wiki/PostgreSQL_Related_Slides_and_Presentations
PGCONF.EU	2017
• USING	KUBERNETES,	DOCKER,	AND	HELM	TO	DEPLOY	ON-DEMAND	POSTGRESQL	
STREAMING	REPLICAS	
https://www.postgresql.eu/events/sessions/pgconfeu2017/session/1559/slides/35/Using%20Kubernetes,%
20Docker,%20and%20Helm%20to%20Deploy%20On-Demand%20PostgreSQL%20Streaming%20Replicas.pdf
Postgres	Open	2017
• A	Kubernetes	Operator	for	PostgreSQL	- Architecture	and	Design
https://www.sarahconway.com/slides/postgres-operator.pdf	
• Containerized	Clustered	PostgreSQL
http://jberkus.github.io/container_cluster_pg/#23
PGConf 2017
• Patroni	- HA	PostgreSQL	made	easy
https://www.slideshare.net/AlexanderKukushkin1/patroni-ha-postgresql-made-easy
• PostgreSQL	High	Availability	in	a	Containerized	World
http://jkshah.blogspot.jp/2017/03/pgconf-2017-postgresql-high.html
1. Introduction
Container
and
Kubernetes
21Copyright©2017 NTT corp. All Rights Reserved.
3.PostgreSQL on Kubernetes
“standing on the shoulders of giants”
for getting the Open Source Power!
https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants
22Copyright©2017 NTT corp. All Rights Reserved.
PostgreSQL	on	Kubernetes Projects
• AtomicDB
https://hackernoon.com/postgresql-cluster-into-kubernetes-cluster-f353cde212de
• Crunchy	PostgreSQL	Containers (CPC)
https://github.com/CrunchyData/crunchy-containers
• Patroni
https://github.com/zalando/patroni
• Stolon
https://github.com/sorintlab/stolon
We	are	trying	CPC	on	OpenShift origin!
1. Introduction
23Copyright©2017 NTT corp. All Rights Reserved.
1. Introduction
2. Crunchy PostgreSQL Containers on OpenShift
3. Multi-Master PostgreSQL Cluster on OpenShift
4. Conclusion
Table of Contents
24Copyright©2017 NTT corp. All Rights Reserved.
What	is	Kubernetes?
• Kubernetes	is	an	open-source	
system	for	management	of	
containerized	applications.
• Pod	is	the	basic	scheduling	unit	
which	consists	of	one	or	more	
containers.
• Kubernetes	follows	the	master-
slave	architecture.	
1. Introduction
https://en.wikipedia.org/wiki/Kubernetes
25Copyright©2017 NTT corp. All Rights Reserved.
2.Crunchy PostgreSQL Containers on OpenShift
https://blog.openshift.com/enterprise-ready-kubernetes/
26Copyright©2017 NTT corp. All Rights Reserved.
2.Crunchy PostgreSQL Containers on OpenShift
Crunchy	PostgreSQL	Containers
• Docker	1.12	and	above
• Openshift 3.4	and	above
• Kubernetes	1.5	and	above
https://github.com/CrunchyData/crunchy-containers
27Copyright©2017 NTT corp. All Rights Reserved.
How	to	run	a	simple	PostgreSQL
Write	the	Pod	manifest	file.
2.Crunchy PostgreSQL Containers on OpenShift
kind: Pod
apiVersion: v1
metadata:
name: simple-pg
labels:
name: simple-pg
spec:
containers:
- name: postgres
image: crunchydata/crunchy-postgres:centos7-
10.1-1.7.0
ports:
- containerPort: 5432
protocol: TCP
env:
- name: PGHOST
value: /tmp
- name: PG_PRIMARY_USER
value: primaryuser
- name: PG_PRIMARY_PORT
value: '5432'
- name: PG_MODE
value: primary
- name: PG_PRIMARY_PASSWORD
value: password
- name: PG_USER
value: testuser
- name: PG_PASSWORD
value: password
- name: PG_DATABASE
value: userdb
- name: PG_ROOT_PASSWORD
value: password
volumeMounts:
- mountPath: /pgdata
name: pgdata
readonly: false
volumes:
- name: pgdata
emptyDir: {}
Define	the	Pod	state like	SQL, not	the	procedure.
• simple_postgres_pod.yml
28Copyright©2017 NTT corp. All Rights Reserved.
How	to	run	a	simple	PostgreSQL
Write	the	Pod	manifest	file.
2.Crunchy PostgreSQL Containers on OpenShift
kind: Pod
apiVersion: v1
metadata:
name: simple-pg
labels:
name: simple-pg
spec:
containers:
- name: postgres
image: crunchydata/crunchy-postgres:centos7-
10.1-1.7.0
ports:
- containerPort: 5432
protocol: TCP
env:
- name: PGHOST
value: /tmp
- name: PG_PRIMARY_USER
value: primaryuser
- name: PG_PRIMARY_PORT
value: '5432'
- name: PG_MODE
value: primary
- name: PG_PRIMARY_PASSWORD
value: password
- name: PG_USER
value: testuser
- name: PG_PASSWORD
value: password
- name: PG_DATABASE
value: userdb
- name: PG_ROOT_PASSWORD
value: password
volumeMounts:
- mountPath: /pgdata
name: pgdata
readonly: false
volumes:
- name: pgdata
emptyDir: {}
Define	the	Pod	state like	SQL, not	the	procedure.
• simple_postgres_pod.yml
PostgreSQL Container settings
29Copyright©2017 NTT corp. All Rights Reserved.
How	to	run	a	simple	PostgreSQL
Write	the	Pod	manifest	file.
2.Crunchy PostgreSQL Containers on OpenShift
kind: Pod
apiVersion: v1
metadata:
name: simple-pg
labels:
name: simple-pg
spec:
containers:
- name: postgres
image: crunchydata/crunchy-postgres:centos7-
10.1-1.7.0
ports:
- containerPort: 5432
protocol: TCP
env:
- name: PGHOST
value: /tmp
- name: PG_PRIMARY_USER
value: primaryuser
- name: PG_PRIMARY_PORT
value: '5432'
- name: PG_MODE
value: primary
- name: PG_PRIMARY_PASSWORD
value: password
- name: PG_USER
value: testuser
- name: PG_PASSWORD
value: password
- name: PG_DATABASE
value: userdb
- name: PG_ROOT_PASSWORD
value: password
volumeMounts:
- mountPath: /pgdata
name: pgdata
readonly: false
volumes:
- name: pgdata
emptyDir: {}
Define	the	Pod	state like	SQL, not	the	procedure.
• simple_postgres_pod.yml
pull the container image from DockerHub
PostgreSQL Container settings
30Copyright©2017 NTT corp. All Rights Reserved.
How	to	run	a	simple	PostgreSQL
Write	the	Pod	manifest	file.
2.Crunchy PostgreSQL Containers on OpenShift
kind: Pod
apiVersion: v1
metadata:
name: simple-pg
labels:
name: simple-pg
spec:
containers:
- name: postgres
image: crunchydata/crunchy-postgres:centos7-
10.1-1.7.0
ports:
- containerPort: 5432
protocol: TCP
env:
- name: PGHOST
value: /tmp
- name: PG_PRIMARY_USER
value: primaryuser
- name: PG_PRIMARY_PORT
value: '5432'
- name: PG_MODE
value: primary
- name: PG_PRIMARY_PASSWORD
value: password
- name: PG_USER
value: testuser
- name: PG_PASSWORD
value: password
- name: PG_DATABASE
value: userdb
- name: PG_ROOT_PASSWORD
value: password
volumeMounts:
- mountPath: /pgdata
name: pgdata
readonly: false
volumes:
- name: pgdata
emptyDir: {}
Define	the	Pod	state like	SQL, not	the	procedure.
PostgreSQL Container settings
pull the container image from DockerHub
PostgreSQL settings
• simple_postgres_pod.yml
31Copyright©2017 NTT corp. All Rights Reserved.
How	to	run	a	simple	PostgreSQL
Write	the	Pod	manifest	file.
2.Crunchy PostgreSQL Containers on OpenShift
kind: Pod
apiVersion: v1
metadata:
name: simple-pg
labels:
name: simple-pg
spec:
containers:
- name: postgres
image: crunchydata/crunchy-postgres:centos7-
10.1-1.7.0
ports:
- containerPort: 5432
protocol: TCP
env:
- name: PGHOST
value: /tmp
- name: PG_PRIMARY_USER
value: primaryuser
- name: PG_PRIMARY_PORT
value: '5432'
- name: PG_MODE
value: primary
- name: PG_PRIMARY_PASSWORD
value: password
- name: PG_USER
value: testuser
- name: PG_PASSWORD
value: password
- name: PG_DATABASE
value: userdb
- name: PG_ROOT_PASSWORD
value: password
volumeMounts:
- mountPath: /pgdata
name: pgdata
readonly: false
volumes:
- name: pgdata
emptyDir: {}
Define	the	Pod	state like	SQL, not	the	procedure.
PostgreSQL Container settings
pull the container image from DockerHub
PostgreSQL settings
Pod storage settings
• simple_postgres_pod.yml
32Copyright©2017 NTT corp. All Rights Reserved.
How	to	run	a	simple	PostgreSQL
Write	the	Service	manifest	file.
• simple_postgres_svc.yml
2.Crunchy PostgreSQL Containers on OpenShift
kind: Service
apiVersion: v1
metadata:
name: simple-pg
labels:
name: simple-pg
spec:
ports:
- protocol: TCP
port: 5432
targetPort: 5432
nodePort: 0
selector:
name: simple-pg
type: ClusterIP
sessionAffinity: None
status:
loadBalancer: {}
33Copyright©2017 NTT corp. All Rights Reserved.
How	to	run	a	simple	PostgreSQL
Write	the	Service	manifest	file.
• simple_postgres_svc.yml
2.Crunchy PostgreSQL Containers on OpenShift
kind: Service
apiVersion: v1
metadata:
name: simple-pg
labels:
name: simple-pg
spec:
ports:
- protocol: TCP
port: 5432
targetPort: 5432
nodePort: 0
selector:
name: simple-pg
type: ClusterIP
sessionAffinity: None
status:
loadBalancer: {}
This Service sends packets to TCP port 5432
on any Pod with the "name: simple-pg" label.
34Copyright©2017 NTT corp. All Rights Reserved.
How	to	run	a	simple	PostgreSQL
Execute	“oc create”	to	create	the	Simple	PostgreSQL	pod	and	service.
2.Crunchy PostgreSQL Containers on OpenShift
$ oc create -f simple_postgres_svc.yml
service "simple-pg" created
$ oc create -f simple_postgres_pod.yml
pod "simple-pg" created
service:
simple-pg
The	simple-pg service	sends	SQLs	to	the	simple-pg pod.
pod:	simple-pg
labels:
name:	simple-pg
simple-pg.mypj01.svc.cluster.local
Clients
35Copyright©2017 NTT corp. All Rights Reserved.
How	to	run	a	simple	PostgreSQL
Login	to	the	hostname	“simple-pg.mypj01.svc.cluster.local”.
2.Crunchy PostgreSQL Containers on OpenShift
$ psql -h simple-pg.mypj01.svc.cluster.local -U testuser -d userdb
Password for user testuser:
userdb=> ¥d
List of relations
Schema | Name | Type | Owner
----------+--------------------+-------+----------
public | pg_stat_statements | view | postgres
testuser | testtable | table | testuser
service:
simple-pg
pod:	simple-pg
labels:
name:	simple-pg
simple-pg.mypj01.svc.cluster.local
Clients
36Copyright©2017 NTT corp. All Rights Reserved.
Let’s	try	to	add	a	replication!
2.Crunchy PostgreSQL Containers on OpenShift
simple-pg.mypj01.svc.cluster.local
service:
simple-pg-rep
pod:	simple-pg-rep
simple-pg-rep.mypj01.svc.cluster.local
streaming	replication
Add	replication	Pod	and	Service
Clients
service:
simple-pg
pod:	simple-pg
37Copyright©2017 NTT corp. All Rights Reserved.
Let’s	try	to	add	a	replication!
Write	the	Pod	and	Service	manifest	file.
2.Crunchy PostgreSQL Containers on OpenShift
kind: Pod
apiVersion: v1
metadata:
name: simple-pg-rep
labels:
name: simple-pg-rep
spec:
containers:
- name: postgres
image: crunchydata/crunchy-postgres:centos7-
10.1-1.7.0
ports:
- containerPort: 5432
protocol: TCP
env:
- name: PGHOST
value: /tmp
- name: PG_PRIMARY_USER
value: primaryuser
- name: PG_PRIMARY_PORT
value: '5432'
- name: PG_MODE
value: replica
- name: PG_PRIMARY_PASSWORD
value: password
- name: PG_USER
value: testuser
- name: PG_PASSWORD
value: password
- name: PG_DATABASE
value: userdb
- name: PG_ROOT_PASSWORD
value: password
- name: PG_PRIMARY_HOST
value: simple-pg
- name: PG_PRIMARY_PORT
value: '5432'
volumeMounts:
- mountPath: /pgdata
name: pgdata
readOnly: false
volumes:
- name: pgdata
emptyDir: {}
• simple_postgres_pod-rep.yml
Crunchy-postgres runs	a	shell	script	for	initialization	
when	the	container	starts.
The	PG_MODE	variable	is	used	to	determine	the	
role(Primary	Postgres	or	Replica	Postgres).
38Copyright©2017 NTT corp. All Rights Reserved.
Let’s	try	to	add	a	replication!
Execute	“oc create”	to	create	the	Replication	pod	and	service.
2.Crunchy PostgreSQL Containers on OpenShift
$ oc create -f simple_postgres_rep_svc.yml
service "simple-pg-rep" created
$ oc create -f simple_postgres_rep_pod.yml
pod "simple-pg-rep" created
39Copyright©2017 NTT corp. All Rights Reserved.
Let’s	try	to	add	a	replication!
Check	replication	status
2.Crunchy PostgreSQL Containers on OpenShift
$ psql -h simple-pg.mypj01.svc.cluster.local -U postgres -d userdb
userdb=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 329
usesysid | 16391
usename | primaryuser
application_name | simple-pg-rep
client_addr | 10.131.0.1
...
sync_state | async
40Copyright©2017 NTT corp. All Rights Reserved.
Where	is	PostgreSQL	running	on	OpenShif Cluster?
2.Crunchy PostgreSQL Containers on OpenShift
master01
infra01	
worker01
worker02
worker03
worker04
worker05
worker06
OpenShift Cluster	(8	virtual	machines)
simple-pg
simple-pg-rep
$ oc get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
simple-pg 1/1 Running 0 7m 10.129.2.20 worker04
simple-pg-rep 1/1 Running 0 5m 10.131.0.16 worker05
We	don’t	have	to	manage	these	IP	addresses	
and	where	these	containers	are	located!
41Copyright©2017 NTT corp. All Rights Reserved.
At	first,	what	should	we	think	about	when	running	
PostgreSQL	on	OpenShift(Kubernetes)?
• What	should	we	use	for	a	Persistent	storage?
• Local	disk
• (Shared)	Network	storage
• What	should	we	do	in	case	of	server/container	failures?
• Failover
• Restart	PostgreSQL
2.Crunchy PostgreSQL Containers on OpenShift
42Copyright©2017 NTT corp. All Rights Reserved.
At	first,	what	should	we	think	about	when	running	
PostgreSQL	on	OpenShift(Kubernetes)?
• What	should	we	use	for	a	Persistent	storage?
• Local	disk
• (Shared)	Network	storage
• What	should	we	do	in	case	of	server/container	failures?
• Failover
• Restart	PostgreSQL
2.Crunchy PostgreSQL Containers on OpenShift
43Copyright©2017 NTT corp. All Rights Reserved.
worker01 worker02
What	should	we	use	for	storage?
2.Crunchy PostgreSQL Containers on OpenShift
Local	disk
Each	worker	node	has	several	HDD/SSD.
44Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	use	for	storage?
2.Crunchy PostgreSQL Containers on OpenShift
(Shared)	Network	storage
Each	worker	node	connects	to	one	
or	more	network	storages.
worker01 worker02
45Copyright©2017 NTT corp. All Rights Reserved.
PersistentVolume plug-in
2.Crunchy PostgreSQL Containers on OpenShift
OpenShift Origin	supports	the	following	PersistentVolume plug-ins:
• NFS
• HostPath
• GlusterFS
• Ceph RBD
• OpenStack	Cinder
• AWS	Elastic	Block	Store	(EBS)
• GCE	Persistent	Disk
• iSCSI
• Fibre Channel
• Azure	Disk
• Azure	File
• VMWare	vSphere
• Local
https://docs.openshift.org/latest/architecture/additional_concepts/storage.html
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv0003
spec:
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
nfs:
path: /tmp
server: 172.17.0.2
NFS	example
46Copyright©2017 NTT corp. All Rights Reserved.
At	first,	what	should	we	think	about	when	running	
PostgreSQL	on	OpenShift(Kubernetes)?
• What	should	we	use	for	a	Persistent	storage?
• Local	disk
• (Shared)	Network	storage
• What	should	we	do	in	case	of	server/container	failures?
• Failover
• Restart	PostgreSQL
2.Crunchy PostgreSQL Containers on OpenShift
47Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
• Failover
CPC	provides	a	Watch	Container	to	manage	failover.
worker01 worker02
replication
Primary Secondary
worker03
Watch
Container
48Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
• Failover
CPC	provides	a	Watch	Container	to	manage	failover.
worker01 worker02
replication
Primary Secondary
worker03
Watch
Container
Execute pg_isready
repeatedly
Check	a	trigger	file	
repeatedly
49Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
• Failover
CPC	provides	a	Watch	Container	to	manage	failover.
worker01 worker02
replication
Primary Secondary
worker03
Watch
Container
Execute pg_isready
repeatedly
Check	a	trigger	file	
repeatedly
50Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
• Failover
CPC	provides	a	Watch	Container	to	manage	failover.
worker01 worker02
Primary Secondary
worker03
Watch
Container
Check	a	trigger	file	
repeatedly
Set	a	trigger	file
51Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
• Failover
CPC	provides	a	Watch	Container	to	manage	failover.
worker01 worker02
Primary Secondary
worker03
Watch
Container
Set	a	trigger	file
Promote!
52Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
• Failover
CPC	provides	a	Watch	Container	to	manage	failover.
worker01 worker02
Primary
worker03
Watch
Container
Set	a	trigger	file
Promote!
Secondary
->	Primary
Change	the	label
53Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
Primary
• Restart	PostgreSQL
OpenShift(Kubernetes)	provides	Liveness	Probe	to	confirm	whether	the	
container	is	alive.
worker01 worker02
master01
OpenShift
Master
Mount	the	network	
storage
54Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
Primary
• Restart	PostgreSQL
OpenShift(Kubernetes)	provides	Liveness	Probe	to	confirm	whether	the	
container	is	alive.
worker01 worker02
master01
OpenShift
Master
Execute pg_isready
repeatedly
Liveness	
Probe
Mount	the	network	
storage
55Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
Primary
• Restart	PostgreSQL
OpenShift(Kubernetes)	provides	Liveness	Probe	to	confirm	whether	the	
container	is	alive.
worker01 worker02
master01
OpenShift
Master
Execute pg_isready
repeatedly
Liveness	
Probe
Mount	the	network	
storage
Check	Liveness	Probe	results.
56Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
Primary
• Restart	PostgreSQL
OpenShift(Kubernetes)	provides	Liveness	Probe	to	confirm	whether	the	
container	is	alive.
worker01 worker02
master01
OpenShift
Master
Liveness	
Probe
Check	Liveness	Probe	results.
57Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
Primary
• Restart	PostgreSQL
OpenShift(Kubernetes)	provides	Liveness	Probe	to	confirm	whether	the	
container	is	alive.
worker01 worker02
master01
OpenShift
Master
Restart	the	PostgreSQL	pod
on	other	worker	nodes
58Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
Primary
• Restart	PostgreSQL
OpenShift(Kubernetes)	provides	Liveness	Probe	to	confirm	whether	the	
container	is	alive.
worker01 worker02
master01
OpenShift
Master
Restart	the	PostgreSQL	pod
on	other	worker	nodes
59Copyright©2017 NTT corp. All Rights Reserved.
What	should	we	do	in	case	of	server/container	failures?
2.Crunchy PostgreSQL Containers on OpenShift
Primary
• Restart	PostgreSQL
OpenShift(Kubernetes)	provides	Liveness	Probe	to	confirm	whether	the	
container	is	alive.
worker01 worker02
master01
OpenShift
Master
Restart	the	PostgreSQL	pod
on	other	worker	nodes
Mount	the	same	
network	storage
60Copyright©2017 NTT corp. All Rights Reserved.
Choose	a	suitable	method	for	your	environments.
Persistent	storage
• Local	disk
• (Shared)	Network	storage
in	case	of	server/container	failures
• Failover
• Restart	PostgreSQL
2.Crunchy PostgreSQL Containers on OpenShift
61Copyright©2017 NTT corp. All Rights Reserved.
CPC	other	examples
2.Crunchy PostgreSQL Containers on OpenShift
crunchy-collect
The	crunchy-collect	container	gathers	different	
metrics	from	the	crunchy-postgres container	and	
pushes	these	to	the	Prometheus	Promgateway.
https://github.com/CrunchyData/crunchy-containers/blob/master/docs/metrics.adoc
62Copyright©2017 NTT corp. All Rights Reserved.
Advantages	of	PostgreSQL	on	Kubernetes
Easy	to	run	PostgreSQL	(including	HA,	monitoring	etc.).
ØDefine	the	state,	not	the procedure.	The	procedure	is	packed	into	
the	container	with	portability.
ØThe	network	is	abstracted.	This	makes	easier	to	define	the	state.	
Disadvantages	of	PostgreSQL	on	Kubernetes
Hard	to	manage	Kubernetes	cluster.	(maybe...)
ØIt	seems	to	me	that	the	complexity	of	PostgreSQL	layer	is	shifted	
to	Kubernetes	layer.
ØBut	if	you	are	already	running	a	Kubernetes	cluster	(and	that	will	
come	soon?)	I	believe	this	will	not	be	a	problem.
2.Crunchy PostgreSQL Containers on OpenShift
63Copyright©2017 NTT corp. All Rights Reserved.
1. Introduction
2. Crunchy PostgreSQL Containers on OpenShift
3. Multi-Master PostgreSQL Cluster on OpenShift
4. Conclusion
Table of Contents
64Copyright©2017 NTT corp. All Rights Reserved.
Try	to	install	Built-in	Sharding base	Multi-Master	PostgreSQL	Cluster	
on	OpenShift.
3.Multi-Master PostgreSQL Cluster on OpenShift
Clients
pg-coordinators
FDW FDW
pg-shards
parent	table
partition	table
• The	PostgreSQL	Container	contains	the	
same	source	code	which	was	
demonstrated	in	the	today's	morning	
session	"Built-in	Sharding Special	
Version”	and	the	modified	scripts	of	
Crunchy-postgres container.
$	docker	pull	ooyamams/postgres-dev
• Using	NFS	as	a	shared	network	storage	
for	test	purposes.
• pg-coordinator	Pod	and	pg-shard	Pod	
are	created	by	Templates	and	
controlled	by	DeploymentConfig,	not	
StatefulSet.
65Copyright©2017 NTT corp. All Rights Reserved.
kind: Template
apiVersion: v1
metadata:
name: pg-coordinator
creationTimestamp: null
annotations:
description: PostgreSQL Multi-Master Build in
Sharding Example
iconClass: icon-database
tags: database,postgresql
parameters:
- name: PG_PRIMARY_USER
description: The username used for primary / replica
replication
value: primaryuser
...(omit)...
objects:
- kind: Service
apiVersion: v1
...(omit)...
- kind: DeploymentConfig
apiVersion: v1
metadata:
name: ${PG_PRIMARY_SERVICE_NAME}
creationTimestamp: null
spec:
strategy:
type: Recreate
resources: {}
triggers:
- type: ConfigChange
replicas: 2
3.Multi-Master PostgreSQL Cluster on OpenShift
selector:
name: ${PG_PRIMARY_SERVICE_NAME}
template:
metadata:
creationTimestamp: null
labels:
name: ${PG_PRIMARY_SERVICE_NAME}
spec:
serviceAccount: pg-cluster-sa
containers:
- name: server
image: 172.30.81.49:5000/mypj01/postgres-
dev:build-in-sharding-4
livenessProbe:
exec:
command:
- /opt/cpm/bin/liveness.sh
initialDelaySeconds: 90
timeoutSeconds: 1
ports:
- containerPort: 5432
protocol: TCP
env:
- name: PG_PRIMARY_HOST
value: ${PG_PRIMARY_SERVICE_NAME}
...(omit)...
resources: {}
terminationMessagePath: /dev/termination-log
securityContext:
privileged: false
volumeMounts:
- mountPath: /pgdata
name: pgdata
readOnly: false
volumes:
- name: pgdata
emptyDir: {}
restartPolicy: Always
dnsPolicy: ClusterFirst
status: {}
pg-coordinator.yml
66Copyright©2017 NTT corp. All Rights Reserved.
Try	to	install	Built-in	Sharding base	Multi-Master	PostgreSQL	Cluster	
on	OpenShift.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ oc create -f pg-coordinator.yml
• Create	Template	for	pg-coordinator.
$ oc process pg-coordinator | oc create -f -
• Create	Service	and	DeploymentConfig from	the	Template.
pg-coordinator pg-coordinator
pg-coordinator.mypj01
.svc.cluster.local
pg-coordinator
clients
67Copyright©2017 NTT corp. All Rights Reserved.
Try	to	install	Built-in	Sharding base	Multi-Master	PostgreSQL	Cluster	
on	OpenShift.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ oc create -f pg-coordinator.yml
• Create	Template	for	pg-coordinator.
$ oc process pg-coordinator | oc create -f -
• Create	Service	and	DeploymentConfig from	the	Template.
Round	robin	load	balancing
pg-coordinator pg-coordinator
pg-coordinator.mypj01
.svc.cluster.local
pg-coordinator
clients
68Copyright©2017 NTT corp. All Rights Reserved.
3.Multi-Master PostgreSQL Cluster on OpenShift
kind: Template
apiVersion: v1
metadata:
name: pg-shared
creationTimestamp: null
annotations:
description: PostgreSQL Multi-Master Build in
Sharding Example
iconClass: icon-database
tags: database,postgresql
parameters:
- name: PG_PRIMARY_USER
description: The username used for primary / replica
replication
value: primaryuser
...(omit)...
- name: PGDATA_PATH_OVERRIDE
value: shared-01
objects:
- kind: Service
apiVersion: v1
metadata:
name: ${PG_PRIMARY_SERVICE_NAME}
labels:
name: ${PG_PRIMARY_SERVICE_NAME}
spec:
ports:
- protocol: TCP
port: 5432
targetPort: 5432
nodePort: 0
selector:
name: ${PG_PRIMARY_SERVICE_NAME}
clusterIP: None
- kind: DeploymentConfig
apiVersion: v1
metadata:
name: ${PG_PRIMARY_SERVICE_NAME}
creationTimestamp: null
spec:
serviceName: ${PG_PRIMARY_SERVICE_NAME}
strategy:
...(omit)...
labels:
name: ${PG_PRIMARY_SERVICE_NAME}
spec:
serviceAccount: pg-sa
containers:
- name: ${PG_PRIMARY_SERVICE_NAME}
image: 172.30.81.49:5000/mypj01/postgres-
dev:build-in-sharding-1
livenessProbe:
exec:
command:
- /opt/cpm/bin/liveness.sh
initialDelaySeconds: 90
timeoutSeconds: 1
ports:
- containerPort: 5432
protocol: TCP
env:
- name: PG_PRIMARY_HOST
value: ${PG_PRIMARY_SERVICE_NAME}
...(omit)...
resources: {}
terminationMessagePath: /dev/termination-log
securityContext:
privileged: false
volumeMounts:
- name: pgdata
mountPath: /pgdata
readOnly: false
volumes:
- name: pgdata
persistentVolumeClaim:
claimName: crunchy-pvc
pg-shard.yml
69Copyright©2017 NTT corp. All Rights Reserved.
Try	to	install	Built-in	Sharding base	Multi-Master	PostgreSQL	Cluster	
on	OpenShift.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ oc create -f pg-shard.yml
• Create	Template	for	pg-shard.
$ oc process pg-shard -p PG_PRIMARY_SERVICE_NAME=pg-shard-00 -p
PGDATA_PATH_OVERRIDE=pg-shard-00 | oc create -f -
$ oc process pg-shard -p PG_PRIMARY_SERVICE_NAME=pg-shard-01 -p
PGDATA_PATH_OVERRIDE=pg-shard-01 | oc create -f -
• Create	Service	and	DeploymentConfig from	the	Template.
pg-shard-00
pg-shard-00.mypj01
.svc.cluster.local
pg-shard-01
pg-shard-01.mypj01
.svc.cluster.local
pg-shard-00
pg-shard-01
pg-shard-01
pg-shard-00
pg-coordinator pg-coordinator
pg-coordinator.mypj01
.svc.cluster.local
pg-coordinator
clients
Round	robin	load	balancing
70Copyright©2017 NTT corp. All Rights Reserved.
Check	Physical	cluster	state.
3.Multi-Master PostgreSQL Cluster on OpenShift
infra01	 worker02
$ oc get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
pg-coordinator-1-drtsv 1/1 Running 0 4m 10.131.0.37 worker05
pg-coordinator-1-rmkb8 1/1 Running 0 4m 10.130.0.50 worker03
pg-shard-00-1-qqwtr 1/1 Running 0 4m 10.129.0.68 worker02
pg-shard-01-1-ht4jn 1/1 Running 0 3m 10.130.0.51 worker03
master01 worker01
worker04
worker03
worker06
worker05
NFS	server
OpenShift Cluster	(8	virtual	machines)
pg-coordinator pg-shard
legends
pg-shard-01
pg-shard-00
OpenShift
Master
71Copyright©2017 NTT corp. All Rights Reserved.
Execute	SQL	to	the	cluster.
The	same	schema	and	data	as	"Sharding in	Build"	in	the	morning	session.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ psql -h pg-coordinator.mypj01.svc.cluster.local -U postgres -d postgres ¥
-c "explain analyze select * from flight_bookings";
QUERY PLAN
-----------------------------------------------------------------------------------------
Append (cost=100.00..254.09 rows=974 width=144) (actual time=1.945..40.289 rows=7499
loops=1)
-> Foreign Scan on flight_bookings1 (cost=...) (actual time=...)
-> Foreign Scan on flight_bookings0 (cost=...) (actual time=...)
Planning time: 1.930 ms
Execution time: 60.319 ms
(5 rows)
pg-shard-00
pg-shard-00.mypj01
.svc.cluster.local
pg-shard-01
pg-shard-01.mypj01
.svc.cluster.local
pg-shard-00
pg-shard-01
pg-shard-01
pg-shard-00
pg-coordinator pg-coordinator
pg-coordinator.mypj01
.svc.cluster.local
pg-coordinator
clients
Round	robin	load	balancing
72Copyright©2017 NTT corp. All Rights Reserved.
Try	worker03	to	crash.
3.Multi-Master PostgreSQL Cluster on OpenShift
Execute this command to generate kernel panic on worker 03.
$ oc get node
NAME STATUS AGE VERSION
infra01 Ready 26d v1.7.6+a08f5eeb62
master01 Ready,SchedulingDisabled 26d v1.7.6+a08f5eeb62
worker01 Ready 26d v1.7.6+a08f5eeb62
worker02 Ready 26d v1.7.6+a08f5eeb62
worker03 NotReady 26d v1.7.6+a08f5eeb62
worker04 Ready 26d v1.7.6+a08f5eeb62
worker05 Ready 26d v1.7.6+a08f5eeb62
worker06 Ready 26d v1.7.6+a08f5eeb62
$ sudo sh -c "echo c > /proc/sysrq-trigger"
Worker 03 status changes “NotReady”.
73Copyright©2017 NTT corp. All Rights Reserved.
Containers	are	re-creating.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ oc get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
pg-coordinator-1-drtsv 1/1 Running 0 8m 10.131.0.37 worker05
pg-coordinator-1-rmkb8 1/1 Terminating 0 8m 10.130.0.50 worker03
pg-coordinator-1-wk4wx 0/1 ContainerCreating 0 2s <none> worker01
pg-shard-00-1-qqwtr 1/1 Running 0 8m 10.129.0.68 worker02
pg-shard-01-1-ht4jn 1/1 Terminating 0 8m 10.130.0.51 worker03
pg-shard-01-1-ztdn4 0/1 ContainerCreating 0 2s <none> worker01
infra01	 worker02
master01 worker01
worker04
worker03
worker06
worker05
NFS	server
OpenShift Cluster	(8	virtual	machines)
pg-coordinator pg-shard
legends
pg-shard-01
pg-shard-00
OpenShift
Master
New	container
74Copyright©2017 NTT corp. All Rights Reserved.
Re-execute	SQL.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ psql -h pg-coordinator.mypj01.svc.cluster.local -U postgres -d postgres ¥
-c "explain analyze select * from flight_bookings";
QUERY PLAN
---------------------------------------------------------------------------------------
Append (cost=100.00..254.09 rows=974 width=144) (actual time=15.531..63.278 rows=7499
loops=1)
-> Foreign Scan on flight_bookings1 (cost=...) (actual time=...)
-> Foreign Scan on flight_bookings0 (cost=...) (actual time=...)
Planning time: 1.716 ms
Execution time: 116.989 ms
(5 rows)
pg-shard-00
pg-shard-00.mypj01
.svc.cluster.local
pg-shard-01
pg-shard-01.mypj01
.svc.cluster.local
pg-shard-00
pg-shard-01
pg-shard-01
pg-shard-00
pg-coordinator pg-coordinator
pg-coordinator.mypj01
.svc.cluster.local
pg-coordinator
clients
Round	robin	load	balancing
75Copyright©2017 NTT corp. All Rights Reserved.
3.Multi-Master PostgreSQL Cluster on OpenShift
pg-coordinators don't	have	persistent	storage.	So	they	lost	the	
cluster	configurations	if	restarted.	
worker01 worker03
pg-coordinators
worker02
pg-shards
worker04
NFS	server
pg-shard-01
pg-shard-00
worker05
worker06
mount
parent	table
FDW
parent	table
FDW
76Copyright©2017 NTT corp. All Rights Reserved.
3.Multi-Master PostgreSQL Cluster on OpenShift
pg-coordinators don't	have	persistent	storage.	So	they	lost	the	
cluster	configurations	if	restarted.	
worker01 worker03
pg-coordinators
worker02
pg-shards
worker04
NFS	server
pg-shard-01
pg-shard-00
worker05
worker06
mount
parent	table
FDW
parent	table
FDW
77Copyright©2017 NTT corp. All Rights Reserved.
3.Multi-Master PostgreSQL Cluster on OpenShift
pg-coordinators don't	have	persistent	storage.	So	they	lost	the	
cluster	configurations	if	restarted.	
worker01 worker03
pg-coordinators
worker02
pg-shards
worker04
NFS	server
pg-shard-01
pg-shard-00
worker05
worker06
re-mount
parent	table
FDW
parent	table
FDW
78Copyright©2017 NTT corp. All Rights Reserved.
3.Multi-Master PostgreSQL Cluster on OpenShift
pg-coordinators don't	have	persistent	storage.	So	they	lost	the	
cluster	configurations	if	restarted.	
worker01 worker03
pg-coordinators
worker02
pg-shards
worker04
NFS	server
pg-shard-01
pg-shard-00
worker05
worker06
re-mount
parent	table
FDW
parent	table
FDW
79Copyright©2017 NTT corp. All Rights Reserved.
3.Multi-Master PostgreSQL Cluster on OpenShift
pg-coordinators don't	have	persistent	storage.	So	they	lost	the	
cluster	configurations	if	restarted.	
worker01 worker03
pg-coordinators
worker02
pg-shards
worker04
NFS	server
pg-shard-01
pg-shard-00
worker05
worker06
re-mount
FDW
parent	table
FDW empty
Lost	configurations!
?
parent	table
80Copyright©2017 NTT corp. All Rights Reserved.
Custom Resource	Definitions	(CRD)
We	try	to	use	CDR(CustomResourceDefinitions)	for	sharing	the	cluster	
configurations.
3.Multi-Master PostgreSQL Cluster on OpenShift
• A	image	of	the	HTTP	API	space	of	kubernetes
/api/v1/mypj01
/api/v1/mypj01/pods/...
/api/v1/mypj01/services/...
/api/v1/mypj01/....
81Copyright©2017 NTT corp. All Rights Reserved.
Custom Resource	Definitions	(CRD)
We	try	to	use	CDR(CustomResourceDefinitions)	for	sharing	the	cluster	
configurations.
3.Multi-Master PostgreSQL Cluster on OpenShift
->	CDR	can	extend	the	kubernetes core	API.	
• A	image	of	the	HTTP	API	space	of	kubernetes
/api/v1/mypj01
/api/v1/mypj01/pods/...
/api/v1/mypj01/services/...
/api/v1/mypj01/....
/api/v1/mypj01/pg-cluster/...
82Copyright©2017 NTT corp. All Rights Reserved.
Write	the	CDR	manifest	file.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ cat pg-cluster-shard-information.yml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: shard-informations.pg-cluster.example.com
spec:
group: pg-cluster.example.com
version: v1
scope: Namespaced
names:
plural: shard-information
singular: shard-information
kind: Shard-Info
shortNames:
- si
This	CRD	can	be	access	by	“oc get	shard-information“	or	“oc get	si“	
$ oc get shard-information
No resources found.
83Copyright©2017 NTT corp. All Rights Reserved.
Write	the	shard-information	manifest	file.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ cat defines/sharding_info_coordinator.yml
apiVersion: "pg-cluster.example.com/v1"
kind: Shard-Info
metadata:
name: multi-master-demo-pgconfasia
spec:
coordinator:
"create extension if not exists postgres_fdw;
create server shard0 foreign data wrapper postgres_fdw options (host 'pg-shard-00.mypj01.svc.cluster.local', dbname 'postgres', port '5432');
create server shard1 foreign data wrapper postgres_fdw options (host 'pg-shard-01.mypj01.svc.cluster.local', dbname 'postgres', port '5432');
create user mapping for postgres server shard0 OPTIONS (user 'postgres', password 'password');
create user mapping for postgres server shard1 OPTIONS (user 'postgres', password 'password');
create table hotel_bookings (id serial, user_id int, booked_at timestamp, city_name text, continent text, flight_id int) partition by list (continent);
create table flight_bookings (id serial, user_id int, booked_at timestamp, from_city text, from_continent text, to_city text, to_continent text) partition
by list (to_continent);
create table users (id serial, name text, age int);
create foreign table flight_bookings0 partition of flight_bookings for values in ('Asia', 'Oceania') server shard0;
create foreign table hotel_bookings0 partition of hotel_bookings for values in ('Asia', 'Oceania') server shard0;
create foreign table flight_bookings1 partition of flight_bookings for values in ('Europe', 'Africa') server shard1;
create foreign table hotel_bookings1 partition of hotel_bookings for values in ('Europe', 'Africa') server shard1;
84Copyright©2017 NTT corp. All Rights Reserved.
Write	the	shard-information	manifest	file.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ cat defines/sharding_info_coordinator.yml
apiVersion: "pg-cluster.example.com/v1"
kind: Shard-Info
metadata:
name: multi-master-demo-pgconfasia
spec:
coordinator:
"create extension if not exists postgres_fdw;
create server shard0 foreign data wrapper postgres_fdw options (host 'pg-shard-00.mypj01.svc.cluster.local', dbname 'postgres', port '5432');
create server shard1 foreign data wrapper postgres_fdw options (host 'pg-shard-01.mypj01.svc.cluster.local', dbname 'postgres', port '5432');
create user mapping for postgres server shard0 OPTIONS (user 'postgres', password 'password');
create user mapping for postgres server shard1 OPTIONS (user 'postgres', password 'password');
create table hotel_bookings (id serial, user_id int, booked_at timestamp, city_name text, continent text, flight_id int) partition by list (continent);
create table flight_bookings (id serial, user_id int, booked_at timestamp, from_city text, from_continent text, to_city text, to_continent text) partition
by list (to_continent);
create table users (id serial, name text, age int);
create foreign table flight_bookings0 partition of flight_bookings for values in ('Asia', 'Oceania') server shard0;
create foreign table hotel_bookings0 partition of hotel_bookings for values in ('Asia', 'Oceania') server shard0;
create foreign table flight_bookings1 partition of flight_bookings for values in ('Europe', 'Africa') server shard1;
create foreign table hotel_bookings1 partition of hotel_bookings for values in ('Europe', 'Africa') server shard1;
There	are	SQLs	for	setting	up	to	pg-coordinator.
85Copyright©2017 NTT corp. All Rights Reserved.
We	can	get	these	SQLs	through	“oc”	command.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ oc get si multi-master-demo-pgconfasia -o json | jq -r '.spec.coordinator'
"create extension if not exists postgres_fdw;
create server shard0 foreign data wrapper postgres_fdw options (host 'pg-shard-00.mypj01.svc.cluster.local', dbname 'postgres', port '5432');
create server shard1 foreign data wrapper postgres_fdw options (host 'pg-shard-01.mypj01.svc.cluster.local', dbname 'postgres', port '5432');
create user mapping for postgres server shard0 OPTIONS (user 'postgres', password 'password');
create user mapping for postgres server shard1 OPTIONS (user 'postgres', password 'password');
create table hotel_bookings (id serial, user_id int, booked_at timestamp, city_name text, continent text, flight_id int) partition by list (continent);
create table flight_bookings (id serial, user_id int, booked_at timestamp, from_city text, from_continent text, to_city text, to_continent text) partition
by list (to_continent);
create table users (id serial, name text, age int);
create foreign table flight_bookings0 partition of flight_bookings for values in ('Asia', 'Oceania') server shard0;
create foreign table hotel_bookings0 partition of hotel_bookings for values in ('Asia', 'Oceania') server shard0;
create foreign table flight_bookings1 partition of flight_bookings for values in ('Europe', 'Africa') server shard1;
create foreign table hotel_bookings1 partition of hotel_bookings for values in ('Europe', 'Africa') server shard1;
pg-coordinator	can	also	take	out	these	SQLs	from	the	Shard-Info	CDR.
86Copyright©2017 NTT corp. All Rights Reserved.
3.Multi-Master PostgreSQL Cluster on OpenShift
worker03
pg-coordinators
pg-shards
worker04
NFS	server
pg-shard-01
pg-shard-00
worker05
worker06
mount mount
parent	table
FDW empty
Lost	configurations!
?
master01
OpenShift
Master
pg-shards
pg-cluster.example.com/Shard-Info
multi-master-demo-pgconfasia:
spec:
coordinator: create...
pg-coordinator	takes	out	the	setting	SQLs	from	the	Shard-
Info	CDR	when	it	is	started.
87Copyright©2017 NTT corp. All Rights Reserved.
3.Multi-Master PostgreSQL Cluster on OpenShift
worker03
pg-coordinators
pg-shards
worker04
NFS	server
pg-shard-01
pg-shard-00
worker05
worker06
mount mount
parent	table
FDW empty
Lost	configurations!
?
master01
OpenShift
Master
pg-shards
oc get si ... > /tmp/setup.sql
psql -U postgres < tmp/setup.sql
pg-cluster.example.com/Shard-Info
multi-master-demo-pgconfasia:
spec:
coordinator: create...
pg-coordinator	takes	out	the	setting	SQLs	from	the	Shard-
Info	CDR	when	it	is	started.
Take	out	from	the	Shard-Info.
88Copyright©2017 NTT corp. All Rights Reserved.
3.Multi-Master PostgreSQL Cluster on OpenShift
worker03
pg-coordinators
pg-shards
worker04
NFS	server
pg-shard-01
pg-shard-00
worker05
worker06
mount mount
parent	table
FDW
master01
OpenShift
Master
pg-shards
parent	table
FDW
configured !
pg-cluster.example.com/Shard-Info
multi-master-demo-pgconfasia:
spec:
coordinator: create...
pg-coordinator	takes	out	the	setting	SQLs	from	the	Shard-
Info	CDR	when	it	is	started.
89Copyright©2017 NTT corp. All Rights Reserved.
Let’s	scale	out	pg-coordinator.
3.Multi-Master PostgreSQL Cluster on OpenShift
$ oc scale dc pg-coordinator --replicas=3
deploymentconfig "pg-coordinator" scaled
• Set	“replicas”	2	->	3
$ oc get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
pg-coordinator-1-drtsv 1/1 Running 0 20m 10.131.0.37 worker05
pg-coordinator-1-wk4wx 0/1 Running 0 11m 10.128.0.76 worker01
pg-coordinator-1-87ddq 0/1 Running 0 1m 10.129.2.45 worker04
pg-shard-00-1-qqwtr 1/1 Running 0 20m 10.129.0.68 worker02
pg-shard-01-1-ztdn4 0/1 Running 0 11m 10.128.0.77 worker01
• Create	Service	and	DeploymentConfig from	the	Template.
infra01	 worker02
master01 worker01
worker04
worker03
worker06
worker05
NFS	server
OpenShift Cluster	(8	virtual	machines)
pg-coordinator pg-shard
legends
pg-shard-01
pg-shard-00
OpenShift
Master
New	container
90Copyright©2017 NTT corp. All Rights Reserved.
4.Conclusion
l PostgreSQL	on	Kubernetes is	easy	to	run	PostgreSQL	even	if	Multi-Master	
PostgreSQL	Cluster.
• including	HA,	monitoring	etc.
• easy	to	scale	out.
l Custom	Resource	Definitions	is	useful	for	sharing	the	cluster	
configurations.
but	you	have	to	consider	...
Ø Persistent	storage.	
Ø In	case	of	server/container	failure.	
Ø Kubernetes	cluster	management.
Please	try	PostgreSQL	on	kubernetes!	You	too!	
And	share	the	knowledge.
Future	Plan
We	try	to	use	Red	Hat	Ceph Storage	as	a	distributed	block	storage	instead	of	NFS.	This	
can	make	the	storage	layer	scalable!	And	evaluate	the	Performance.
91Copyright©2017 NTT corp. All Rights Reserved.
2	examples	of	Multi-Master	PostgreSQL	Cluster
Appendix
• For	High	availability • For	Performance
BDR
(Bi-Directional	Replication)
Postgres-XC	way
Logical Replication
Coordinators
DataNodes
92Copyright©2017 NTT corp. All Rights Reserved.
OpenShift provides Web Console.
Appendix

More Related Content

What's hot

Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesJimmy Angelakos
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyAlexander Kukushkin
 
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)NTT DATA Technology & Innovation
 
PostgreSQL 15 開発最新情報
PostgreSQL 15 開発最新情報PostgreSQL 15 開発最新情報
PostgreSQL 15 開発最新情報Masahiko Sawada
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationDatabricks
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput Grant McAlister
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)Noritaka Sekiyama
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)NTT DATA Technology & Innovation
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDatabricks
 
MongoDB Configパラメータ解説
MongoDB Configパラメータ解説MongoDB Configパラメータ解説
MongoDB Configパラメータ解説Shoken Fujisaki
 
What's New in MySQL 5.7 InnoDB
What's New in MySQL 5.7 InnoDBWhat's New in MySQL 5.7 InnoDB
What's New in MySQL 5.7 InnoDBMikiya Okuno
 
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)NTT DATA Technology & Innovation
 
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...NTT DATA Technology & Innovation
 
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)NTT DATA Technology & Innovation
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)NTT DATA Technology & Innovation
 

What's hot (20)

Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
 
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
押さえておきたい、PostgreSQL 13 の新機能!!(Open Source Conference 2021 Online/Hokkaido 発表資料)
 
PostgreSQL 15 開発最新情報
PostgreSQL 15 開発最新情報PostgreSQL 15 開発最新情報
PostgreSQL 15 開発最新情報
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Apache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper OptimizationApache Spark Core—Deep Dive—Proper Optimization
Apache Spark Core—Deep Dive—Proper Optimization
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput Tuning PostgreSQL for High Write Throughput
Tuning PostgreSQL for High Write Throughput
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
Hadoop/Spark で Amazon S3 を徹底的に使いこなすワザ (Hadoop / Spark Conference Japan 2019)
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
 
Delta Lake: Optimizing Merge
Delta Lake: Optimizing MergeDelta Lake: Optimizing Merge
Delta Lake: Optimizing Merge
 
MongoDB Configパラメータ解説
MongoDB Configパラメータ解説MongoDB Configパラメータ解説
MongoDB Configパラメータ解説
 
What's New in MySQL 5.7 InnoDB
What's New in MySQL 5.7 InnoDBWhat's New in MySQL 5.7 InnoDB
What's New in MySQL 5.7 InnoDB
 
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
PostgreSQLの統計情報について(第26回PostgreSQLアンカンファレンス@オンライン 発表資料)
 
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
オンライン物理バックアップの排他モードと非排他モードについて ~PostgreSQLバージョン15対応版~(第34回PostgreSQLアンカンファレンス...
 
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
 

Similar to Multi Master PostgreSQL Cluster on Kubernetes

PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?Ohyama Masanori
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?DoKC
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?DoKC
 
FDW-based Sharding Update and Future
FDW-based Sharding Update and FutureFDW-based Sharding Update and Future
FDW-based Sharding Update and FutureMasahiko Sawada
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesSeungYong Oh
 
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7Shuo LI
 
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?Tom Paseka
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQLEDB
 
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDBWebinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDBSeveralnines
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVYoshihiro Nakajima
 
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki KondoPGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki KondoEqunix Business Solutions
 
GitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your ReposGitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your ReposWeaveworks
 
PostgreSQL Security. How Do We Think? at PGCon 2017
PostgreSQL Security. How Do We Think? at PGCon 2017PostgreSQL Security. How Do We Think? at PGCon 2017
PostgreSQL Security. How Do We Think? at PGCon 2017Ohyama Masanori
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Againinside-BigData.com
 
When it all GOes right
When it all GOes rightWhen it all GOes right
When it all GOes rightPavlo Golub
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
 

Similar to Multi Master PostgreSQL Cluster on Kubernetes (20)

PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?PostgreSQL Security. How Do We Think?
PostgreSQL Security. How Do We Think?
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?
 
Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?Why run Postgres in Kubernetes?
Why run Postgres in Kubernetes?
 
FDW-based Sharding Update and Future
FDW-based Sharding Update and FutureFDW-based Sharding Update and Future
FDW-based Sharding Update and Future
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
Criteo Labs Infrastructure Tech Talk Meetup Nov. 7
 
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
HKNOG 6.0 Next Generation Networks - will automation put us out of jobs?
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 Apache AGE and the synergy effect in the combination of Postgres and NoSQL Apache AGE and the synergy effect in the combination of Postgres and NoSQL
Apache AGE and the synergy effect in the combination of Postgres and NoSQL
 
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDBWebinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
Webinar slides: How to automate and manage MongoDB & Percona Server for MongoDB
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki KondoPGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
PGConf.ASIA 2019 - The Future of TDEforPG - Taiki Kondo
 
GitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your ReposGitOps Core Concepts & Ways of Structuring Your Repos
GitOps Core Concepts & Ways of Structuring Your Repos
 
PostgreSQL Security. How Do We Think? at PGCon 2017
PostgreSQL Security. How Do We Think? at PGCon 2017PostgreSQL Security. How Do We Think? at PGCon 2017
PostgreSQL Security. How Do We Think? at PGCon 2017
 
WekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound AgainWekaIO: Making Machine Learning Compute Bound Again
WekaIO: Making Machine Learning Compute Bound Again
 
When it all GOes right
When it all GOes rightWhen it all GOes right
When it all GOes right
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 

Recently uploaded

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 

Recently uploaded (20)

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 

Multi Master PostgreSQL Cluster on Kubernetes

  • 1. Copyright©2017 NTT corp. All Rights Reserved. Multi-Master PostgreSQL Cluster on Kubernetes PGConf.Asia 2017 NTT OSSセンタ Masanori Oyama/⼤⼭真実
  • 2. 2Copyright©2017 NTT corp. All Rights Reserved. Who Am I Masanori Oyama / ⼤⼭ 真実 twitter @ooyamams1987 slideshare https://www.slideshare.net/ooyamams Work l PostgreSQL engineering support and consultation. l Extensive quality verification of PostgreSQL releases. l PostgreSQL Security Ø PostgreSQL Security. How Do We Think? at PGCon 2017 https://www.slideshare.net/ooyamams/postgresql-security-how-do-we-think-at-pgcon-2017 Prev work l Hadoop engineering support and consultation at NTT DATA Corp. Ø I had managed a big Hadoop cluster (1000 node, 200PB!).
  • 3. 3Copyright©2017 NTT corp. All Rights Reserved. I will talk about ... • Introduce Next Generation Multi Master PostgreSQL Cluster and share some details about a PoC of IoT log platform. • Share the knowledge of PostgreSQL cluster management using Kubernetes. Today’sTopic
  • 4. 4Copyright©2017 NTT corp. All Rights Reserved. I will talk about ... • Introduce Next Generation Multi Master PostgreSQL Cluster and share some details about a PoC of IoT log platform. • Share the knowledge of PostgreSQL cluster management using Kubernetes. Today’sTopic Please read the today's morning presentation material "Built-in Sharding“! I will focus on this today.
  • 5. 5Copyright©2017 NTT corp. All Rights Reserved. I will talk about ... • Introduce Next Generation Multi Master PostgreSQL Cluster and share some details about a PoC of IoT log platform. • Share the knowledge of PostgreSQL cluster management using Kubernetes. Today’sTopic Please read the today's morning presentation material "Built-in Sharding“! I will focus on this today.
  • 6. 6Copyright©2017 NTT corp. All Rights Reserved. 1. Introduction 2. Crunchy PostgreSQL Containers on OpenShift 3. Multi-Master PostgreSQL Cluster on OpenShift 4. Conclusion Table of Contents
  • 7. 7Copyright©2017 NTT corp. All Rights Reserved. 1. Introduction 2. Crunchy PostgreSQL Containers on OpenShift 3. Multi-Master PostgreSQL Cluster on OpenShift 4. Conclusion Table of Contents
  • 8. 8Copyright©2017 NTT corp. All Rights Reserved. Postgres-XC • Multi-Master PostgreSQL Cluster forked from PostgreSQL by NTT • Scale out for Read/Write workloads • The development ended in 2014 1. Introduction Coordinators DataNodes Global Transaction Manager Clients Now, We are contributing back this experiences to the PostgreSQL core. Please see “PostgreSQL Built-in Sharding” of today's morning presentation.
  • 9. 9Copyright©2017 NTT corp. All Rights Reserved. Why scale out? It is hard to improve write performance by scale up (by purchasing high- performance hardware). 1. Introduction Major bottleneck points of PostgreSQL • Get transaction snapshot • Write WAL There is a limit of performance improvement even if you use many core server and NVMe. To solve these bottlenecks, a drastic improvement is necessary. • New Transaction Mechanism for many core machine • Parallel WAL Mechanism or New WAL Writer Mechanism for NVMe Scale out of PostgreSQL is one of the workarounds to avoid these bottlenecks.
  • 10. 10Copyright©2017 NTT corp. All Rights Reserved. “Instagram” Way Tantan told about the homemade version similar to the “Instagram” way at last year PGconf.Asia. 1. Introduction http://www.pgconf.asia/JP/wp-content/uploads/2016/12/From-0-to-350bn-rows-in-2-years-PgConfAsia2016-v1.pdf Applications need to control which databases are accessed and satisfy the ACID properties with transaction. l 22TB / 350bn rows Biggest table. l 270k tuple writes / sec. l 1.3M TPS Aggregated over all databases
  • 11. 11Copyright©2017 NTT corp. All Rights Reserved. PostgreSQL Built-in Sharding 1. Introduction Worker node 3 PostgreSQL community is working very hard to realize PostgreSQL Built-in Sharding. Worker node 2Worker node 1
  • 12. 12Copyright©2017 NTT corp. All Rights Reserved. Even if we could get a Perfect PostgreSQL Cluster, we have to spend a lot of time to manage the cluster. For example, • Orchestration • HA and Recovery • Backup • Monitoring • Security • Resource Management • Load Balancing • Scale out (Add/Delete Coordinator/Shard) 1. Introduction
  • 13. 13Copyright©2017 NTT corp. All Rights Reserved. Distributed systems are hard!
  • 14. 14Copyright©2017 NTT corp. All Rights Reserved. The Difficulty of Scale Out For example, about High Availability. With a ordinary PostgreSQL, we have to use a middleware to achieve HA. 1. Introduction PG-REX PostgreSQL HA solution with Pacemaker How about the PostgreSQL cluster? https://www.slideshare.net/kazuhcurry/pgrexpacemaker (Japanese)
  • 15. 15Copyright©2017 NTT corp. All Rights Reserved. HA of PostgreSQL-XC 1. Introduction https://wiki.postgresql.o rg/images/4/44/Pgxc_H A_20121024.pdf
  • 16. 16Copyright©2017 NTT corp. All Rights Reserved. HA of PostgreSQL-XC 1. Introduction GTM Server Coordinator and DataNode Servers
  • 17. 17Copyright©2017 NTT corp. All Rights Reserved. HA of PostgreSQL-XC 10 shard How many HA settings are needed!? 1. Introduction GTM Server Coordinator and DataNode Servers
  • 18. 18Copyright©2017 NTT corp. All Rights Reserved. HA of PostgreSQL-XC 10 shard How many HA settings are needed!? 1. Introduction GTM Server Coordinator and DataNode Servers I don’t want to manage this by myself !
  • 19. 19Copyright©2017 NTT corp. All Rights Reserved. How do people solve this? PostgreSQL Related Slides and Presentations https://wiki.postgresql.org/wiki/PostgreSQL_Related_Slides_and_Presentations PGCONF.EU 2017 • USING KUBERNETES, DOCKER, AND HELM TO DEPLOY ON-DEMAND POSTGRESQL STREAMING REPLICAS https://www.postgresql.eu/events/sessions/pgconfeu2017/session/1559/slides/35/Using%20Kubernetes,% 20Docker,%20and%20Helm%20to%20Deploy%20On-Demand%20PostgreSQL%20Streaming%20Replicas.pdf Postgres Open 2017 • A Kubernetes Operator for PostgreSQL - Architecture and Design https://www.sarahconway.com/slides/postgres-operator.pdf • Containerized Clustered PostgreSQL http://jberkus.github.io/container_cluster_pg/#23 PGConf 2017 • Patroni - HA PostgreSQL made easy https://www.slideshare.net/AlexanderKukushkin1/patroni-ha-postgresql-made-easy • PostgreSQL High Availability in a Containerized World http://jkshah.blogspot.jp/2017/03/pgconf-2017-postgresql-high.html 1. Introduction
  • 20. 20Copyright©2017 NTT corp. All Rights Reserved. How do people solve this? PostgreSQL Related Slides and Presentations https://wiki.postgresql.org/wiki/PostgreSQL_Related_Slides_and_Presentations PGCONF.EU 2017 • USING KUBERNETES, DOCKER, AND HELM TO DEPLOY ON-DEMAND POSTGRESQL STREAMING REPLICAS https://www.postgresql.eu/events/sessions/pgconfeu2017/session/1559/slides/35/Using%20Kubernetes,% 20Docker,%20and%20Helm%20to%20Deploy%20On-Demand%20PostgreSQL%20Streaming%20Replicas.pdf Postgres Open 2017 • A Kubernetes Operator for PostgreSQL - Architecture and Design https://www.sarahconway.com/slides/postgres-operator.pdf • Containerized Clustered PostgreSQL http://jberkus.github.io/container_cluster_pg/#23 PGConf 2017 • Patroni - HA PostgreSQL made easy https://www.slideshare.net/AlexanderKukushkin1/patroni-ha-postgresql-made-easy • PostgreSQL High Availability in a Containerized World http://jkshah.blogspot.jp/2017/03/pgconf-2017-postgresql-high.html 1. Introduction Container and Kubernetes
  • 21. 21Copyright©2017 NTT corp. All Rights Reserved. 3.PostgreSQL on Kubernetes “standing on the shoulders of giants” for getting the Open Source Power! https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants
  • 22. 22Copyright©2017 NTT corp. All Rights Reserved. PostgreSQL on Kubernetes Projects • AtomicDB https://hackernoon.com/postgresql-cluster-into-kubernetes-cluster-f353cde212de • Crunchy PostgreSQL Containers (CPC) https://github.com/CrunchyData/crunchy-containers • Patroni https://github.com/zalando/patroni • Stolon https://github.com/sorintlab/stolon We are trying CPC on OpenShift origin! 1. Introduction
  • 23. 23Copyright©2017 NTT corp. All Rights Reserved. 1. Introduction 2. Crunchy PostgreSQL Containers on OpenShift 3. Multi-Master PostgreSQL Cluster on OpenShift 4. Conclusion Table of Contents
  • 24. 24Copyright©2017 NTT corp. All Rights Reserved. What is Kubernetes? • Kubernetes is an open-source system for management of containerized applications. • Pod is the basic scheduling unit which consists of one or more containers. • Kubernetes follows the master- slave architecture. 1. Introduction https://en.wikipedia.org/wiki/Kubernetes
  • 25. 25Copyright©2017 NTT corp. All Rights Reserved. 2.Crunchy PostgreSQL Containers on OpenShift https://blog.openshift.com/enterprise-ready-kubernetes/
  • 26. 26Copyright©2017 NTT corp. All Rights Reserved. 2.Crunchy PostgreSQL Containers on OpenShift Crunchy PostgreSQL Containers • Docker 1.12 and above • Openshift 3.4 and above • Kubernetes 1.5 and above https://github.com/CrunchyData/crunchy-containers
  • 27. 27Copyright©2017 NTT corp. All Rights Reserved. How to run a simple PostgreSQL Write the Pod manifest file. 2.Crunchy PostgreSQL Containers on OpenShift kind: Pod apiVersion: v1 metadata: name: simple-pg labels: name: simple-pg spec: containers: - name: postgres image: crunchydata/crunchy-postgres:centos7- 10.1-1.7.0 ports: - containerPort: 5432 protocol: TCP env: - name: PGHOST value: /tmp - name: PG_PRIMARY_USER value: primaryuser - name: PG_PRIMARY_PORT value: '5432' - name: PG_MODE value: primary - name: PG_PRIMARY_PASSWORD value: password - name: PG_USER value: testuser - name: PG_PASSWORD value: password - name: PG_DATABASE value: userdb - name: PG_ROOT_PASSWORD value: password volumeMounts: - mountPath: /pgdata name: pgdata readonly: false volumes: - name: pgdata emptyDir: {} Define the Pod state like SQL, not the procedure. • simple_postgres_pod.yml
  • 28. 28Copyright©2017 NTT corp. All Rights Reserved. How to run a simple PostgreSQL Write the Pod manifest file. 2.Crunchy PostgreSQL Containers on OpenShift kind: Pod apiVersion: v1 metadata: name: simple-pg labels: name: simple-pg spec: containers: - name: postgres image: crunchydata/crunchy-postgres:centos7- 10.1-1.7.0 ports: - containerPort: 5432 protocol: TCP env: - name: PGHOST value: /tmp - name: PG_PRIMARY_USER value: primaryuser - name: PG_PRIMARY_PORT value: '5432' - name: PG_MODE value: primary - name: PG_PRIMARY_PASSWORD value: password - name: PG_USER value: testuser - name: PG_PASSWORD value: password - name: PG_DATABASE value: userdb - name: PG_ROOT_PASSWORD value: password volumeMounts: - mountPath: /pgdata name: pgdata readonly: false volumes: - name: pgdata emptyDir: {} Define the Pod state like SQL, not the procedure. • simple_postgres_pod.yml PostgreSQL Container settings
  • 29. 29Copyright©2017 NTT corp. All Rights Reserved. How to run a simple PostgreSQL Write the Pod manifest file. 2.Crunchy PostgreSQL Containers on OpenShift kind: Pod apiVersion: v1 metadata: name: simple-pg labels: name: simple-pg spec: containers: - name: postgres image: crunchydata/crunchy-postgres:centos7- 10.1-1.7.0 ports: - containerPort: 5432 protocol: TCP env: - name: PGHOST value: /tmp - name: PG_PRIMARY_USER value: primaryuser - name: PG_PRIMARY_PORT value: '5432' - name: PG_MODE value: primary - name: PG_PRIMARY_PASSWORD value: password - name: PG_USER value: testuser - name: PG_PASSWORD value: password - name: PG_DATABASE value: userdb - name: PG_ROOT_PASSWORD value: password volumeMounts: - mountPath: /pgdata name: pgdata readonly: false volumes: - name: pgdata emptyDir: {} Define the Pod state like SQL, not the procedure. • simple_postgres_pod.yml pull the container image from DockerHub PostgreSQL Container settings
  • 30. 30Copyright©2017 NTT corp. All Rights Reserved. How to run a simple PostgreSQL Write the Pod manifest file. 2.Crunchy PostgreSQL Containers on OpenShift kind: Pod apiVersion: v1 metadata: name: simple-pg labels: name: simple-pg spec: containers: - name: postgres image: crunchydata/crunchy-postgres:centos7- 10.1-1.7.0 ports: - containerPort: 5432 protocol: TCP env: - name: PGHOST value: /tmp - name: PG_PRIMARY_USER value: primaryuser - name: PG_PRIMARY_PORT value: '5432' - name: PG_MODE value: primary - name: PG_PRIMARY_PASSWORD value: password - name: PG_USER value: testuser - name: PG_PASSWORD value: password - name: PG_DATABASE value: userdb - name: PG_ROOT_PASSWORD value: password volumeMounts: - mountPath: /pgdata name: pgdata readonly: false volumes: - name: pgdata emptyDir: {} Define the Pod state like SQL, not the procedure. PostgreSQL Container settings pull the container image from DockerHub PostgreSQL settings • simple_postgres_pod.yml
  • 31. 31Copyright©2017 NTT corp. All Rights Reserved. How to run a simple PostgreSQL Write the Pod manifest file. 2.Crunchy PostgreSQL Containers on OpenShift kind: Pod apiVersion: v1 metadata: name: simple-pg labels: name: simple-pg spec: containers: - name: postgres image: crunchydata/crunchy-postgres:centos7- 10.1-1.7.0 ports: - containerPort: 5432 protocol: TCP env: - name: PGHOST value: /tmp - name: PG_PRIMARY_USER value: primaryuser - name: PG_PRIMARY_PORT value: '5432' - name: PG_MODE value: primary - name: PG_PRIMARY_PASSWORD value: password - name: PG_USER value: testuser - name: PG_PASSWORD value: password - name: PG_DATABASE value: userdb - name: PG_ROOT_PASSWORD value: password volumeMounts: - mountPath: /pgdata name: pgdata readonly: false volumes: - name: pgdata emptyDir: {} Define the Pod state like SQL, not the procedure. PostgreSQL Container settings pull the container image from DockerHub PostgreSQL settings Pod storage settings • simple_postgres_pod.yml
  • 32. 32Copyright©2017 NTT corp. All Rights Reserved. How to run a simple PostgreSQL Write the Service manifest file. • simple_postgres_svc.yml 2.Crunchy PostgreSQL Containers on OpenShift kind: Service apiVersion: v1 metadata: name: simple-pg labels: name: simple-pg spec: ports: - protocol: TCP port: 5432 targetPort: 5432 nodePort: 0 selector: name: simple-pg type: ClusterIP sessionAffinity: None status: loadBalancer: {}
  • 33. 33Copyright©2017 NTT corp. All Rights Reserved. How to run a simple PostgreSQL Write the Service manifest file. • simple_postgres_svc.yml 2.Crunchy PostgreSQL Containers on OpenShift kind: Service apiVersion: v1 metadata: name: simple-pg labels: name: simple-pg spec: ports: - protocol: TCP port: 5432 targetPort: 5432 nodePort: 0 selector: name: simple-pg type: ClusterIP sessionAffinity: None status: loadBalancer: {} This Service sends packets to TCP port 5432 on any Pod with the "name: simple-pg" label.
  • 34. 34Copyright©2017 NTT corp. All Rights Reserved. How to run a simple PostgreSQL Execute “oc create” to create the Simple PostgreSQL pod and service. 2.Crunchy PostgreSQL Containers on OpenShift $ oc create -f simple_postgres_svc.yml service "simple-pg" created $ oc create -f simple_postgres_pod.yml pod "simple-pg" created service: simple-pg The simple-pg service sends SQLs to the simple-pg pod. pod: simple-pg labels: name: simple-pg simple-pg.mypj01.svc.cluster.local Clients
  • 35. 35Copyright©2017 NTT corp. All Rights Reserved. How to run a simple PostgreSQL Login to the hostname “simple-pg.mypj01.svc.cluster.local”. 2.Crunchy PostgreSQL Containers on OpenShift $ psql -h simple-pg.mypj01.svc.cluster.local -U testuser -d userdb Password for user testuser: userdb=> ¥d List of relations Schema | Name | Type | Owner ----------+--------------------+-------+---------- public | pg_stat_statements | view | postgres testuser | testtable | table | testuser service: simple-pg pod: simple-pg labels: name: simple-pg simple-pg.mypj01.svc.cluster.local Clients
  • 36. 36Copyright©2017 NTT corp. All Rights Reserved. Let’s try to add a replication! 2.Crunchy PostgreSQL Containers on OpenShift simple-pg.mypj01.svc.cluster.local service: simple-pg-rep pod: simple-pg-rep simple-pg-rep.mypj01.svc.cluster.local streaming replication Add replication Pod and Service Clients service: simple-pg pod: simple-pg
  • 37. 37Copyright©2017 NTT corp. All Rights Reserved. Let’s try to add a replication! Write the Pod and Service manifest file. 2.Crunchy PostgreSQL Containers on OpenShift kind: Pod apiVersion: v1 metadata: name: simple-pg-rep labels: name: simple-pg-rep spec: containers: - name: postgres image: crunchydata/crunchy-postgres:centos7- 10.1-1.7.0 ports: - containerPort: 5432 protocol: TCP env: - name: PGHOST value: /tmp - name: PG_PRIMARY_USER value: primaryuser - name: PG_PRIMARY_PORT value: '5432' - name: PG_MODE value: replica - name: PG_PRIMARY_PASSWORD value: password - name: PG_USER value: testuser - name: PG_PASSWORD value: password - name: PG_DATABASE value: userdb - name: PG_ROOT_PASSWORD value: password - name: PG_PRIMARY_HOST value: simple-pg - name: PG_PRIMARY_PORT value: '5432' volumeMounts: - mountPath: /pgdata name: pgdata readOnly: false volumes: - name: pgdata emptyDir: {} • simple_postgres_pod-rep.yml Crunchy-postgres runs a shell script for initialization when the container starts. The PG_MODE variable is used to determine the role(Primary Postgres or Replica Postgres).
  • 38. 38Copyright©2017 NTT corp. All Rights Reserved. Let’s try to add a replication! Execute “oc create” to create the Replication pod and service. 2.Crunchy PostgreSQL Containers on OpenShift $ oc create -f simple_postgres_rep_svc.yml service "simple-pg-rep" created $ oc create -f simple_postgres_rep_pod.yml pod "simple-pg-rep" created
  • 39. 39Copyright©2017 NTT corp. All Rights Reserved. Let’s try to add a replication! Check replication status 2.Crunchy PostgreSQL Containers on OpenShift $ psql -h simple-pg.mypj01.svc.cluster.local -U postgres -d userdb userdb=# select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 329 usesysid | 16391 usename | primaryuser application_name | simple-pg-rep client_addr | 10.131.0.1 ... sync_state | async
  • 40. 40Copyright©2017 NTT corp. All Rights Reserved. Where is PostgreSQL running on OpenShif Cluster? 2.Crunchy PostgreSQL Containers on OpenShift master01 infra01 worker01 worker02 worker03 worker04 worker05 worker06 OpenShift Cluster (8 virtual machines) simple-pg simple-pg-rep $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE simple-pg 1/1 Running 0 7m 10.129.2.20 worker04 simple-pg-rep 1/1 Running 0 5m 10.131.0.16 worker05 We don’t have to manage these IP addresses and where these containers are located!
  • 41. 41Copyright©2017 NTT corp. All Rights Reserved. At first, what should we think about when running PostgreSQL on OpenShift(Kubernetes)? • What should we use for a Persistent storage? • Local disk • (Shared) Network storage • What should we do in case of server/container failures? • Failover • Restart PostgreSQL 2.Crunchy PostgreSQL Containers on OpenShift
  • 42. 42Copyright©2017 NTT corp. All Rights Reserved. At first, what should we think about when running PostgreSQL on OpenShift(Kubernetes)? • What should we use for a Persistent storage? • Local disk • (Shared) Network storage • What should we do in case of server/container failures? • Failover • Restart PostgreSQL 2.Crunchy PostgreSQL Containers on OpenShift
  • 43. 43Copyright©2017 NTT corp. All Rights Reserved. worker01 worker02 What should we use for storage? 2.Crunchy PostgreSQL Containers on OpenShift Local disk Each worker node has several HDD/SSD.
  • 44. 44Copyright©2017 NTT corp. All Rights Reserved. What should we use for storage? 2.Crunchy PostgreSQL Containers on OpenShift (Shared) Network storage Each worker node connects to one or more network storages. worker01 worker02
  • 45. 45Copyright©2017 NTT corp. All Rights Reserved. PersistentVolume plug-in 2.Crunchy PostgreSQL Containers on OpenShift OpenShift Origin supports the following PersistentVolume plug-ins: • NFS • HostPath • GlusterFS • Ceph RBD • OpenStack Cinder • AWS Elastic Block Store (EBS) • GCE Persistent Disk • iSCSI • Fibre Channel • Azure Disk • Azure File • VMWare vSphere • Local https://docs.openshift.org/latest/architecture/additional_concepts/storage.html apiVersion: v1 kind: PersistentVolume metadata: name: pv0003 spec: capacity: storage: 5Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Recycle nfs: path: /tmp server: 172.17.0.2 NFS example
  • 46. 46Copyright©2017 NTT corp. All Rights Reserved. At first, what should we think about when running PostgreSQL on OpenShift(Kubernetes)? • What should we use for a Persistent storage? • Local disk • (Shared) Network storage • What should we do in case of server/container failures? • Failover • Restart PostgreSQL 2.Crunchy PostgreSQL Containers on OpenShift
  • 47. 47Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift • Failover CPC provides a Watch Container to manage failover. worker01 worker02 replication Primary Secondary worker03 Watch Container
  • 48. 48Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift • Failover CPC provides a Watch Container to manage failover. worker01 worker02 replication Primary Secondary worker03 Watch Container Execute pg_isready repeatedly Check a trigger file repeatedly
  • 49. 49Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift • Failover CPC provides a Watch Container to manage failover. worker01 worker02 replication Primary Secondary worker03 Watch Container Execute pg_isready repeatedly Check a trigger file repeatedly
  • 50. 50Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift • Failover CPC provides a Watch Container to manage failover. worker01 worker02 Primary Secondary worker03 Watch Container Check a trigger file repeatedly Set a trigger file
  • 51. 51Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift • Failover CPC provides a Watch Container to manage failover. worker01 worker02 Primary Secondary worker03 Watch Container Set a trigger file Promote!
  • 52. 52Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift • Failover CPC provides a Watch Container to manage failover. worker01 worker02 Primary worker03 Watch Container Set a trigger file Promote! Secondary -> Primary Change the label
  • 53. 53Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift Primary • Restart PostgreSQL OpenShift(Kubernetes) provides Liveness Probe to confirm whether the container is alive. worker01 worker02 master01 OpenShift Master Mount the network storage
  • 54. 54Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift Primary • Restart PostgreSQL OpenShift(Kubernetes) provides Liveness Probe to confirm whether the container is alive. worker01 worker02 master01 OpenShift Master Execute pg_isready repeatedly Liveness Probe Mount the network storage
  • 55. 55Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift Primary • Restart PostgreSQL OpenShift(Kubernetes) provides Liveness Probe to confirm whether the container is alive. worker01 worker02 master01 OpenShift Master Execute pg_isready repeatedly Liveness Probe Mount the network storage Check Liveness Probe results.
  • 56. 56Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift Primary • Restart PostgreSQL OpenShift(Kubernetes) provides Liveness Probe to confirm whether the container is alive. worker01 worker02 master01 OpenShift Master Liveness Probe Check Liveness Probe results.
  • 57. 57Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift Primary • Restart PostgreSQL OpenShift(Kubernetes) provides Liveness Probe to confirm whether the container is alive. worker01 worker02 master01 OpenShift Master Restart the PostgreSQL pod on other worker nodes
  • 58. 58Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift Primary • Restart PostgreSQL OpenShift(Kubernetes) provides Liveness Probe to confirm whether the container is alive. worker01 worker02 master01 OpenShift Master Restart the PostgreSQL pod on other worker nodes
  • 59. 59Copyright©2017 NTT corp. All Rights Reserved. What should we do in case of server/container failures? 2.Crunchy PostgreSQL Containers on OpenShift Primary • Restart PostgreSQL OpenShift(Kubernetes) provides Liveness Probe to confirm whether the container is alive. worker01 worker02 master01 OpenShift Master Restart the PostgreSQL pod on other worker nodes Mount the same network storage
  • 60. 60Copyright©2017 NTT corp. All Rights Reserved. Choose a suitable method for your environments. Persistent storage • Local disk • (Shared) Network storage in case of server/container failures • Failover • Restart PostgreSQL 2.Crunchy PostgreSQL Containers on OpenShift
  • 61. 61Copyright©2017 NTT corp. All Rights Reserved. CPC other examples 2.Crunchy PostgreSQL Containers on OpenShift crunchy-collect The crunchy-collect container gathers different metrics from the crunchy-postgres container and pushes these to the Prometheus Promgateway. https://github.com/CrunchyData/crunchy-containers/blob/master/docs/metrics.adoc
  • 62. 62Copyright©2017 NTT corp. All Rights Reserved. Advantages of PostgreSQL on Kubernetes Easy to run PostgreSQL (including HA, monitoring etc.). ØDefine the state, not the procedure. The procedure is packed into the container with portability. ØThe network is abstracted. This makes easier to define the state. Disadvantages of PostgreSQL on Kubernetes Hard to manage Kubernetes cluster. (maybe...) ØIt seems to me that the complexity of PostgreSQL layer is shifted to Kubernetes layer. ØBut if you are already running a Kubernetes cluster (and that will come soon?) I believe this will not be a problem. 2.Crunchy PostgreSQL Containers on OpenShift
  • 63. 63Copyright©2017 NTT corp. All Rights Reserved. 1. Introduction 2. Crunchy PostgreSQL Containers on OpenShift 3. Multi-Master PostgreSQL Cluster on OpenShift 4. Conclusion Table of Contents
  • 64. 64Copyright©2017 NTT corp. All Rights Reserved. Try to install Built-in Sharding base Multi-Master PostgreSQL Cluster on OpenShift. 3.Multi-Master PostgreSQL Cluster on OpenShift Clients pg-coordinators FDW FDW pg-shards parent table partition table • The PostgreSQL Container contains the same source code which was demonstrated in the today's morning session "Built-in Sharding Special Version” and the modified scripts of Crunchy-postgres container. $ docker pull ooyamams/postgres-dev • Using NFS as a shared network storage for test purposes. • pg-coordinator Pod and pg-shard Pod are created by Templates and controlled by DeploymentConfig, not StatefulSet.
  • 65. 65Copyright©2017 NTT corp. All Rights Reserved. kind: Template apiVersion: v1 metadata: name: pg-coordinator creationTimestamp: null annotations: description: PostgreSQL Multi-Master Build in Sharding Example iconClass: icon-database tags: database,postgresql parameters: - name: PG_PRIMARY_USER description: The username used for primary / replica replication value: primaryuser ...(omit)... objects: - kind: Service apiVersion: v1 ...(omit)... - kind: DeploymentConfig apiVersion: v1 metadata: name: ${PG_PRIMARY_SERVICE_NAME} creationTimestamp: null spec: strategy: type: Recreate resources: {} triggers: - type: ConfigChange replicas: 2 3.Multi-Master PostgreSQL Cluster on OpenShift selector: name: ${PG_PRIMARY_SERVICE_NAME} template: metadata: creationTimestamp: null labels: name: ${PG_PRIMARY_SERVICE_NAME} spec: serviceAccount: pg-cluster-sa containers: - name: server image: 172.30.81.49:5000/mypj01/postgres- dev:build-in-sharding-4 livenessProbe: exec: command: - /opt/cpm/bin/liveness.sh initialDelaySeconds: 90 timeoutSeconds: 1 ports: - containerPort: 5432 protocol: TCP env: - name: PG_PRIMARY_HOST value: ${PG_PRIMARY_SERVICE_NAME} ...(omit)... resources: {} terminationMessagePath: /dev/termination-log securityContext: privileged: false volumeMounts: - mountPath: /pgdata name: pgdata readOnly: false volumes: - name: pgdata emptyDir: {} restartPolicy: Always dnsPolicy: ClusterFirst status: {} pg-coordinator.yml
  • 66. 66Copyright©2017 NTT corp. All Rights Reserved. Try to install Built-in Sharding base Multi-Master PostgreSQL Cluster on OpenShift. 3.Multi-Master PostgreSQL Cluster on OpenShift $ oc create -f pg-coordinator.yml • Create Template for pg-coordinator. $ oc process pg-coordinator | oc create -f - • Create Service and DeploymentConfig from the Template. pg-coordinator pg-coordinator pg-coordinator.mypj01 .svc.cluster.local pg-coordinator clients
  • 67. 67Copyright©2017 NTT corp. All Rights Reserved. Try to install Built-in Sharding base Multi-Master PostgreSQL Cluster on OpenShift. 3.Multi-Master PostgreSQL Cluster on OpenShift $ oc create -f pg-coordinator.yml • Create Template for pg-coordinator. $ oc process pg-coordinator | oc create -f - • Create Service and DeploymentConfig from the Template. Round robin load balancing pg-coordinator pg-coordinator pg-coordinator.mypj01 .svc.cluster.local pg-coordinator clients
  • 68. 68Copyright©2017 NTT corp. All Rights Reserved. 3.Multi-Master PostgreSQL Cluster on OpenShift kind: Template apiVersion: v1 metadata: name: pg-shared creationTimestamp: null annotations: description: PostgreSQL Multi-Master Build in Sharding Example iconClass: icon-database tags: database,postgresql parameters: - name: PG_PRIMARY_USER description: The username used for primary / replica replication value: primaryuser ...(omit)... - name: PGDATA_PATH_OVERRIDE value: shared-01 objects: - kind: Service apiVersion: v1 metadata: name: ${PG_PRIMARY_SERVICE_NAME} labels: name: ${PG_PRIMARY_SERVICE_NAME} spec: ports: - protocol: TCP port: 5432 targetPort: 5432 nodePort: 0 selector: name: ${PG_PRIMARY_SERVICE_NAME} clusterIP: None - kind: DeploymentConfig apiVersion: v1 metadata: name: ${PG_PRIMARY_SERVICE_NAME} creationTimestamp: null spec: serviceName: ${PG_PRIMARY_SERVICE_NAME} strategy: ...(omit)... labels: name: ${PG_PRIMARY_SERVICE_NAME} spec: serviceAccount: pg-sa containers: - name: ${PG_PRIMARY_SERVICE_NAME} image: 172.30.81.49:5000/mypj01/postgres- dev:build-in-sharding-1 livenessProbe: exec: command: - /opt/cpm/bin/liveness.sh initialDelaySeconds: 90 timeoutSeconds: 1 ports: - containerPort: 5432 protocol: TCP env: - name: PG_PRIMARY_HOST value: ${PG_PRIMARY_SERVICE_NAME} ...(omit)... resources: {} terminationMessagePath: /dev/termination-log securityContext: privileged: false volumeMounts: - name: pgdata mountPath: /pgdata readOnly: false volumes: - name: pgdata persistentVolumeClaim: claimName: crunchy-pvc pg-shard.yml
  • 69. 69Copyright©2017 NTT corp. All Rights Reserved. Try to install Built-in Sharding base Multi-Master PostgreSQL Cluster on OpenShift. 3.Multi-Master PostgreSQL Cluster on OpenShift $ oc create -f pg-shard.yml • Create Template for pg-shard. $ oc process pg-shard -p PG_PRIMARY_SERVICE_NAME=pg-shard-00 -p PGDATA_PATH_OVERRIDE=pg-shard-00 | oc create -f - $ oc process pg-shard -p PG_PRIMARY_SERVICE_NAME=pg-shard-01 -p PGDATA_PATH_OVERRIDE=pg-shard-01 | oc create -f - • Create Service and DeploymentConfig from the Template. pg-shard-00 pg-shard-00.mypj01 .svc.cluster.local pg-shard-01 pg-shard-01.mypj01 .svc.cluster.local pg-shard-00 pg-shard-01 pg-shard-01 pg-shard-00 pg-coordinator pg-coordinator pg-coordinator.mypj01 .svc.cluster.local pg-coordinator clients Round robin load balancing
  • 70. 70Copyright©2017 NTT corp. All Rights Reserved. Check Physical cluster state. 3.Multi-Master PostgreSQL Cluster on OpenShift infra01 worker02 $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE pg-coordinator-1-drtsv 1/1 Running 0 4m 10.131.0.37 worker05 pg-coordinator-1-rmkb8 1/1 Running 0 4m 10.130.0.50 worker03 pg-shard-00-1-qqwtr 1/1 Running 0 4m 10.129.0.68 worker02 pg-shard-01-1-ht4jn 1/1 Running 0 3m 10.130.0.51 worker03 master01 worker01 worker04 worker03 worker06 worker05 NFS server OpenShift Cluster (8 virtual machines) pg-coordinator pg-shard legends pg-shard-01 pg-shard-00 OpenShift Master
  • 71. 71Copyright©2017 NTT corp. All Rights Reserved. Execute SQL to the cluster. The same schema and data as "Sharding in Build" in the morning session. 3.Multi-Master PostgreSQL Cluster on OpenShift $ psql -h pg-coordinator.mypj01.svc.cluster.local -U postgres -d postgres ¥ -c "explain analyze select * from flight_bookings"; QUERY PLAN ----------------------------------------------------------------------------------------- Append (cost=100.00..254.09 rows=974 width=144) (actual time=1.945..40.289 rows=7499 loops=1) -> Foreign Scan on flight_bookings1 (cost=...) (actual time=...) -> Foreign Scan on flight_bookings0 (cost=...) (actual time=...) Planning time: 1.930 ms Execution time: 60.319 ms (5 rows) pg-shard-00 pg-shard-00.mypj01 .svc.cluster.local pg-shard-01 pg-shard-01.mypj01 .svc.cluster.local pg-shard-00 pg-shard-01 pg-shard-01 pg-shard-00 pg-coordinator pg-coordinator pg-coordinator.mypj01 .svc.cluster.local pg-coordinator clients Round robin load balancing
  • 72. 72Copyright©2017 NTT corp. All Rights Reserved. Try worker03 to crash. 3.Multi-Master PostgreSQL Cluster on OpenShift Execute this command to generate kernel panic on worker 03. $ oc get node NAME STATUS AGE VERSION infra01 Ready 26d v1.7.6+a08f5eeb62 master01 Ready,SchedulingDisabled 26d v1.7.6+a08f5eeb62 worker01 Ready 26d v1.7.6+a08f5eeb62 worker02 Ready 26d v1.7.6+a08f5eeb62 worker03 NotReady 26d v1.7.6+a08f5eeb62 worker04 Ready 26d v1.7.6+a08f5eeb62 worker05 Ready 26d v1.7.6+a08f5eeb62 worker06 Ready 26d v1.7.6+a08f5eeb62 $ sudo sh -c "echo c > /proc/sysrq-trigger" Worker 03 status changes “NotReady”.
  • 73. 73Copyright©2017 NTT corp. All Rights Reserved. Containers are re-creating. 3.Multi-Master PostgreSQL Cluster on OpenShift $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE pg-coordinator-1-drtsv 1/1 Running 0 8m 10.131.0.37 worker05 pg-coordinator-1-rmkb8 1/1 Terminating 0 8m 10.130.0.50 worker03 pg-coordinator-1-wk4wx 0/1 ContainerCreating 0 2s <none> worker01 pg-shard-00-1-qqwtr 1/1 Running 0 8m 10.129.0.68 worker02 pg-shard-01-1-ht4jn 1/1 Terminating 0 8m 10.130.0.51 worker03 pg-shard-01-1-ztdn4 0/1 ContainerCreating 0 2s <none> worker01 infra01 worker02 master01 worker01 worker04 worker03 worker06 worker05 NFS server OpenShift Cluster (8 virtual machines) pg-coordinator pg-shard legends pg-shard-01 pg-shard-00 OpenShift Master New container
  • 74. 74Copyright©2017 NTT corp. All Rights Reserved. Re-execute SQL. 3.Multi-Master PostgreSQL Cluster on OpenShift $ psql -h pg-coordinator.mypj01.svc.cluster.local -U postgres -d postgres ¥ -c "explain analyze select * from flight_bookings"; QUERY PLAN --------------------------------------------------------------------------------------- Append (cost=100.00..254.09 rows=974 width=144) (actual time=15.531..63.278 rows=7499 loops=1) -> Foreign Scan on flight_bookings1 (cost=...) (actual time=...) -> Foreign Scan on flight_bookings0 (cost=...) (actual time=...) Planning time: 1.716 ms Execution time: 116.989 ms (5 rows) pg-shard-00 pg-shard-00.mypj01 .svc.cluster.local pg-shard-01 pg-shard-01.mypj01 .svc.cluster.local pg-shard-00 pg-shard-01 pg-shard-01 pg-shard-00 pg-coordinator pg-coordinator pg-coordinator.mypj01 .svc.cluster.local pg-coordinator clients Round robin load balancing
  • 75. 75Copyright©2017 NTT corp. All Rights Reserved. 3.Multi-Master PostgreSQL Cluster on OpenShift pg-coordinators don't have persistent storage. So they lost the cluster configurations if restarted. worker01 worker03 pg-coordinators worker02 pg-shards worker04 NFS server pg-shard-01 pg-shard-00 worker05 worker06 mount parent table FDW parent table FDW
  • 76. 76Copyright©2017 NTT corp. All Rights Reserved. 3.Multi-Master PostgreSQL Cluster on OpenShift pg-coordinators don't have persistent storage. So they lost the cluster configurations if restarted. worker01 worker03 pg-coordinators worker02 pg-shards worker04 NFS server pg-shard-01 pg-shard-00 worker05 worker06 mount parent table FDW parent table FDW
  • 77. 77Copyright©2017 NTT corp. All Rights Reserved. 3.Multi-Master PostgreSQL Cluster on OpenShift pg-coordinators don't have persistent storage. So they lost the cluster configurations if restarted. worker01 worker03 pg-coordinators worker02 pg-shards worker04 NFS server pg-shard-01 pg-shard-00 worker05 worker06 re-mount parent table FDW parent table FDW
  • 78. 78Copyright©2017 NTT corp. All Rights Reserved. 3.Multi-Master PostgreSQL Cluster on OpenShift pg-coordinators don't have persistent storage. So they lost the cluster configurations if restarted. worker01 worker03 pg-coordinators worker02 pg-shards worker04 NFS server pg-shard-01 pg-shard-00 worker05 worker06 re-mount parent table FDW parent table FDW
  • 79. 79Copyright©2017 NTT corp. All Rights Reserved. 3.Multi-Master PostgreSQL Cluster on OpenShift pg-coordinators don't have persistent storage. So they lost the cluster configurations if restarted. worker01 worker03 pg-coordinators worker02 pg-shards worker04 NFS server pg-shard-01 pg-shard-00 worker05 worker06 re-mount FDW parent table FDW empty Lost configurations! ? parent table
  • 80. 80Copyright©2017 NTT corp. All Rights Reserved. Custom Resource Definitions (CRD) We try to use CDR(CustomResourceDefinitions) for sharing the cluster configurations. 3.Multi-Master PostgreSQL Cluster on OpenShift • A image of the HTTP API space of kubernetes /api/v1/mypj01 /api/v1/mypj01/pods/... /api/v1/mypj01/services/... /api/v1/mypj01/....
  • 81. 81Copyright©2017 NTT corp. All Rights Reserved. Custom Resource Definitions (CRD) We try to use CDR(CustomResourceDefinitions) for sharing the cluster configurations. 3.Multi-Master PostgreSQL Cluster on OpenShift -> CDR can extend the kubernetes core API. • A image of the HTTP API space of kubernetes /api/v1/mypj01 /api/v1/mypj01/pods/... /api/v1/mypj01/services/... /api/v1/mypj01/.... /api/v1/mypj01/pg-cluster/...
  • 82. 82Copyright©2017 NTT corp. All Rights Reserved. Write the CDR manifest file. 3.Multi-Master PostgreSQL Cluster on OpenShift $ cat pg-cluster-shard-information.yml apiVersion: apiextensions.k8s.io/v1beta1 kind: CustomResourceDefinition metadata: name: shard-informations.pg-cluster.example.com spec: group: pg-cluster.example.com version: v1 scope: Namespaced names: plural: shard-information singular: shard-information kind: Shard-Info shortNames: - si This CRD can be access by “oc get shard-information“ or “oc get si“ $ oc get shard-information No resources found.
  • 83. 83Copyright©2017 NTT corp. All Rights Reserved. Write the shard-information manifest file. 3.Multi-Master PostgreSQL Cluster on OpenShift $ cat defines/sharding_info_coordinator.yml apiVersion: "pg-cluster.example.com/v1" kind: Shard-Info metadata: name: multi-master-demo-pgconfasia spec: coordinator: "create extension if not exists postgres_fdw; create server shard0 foreign data wrapper postgres_fdw options (host 'pg-shard-00.mypj01.svc.cluster.local', dbname 'postgres', port '5432'); create server shard1 foreign data wrapper postgres_fdw options (host 'pg-shard-01.mypj01.svc.cluster.local', dbname 'postgres', port '5432'); create user mapping for postgres server shard0 OPTIONS (user 'postgres', password 'password'); create user mapping for postgres server shard1 OPTIONS (user 'postgres', password 'password'); create table hotel_bookings (id serial, user_id int, booked_at timestamp, city_name text, continent text, flight_id int) partition by list (continent); create table flight_bookings (id serial, user_id int, booked_at timestamp, from_city text, from_continent text, to_city text, to_continent text) partition by list (to_continent); create table users (id serial, name text, age int); create foreign table flight_bookings0 partition of flight_bookings for values in ('Asia', 'Oceania') server shard0; create foreign table hotel_bookings0 partition of hotel_bookings for values in ('Asia', 'Oceania') server shard0; create foreign table flight_bookings1 partition of flight_bookings for values in ('Europe', 'Africa') server shard1; create foreign table hotel_bookings1 partition of hotel_bookings for values in ('Europe', 'Africa') server shard1;
  • 84. 84Copyright©2017 NTT corp. All Rights Reserved. Write the shard-information manifest file. 3.Multi-Master PostgreSQL Cluster on OpenShift $ cat defines/sharding_info_coordinator.yml apiVersion: "pg-cluster.example.com/v1" kind: Shard-Info metadata: name: multi-master-demo-pgconfasia spec: coordinator: "create extension if not exists postgres_fdw; create server shard0 foreign data wrapper postgres_fdw options (host 'pg-shard-00.mypj01.svc.cluster.local', dbname 'postgres', port '5432'); create server shard1 foreign data wrapper postgres_fdw options (host 'pg-shard-01.mypj01.svc.cluster.local', dbname 'postgres', port '5432'); create user mapping for postgres server shard0 OPTIONS (user 'postgres', password 'password'); create user mapping for postgres server shard1 OPTIONS (user 'postgres', password 'password'); create table hotel_bookings (id serial, user_id int, booked_at timestamp, city_name text, continent text, flight_id int) partition by list (continent); create table flight_bookings (id serial, user_id int, booked_at timestamp, from_city text, from_continent text, to_city text, to_continent text) partition by list (to_continent); create table users (id serial, name text, age int); create foreign table flight_bookings0 partition of flight_bookings for values in ('Asia', 'Oceania') server shard0; create foreign table hotel_bookings0 partition of hotel_bookings for values in ('Asia', 'Oceania') server shard0; create foreign table flight_bookings1 partition of flight_bookings for values in ('Europe', 'Africa') server shard1; create foreign table hotel_bookings1 partition of hotel_bookings for values in ('Europe', 'Africa') server shard1; There are SQLs for setting up to pg-coordinator.
  • 85. 85Copyright©2017 NTT corp. All Rights Reserved. We can get these SQLs through “oc” command. 3.Multi-Master PostgreSQL Cluster on OpenShift $ oc get si multi-master-demo-pgconfasia -o json | jq -r '.spec.coordinator' "create extension if not exists postgres_fdw; create server shard0 foreign data wrapper postgres_fdw options (host 'pg-shard-00.mypj01.svc.cluster.local', dbname 'postgres', port '5432'); create server shard1 foreign data wrapper postgres_fdw options (host 'pg-shard-01.mypj01.svc.cluster.local', dbname 'postgres', port '5432'); create user mapping for postgres server shard0 OPTIONS (user 'postgres', password 'password'); create user mapping for postgres server shard1 OPTIONS (user 'postgres', password 'password'); create table hotel_bookings (id serial, user_id int, booked_at timestamp, city_name text, continent text, flight_id int) partition by list (continent); create table flight_bookings (id serial, user_id int, booked_at timestamp, from_city text, from_continent text, to_city text, to_continent text) partition by list (to_continent); create table users (id serial, name text, age int); create foreign table flight_bookings0 partition of flight_bookings for values in ('Asia', 'Oceania') server shard0; create foreign table hotel_bookings0 partition of hotel_bookings for values in ('Asia', 'Oceania') server shard0; create foreign table flight_bookings1 partition of flight_bookings for values in ('Europe', 'Africa') server shard1; create foreign table hotel_bookings1 partition of hotel_bookings for values in ('Europe', 'Africa') server shard1; pg-coordinator can also take out these SQLs from the Shard-Info CDR.
  • 86. 86Copyright©2017 NTT corp. All Rights Reserved. 3.Multi-Master PostgreSQL Cluster on OpenShift worker03 pg-coordinators pg-shards worker04 NFS server pg-shard-01 pg-shard-00 worker05 worker06 mount mount parent table FDW empty Lost configurations! ? master01 OpenShift Master pg-shards pg-cluster.example.com/Shard-Info multi-master-demo-pgconfasia: spec: coordinator: create... pg-coordinator takes out the setting SQLs from the Shard- Info CDR when it is started.
  • 87. 87Copyright©2017 NTT corp. All Rights Reserved. 3.Multi-Master PostgreSQL Cluster on OpenShift worker03 pg-coordinators pg-shards worker04 NFS server pg-shard-01 pg-shard-00 worker05 worker06 mount mount parent table FDW empty Lost configurations! ? master01 OpenShift Master pg-shards oc get si ... > /tmp/setup.sql psql -U postgres < tmp/setup.sql pg-cluster.example.com/Shard-Info multi-master-demo-pgconfasia: spec: coordinator: create... pg-coordinator takes out the setting SQLs from the Shard- Info CDR when it is started. Take out from the Shard-Info.
  • 88. 88Copyright©2017 NTT corp. All Rights Reserved. 3.Multi-Master PostgreSQL Cluster on OpenShift worker03 pg-coordinators pg-shards worker04 NFS server pg-shard-01 pg-shard-00 worker05 worker06 mount mount parent table FDW master01 OpenShift Master pg-shards parent table FDW configured ! pg-cluster.example.com/Shard-Info multi-master-demo-pgconfasia: spec: coordinator: create... pg-coordinator takes out the setting SQLs from the Shard- Info CDR when it is started.
  • 89. 89Copyright©2017 NTT corp. All Rights Reserved. Let’s scale out pg-coordinator. 3.Multi-Master PostgreSQL Cluster on OpenShift $ oc scale dc pg-coordinator --replicas=3 deploymentconfig "pg-coordinator" scaled • Set “replicas” 2 -> 3 $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE pg-coordinator-1-drtsv 1/1 Running 0 20m 10.131.0.37 worker05 pg-coordinator-1-wk4wx 0/1 Running 0 11m 10.128.0.76 worker01 pg-coordinator-1-87ddq 0/1 Running 0 1m 10.129.2.45 worker04 pg-shard-00-1-qqwtr 1/1 Running 0 20m 10.129.0.68 worker02 pg-shard-01-1-ztdn4 0/1 Running 0 11m 10.128.0.77 worker01 • Create Service and DeploymentConfig from the Template. infra01 worker02 master01 worker01 worker04 worker03 worker06 worker05 NFS server OpenShift Cluster (8 virtual machines) pg-coordinator pg-shard legends pg-shard-01 pg-shard-00 OpenShift Master New container
  • 90. 90Copyright©2017 NTT corp. All Rights Reserved. 4.Conclusion l PostgreSQL on Kubernetes is easy to run PostgreSQL even if Multi-Master PostgreSQL Cluster. • including HA, monitoring etc. • easy to scale out. l Custom Resource Definitions is useful for sharing the cluster configurations. but you have to consider ... Ø Persistent storage. Ø In case of server/container failure. Ø Kubernetes cluster management. Please try PostgreSQL on kubernetes! You too! And share the knowledge. Future Plan We try to use Red Hat Ceph Storage as a distributed block storage instead of NFS. This can make the storage layer scalable! And evaluate the Performance.
  • 91. 91Copyright©2017 NTT corp. All Rights Reserved. 2 examples of Multi-Master PostgreSQL Cluster Appendix • For High availability • For Performance BDR (Bi-Directional Replication) Postgres-XC way Logical Replication Coordinators DataNodes
  • 92. 92Copyright©2017 NTT corp. All Rights Reserved. OpenShift provides Web Console. Appendix