SlideShare a Scribd company logo
1 of 41
Download to read offline
walbouncer: Filtering WAL
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
walbouncer: Filtering WAL
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Current status
WAL streaming is the basis for replication
Limitations:
Currently an entire database instance has to be replicated
There is no way to replicate single databases
WAL used to be hard to read
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
The goal
Create a “WAL-server” to filter the transaction log
Put walbouncer between the PostgreSQL master and the
“partial” slave
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
How is it done?
The basic structure of the WAL is very actually fairly nice to
filter
Each WAL record that accesses a database has a RelFileNode:
database OID
tablespace OID
data file OID
What more do we need?
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
WAL format
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
WAL format
WAL logically is a stream of records.
Each record is identified by position in this stream.
WAL is stored in 16MB files called segments.
Each segment is composed of 8KB WAL pages.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
WAL pages
Wal pages have a header containing:
16bit “magic” value
Flag bits
Timeline ID
Xlog position of this page
Length of data remaining from last record on previous page
Additionally first page of each segment has the following
information for correctness validation:
System identifier
WAL segment and block sizes
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Xlog record structre
Total length of record
Transaction ID that produced the record
Length of record specific data excluding header and backup
blocks
Flags
Record type (e.g. Xlog checkpoint, transaction commit, btree
insert)
Start position of previous record
Checksum of this record
Record specific data
Full page images
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Handling WAL positions
WAL positions are highly critical
WAL addresses must not be changed
addresses are stored in data page headers to decide if replay is
necessary.
The solution:
inject dummy records into the WAL
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Dummy records
PostgreSQL has infrastructure for dummy WAL entries
(basically “zero” values)
Valid WAL records can therefore be replaced with dummy
ones quite easily.
The slave will consume and ignore them
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Question: What to filter?
What about the shared catalog?
We got to replicate the shared catalog
This has some consequences:
The catalog might think that a database called X is around
but in fact files are missing.
This is totally desirable
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Catalog issues
There is no reasonably simple way to filter the content of the
shared catalog (skip rows or so).
It is hardly possible to add semantics to the filtering
But, this should be fine for most users
If you want to access a missing element, you will simply get
an error (missing file, etc.).
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Replication protocol
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Streaming replication
Replication connections use the same wire protocol as regular
client connections.
However they are serviced by special backend processes called
walsenders
Walsenders are started by including “replication=true” in the
startup packet.
Replication connections support different commands from a
regular backend.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Starting up replication 1/2
When a slave connects it first executes IDENTIFY SYSTEM
postgres=# IDENTIFY_SYSTEM;
systemid | timeline | xlogpos | dbname
---------------------+----------+-----------+--------
6069865756165247251 | 2 | 0/3B7E910 |
Then any necessary timeline history files are fetched:
postgres=# TIMELINE_HISTORY 2;
filename | content
------------------+----------------------------------
00000002.history | 1 0/3000090 no recovery
target specified+
|
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Starting up replication 2/2
Streaming of writeahead log is started by executing:
START_REPLICATION [SLOT slot_name] [PHYSICAL] XXX/XXX
[TIMELINE tli]
START REPLICATION switches the connection to a
bidirectional COPY mode.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Replication messages
Replication messages are embedded in protocol level copy
messages. 4 types of messages:
XLogData (server -> client)
Keepalive message (server -> client)
Standby status update (client -> server)
Hot Standby feedback (client -> server)
To end replication either end can close the copying with a
CopyDone protocol message.
If the WAL stream was from an old timeline the server sends a
result set with the next timeline ID and start position upon
completion.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Other replication commands
Other replication commands:
START REPLICATION ... LOGICAL ...
CREATE REPLICATION SLOT
DROP REPLICATION SLOT
BASE BACKUP
Not supported by walbouncer yet.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Software design
Client connects to the WAL bouncer instead of the master.
WAL bouncer forks, connects to the master and streams xlog
to the client.
A this point the WAL proxy does not buffer stuff.
One connection to the master per client.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Software design - filtering
WAL stream is split into records.
Splitting works as a state machine consuming input and
immediately forwarding it. We only buffer when we need to
wait for a RelFileNode to decide whether we need to filter.
Based on record type we extract the RelFileNode from the
WAL record and decide if we want to filter it out.
If we want to filter we replace the record with a XLOG NOOP
that has the same total length.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Synchronizing record decoding
Starting streaming is hard. Client may request streaming from
the middle of a record.
We have a state machine for synchronization.
Determine if we are in a continuation record from WAL page
header.
If we are, stream data until we have buffered the next record
header.
From next record header we read the previous record link, then
restart decoding from that position.
Once we hit the requested position stream the filtered data out
to client.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Using WAL bouncer
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Streaming
At this point we did not want to take the complexity of
implementing buffering.
Getting right what we got is already an important step
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
How to set things up
First f all an initial base backup is needed
The easiest thing here is rsync
Skip all directories in “base”, which do not belong to your
desired setup
pg database is needed to figure out, what to skip.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Start streaming
Once you got the initial copy, setup streaming replication just
as if you had a “normal” master.
Use the address of the walbouncer in your primary conninfo
You will not notice any difference during replication
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Important aspects
If you use walbouncer use the usual streaming replication
precautions
enough wal keep segments or use
take care of conflicts (hot standby feedback)
etc.
You cannot promote a slave that has filtered data to master.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Adding databases
List of database OIDs to filter out is only fetched at
walbouncer backend startup.
Disconnect any streaming slaves and reconfigure filtering while
you are creating the database.
If you try to do it online you the slaves will not know to filter
the new database.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Dropping databases
Have all slaves that want to filter out the dropped database
actively streaming before you execute the drop.
Otherwise the slaves will not know to skip the drop record and
xlog replay will fail with an error.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Can we filter on individual tables
For filtering individual tables you can use tablespace filtering
functionality.
Same caveats apply for adding-removing tablespaces as for
databases.
You can safely add/remove tables in filtered tablespaces and
even move tables between filtered/non-filtered tablespaces.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Bonus feature
You can use walbouncer to switch the master server without
restarting the slave.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Limitations
Currently PostgreSQL 9.4 only.
No SSL support yet.
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Simple configuration
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
A sample config file (1)
listen_port: 5433
master:
host: localhost
port: 5432
configurations:
- slave1:
match:
application_name: slave1
filter:
include_tablespaces: [spc_slave1]
exclude_databases: [test]
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
A sample config file (2)
- slave2:
match:
application_name: slave2
filter:
include_tablespaces: [spc_slave2]
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
application name
The application name is needed to support synchronous
replication as well as better monitoring
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
include tablespaces / exclude tablespaces
A good option to exclude entire groups of databases
In a perverted way this can be used to filter on tables
No need to mention that you should not
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Finally
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Where can we download the stuff?
For download and more information visit::
www.postgresql-support.de/walbouncer/
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Thank you for your attention
Any questions?
Hans-J¨urgen Sch¨onig
www.postgresql-support.de
Contact
Cybertec Sch¨onig & Sch¨onig GmbH
Gr¨ohrm¨uhlgasse 26
A-2700 Wiener Neustadt
www.postgresql-support.de
Hans-J¨urgen Sch¨onig
www.postgresql-support.de

More Related Content

What's hot

Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationAlexey Lesovsky
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterAlexey Lesovsky
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL-Consulting
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
Managing a 14 TB reporting datawarehouse with postgresql
Managing a 14 TB reporting datawarehouse with postgresql Managing a 14 TB reporting datawarehouse with postgresql
Managing a 14 TB reporting datawarehouse with postgresql Soumya Ranjan Subudhi
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 
PostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with groupingPostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with groupingAlexey Bashtanov
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQLMark Wong
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres MonitoringDenish Patel
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internalsAlexey Ermakov
 
Best Practices in Handling Performance Issues
Best Practices in Handling Performance IssuesBest Practices in Handling Performance Issues
Best Practices in Handling Performance IssuesOdoo
 
Understanding Autovacuum
Understanding AutovacuumUnderstanding Autovacuum
Understanding AutovacuumDan Robinson
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
Common Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo appsCommon Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo appsOdoo
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollPGConf APAC
 
12c SQL Plan Directives
12c SQL Plan Directives12c SQL Plan Directives
12c SQL Plan DirectivesFranck Pachot
 
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeSCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeJeff Frost
 

What's hot (20)

Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenter
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQ
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
Managing a 14 TB reporting datawarehouse with postgresql
Managing a 14 TB reporting datawarehouse with postgresql Managing a 14 TB reporting datawarehouse with postgresql
Managing a 14 TB reporting datawarehouse with postgresql
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
PostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with groupingPostgreSQL, performance for queries with grouping
PostgreSQL, performance for queries with grouping
 
PostgreSQL and RAM usage
PostgreSQL and RAM usagePostgreSQL and RAM usage
PostgreSQL and RAM usage
 
collectd & PostgreSQL
collectd & PostgreSQLcollectd & PostgreSQL
collectd & PostgreSQL
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internals
 
Best Practices in Handling Performance Issues
Best Practices in Handling Performance IssuesBest Practices in Handling Performance Issues
Best Practices in Handling Performance Issues
 
Understanding Autovacuum
Understanding AutovacuumUnderstanding Autovacuum
Understanding Autovacuum
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
Common Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo appsCommon Performance Pitfalls in Odoo apps
Common Performance Pitfalls in Odoo apps
 
Strategic autovacuum
Strategic autovacuumStrategic autovacuum
Strategic autovacuum
 
How to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'rollHow to teach an elephant to rock'n'roll
How to teach an elephant to rock'n'roll
 
12c SQL Plan Directives
12c SQL Plan Directives12c SQL Plan Directives
12c SQL Plan Directives
 
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade DowntimeSCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
SCALE 15x Minimizing PostgreSQL Major Version Upgrade Downtime
 

Similar to Walbouncer: Filtering PostgreSQL transaction log

Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptxBuilt-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptxnadirpervez2
 
Built in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecBuilt in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecFIRAT GULEC
 
Challenges when building high profile editorial sites
Challenges when building high profile editorial sitesChallenges when building high profile editorial sites
Challenges when building high profile editorial sitesYann Malet
 
Oracle Database 12c "New features"
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features" Anar Godjaev
 
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...Command Prompt., Inc
 
MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011Mike Willbanks
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterSrihari Sriraman
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
 
The Magic of Hot Streaming Replication, Bruce Momjian
The Magic of Hot Streaming Replication, Bruce MomjianThe Magic of Hot Streaming Replication, Bruce Momjian
The Magic of Hot Streaming Replication, Bruce MomjianFuenteovejuna
 
还原Oracle中真实的cache recovery
还原Oracle中真实的cache recovery还原Oracle中真实的cache recovery
还原Oracle中真实的cache recoverymaclean liu
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Introduction to XtraDB Cluster
Introduction to XtraDB ClusterIntroduction to XtraDB Cluster
Introduction to XtraDB Clusteryoku0825
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualizationFranck Pachot
 
pg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL Performancepg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL PerformanceSean Chittenden
 
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...NETWAYS
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Lviv Startup Club
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Prajal Kulkarni
 

Similar to Walbouncer: Filtering PostgreSQL transaction log (20)

Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptxBuilt-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
Built-in-Physical-and-Logical-Replication-in-Postgresql-Firat-Gulec.pptx
 
Oracle NOLOGGING
Oracle NOLOGGINGOracle NOLOGGING
Oracle NOLOGGING
 
Built in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat GulecBuilt in physical and logical replication in postgresql-Firat Gulec
Built in physical and logical replication in postgresql-Firat Gulec
 
Challenges when building high profile editorial sites
Challenges when building high profile editorial sitesChallenges when building high profile editorial sites
Challenges when building high profile editorial sites
 
Performance
PerformancePerformance
Performance
 
Oracle Database 12c "New features"
Oracle Database 12c "New features" Oracle Database 12c "New features"
Oracle Database 12c "New features"
 
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
Building tungsten-clusters-with-postgre sql-hot-standby-and-streaming-replica...
 
MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011MNPHP Scalable Architecture 101 - Feb 3 2011
MNPHP Scalable Architecture 101 - Feb 3 2011
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL Cluster
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
The Magic of Hot Streaming Replication, Bruce Momjian
The Magic of Hot Streaming Replication, Bruce MomjianThe Magic of Hot Streaming Replication, Bruce Momjian
The Magic of Hot Streaming Replication, Bruce Momjian
 
还原Oracle中真实的cache recovery
还原Oracle中真实的cache recovery还原Oracle中真实的cache recovery
还原Oracle中真实的cache recovery
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Introduction to XtraDB Cluster
Introduction to XtraDB ClusterIntroduction to XtraDB Cluster
Introduction to XtraDB Cluster
 
Postgres clusters
Postgres clustersPostgres clusters
Postgres clusters
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
 
pg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL Performancepg_prefaulter: Scaling WAL Performance
pg_prefaulter: Scaling WAL Performance
 
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
OSBConf 2015 | Using aws virtual tape library as storage for bacula bareos by...
 
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
Artur Fejklowicz - “Data Lake architecture” AI&BigDataDay 2017
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 

Recently uploaded

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 

Recently uploaded (20)

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 

Walbouncer: Filtering PostgreSQL transaction log

  • 1. walbouncer: Filtering WAL Hans-J¨urgen Sch¨onig www.postgresql-support.de Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 2. walbouncer: Filtering WAL Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 3. Current status WAL streaming is the basis for replication Limitations: Currently an entire database instance has to be replicated There is no way to replicate single databases WAL used to be hard to read Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 4. The goal Create a “WAL-server” to filter the transaction log Put walbouncer between the PostgreSQL master and the “partial” slave Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 5. How is it done? The basic structure of the WAL is very actually fairly nice to filter Each WAL record that accesses a database has a RelFileNode: database OID tablespace OID data file OID What more do we need? Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 7. WAL format WAL logically is a stream of records. Each record is identified by position in this stream. WAL is stored in 16MB files called segments. Each segment is composed of 8KB WAL pages. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 8. WAL pages Wal pages have a header containing: 16bit “magic” value Flag bits Timeline ID Xlog position of this page Length of data remaining from last record on previous page Additionally first page of each segment has the following information for correctness validation: System identifier WAL segment and block sizes Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 9. Xlog record structre Total length of record Transaction ID that produced the record Length of record specific data excluding header and backup blocks Flags Record type (e.g. Xlog checkpoint, transaction commit, btree insert) Start position of previous record Checksum of this record Record specific data Full page images Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 10. Handling WAL positions WAL positions are highly critical WAL addresses must not be changed addresses are stored in data page headers to decide if replay is necessary. The solution: inject dummy records into the WAL Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 11. Dummy records PostgreSQL has infrastructure for dummy WAL entries (basically “zero” values) Valid WAL records can therefore be replaced with dummy ones quite easily. The slave will consume and ignore them Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 12. Question: What to filter? What about the shared catalog? We got to replicate the shared catalog This has some consequences: The catalog might think that a database called X is around but in fact files are missing. This is totally desirable Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 13. Catalog issues There is no reasonably simple way to filter the content of the shared catalog (skip rows or so). It is hardly possible to add semantics to the filtering But, this should be fine for most users If you want to access a missing element, you will simply get an error (missing file, etc.). Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 15. Streaming replication Replication connections use the same wire protocol as regular client connections. However they are serviced by special backend processes called walsenders Walsenders are started by including “replication=true” in the startup packet. Replication connections support different commands from a regular backend. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 16. Starting up replication 1/2 When a slave connects it first executes IDENTIFY SYSTEM postgres=# IDENTIFY_SYSTEM; systemid | timeline | xlogpos | dbname ---------------------+----------+-----------+-------- 6069865756165247251 | 2 | 0/3B7E910 | Then any necessary timeline history files are fetched: postgres=# TIMELINE_HISTORY 2; filename | content ------------------+---------------------------------- 00000002.history | 1 0/3000090 no recovery target specified+ | Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 17. Starting up replication 2/2 Streaming of writeahead log is started by executing: START_REPLICATION [SLOT slot_name] [PHYSICAL] XXX/XXX [TIMELINE tli] START REPLICATION switches the connection to a bidirectional COPY mode. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 18. Replication messages Replication messages are embedded in protocol level copy messages. 4 types of messages: XLogData (server -> client) Keepalive message (server -> client) Standby status update (client -> server) Hot Standby feedback (client -> server) To end replication either end can close the copying with a CopyDone protocol message. If the WAL stream was from an old timeline the server sends a result set with the next timeline ID and start position upon completion. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 19. Other replication commands Other replication commands: START REPLICATION ... LOGICAL ... CREATE REPLICATION SLOT DROP REPLICATION SLOT BASE BACKUP Not supported by walbouncer yet. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 20. Software design Client connects to the WAL bouncer instead of the master. WAL bouncer forks, connects to the master and streams xlog to the client. A this point the WAL proxy does not buffer stuff. One connection to the master per client. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 21. Software design - filtering WAL stream is split into records. Splitting works as a state machine consuming input and immediately forwarding it. We only buffer when we need to wait for a RelFileNode to decide whether we need to filter. Based on record type we extract the RelFileNode from the WAL record and decide if we want to filter it out. If we want to filter we replace the record with a XLOG NOOP that has the same total length. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 22. Synchronizing record decoding Starting streaming is hard. Client may request streaming from the middle of a record. We have a state machine for synchronization. Determine if we are in a continuation record from WAL page header. If we are, stream data until we have buffered the next record header. From next record header we read the previous record link, then restart decoding from that position. Once we hit the requested position stream the filtered data out to client. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 23. Using WAL bouncer Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 24. Streaming At this point we did not want to take the complexity of implementing buffering. Getting right what we got is already an important step Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 25. How to set things up First f all an initial base backup is needed The easiest thing here is rsync Skip all directories in “base”, which do not belong to your desired setup pg database is needed to figure out, what to skip. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 26. Start streaming Once you got the initial copy, setup streaming replication just as if you had a “normal” master. Use the address of the walbouncer in your primary conninfo You will not notice any difference during replication Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 27. Important aspects If you use walbouncer use the usual streaming replication precautions enough wal keep segments or use take care of conflicts (hot standby feedback) etc. You cannot promote a slave that has filtered data to master. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 28. Adding databases List of database OIDs to filter out is only fetched at walbouncer backend startup. Disconnect any streaming slaves and reconfigure filtering while you are creating the database. If you try to do it online you the slaves will not know to filter the new database. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 29. Dropping databases Have all slaves that want to filter out the dropped database actively streaming before you execute the drop. Otherwise the slaves will not know to skip the drop record and xlog replay will fail with an error. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 30. Can we filter on individual tables For filtering individual tables you can use tablespace filtering functionality. Same caveats apply for adding-removing tablespaces as for databases. You can safely add/remove tables in filtered tablespaces and even move tables between filtered/non-filtered tablespaces. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 31. Bonus feature You can use walbouncer to switch the master server without restarting the slave. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 32. Limitations Currently PostgreSQL 9.4 only. No SSL support yet. Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 34. A sample config file (1) listen_port: 5433 master: host: localhost port: 5432 configurations: - slave1: match: application_name: slave1 filter: include_tablespaces: [spc_slave1] exclude_databases: [test] Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 35. A sample config file (2) - slave2: match: application_name: slave2 filter: include_tablespaces: [spc_slave2] Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 36. application name The application name is needed to support synchronous replication as well as better monitoring Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 37. include tablespaces / exclude tablespaces A good option to exclude entire groups of databases In a perverted way this can be used to filter on tables No need to mention that you should not Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 39. Where can we download the stuff? For download and more information visit:: www.postgresql-support.de/walbouncer/ Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 40. Thank you for your attention Any questions? Hans-J¨urgen Sch¨onig www.postgresql-support.de
  • 41. Contact Cybertec Sch¨onig & Sch¨onig GmbH Gr¨ohrm¨uhlgasse 26 A-2700 Wiener Neustadt www.postgresql-support.de Hans-J¨urgen Sch¨onig www.postgresql-support.de