SlideShare a Scribd company logo
pg chameleon
MySQL to PostgreSQL replica
Federico Campoli
Transferwise
30 May 2017
Federico Campoli (Transferwise) pg chameleon 30 May 2017 1 / 49
Few words about the speaker
Born in 1972
Passionate about IT since 1982
Federico Campoli (Transferwise) pg chameleon 30 May 2017 2 / 49
Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Joined the Oracle DBA secret society in 2004
In love with PostgreSQL since 2006
Currently runs the Brighton PostgreSQL User group
Works at Transferwise as Data Engineer
Federico Campoli (Transferwise) pg chameleon 30 May 2017 2 / 49
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 3 / 49
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 4 / 49
The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL
Federico Campoli (Transferwise) pg chameleon 30 May 2017 5 / 49
The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL
The script is in python 2.6
It’s a monolith script
And it’s slow, very slow
Federico Campoli (Transferwise) pg chameleon 30 May 2017 5 / 49
The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL
The script is in python 2.6
It’s a monolith script
And it’s slow, very slow
It’s a good checklist for things to avoid when coding
https://github.com/the4thdoctor/neo my2pg
Federico Campoli (Transferwise) pg chameleon 30 May 2017 5 / 49
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
Federico Campoli (Transferwise) pg chameleon 30 May 2017 6 / 49
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
It was built during the years of the roller coaster
Therefore it was a just a way to discharge frustration
Abandoned after a while
Federico Campoli (Transferwise) pg chameleon 30 May 2017 6 / 49
I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
It was built during the years of the roller coaster
Therefore it was a just a way to discharge frustration
Abandoned after a while
SQLAlchemy’s limitations were frustrating as well
And there was already pgloader doing the same job
Federico Campoli (Transferwise) pg chameleon 30 May 2017 6 / 49
pg chameleon reborn
Year 2016
I revamped the project because I needed to replicate the data data from MySQL
to PostgreSQL.
And the library python-mysql-replication looked very promising for reading the
mysql replica protocol.
Trying won’t harm they said.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 7 / 49
pg chameleon v1
Compatible with CPython 2.7/3.3+
Removed SQLAlchemy
Replaced the mysqldb driver with PyMySQL
Added a command line helper
Installs in virtualenv and system wide
Shipped via pypi for easy installation
Federico Campoli (Transferwise) pg chameleon 30 May 2017 8 / 49
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 9 / 49
MySQL Replica
MySQL replica is logical
When configured the data changes are stored in the master’s binary log files
The slave gets from the master the data changes
The data changes are saved in the slave’s relay logs
The relay logs are used to replay the data in the slave
Federico Campoli (Transferwise) pg chameleon 30 May 2017 10 / 49
Log formats
STATEMENT: It logs the statements which are replayed on the slave.
It’s the best solution for performance, however when replaying statements
with not deterministic functions this format generates different values o the
slave (e.g. using an insert wit the uuid function).
ROW: It’s deterministic. This format logs the row image and the DDL
queries.
This format is compulsory for using pg chameleon.
MIXED takes the best of both worlds. The master logs the statements unless
a not deterministic function is used. In that case it logs the row image.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 11 / 49
MySQL Replica
Federico Campoli (Transferwise) pg chameleon 30 May 2017 12 / 49
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 13 / 49
pg chameleon
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
Federico Campoli (Transferwise) pg chameleon 30 May 2017 14 / 49
MySQL replica + pg chameleon
Federico Campoli (Transferwise) pg chameleon 30 May 2017 15 / 49
Features
Read the schema and data from MySQL and restore it into a target
PostgreSQL schema
Setup PostgreSQL to act as a MySQL slave
Basic DDL Support (CREATE/DROP/ALTER TABLE, DROP PRIMARY
KEY/TRUNCATE)
Handles the rubbish data coming from the replica stream and saves the
problematic rows in sch chameleon.t discarded rows
Supports multiple MySQL sources for replica
There is a basic replica monitoring
Can detach replica from MySQL leaving PostgreSQL ready to work as
standalone server
Federico Campoli (Transferwise) pg chameleon 30 May 2017 16 / 49
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 17 / 49
MySQL configuration
The mysql configuration file is usually stored in /etc/mysql/my.cnf
To enable the binary logging find the section [mysqld] and check the following
parameters are set.
binlog_format= ROW
log-bin = mysql-bin
server-id = 1
binlog-row-image = FULL
Federico Campoli (Transferwise) pg chameleon 30 May 2017 18 / 49
MySQL user for replica
Setup a replication user on MySQL
CREATE USER usr_replica ;
SET PASSWORD FOR usr_replica =PASSWORD(’replica ’);
GRANT ALL ON sakila .* TO ’usr_replica ’;
GRANT RELOAD ON *.* to ’usr_replica ’;
GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’;
GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’;
FLUSH PRIVILEGES;
In our example we are using the sakila test database.
https://dev.mysql.com/doc/sakila/en/
Federico Campoli (Transferwise) pg chameleon 30 May 2017 19 / 49
PostgreSQL setup
Add an user on PostgreSQL capable to create schemas and relations in the
destination database
CREATE USER usr_replica WITH PASSWORD ’replica ’;
CREATE DATABASE db_replica WITH OWNER usr_replica;
Federico Campoli (Transferwise) pg chameleon 30 May 2017 20 / 49
Install pg chameleon
The simplest way to install pg chameleon is with a virtual environment. However
if you have root access on your system the installation can be system wide.
It’s important to upgrade pip before installing the package.
python3 -m venv venv
source venv/bin/activate
pip install pip --upgrade
pip install pg_chameleon
Execute chameleon.py to create the configuration directory
$HOME/.pg chameleon/
chameleon.py
Federico Campoli (Transferwise) pg chameleon 30 May 2017 21 / 49
Replica setup
cd in $HOME/.pg chameleon/ and copy config-yaml.example to default.yaml then
edit the file adding the connection settings and the source and destination
schemas.
my_database: sakila
pg_database: db_replica
dest_schema: ’my_schema’
mysql_conn:
host: derpy
port: 3306
user: usr_replica
passwd: replica
pg_conn:
host: derpy
port: 5432
user: usr_replica
password: replica
Federico Campoli (Transferwise) pg chameleon 30 May 2017 22 / 49
Init replica
Activate the virtualenv and run
chameleon.py create_schema --config default
chameleon.py add_source --config default
chameleon.py init_replica --config default
Wait for the init replica to complete. If the database is large consider running the
init replica in a screen or tmux session.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 23 / 49
Start replica
Start the replica with
chameleon.py start_replica --config default
Federico Campoli (Transferwise) pg chameleon 30 May 2017 24 / 49
Project structure
project directory
scripts
chameleon.py
pg chameleon
lib
global lib.py
mysql lib.py
pg lib.py
sqlutil lib.py
Federico Campoli (Transferwise) pg chameleon 30 May 2017 25 / 49
chameleon.py
Command line wrapper
Use argparse to execute the commands
Supports several commands
Federico Campoli (Transferwise) pg chameleon 30 May 2017 26 / 49
chameleon.py
Command line wrapper
Use argparse to execute the commands
Supports several commands
After the installation executing chameleon.py creates the configuration directory
$HOME/.pg chameleon/ with three subdirectories
pid
logs
config
Federico Campoli (Transferwise) pg chameleon 30 May 2017 26 / 49
chameleon.py
Commands
drop schema Drops the service schema sch chameleon with cascade option.
create schema Create the service schema sch chameleon.
upgrade schema Upgrade an existing schema sch chameleon to an newer
version.
init replica Creates the table structure from the mysql to PostgreSQL. The
mysql tables are locked in read only mode and the data is copied into the
PostgreSQL database. The master’s coordinates are stored in the
PostgreSQL replica catalogue.
start replica Starts the replication from mysql to PostgreSQL using the
master data stored in sch chameleon.t replica batch. The master’s position is
updated when a new batch is processed.
list config List the available configurations and their status (’ready’,
’initialising’,’initialised’,’stopped’,’running’,’error’)
Federico Campoli (Transferwise) pg chameleon 30 May 2017 27 / 49
chameleon.py
add source register a new configuration file as source
drop source remove the configuration from the registered sources
stop replica ends the replica process gracefully
disable replica ends the replica process and disable the restart
enable replica enable the replica process
sync replica sync the data between mysql and postgresql without dropping
the tables
show status displays the replication status for each source, with the lag in
seconds and the last received event
detach replica stops the replica stream, discards the replica setup and resets
the sequences in PostgreSQL to work as a standalone db.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 28 / 49
global lib.py
class global config: loads the configuration parameters into the class
attributes
class replica engine: wraps the classes mysql engine and pgsql engine and
setup the logging method. The global config instance is used to track the
configuration settings.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 29 / 49
mysql lib.py
class mysql connection: connects to mysql using the parameters provided by
replica engine
class mysql engine: does all the magic for the replication setup and execution
Federico Campoli (Transferwise) pg chameleon 30 May 2017 30 / 49
pg lib.py
class pg encoder: extends the class JSON and adds some special handling for
types like decimal and datetime
class pgsql connection: connects to the PostgreSQL database
class pgsql engine: does all the magic for rebuilding the data structure,
loading data and migrating the schema
Federico Campoli (Transferwise) pg chameleon 30 May 2017 31 / 49
sqlutil lib.py
Consists in just one class sql token which tokenise the mysql queries using the
regular expressions.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 32 / 49
sqlutil lib.py
Consists in just one class sql token which tokenise the mysql queries using the
regular expressions.
Yes, I have two problems now!
Federico Campoli (Transferwise) pg chameleon 30 May 2017 32 / 49
Limitations
Tables for being replicated require primary keys
No Daemonisation
Binary data are hexified to avoid issues with PostgreSQL
Federico Campoli (Transferwise) pg chameleon 30 May 2017 33 / 49
The future
pg chameleon v2 development is already started. The first alpha will come out
soon.
The new version is a reorganisation of the version 1 with several improvements.
Reorganised configuration files
Background copy with parallel processes
Separate daemon for read and replay
Improved monitoring
Python 3 only
Federico Campoli (Transferwise) pg chameleon 30 May 2017 34 / 49
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 35 / 49
init replica tune
The replica initialisation required several improvements.
The OOM killer is always happy to kill processes using large amount of
memory
Using a general slice size doesn’t work well because with large rows the
process crashes
Estimating the total rows for user’s feedback is faster but the output can be
odd.
Using not buffered cursors improves the speed and the memory usage.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 36 / 49
Strictness is an illusion. MySQL doubly so
MySQL’s lack of strictness is not a mystery.
The funny way the default with NOT NULL is managed by MySQL can break the
replica.
Therefore any field with NOT NULL added after the initialisation are created
always as NULLable in PostgreSQL.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 37 / 49
I feel your lack of constraint disturbing
Rubbish data in MySQL can be stored without errors raised by the DBMS.
When this happens the replicator traps the error when the change is replayed on
PostgreSQL and discards the problematic row.
The value is stored hexified in the table t discarded rows for later analysis.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 38 / 49
Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 39 / 49
Igor, the green little guy
The chameleon logo has been developed by Elena Toma, a talented Italian Lady.
https://www.facebook.com/Tonkipapperoart/
The name Igor is inspired by Martin Feldman’s Igor portraited in Young
Frankenstein movie.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 40 / 49
Some numbers
Lines of code
global lib.py 327
mysql lib.py 401
pg lib.py 670
sql util.py 228
chameleon.py 58
Total lines of code 1684
Federico Campoli (Transferwise) pg chameleon 30 May 2017 41 / 49
pg chameleon’s license
2 clause BSD License
Copyright (c) 2016,2017 Federico Campoli
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 42 / 49
Feedback please!
Please report any issue on github!
https://github.com/the4thdoctor/pg chameleon
Federico Campoli (Transferwise) pg chameleon 30 May 2017 43 / 49
Boring legal stuff
MySQL Image source WikiCommons
Hard Disk image source WikiCommons
Tron image source Tron Wikia
Federico Campoli (Transferwise) pg chameleon 30 May 2017 44 / 49
Did you say hire?
WE ARE HIRING!
https://transferwise.com/jobs/
Federico Campoli (Transferwise) pg chameleon 30 May 2017 45 / 49
Questions
Any questions?
Federico Campoli (Transferwise) pg chameleon 30 May 2017 46 / 49
Contacts and license
Twitter: 4thdoctor scarf
Blog:http://www.pgdba.co.uk
Brighton PostgreSQL Meetup:
http://www.meetup.com/Brighton-PostgreSQL-Meetup/
This document is distributed under the terms of the Creative Commons
Federico Campoli (Transferwise) pg chameleon 30 May 2017 47 / 49
pg chameleon
MySQL to PostgreSQL replica
Federico Campoli
Transferwise
30 May 2017
Federico Campoli (Transferwise) pg chameleon 30 May 2017 48 / 49

More Related Content

What's hot

Life on a_rollercoaster
Life on a_rollercoasterLife on a_rollercoaster
Life on a_rollercoaster
Federico Campoli
 
a look at the postgresql engine
a look at the postgresql enginea look at the postgresql engine
a look at the postgresql engine
Federico Campoli
 
Pg big fast ugly acid
Pg big fast ugly acidPg big fast ugly acid
Pg big fast ugly acid
Federico Campoli
 
The hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLThe hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQL
Federico Campoli
 
A couple of things about PostgreSQL...
A couple of things  about PostgreSQL...A couple of things  about PostgreSQL...
A couple of things about PostgreSQL...
Federico Campoli
 
Don't panic! - Postgres introduction
Don't panic! - Postgres introductionDon't panic! - Postgres introduction
Don't panic! - Postgres introduction
Federico Campoli
 
JPA Week5. Join Fetch
JPA Week5. Join FetchJPA Week5. Join Fetch
JPA Week5. Join Fetch
Covenant Ko
 
[로켓 자바] Part 1 성능 튜닝 마인드 확립
[로켓 자바] Part 1 성능 튜닝 마인드 확립[로켓 자바] Part 1 성능 튜닝 마인드 확립
[로켓 자바] Part 1 성능 튜닝 마인드 확립
Covenant Ko
 
JPA Week3 Entity Mapping / Hexagonal Architecture
JPA Week3 Entity Mapping / Hexagonal ArchitectureJPA Week3 Entity Mapping / Hexagonal Architecture
JPA Week3 Entity Mapping / Hexagonal Architecture
Covenant Ko
 
Our answer to Uber
Our answer to UberOur answer to Uber
Our answer to Uber
Alexander Korotkov
 
GREAT STEP 1. 테스트 코드를 향한 위대한 발걸음
GREAT STEP 1. 테스트 코드를 향한 위대한 발걸음GREAT STEP 1. 테스트 코드를 향한 위대한 발걸음
GREAT STEP 1. 테스트 코드를 향한 위대한 발걸음
Covenant Ko
 
In-memory OLTP storage with persistence and transaction support
In-memory OLTP storage with persistence and transaction supportIn-memory OLTP storage with persistence and transaction support
In-memory OLTP storage with persistence and transaction support
Alexander Korotkov
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionNot Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Paris Carbone
 
Velox: Models in Action
Velox: Models in ActionVelox: Models in Action
Velox: Models in Action
Dan Crankshaw
 
JPA 스터디 Week1 - 하이버네이트, 캐시
JPA 스터디 Week1 - 하이버네이트, 캐시JPA 스터디 Week1 - 하이버네이트, 캐시
JPA 스터디 Week1 - 하이버네이트, 캐시
Covenant Ko
 
Week2 아주 작은 빈 이야기
Week2 아주 작은 빈 이야기Week2 아주 작은 빈 이야기
Week2 아주 작은 빈 이야기
Covenant Ko
 
떠먹는 '오브젝트' Ch05 책임 할당하기
떠먹는 '오브젝트' Ch05 책임 할당하기떠먹는 '오브젝트' Ch05 책임 할당하기
떠먹는 '오브젝트' Ch05 책임 할당하기
Covenant Ko
 
Week7 bean life cycle
Week7 bean life cycleWeek7 bean life cycle
Week7 bean life cycle
Covenant Ko
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
Robert Metzger
 
JPA 스터디 Week2 - Object Relational Mapping
JPA 스터디 Week2 - Object Relational MappingJPA 스터디 Week2 - Object Relational Mapping
JPA 스터디 Week2 - Object Relational Mapping
Covenant Ko
 

What's hot (20)

Life on a_rollercoaster
Life on a_rollercoasterLife on a_rollercoaster
Life on a_rollercoaster
 
a look at the postgresql engine
a look at the postgresql enginea look at the postgresql engine
a look at the postgresql engine
 
Pg big fast ugly acid
Pg big fast ugly acidPg big fast ugly acid
Pg big fast ugly acid
 
The hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQLThe hitchhiker's guide to PostgreSQL
The hitchhiker's guide to PostgreSQL
 
A couple of things about PostgreSQL...
A couple of things  about PostgreSQL...A couple of things  about PostgreSQL...
A couple of things about PostgreSQL...
 
Don't panic! - Postgres introduction
Don't panic! - Postgres introductionDon't panic! - Postgres introduction
Don't panic! - Postgres introduction
 
JPA Week5. Join Fetch
JPA Week5. Join FetchJPA Week5. Join Fetch
JPA Week5. Join Fetch
 
[로켓 자바] Part 1 성능 튜닝 마인드 확립
[로켓 자바] Part 1 성능 튜닝 마인드 확립[로켓 자바] Part 1 성능 튜닝 마인드 확립
[로켓 자바] Part 1 성능 튜닝 마인드 확립
 
JPA Week3 Entity Mapping / Hexagonal Architecture
JPA Week3 Entity Mapping / Hexagonal ArchitectureJPA Week3 Entity Mapping / Hexagonal Architecture
JPA Week3 Entity Mapping / Hexagonal Architecture
 
Our answer to Uber
Our answer to UberOur answer to Uber
Our answer to Uber
 
GREAT STEP 1. 테스트 코드를 향한 위대한 발걸음
GREAT STEP 1. 테스트 코드를 향한 위대한 발걸음GREAT STEP 1. 테스트 코드를 향한 위대한 발걸음
GREAT STEP 1. 테스트 코드를 향한 위대한 발걸음
 
In-memory OLTP storage with persistence and transaction support
In-memory OLTP storage with persistence and transaction supportIn-memory OLTP storage with persistence and transaction support
In-memory OLTP storage with persistence and transaction support
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionNot Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
 
Velox: Models in Action
Velox: Models in ActionVelox: Models in Action
Velox: Models in Action
 
JPA 스터디 Week1 - 하이버네이트, 캐시
JPA 스터디 Week1 - 하이버네이트, 캐시JPA 스터디 Week1 - 하이버네이트, 캐시
JPA 스터디 Week1 - 하이버네이트, 캐시
 
Week2 아주 작은 빈 이야기
Week2 아주 작은 빈 이야기Week2 아주 작은 빈 이야기
Week2 아주 작은 빈 이야기
 
떠먹는 '오브젝트' Ch05 책임 할당하기
떠먹는 '오브젝트' Ch05 책임 할당하기떠먹는 '오브젝트' Ch05 책임 할당하기
떠먹는 '오브젝트' Ch05 책임 할당하기
 
Week7 bean life cycle
Week7 bean life cycleWeek7 bean life cycle
Week7 bean life cycle
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
 
JPA 스터디 Week2 - Object Relational Mapping
JPA 스터디 Week2 - Object Relational MappingJPA 스터디 Week2 - Object Relational Mapping
JPA 스터디 Week2 - Object Relational Mapping
 

Similar to pg_chameleon a MySQL to PostgreSQL replica

Pg chameleon, mysql to postgresql replica made easy
Pg chameleon, mysql to postgresql replica made easyPg chameleon, mysql to postgresql replica made easy
Pg chameleon, mysql to postgresql replica made easy
Federico Campoli
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
Ian Ozsvald
 
MySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a NutshellMySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a Nutshell
Frederic Descamps
 
MySQL Group Replicatio in a nutshell - MySQL InnoDB Cluster
MySQL Group Replicatio  in a nutshell - MySQL InnoDB ClusterMySQL Group Replicatio  in a nutshell - MySQL InnoDB Cluster
MySQL Group Replicatio in a nutshell - MySQL InnoDB Cluster
Frederic Descamps
 
Pypy is-it-ready-for-production-the-sequel
Pypy is-it-ready-for-production-the-sequelPypy is-it-ready-for-production-the-sequel
Pypy is-it-ready-for-production-the-sequel
Mark Rees
 
The PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
The PHP mysqlnd plugin talk - plugins an alternative to MySQL ProxyThe PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
The PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
Ulf Wendel
 
Percona Live 2022 - PBM - The Backup Open Source Tool for MongoDB
Percona Live 2022 - PBM - The Backup Open Source Tool for MongoDBPercona Live 2022 - PBM - The Backup Open Source Tool for MongoDB
Percona Live 2022 - PBM - The Backup Open Source Tool for MongoDB
Jean Da Silva
 
OSDC 2017 | Mgmt Config: Autonomous systems by James Shubin
OSDC 2017 | Mgmt Config: Autonomous systems by James ShubinOSDC 2017 | Mgmt Config: Autonomous systems by James Shubin
OSDC 2017 | Mgmt Config: Autonomous systems by James Shubin
NETWAYS
 
OSDC 2017 - James Shubin - MGMT config autonomous systems
OSDC 2017 - James Shubin - MGMT config autonomous systemsOSDC 2017 - James Shubin - MGMT config autonomous systems
OSDC 2017 - James Shubin - MGMT config autonomous systems
NETWAYS
 
My works in gitub, etc.
My works in gitub, etc.My works in gitub, etc.
My works in gitub, etc.
孝好 飯塚
 
OpenNebula, the foreman and CentOS play nice, too
OpenNebula, the foreman and CentOS play nice, tooOpenNebula, the foreman and CentOS play nice, too
OpenNebula, the foreman and CentOS play nice, too
inovex GmbH
 
Python And My Sq Ldb Module
Python And My Sq Ldb ModulePython And My Sq Ldb Module
Python And My Sq Ldb Module
AkramWaseem
 
Getting Modern With MySQL
Getting Modern With MySQLGetting Modern With MySQL
Getting Modern With MySQL
All Things Open
 
Getting modern with my sql
Getting modern with my sqlGetting modern with my sql
Getting modern with my sql
Jakob Lorberblatt
 
Applying profilers to my sql (fosdem 2017)
Applying profilers to my sql (fosdem 2017)Applying profilers to my sql (fosdem 2017)
Applying profilers to my sql (fosdem 2017)
Valeriy Kravchuk
 
2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and models2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and models
Ge Org
 
Android Custom Kernel/ROM design
Android Custom Kernel/ROM designAndroid Custom Kernel/ROM design
Android Custom Kernel/ROM design
Muhammad Najmi Ahmad Zabidi
 
Lightweight APIs in mRuby
Lightweight APIs in mRubyLightweight APIs in mRuby
Lightweight APIs in mRuby
Pivorak MeetUp
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn Tutorial
Kenny Gryp
 
MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!
Frederic Descamps
 

Similar to pg_chameleon a MySQL to PostgreSQL replica (20)

Pg chameleon, mysql to postgresql replica made easy
Pg chameleon, mysql to postgresql replica made easyPg chameleon, mysql to postgresql replica made easy
Pg chameleon, mysql to postgresql replica made easy
 
Euro python2011 High Performance Python
Euro python2011 High Performance PythonEuro python2011 High Performance Python
Euro python2011 High Performance Python
 
MySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a NutshellMySQL InnoDB Cluster and Group Replication in a Nutshell
MySQL InnoDB Cluster and Group Replication in a Nutshell
 
MySQL Group Replicatio in a nutshell - MySQL InnoDB Cluster
MySQL Group Replicatio  in a nutshell - MySQL InnoDB ClusterMySQL Group Replicatio  in a nutshell - MySQL InnoDB Cluster
MySQL Group Replicatio in a nutshell - MySQL InnoDB Cluster
 
Pypy is-it-ready-for-production-the-sequel
Pypy is-it-ready-for-production-the-sequelPypy is-it-ready-for-production-the-sequel
Pypy is-it-ready-for-production-the-sequel
 
The PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
The PHP mysqlnd plugin talk - plugins an alternative to MySQL ProxyThe PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
The PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
 
Percona Live 2022 - PBM - The Backup Open Source Tool for MongoDB
Percona Live 2022 - PBM - The Backup Open Source Tool for MongoDBPercona Live 2022 - PBM - The Backup Open Source Tool for MongoDB
Percona Live 2022 - PBM - The Backup Open Source Tool for MongoDB
 
OSDC 2017 | Mgmt Config: Autonomous systems by James Shubin
OSDC 2017 | Mgmt Config: Autonomous systems by James ShubinOSDC 2017 | Mgmt Config: Autonomous systems by James Shubin
OSDC 2017 | Mgmt Config: Autonomous systems by James Shubin
 
OSDC 2017 - James Shubin - MGMT config autonomous systems
OSDC 2017 - James Shubin - MGMT config autonomous systemsOSDC 2017 - James Shubin - MGMT config autonomous systems
OSDC 2017 - James Shubin - MGMT config autonomous systems
 
My works in gitub, etc.
My works in gitub, etc.My works in gitub, etc.
My works in gitub, etc.
 
OpenNebula, the foreman and CentOS play nice, too
OpenNebula, the foreman and CentOS play nice, tooOpenNebula, the foreman and CentOS play nice, too
OpenNebula, the foreman and CentOS play nice, too
 
Python And My Sq Ldb Module
Python And My Sq Ldb ModulePython And My Sq Ldb Module
Python And My Sq Ldb Module
 
Getting Modern With MySQL
Getting Modern With MySQLGetting Modern With MySQL
Getting Modern With MySQL
 
Getting modern with my sql
Getting modern with my sqlGetting modern with my sql
Getting modern with my sql
 
Applying profilers to my sql (fosdem 2017)
Applying profilers to my sql (fosdem 2017)Applying profilers to my sql (fosdem 2017)
Applying profilers to my sql (fosdem 2017)
 
2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and models2018 data engineering for ml asset management for features and models
2018 data engineering for ml asset management for features and models
 
Android Custom Kernel/ROM design
Android Custom Kernel/ROM designAndroid Custom Kernel/ROM design
Android Custom Kernel/ROM design
 
Lightweight APIs in mRuby
Lightweight APIs in mRubyLightweight APIs in mRuby
Lightweight APIs in mRuby
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn Tutorial
 
MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!MySQL Document Store - when SQL & NoSQL live together... in peace!
MySQL Document Store - when SQL & NoSQL live together... in peace!
 

Recently uploaded

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 

Recently uploaded (20)

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 

pg_chameleon a MySQL to PostgreSQL replica

  • 1. pg chameleon MySQL to PostgreSQL replica Federico Campoli Transferwise 30 May 2017 Federico Campoli (Transferwise) pg chameleon 30 May 2017 1 / 49
  • 2. Few words about the speaker Born in 1972 Passionate about IT since 1982 Federico Campoli (Transferwise) pg chameleon 30 May 2017 2 / 49
  • 3. Few words about the speaker Born in 1972 Passionate about IT since 1982 mostly because of the TRON movie Joined the Oracle DBA secret society in 2004 In love with PostgreSQL since 2006 Currently runs the Brighton PostgreSQL User group Works at Transferwise as Data Engineer Federico Campoli (Transferwise) pg chameleon 30 May 2017 2 / 49
  • 4. Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 A chameleon in the middle 4 Replica in action 5 Lessons learned 6 Wrap up Federico Campoli (Transferwise) pg chameleon 30 May 2017 3 / 49
  • 5. Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 A chameleon in the middle 4 Replica in action 5 Lessons learned 6 Wrap up Federico Campoli (Transferwise) pg chameleon 30 May 2017 4 / 49
  • 6. The beginnings Years 2006/2012 neo my2pg.py I wrote the script because of a struggling phpbb on MySQL The database migration was successful However phpbb didn’t work very well with PostgreSQL Federico Campoli (Transferwise) pg chameleon 30 May 2017 5 / 49
  • 7. The beginnings Years 2006/2012 neo my2pg.py I wrote the script because of a struggling phpbb on MySQL The database migration was successful However phpbb didn’t work very well with PostgreSQL The script is in python 2.6 It’s a monolith script And it’s slow, very slow Federico Campoli (Transferwise) pg chameleon 30 May 2017 5 / 49
  • 8. The beginnings Years 2006/2012 neo my2pg.py I wrote the script because of a struggling phpbb on MySQL The database migration was successful However phpbb didn’t work very well with PostgreSQL The script is in python 2.6 It’s a monolith script And it’s slow, very slow It’s a good checklist for things to avoid when coding https://github.com/the4thdoctor/neo my2pg Federico Campoli (Transferwise) pg chameleon 30 May 2017 5 / 49
  • 9. I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 Used SQLAlchemy for extracting the MySQL’s metadata Proof of concept only Federico Campoli (Transferwise) pg chameleon 30 May 2017 6 / 49
  • 10. I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 Used SQLAlchemy for extracting the MySQL’s metadata Proof of concept only It was built during the years of the roller coaster Therefore it was a just a way to discharge frustration Abandoned after a while Federico Campoli (Transferwise) pg chameleon 30 May 2017 6 / 49
  • 11. I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 Used SQLAlchemy for extracting the MySQL’s metadata Proof of concept only It was built during the years of the roller coaster Therefore it was a just a way to discharge frustration Abandoned after a while SQLAlchemy’s limitations were frustrating as well And there was already pgloader doing the same job Federico Campoli (Transferwise) pg chameleon 30 May 2017 6 / 49
  • 12. pg chameleon reborn Year 2016 I revamped the project because I needed to replicate the data data from MySQL to PostgreSQL. And the library python-mysql-replication looked very promising for reading the mysql replica protocol. Trying won’t harm they said. Federico Campoli (Transferwise) pg chameleon 30 May 2017 7 / 49
  • 13. pg chameleon v1 Compatible with CPython 2.7/3.3+ Removed SQLAlchemy Replaced the mysqldb driver with PyMySQL Added a command line helper Installs in virtualenv and system wide Shipped via pypi for easy installation Federico Campoli (Transferwise) pg chameleon 30 May 2017 8 / 49
  • 14. Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 A chameleon in the middle 4 Replica in action 5 Lessons learned 6 Wrap up Federico Campoli (Transferwise) pg chameleon 30 May 2017 9 / 49
  • 15. MySQL Replica MySQL replica is logical When configured the data changes are stored in the master’s binary log files The slave gets from the master the data changes The data changes are saved in the slave’s relay logs The relay logs are used to replay the data in the slave Federico Campoli (Transferwise) pg chameleon 30 May 2017 10 / 49
  • 16. Log formats STATEMENT: It logs the statements which are replayed on the slave. It’s the best solution for performance, however when replaying statements with not deterministic functions this format generates different values o the slave (e.g. using an insert wit the uuid function). ROW: It’s deterministic. This format logs the row image and the DDL queries. This format is compulsory for using pg chameleon. MIXED takes the best of both worlds. The master logs the statements unless a not deterministic function is used. In that case it logs the row image. Federico Campoli (Transferwise) pg chameleon 30 May 2017 11 / 49
  • 17. MySQL Replica Federico Campoli (Transferwise) pg chameleon 30 May 2017 12 / 49
  • 18. Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 A chameleon in the middle 4 Replica in action 5 Lessons learned 6 Wrap up Federico Campoli (Transferwise) pg chameleon 30 May 2017 13 / 49
  • 19. pg chameleon pg chameleon mimics a mysql slave’s behaviour Reads the replica Stores the decoded rows into a PostgreSQL table PostgreSQL acts as relay log and replication slave A plpgSQL function decodes the rows and replay the changes Federico Campoli (Transferwise) pg chameleon 30 May 2017 14 / 49
  • 20. MySQL replica + pg chameleon Federico Campoli (Transferwise) pg chameleon 30 May 2017 15 / 49
  • 21. Features Read the schema and data from MySQL and restore it into a target PostgreSQL schema Setup PostgreSQL to act as a MySQL slave Basic DDL Support (CREATE/DROP/ALTER TABLE, DROP PRIMARY KEY/TRUNCATE) Handles the rubbish data coming from the replica stream and saves the problematic rows in sch chameleon.t discarded rows Supports multiple MySQL sources for replica There is a basic replica monitoring Can detach replica from MySQL leaving PostgreSQL ready to work as standalone server Federico Campoli (Transferwise) pg chameleon 30 May 2017 16 / 49
  • 22. Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 A chameleon in the middle 4 Replica in action 5 Lessons learned 6 Wrap up Federico Campoli (Transferwise) pg chameleon 30 May 2017 17 / 49
  • 23. MySQL configuration The mysql configuration file is usually stored in /etc/mysql/my.cnf To enable the binary logging find the section [mysqld] and check the following parameters are set. binlog_format= ROW log-bin = mysql-bin server-id = 1 binlog-row-image = FULL Federico Campoli (Transferwise) pg chameleon 30 May 2017 18 / 49
  • 24. MySQL user for replica Setup a replication user on MySQL CREATE USER usr_replica ; SET PASSWORD FOR usr_replica =PASSWORD(’replica ’); GRANT ALL ON sakila .* TO ’usr_replica ’; GRANT RELOAD ON *.* to ’usr_replica ’; GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’; GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’; FLUSH PRIVILEGES; In our example we are using the sakila test database. https://dev.mysql.com/doc/sakila/en/ Federico Campoli (Transferwise) pg chameleon 30 May 2017 19 / 49
  • 25. PostgreSQL setup Add an user on PostgreSQL capable to create schemas and relations in the destination database CREATE USER usr_replica WITH PASSWORD ’replica ’; CREATE DATABASE db_replica WITH OWNER usr_replica; Federico Campoli (Transferwise) pg chameleon 30 May 2017 20 / 49
  • 26. Install pg chameleon The simplest way to install pg chameleon is with a virtual environment. However if you have root access on your system the installation can be system wide. It’s important to upgrade pip before installing the package. python3 -m venv venv source venv/bin/activate pip install pip --upgrade pip install pg_chameleon Execute chameleon.py to create the configuration directory $HOME/.pg chameleon/ chameleon.py Federico Campoli (Transferwise) pg chameleon 30 May 2017 21 / 49
  • 27. Replica setup cd in $HOME/.pg chameleon/ and copy config-yaml.example to default.yaml then edit the file adding the connection settings and the source and destination schemas. my_database: sakila pg_database: db_replica dest_schema: ’my_schema’ mysql_conn: host: derpy port: 3306 user: usr_replica passwd: replica pg_conn: host: derpy port: 5432 user: usr_replica password: replica Federico Campoli (Transferwise) pg chameleon 30 May 2017 22 / 49
  • 28. Init replica Activate the virtualenv and run chameleon.py create_schema --config default chameleon.py add_source --config default chameleon.py init_replica --config default Wait for the init replica to complete. If the database is large consider running the init replica in a screen or tmux session. Federico Campoli (Transferwise) pg chameleon 30 May 2017 23 / 49
  • 29. Start replica Start the replica with chameleon.py start_replica --config default Federico Campoli (Transferwise) pg chameleon 30 May 2017 24 / 49
  • 30. Project structure project directory scripts chameleon.py pg chameleon lib global lib.py mysql lib.py pg lib.py sqlutil lib.py Federico Campoli (Transferwise) pg chameleon 30 May 2017 25 / 49
  • 31. chameleon.py Command line wrapper Use argparse to execute the commands Supports several commands Federico Campoli (Transferwise) pg chameleon 30 May 2017 26 / 49
  • 32. chameleon.py Command line wrapper Use argparse to execute the commands Supports several commands After the installation executing chameleon.py creates the configuration directory $HOME/.pg chameleon/ with three subdirectories pid logs config Federico Campoli (Transferwise) pg chameleon 30 May 2017 26 / 49
  • 33. chameleon.py Commands drop schema Drops the service schema sch chameleon with cascade option. create schema Create the service schema sch chameleon. upgrade schema Upgrade an existing schema sch chameleon to an newer version. init replica Creates the table structure from the mysql to PostgreSQL. The mysql tables are locked in read only mode and the data is copied into the PostgreSQL database. The master’s coordinates are stored in the PostgreSQL replica catalogue. start replica Starts the replication from mysql to PostgreSQL using the master data stored in sch chameleon.t replica batch. The master’s position is updated when a new batch is processed. list config List the available configurations and their status (’ready’, ’initialising’,’initialised’,’stopped’,’running’,’error’) Federico Campoli (Transferwise) pg chameleon 30 May 2017 27 / 49
  • 34. chameleon.py add source register a new configuration file as source drop source remove the configuration from the registered sources stop replica ends the replica process gracefully disable replica ends the replica process and disable the restart enable replica enable the replica process sync replica sync the data between mysql and postgresql without dropping the tables show status displays the replication status for each source, with the lag in seconds and the last received event detach replica stops the replica stream, discards the replica setup and resets the sequences in PostgreSQL to work as a standalone db. Federico Campoli (Transferwise) pg chameleon 30 May 2017 28 / 49
  • 35. global lib.py class global config: loads the configuration parameters into the class attributes class replica engine: wraps the classes mysql engine and pgsql engine and setup the logging method. The global config instance is used to track the configuration settings. Federico Campoli (Transferwise) pg chameleon 30 May 2017 29 / 49
  • 36. mysql lib.py class mysql connection: connects to mysql using the parameters provided by replica engine class mysql engine: does all the magic for the replication setup and execution Federico Campoli (Transferwise) pg chameleon 30 May 2017 30 / 49
  • 37. pg lib.py class pg encoder: extends the class JSON and adds some special handling for types like decimal and datetime class pgsql connection: connects to the PostgreSQL database class pgsql engine: does all the magic for rebuilding the data structure, loading data and migrating the schema Federico Campoli (Transferwise) pg chameleon 30 May 2017 31 / 49
  • 38. sqlutil lib.py Consists in just one class sql token which tokenise the mysql queries using the regular expressions. Federico Campoli (Transferwise) pg chameleon 30 May 2017 32 / 49
  • 39. sqlutil lib.py Consists in just one class sql token which tokenise the mysql queries using the regular expressions. Yes, I have two problems now! Federico Campoli (Transferwise) pg chameleon 30 May 2017 32 / 49
  • 40. Limitations Tables for being replicated require primary keys No Daemonisation Binary data are hexified to avoid issues with PostgreSQL Federico Campoli (Transferwise) pg chameleon 30 May 2017 33 / 49
  • 41. The future pg chameleon v2 development is already started. The first alpha will come out soon. The new version is a reorganisation of the version 1 with several improvements. Reorganised configuration files Background copy with parallel processes Separate daemon for read and replay Improved monitoring Python 3 only Federico Campoli (Transferwise) pg chameleon 30 May 2017 34 / 49
  • 42. Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 A chameleon in the middle 4 Replica in action 5 Lessons learned 6 Wrap up Federico Campoli (Transferwise) pg chameleon 30 May 2017 35 / 49
  • 43. init replica tune The replica initialisation required several improvements. The OOM killer is always happy to kill processes using large amount of memory Using a general slice size doesn’t work well because with large rows the process crashes Estimating the total rows for user’s feedback is faster but the output can be odd. Using not buffered cursors improves the speed and the memory usage. Federico Campoli (Transferwise) pg chameleon 30 May 2017 36 / 49
  • 44. Strictness is an illusion. MySQL doubly so MySQL’s lack of strictness is not a mystery. The funny way the default with NOT NULL is managed by MySQL can break the replica. Therefore any field with NOT NULL added after the initialisation are created always as NULLable in PostgreSQL. Federico Campoli (Transferwise) pg chameleon 30 May 2017 37 / 49
  • 45. I feel your lack of constraint disturbing Rubbish data in MySQL can be stored without errors raised by the DBMS. When this happens the replicator traps the error when the change is replayed on PostgreSQL and discards the problematic row. The value is stored hexified in the table t discarded rows for later analysis. Federico Campoli (Transferwise) pg chameleon 30 May 2017 38 / 49
  • 46. Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 A chameleon in the middle 4 Replica in action 5 Lessons learned 6 Wrap up Federico Campoli (Transferwise) pg chameleon 30 May 2017 39 / 49
  • 47. Igor, the green little guy The chameleon logo has been developed by Elena Toma, a talented Italian Lady. https://www.facebook.com/Tonkipapperoart/ The name Igor is inspired by Martin Feldman’s Igor portraited in Young Frankenstein movie. Federico Campoli (Transferwise) pg chameleon 30 May 2017 40 / 49
  • 48. Some numbers Lines of code global lib.py 327 mysql lib.py 401 pg lib.py 670 sql util.py 228 chameleon.py 58 Total lines of code 1684 Federico Campoli (Transferwise) pg chameleon 30 May 2017 41 / 49
  • 49. pg chameleon’s license 2 clause BSD License Copyright (c) 2016,2017 Federico Campoli All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Federico Campoli (Transferwise) pg chameleon 30 May 2017 42 / 49
  • 50. Feedback please! Please report any issue on github! https://github.com/the4thdoctor/pg chameleon Federico Campoli (Transferwise) pg chameleon 30 May 2017 43 / 49
  • 51. Boring legal stuff MySQL Image source WikiCommons Hard Disk image source WikiCommons Tron image source Tron Wikia Federico Campoli (Transferwise) pg chameleon 30 May 2017 44 / 49
  • 52. Did you say hire? WE ARE HIRING! https://transferwise.com/jobs/ Federico Campoli (Transferwise) pg chameleon 30 May 2017 45 / 49
  • 53. Questions Any questions? Federico Campoli (Transferwise) pg chameleon 30 May 2017 46 / 49
  • 54. Contacts and license Twitter: 4thdoctor scarf Blog:http://www.pgdba.co.uk Brighton PostgreSQL Meetup: http://www.meetup.com/Brighton-PostgreSQL-Meetup/ This document is distributed under the terms of the Creative Commons Federico Campoli (Transferwise) pg chameleon 30 May 2017 47 / 49
  • 55. pg chameleon MySQL to PostgreSQL replica Federico Campoli Transferwise 30 May 2017 Federico Campoli (Transferwise) pg chameleon 30 May 2017 48 / 49