Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Pg chameleon MySQL to PostgreSQL replica

946 views

Published on

pg_chameleon is a lightweight replication system written in python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The author's tool will talk about the history, the logic behind the functions available and will give an interactive usage example.

Published in: Technology
  • Be the first to comment

Pg chameleon MySQL to PostgreSQL replica

  1. 1. pg chameleon MySQL to PostgreSQL lightweight replica Federico Campoli Brighton PostgreSQL Meetup 18 November 2016 Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 1 / 44
  2. 2. Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 2 / 44
  3. 3. Some history Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 3 / 44
  4. 4. Some history The beginnings Years 2006/2012 neo my2pg.py Developed for helping a struggling phpbb The database was successfully migrated from MySQL to PostgreSQL The migration failed for other reasons Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
  5. 5. Some history The beginnings Years 2006/2012 neo my2pg.py Developed for helping a struggling phpbb The database was successfully migrated from MySQL to PostgreSQL The migration failed for other reasons It’s written in python 2.6 It’s a monolith script And it’s slow, very slow Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
  6. 6. Some history The beginnings Years 2006/2012 neo my2pg.py Developed for helping a struggling phpbb The database was successfully migrated from MySQL to PostgreSQL The migration failed for other reasons It’s written in python 2.6 It’s a monolith script And it’s slow, very slow You can use it as checklist for things to avoid when coding https://github.com/the4thdoctor/neo my2pg Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 4 / 44
  7. 7. Some history I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 SQLAlchemy was used for extracting the MySql’s metadata Good proof of concept. No real hope to become usable Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
  8. 8. Some history I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 SQLAlchemy was used for extracting the MySql’s metadata Good proof of concept. No real hope to become usable Built during the years of the roller coaster It was a just a way to discharge frustration Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
  9. 9. Some history I’m not scared of using the ORMs Years 2013/2015 First attempt of pg chameleon Developed in Python 2.7 SQLAlchemy was used for extracting the MySql’s metadata Good proof of concept. No real hope to become usable Built during the years of the roller coaster It was a just a way to discharge frustration Abandoned because pgloader did the same and better The ORM limitations didn’t help to keep the project alive Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 5 / 44
  10. 10. Some history pg chameleon reborn Year 2016 The project’s revamp the was triggered by a specific need. What if were possible to replicate data from MySQL to PostgreSQL? The library python-mysql-replication can decode the mysql replica when using ROW based. Trying won’t harm they said. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 6 / 44
  11. 11. Some history pg chameleon reborn Is still on Python 2.7 Removed SQLAlchemy Switched the mysql driver to PyMySQL The library python-mysql-replication reads the MySQL replica Provides a basic command line Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 7 / 44
  12. 12. MySQL Replica in a nutshell Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 8 / 44
  13. 13. MySQL Replica in a nutshell MySQL Replica MySQL saves the logical data rather the physical The data changes are stored in a local binary log The slave saves in its local relay logs the replication data pulled from the master The slave read the local relay logs and replays the data Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 9 / 44
  14. 14. MySQL Replica in a nutshell MySQL Replica Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 10 / 44
  15. 15. MySQL Replica in a nutshell Log formats STATEMENT format logs the statements which are replayed on the slave. It seems the best solution for performance. Replaying not deterministic functions generate inconsistent slaves (e.g. uuid). ROW is deterministic. It logs the changed row and the DDL queries. This format is required for pg chameleon to work. MIXED takes the best of both worlds. The master logs the statements unless a not deterministic function is used. In that case it logs the row image. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 11 / 44
  16. 16. MySQL Replica in a nutshell A chameleon in the middle pg chameleon mimics a mysql slave’s behaviour Reads the replica Stores the decoded rows into a PostgreSQL table PostgreSQL acts as relay log and replication slave A plpgSQL function decodes the rows and replay the changes Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
  17. 17. MySQL Replica in a nutshell A chameleon in the middle pg chameleon mimics a mysql slave’s behaviour Reads the replica Stores the decoded rows into a PostgreSQL table PostgreSQL acts as relay log and replication slave A plpgSQL function decodes the rows and replay the changes With an extra cool feature. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
  18. 18. MySQL Replica in a nutshell A chameleon in the middle pg chameleon mimics a mysql slave’s behaviour Reads the replica Stores the decoded rows into a PostgreSQL table PostgreSQL acts as relay log and replication slave A plpgSQL function decodes the rows and replay the changes With an extra cool feature. Initialise the PostgreSQL replica schema in just one command Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 12 / 44
  19. 19. MySQL Replica in a nutshell MySQL replica + pg chameleon Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 13 / 44
  20. 20. The pg chameleon library Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 14 / 44
  21. 21. The pg chameleon library Project structure project directory pg chameleon.py config config.yaml logs pg chameleon lib global lib.py mysql lib.py pg lib.py sqlutil lib.py sql upgrade create schema.sql Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 15 / 44
  22. 22. The pg chameleon library pg chameleon.py Command line wrapper Use argparse to execute the commands Can be simply extended to more commands Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 16 / 44
  23. 23. The pg chameleon library pg chameleon.py init replica copies the data from mysql and saves the master coordinates in postgres this command locks the mysql tables in read only mode during the copy start replica connects to the mysql master and replies the changes in PostgreSQL create schema,drop schema,upgrade schema manual actions on the PostgreSQL service schema not required in general because the init replica recreates the service schema from scratch. start replica runs the schema migrations if required before starting the program loop Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 17 / 44
  24. 24. The pg chameleon library global lib.py class global config: loads the config.yaml into the class attributes class replica engine: wraps the mysql and pgsql class methods and setup the logging method. a global config instance is created for getting the configuration settings Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 18 / 44
  25. 25. The pg chameleon library mysql lib.py class mysql connection: connects to mysql using the parameters provided by replica engine class mysql engine: does all the magic for the replication setup and execution Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 19 / 44
  26. 26. The pg chameleon library mysql lib.py class mysql engine locks and release the tables for the init replica command pulls out the data from mysql in csv format or insert statements extracts the metadata from mysql’s information schema copy the data into postgres using the class pg engine fallsback to inserts if the copy fails for any reason starts the replica stream using python-mysql-replication decodes the replica events into a data dictionary which is saved by pg engine when a replica binlog is read executes the postgres replay via pg engine Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 20 / 44
  27. 27. The pg chameleon library pg lib.py class pg encoder: extends the class JSON and adds some special handling for types like decimal and datetime class pgsql connection: connects to the PostgreSQL database class pgsql engine: does all the magic for rebuilding the data structure, loading data and migrating the schema Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 21 / 44
  28. 28. The pg chameleon library pg lib.py class pgsql engine create and upgrade the service schema sch chameleon builds the create statements for tables and indices using the metadata provided by mysql engine executes the create statements and register the mysql tables in sch chameleon copy the data into the tables and fallsback to inserts if the copy fails builds the primary keys and indices using the medatada provided by mysql engine store the json data from the replica and executes the replay Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 22 / 44
  29. 29. The pg chameleon library sqlutil lib.py Consists in just one class sql token which tokenise the mysql queries to be used by pgsql engine for building the DDL in PostgreSQL’s dialect. Currently under development Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 23 / 44
  30. 30. The pg chameleon library config.yaml my server id: the server id for the mysql replica. must be unique among the replica cluster copy max memory: the max amount of memory to use when copying the table in PostgreSQL. Is possible to specify the value in (k)ilobytes, (M)egabytes, (G)igabytes adding the suffix (e.g. 300M) my database: mysql database to replicate. a schema with the same name will be initialised in the postgres database pg database: destination database in PostgreSQL. copy mode: the allowed values are ‘file’ and ‘direct’. With direct the copy happens on the fly. With file the table is first dumped in a csv file then reloaded in PostgreSQL. hexify: is a yaml list with the data types that require coversion in hex (e.g. blob, binary). The conversion happens on the copy and on the replica. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 24 / 44
  31. 31. The pg chameleon library config.yaml log dir: directory where the logs are stored log level: logging verbosity. allowed values are debug, info, warning, error log dest: log destination. stdout for debugging purposes, file for the normal activity. my charset mysql charset for the copy (please note the replica is always in utf8) pg charset: PostgreSQL connection’s charset. tables limit: yaml list with the tables to replicate. if empty the entire mysql database is replicated. sleep loop seconds between a new replica batch attempt Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 25 / 44
  32. 32. The pg chameleon library config.yaml MySQL connection parameters mysql_conn: host: localhost port: 3306 user: replication_username passwd: never_commit_passwords Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 26 / 44
  33. 33. The pg chameleon library config.yaml PostgreSQL connection parameters pg_conn: host: localhost port: 5432 user: replication_username password: never_commit_passwords Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 27 / 44
  34. 34. The pg chameleon library MySQL replica configuration The mysql configuration file is usually stored in /etc/mysql/my.cnf To enable the binary logging find the section [mysqld] and check the following parameters are set. binlog format Has to be ROW for capturing the DML events log-bin any name is good (e.g. mysql-bin) server-id has to be a numerical value unique along the replication cluster The value 1 is used for the master binlog row image has to be full as required by the python-mysql-replication library Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 28 / 44
  35. 35. The pg chameleon library MySQL setup CREATE USER usr_replica ; SET PASSWORD FOR usr_replica =PASSWORD(’replica ’); GRANT ALL ON sakila .* TO ’usr_replica ’; GRANT RELOAD ON *.* to ’usr_replica ’; GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’; GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’; FLUSH PRIVILEGES; Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 29 / 44
  36. 36. The pg chameleon library PostgreSQL setup CREATE USER usr_replica WITH PASSWORD ’replica ’; CREATE DATABASE db_replica WITH OWNER usr_replica; Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 30 / 44
  37. 37. The pg chameleon library Replica setup Setup copy config-yaml.example in config.yaml and setup the configuration parameters ./pg_chameleon.py init_replica Wait for the init replica completion then start the replica with ./pg_chameleon.py start_replica Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 31 / 44
  38. 38. Caveats, traps, the usual political stuff... Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 32 / 44
  39. 39. Caveats, traps, the usual political stuff... Limitations Tables for being replicated require primary keys There is no cleanup for the rubbish accepted by mysql (e.g. nulls implicitly converted to 0) No Daemonisation yet Binary data are hexified to avoid issues with PostgreSQL Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 33 / 44
  40. 40. Caveats, traps, the usual political stuff... What does it work Replicate mysql schema into PostgreSQL Locks the tables in mysql and gets the master coordinates Create primary keys and indices on PostgreSQL Write MySQL row events in PostgreSQL Replay of the replicated data in PostgreSQL Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 34 / 44
  41. 41. Caveats, traps, the usual political stuff... What does seem to work Enum support Binary import into bytea (hex conversion) Initial copy based on copy to file or in memory Fall back to inserts in case of rubbish data (slow) Replication of CREATE and DROP TABLE statements Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 35 / 44
  42. 42. Caveats, traps, the usual political stuff... What doesn’t work replication of ALTER TABLE statements Materialisation of the MySQL views Foreign keys import in PostgreSQL Daemonisation, background workers for replay, postgres extension Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 36 / 44
  43. 43. Wrap up Table of contents 1 Some history 2 MySQL Replica in a nutshell 3 The pg chameleon library 4 Caveats, traps, the usual political stuff... 5 Wrap up Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 37 / 44
  44. 44. Wrap up Igor, the green little guy The chameleon logo has been developed by Elena Toma, a talented Italian Lady. https://www.facebook.com/Tonkipapperoart/ The name Igor is inspired by Martin Feldman’s Igor portraited in Young Frankenstein movie. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 38 / 44
  45. 45. Wrap up Some numbers Lines of code global lib.py 163 mysql lib.py 521 pg lib.py 557 sql util.py 208 create schema.sql 354 Total lines in libraries 1449 Total lines including SQL 1803 Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 39 / 44
  46. 46. Wrap up pg chameleon’s license Old plain 2clause BSD License Copyright (c) 2016, Federico Campoli All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 40 / 44
  47. 47. Wrap up Please Test! That’s all! Please clone the repository, test and break the tool! Report issues! https://github.com/the4thdoctor/pg chameleon Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 41 / 44
  48. 48. Wrap up Boring legal stuff MySQL Image source WikiCommons Hard Disk image source WikiCommons Slonik logo, copyright PostgreSQL Global development group Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 42 / 44
  49. 49. Wrap up Contacts and license Twitter: 4thdoctor scarf Blog:http://www.pgdba.co.uk Brighton PostgreSQL Meetup: http://www.meetup.com/Brighton-PostgreSQL-Meetup/ This document is distributed under the terms of the Creative Commons Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 43 / 44
  50. 50. Wrap up pg chameleon MySQL to PostgreSQL lightweight replica Federico Campoli Brighton PostgreSQL Meetup 18 November 2016 Federico Campoli (Brighton PostgreSQL Meetup) pg chameleon 18 November 2016 44 / 44

×