pg_chameleon is a lightweight replication system written in
python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The history, the logic and the future of the tool.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
pg_chameleon MySQL to PostgreSQL replica made easyFederico Campoli
pg_chameleon is a lightweight replication system written in python. The tool can connect to the mysql replication protocol and replicate the data changes in PostgreSQL.
pg_chameleon is a lightweight replication system written in python. The tool can connect to the mysql replication protocol and replicate the data changes in PostgreSQL.
Whether the user needs to setup a permanent replica between MySQL and PostgreSQL or perform an engine migration, pg_chamaleon is the perfect tool for the job.
The talk will cover the history the current implementation and the future releases.
The audience will learn how to setup a replica from MySQL to PostgreSQL in few easy steps. There will be also a coverage on the lessons learned during the tool’s development cycle.
pg_chameleon is a lightweight replication system written in python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The author's tool will talk about the history, the logic behind the functions available and will give an interactive usage example.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
PostgreSQL's is one of the finest database systems available.
The talk will cover the history, the basic concepts of the PostgreSQL's architecture and the how the community behind the "the most advanced open source database" works.
The document discusses backup and recovery strategies in PostgreSQL. It describes logical backups using pg_dump, which takes a snapshot of the database and outputs SQL scripts or custom files. It also describes physical backups using write-ahead logging (WAL) archiving and point-in-time recovery (PITR). With WAL archiving enabled, PostgreSQL archives WAL files, allowing recovery to any point between backups by restoring the backup files and replaying the WAL logs. The document provides steps for performing PITR backups, including starting the backup, copying files, stopping the backup, and recovery by restoring files and using a recovery.conf file.
PostgreSQL - backup and recovery with large databasesFederico Campoli
Life on a rollercoaster, backup and recovery with large databases
Dealing with large databases is always a challenge.
The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
The presentation is based on a real story. The names are changed in order to protect the innocents.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
pg_chameleon MySQL to PostgreSQL replica made easyFederico Campoli
pg_chameleon is a lightweight replication system written in python. The tool can connect to the mysql replication protocol and replicate the data changes in PostgreSQL.
pg_chameleon is a lightweight replication system written in python. The tool can connect to the mysql replication protocol and replicate the data changes in PostgreSQL.
Whether the user needs to setup a permanent replica between MySQL and PostgreSQL or perform an engine migration, pg_chamaleon is the perfect tool for the job.
The talk will cover the history the current implementation and the future releases.
The audience will learn how to setup a replica from MySQL to PostgreSQL in few easy steps. There will be also a coverage on the lessons learned during the tool’s development cycle.
pg_chameleon is a lightweight replication system written in python. The tool connects to the mysql replication protocol and replicates the data in PostgreSQL.
The author's tool will talk about the history, the logic behind the functions available and will give an interactive usage example.
The ninja elephant, scaling the analytics database in TranswerwiseFederico Campoli
Business intelligence and analytics is the core of any great company and Transferwise is not an exception.
The talk will start with a brief history on the legacy analytics implemented with MySQL and how we scaled up the performance using PostgreSQL. In order to get fresh data from the core MySQL databases in real time we used a modified version of pg_chameleon which also obfuscated the PII data.
The talk will also cover the challenges and the lesson learned by the developers and analysts when bridging MySQL with PostgreSQL.
PostgreSQL's is one of the finest database systems available.
The talk will cover the history, the basic concepts of the PostgreSQL's architecture and the how the community behind the "the most advanced open source database" works.
The document discusses backup and recovery strategies in PostgreSQL. It describes logical backups using pg_dump, which takes a snapshot of the database and outputs SQL scripts or custom files. It also describes physical backups using write-ahead logging (WAL) archiving and point-in-time recovery (PITR). With WAL archiving enabled, PostgreSQL archives WAL files, allowing recovery to any point between backups by restoring the backup files and replaying the WAL logs. The document provides steps for performing PITR backups, including starting the backup, copying files, stopping the backup, and recovery by restoring files and using a recovery.conf file.
PostgreSQL - backup and recovery with large databasesFederico Campoli
Life on a rollercoaster, backup and recovery with large databases
Dealing with large databases is always a challenge.
The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
The presentation is based on a real story. The names are changed in order to protect the innocents.
Dealing with large databases is always a challenge. The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
The document discusses PostgreSQL's internal architecture and components. It describes the data area, which stores data files on disk, and key directories like pg_xlog for write-ahead logs. It explains the buffer cache and clock sweep algorithm for managing memory, and covers the multi-version concurrency control (MVCC) which allows simultaneous transactions. TOAST storage is also summarized, which stores large data values externally.
The document discusses PostgreSQL and its capabilities. It describes how PostgreSQL was created in 1982 and became open source in 1996. It discusses PostgreSQL's support for large databases, high-performance transactions using MVCC, ACID compliance, and its ability to run on most operating systems. The document also covers PostgreSQL's JSON and NoSQL capabilities and provides performance comparisons of JSON, JSONB and text fields.
Slides from the Brighton PostgreSQL meetup presentation. An all around PostgreSQL exploration. The rocky physical layer, the treacherous MVCC’s swamp and the buffer manager’s garden.
The document discusses PostgreSQL's physical storage structure. It describes the various directories within the PGDATA directory that stores the database, including the global directory containing shared objects and the critical pg_control file, the base directory containing numeric files for each database, the pg_tblspc directory containing symbolic links to tablespaces, and the pg_xlog directory which contains write-ahead log (WAL) segments that are critical for database writes and recovery. It notes that tablespaces allow spreading database objects across different storage devices to optimize performance.
This document is an introduction to PostgreSQL presented by Federico Campoli to the Brighton PostgreSQL Users Group. It covers the history and development of PostgreSQL, its features including data types, JSON/JSONB support, and performance comparisons. The presentation includes sections on the history of PostgreSQL, its features and capabilities, NOSQL support using JSON/JSONB, and concludes with a wrap up on PostgreSQL and related projects.
This document discusses JPQL join fetch queries in JPA. It explains that join fetch queries can result in a cartesian product between tables if not written properly. It provides examples of order, member, and delivery entities to illustrate this issue. It also discusses using the distinct keyword in JPQL queries, clarifying that distinct applies to root entities, not rows in the SQL query. Finally, it mentions that fetch joins can be optimized using batch fetching.
- Covenant Ko is a founder and maintainer of the Github organization 'brave-people' and has a technical blog that has received over 410,000 visits
- He explains Anderson's Formula for calculating expected value, which takes into account probability of success (P) and payoff for success (S)
- Some tips he provides include making the common case fast, using static fields for caches but not databases, and that StringBuffer is thread-safe while StringBuilder is not
JPA Week3 Entity Mapping / Hexagonal ArchitectureCovenant Ko
The document discusses Hexagonal Architecture and its principles. It explains that the core domain layer should not depend on other layers like the data layer. It provides examples of package structures for Hexagonal Architecture and sample code that separates ports and adapters. Case studies are presented on how companies have implemented Hexagonal Architecture for microservices and APIs.
Uber’s blog post about migration from PostgreSQL to MySQL made a lot of buzz in PostgreSQL community. Many of Developers PostgreSQL community realized shortcoming of our table engine (which is the only one yet). As result, many of patches were developer in order to overcome the shortcomings mentioned by Uber. Some of those patches are overlapping, even some of them are in contradictory. Those patches include: indirect indexes (indexes which references primary key value), WARM (write-amplification reduction method), RDS (recently dead storage). Also there are discussions about pluggable table engines and undo log.
In this talk I’ll consider points of Uber’s blog post from PostgreSQL developer point of view. I’ll tell which points I agree, which points I disagree and which points I partially agree. Also I’ll consider developments of PostgreSQL community, and how them can overcome mentioned shortcomings from my point of view.
1. The document discusses writing test code and outlines a multi-week plan for learning testing techniques like JPA, ORM, TDD, DDD, and MVC.
2. It recommends practices like using Mockito to mock dependencies, avoiding unnecessary mocking, and following FIRST principles of tests being fast, independent, repeatable, self-validating, and timely.
3. The document also provides resources on functional decomposition, abstract data types, object-oriented design, and troubleshooting services.
In-memory OLTP storage with persistence and transaction supportAlexander Korotkov
Nowadays it becomes evident that single storage engine can't be "one size fits all". PostgreSQL community starts its movement towards pluggable storages. Significant restriction which is imposed in the current approach is compatibility. We consider pluggable storages to be compatible with (at least some) existing index access methods. That means we've long way to go, because we have to extend our index AMs before we can add corresponding features in the pluggable storages themselves.
In this talk we would like look this problem from another angle, and see what can we achieve if we try to make storage completely from scratch (using FDW interface for prototyping). Thus, we would show you a prototype of in-memory OLTP storage with transaction support and snapshot isolation. Internally it's implemented as index-organized table (B-tree) with undo log and optional persistence. That means it's quite different from what we have in PostgreSQL now.
The proved by benchmarks advantages of this in-memory storage are: better multicore scalability (thanks to no buffer manager), reduced bloat (thanks to undo log) and optimized IO (thank to logical WAL logging).
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionParis Carbone
Large-scale data stream processing has come a long way to where it is today. It combines all the essential requirements of modern data analytics: subsecond latency, high throughput and impressively, strong consistency. Apache Flink is a system that serves as a proof-of-concept of these characteristics and it is mainly well-known for its lightweight fault tolerance. Data engineers and analysts can now let the system handle Terabytes of computational state without worrying about failures that can potentially occur.
This presentation describes all the fundamental challenges behind exactly-once processing guarantees in large-scale streaming in a simple and intuitive way. Furthermore, it demonstrate the basic and extended versions of Flink's state-of-the-art snapshotting algorithm tailored to the needs of a dataflow graph.
This document contains information about Covenant Ko, including his background and links to his Github and blog. It discusses ORM and JPA, with JPA standing for Java Persistence API. It includes a pop quiz question asking about the relationships between ORM, JPA, and JPA providers like Hibernate. The document also references a case study on using Hazelcast for Hibernate second-level caching to improve performance.
- Beans in Spring are objects that are managed by the Spring IoC container. They form the backbone of a Spring application.
- Beans and their dependencies are configured in metadata that is used by the container to manage the complete lifecycle of a bean - instantiating, assembling, and configuring beans.
- There are two main ways to configure beans - using XML configuration or using annotations in Java code. Common annotations used include @Component, @Controller, @Service and @Repository.
The document discusses Covenant Ko and chapter 5. It mentions Covenant Ko's name and company (11번가) and links to his Github and tech blog. It then covers the GRASP pattern for software design, including information expert, controller and other patterns. It discusses the need for responsibility-driven design and applying the GRASP patterns.
The document discusses the life cycle of Spring beans. It begins with an overview of how beans are defined using XML, stereotype annotations, and configuration. It then covers the key stages in the bean life cycle: 1) Context loading where the configuration is merged and validated, 2) Dependency injection where dependencies are injected either through constructors, setters or fields, and 3) A pop quiz about the final bean definition and how it can be modified through a BeanFactoryPostProcessor.
This document provides an overview of how to run, debug, and tune Apache Flink applications. It discusses:
- Writing and testing Flink jobs locally and submitting them to a cluster for execution
- Debugging techniques like logs, accumulators, and remote debugging
- Tuning jobs by configuring parallelism, memory settings, and I/O directories
- Common issues like OutOfMemoryErrors and how to resolve them
JPA 스터디 Week2 - Object Relational MappingCovenant Ko
This document discusses Object-Relational Mapping (ORM) and JPA. It begins with an introduction to the author and their background and credentials. It then poses some questions about ORM and defines it as mapping between object-oriented programming languages and relational databases. It discusses some of the challenges with ORM including differences in granularity, inheritance, identity, associations, and data navigation between objects and relational databases. It also covers ORM patterns like Active Record and Data Mapper and compares JPA, Hibernate, and Spring Data JPA. Finally, it provides a case study example of applying ORM and architectural patterns to an online ordering system.
Pg chameleon, mysql to postgresql replica made easyFederico Campoli
Federico Campoli developed pg_chameleon to replicate data from MySQL to PostgreSQL. He has been passionate about IT since 1982 and loves PostgreSQL. Pg_chameleon version 2.0 allows replication of multiple MySQL schemas to a PostgreSQL database. It uses two subprocesses to concurrently read from and replay data to MySQL. The presentation covered pg_chameleon's history, how it works as a replica, setup instructions, and a demo. Future development plans include parallelizing the initial load to speed it up and adding logical replication from PostgreSQL.
Euro python2011 High Performance PythonIan Ozsvald
I ran this as a 4 hour tutorial at EuroPython 2011 to teach High Performance Python coding.
Techniques covered include bottleneck analysis by profiling, bytecode analysis, converting to C using Cython and ShedSkin, use of the numerical numpy library and numexpr, multi-core and multi-machine parallelisation and using CUDA GPUs.
Write-up with 49 page PDF report: http://ianozsvald.com/2011/06/29/high-performance-python-tutorial-v0-1-from-my-4-hour-tutorial-at-europython-2011/
Dealing with large databases is always a challenge. The backups and the HA procedures evolve meanwhile the database installation grow up over the time.
The talk will cover the problems solved by the DBA in four years of working with large databases, which size increased from 1.7 TB single cluster, up to 40 TB in a multi shard environment.
The talk will cover either the disaster recovery with pg_dump and the high availability with the log shipping/streaming replication.
The document discusses PostgreSQL's internal architecture and components. It describes the data area, which stores data files on disk, and key directories like pg_xlog for write-ahead logs. It explains the buffer cache and clock sweep algorithm for managing memory, and covers the multi-version concurrency control (MVCC) which allows simultaneous transactions. TOAST storage is also summarized, which stores large data values externally.
The document discusses PostgreSQL and its capabilities. It describes how PostgreSQL was created in 1982 and became open source in 1996. It discusses PostgreSQL's support for large databases, high-performance transactions using MVCC, ACID compliance, and its ability to run on most operating systems. The document also covers PostgreSQL's JSON and NoSQL capabilities and provides performance comparisons of JSON, JSONB and text fields.
Slides from the Brighton PostgreSQL meetup presentation. An all around PostgreSQL exploration. The rocky physical layer, the treacherous MVCC’s swamp and the buffer manager’s garden.
The document discusses PostgreSQL's physical storage structure. It describes the various directories within the PGDATA directory that stores the database, including the global directory containing shared objects and the critical pg_control file, the base directory containing numeric files for each database, the pg_tblspc directory containing symbolic links to tablespaces, and the pg_xlog directory which contains write-ahead log (WAL) segments that are critical for database writes and recovery. It notes that tablespaces allow spreading database objects across different storage devices to optimize performance.
This document is an introduction to PostgreSQL presented by Federico Campoli to the Brighton PostgreSQL Users Group. It covers the history and development of PostgreSQL, its features including data types, JSON/JSONB support, and performance comparisons. The presentation includes sections on the history of PostgreSQL, its features and capabilities, NOSQL support using JSON/JSONB, and concludes with a wrap up on PostgreSQL and related projects.
This document discusses JPQL join fetch queries in JPA. It explains that join fetch queries can result in a cartesian product between tables if not written properly. It provides examples of order, member, and delivery entities to illustrate this issue. It also discusses using the distinct keyword in JPQL queries, clarifying that distinct applies to root entities, not rows in the SQL query. Finally, it mentions that fetch joins can be optimized using batch fetching.
- Covenant Ko is a founder and maintainer of the Github organization 'brave-people' and has a technical blog that has received over 410,000 visits
- He explains Anderson's Formula for calculating expected value, which takes into account probability of success (P) and payoff for success (S)
- Some tips he provides include making the common case fast, using static fields for caches but not databases, and that StringBuffer is thread-safe while StringBuilder is not
JPA Week3 Entity Mapping / Hexagonal ArchitectureCovenant Ko
The document discusses Hexagonal Architecture and its principles. It explains that the core domain layer should not depend on other layers like the data layer. It provides examples of package structures for Hexagonal Architecture and sample code that separates ports and adapters. Case studies are presented on how companies have implemented Hexagonal Architecture for microservices and APIs.
Uber’s blog post about migration from PostgreSQL to MySQL made a lot of buzz in PostgreSQL community. Many of Developers PostgreSQL community realized shortcoming of our table engine (which is the only one yet). As result, many of patches were developer in order to overcome the shortcomings mentioned by Uber. Some of those patches are overlapping, even some of them are in contradictory. Those patches include: indirect indexes (indexes which references primary key value), WARM (write-amplification reduction method), RDS (recently dead storage). Also there are discussions about pluggable table engines and undo log.
In this talk I’ll consider points of Uber’s blog post from PostgreSQL developer point of view. I’ll tell which points I agree, which points I disagree and which points I partially agree. Also I’ll consider developments of PostgreSQL community, and how them can overcome mentioned shortcomings from my point of view.
1. The document discusses writing test code and outlines a multi-week plan for learning testing techniques like JPA, ORM, TDD, DDD, and MVC.
2. It recommends practices like using Mockito to mock dependencies, avoiding unnecessary mocking, and following FIRST principles of tests being fast, independent, repeatable, self-validating, and timely.
3. The document also provides resources on functional decomposition, abstract data types, object-oriented design, and troubleshooting services.
In-memory OLTP storage with persistence and transaction supportAlexander Korotkov
Nowadays it becomes evident that single storage engine can't be "one size fits all". PostgreSQL community starts its movement towards pluggable storages. Significant restriction which is imposed in the current approach is compatibility. We consider pluggable storages to be compatible with (at least some) existing index access methods. That means we've long way to go, because we have to extend our index AMs before we can add corresponding features in the pluggable storages themselves.
In this talk we would like look this problem from another angle, and see what can we achieve if we try to make storage completely from scratch (using FDW interface for prototyping). Thus, we would show you a prototype of in-memory OLTP storage with transaction support and snapshot isolation. Internally it's implemented as index-organized table (B-tree) with undo log and optional persistence. That means it's quite different from what we have in PostgreSQL now.
The proved by benchmarks advantages of this in-memory storage are: better multicore scalability (thanks to no buffer manager), reduced bloat (thanks to undo log) and optimized IO (thank to logical WAL logging).
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in ActionParis Carbone
Large-scale data stream processing has come a long way to where it is today. It combines all the essential requirements of modern data analytics: subsecond latency, high throughput and impressively, strong consistency. Apache Flink is a system that serves as a proof-of-concept of these characteristics and it is mainly well-known for its lightweight fault tolerance. Data engineers and analysts can now let the system handle Terabytes of computational state without worrying about failures that can potentially occur.
This presentation describes all the fundamental challenges behind exactly-once processing guarantees in large-scale streaming in a simple and intuitive way. Furthermore, it demonstrate the basic and extended versions of Flink's state-of-the-art snapshotting algorithm tailored to the needs of a dataflow graph.
This document contains information about Covenant Ko, including his background and links to his Github and blog. It discusses ORM and JPA, with JPA standing for Java Persistence API. It includes a pop quiz question asking about the relationships between ORM, JPA, and JPA providers like Hibernate. The document also references a case study on using Hazelcast for Hibernate second-level caching to improve performance.
- Beans in Spring are objects that are managed by the Spring IoC container. They form the backbone of a Spring application.
- Beans and their dependencies are configured in metadata that is used by the container to manage the complete lifecycle of a bean - instantiating, assembling, and configuring beans.
- There are two main ways to configure beans - using XML configuration or using annotations in Java code. Common annotations used include @Component, @Controller, @Service and @Repository.
The document discusses Covenant Ko and chapter 5. It mentions Covenant Ko's name and company (11번가) and links to his Github and tech blog. It then covers the GRASP pattern for software design, including information expert, controller and other patterns. It discusses the need for responsibility-driven design and applying the GRASP patterns.
The document discusses the life cycle of Spring beans. It begins with an overview of how beans are defined using XML, stereotype annotations, and configuration. It then covers the key stages in the bean life cycle: 1) Context loading where the configuration is merged and validated, 2) Dependency injection where dependencies are injected either through constructors, setters or fields, and 3) A pop quiz about the final bean definition and how it can be modified through a BeanFactoryPostProcessor.
This document provides an overview of how to run, debug, and tune Apache Flink applications. It discusses:
- Writing and testing Flink jobs locally and submitting them to a cluster for execution
- Debugging techniques like logs, accumulators, and remote debugging
- Tuning jobs by configuring parallelism, memory settings, and I/O directories
- Common issues like OutOfMemoryErrors and how to resolve them
JPA 스터디 Week2 - Object Relational MappingCovenant Ko
This document discusses Object-Relational Mapping (ORM) and JPA. It begins with an introduction to the author and their background and credentials. It then poses some questions about ORM and defines it as mapping between object-oriented programming languages and relational databases. It discusses some of the challenges with ORM including differences in granularity, inheritance, identity, associations, and data navigation between objects and relational databases. It also covers ORM patterns like Active Record and Data Mapper and compares JPA, Hibernate, and Spring Data JPA. Finally, it provides a case study example of applying ORM and architectural patterns to an online ordering system.
Pg chameleon, mysql to postgresql replica made easyFederico Campoli
Federico Campoli developed pg_chameleon to replicate data from MySQL to PostgreSQL. He has been passionate about IT since 1982 and loves PostgreSQL. Pg_chameleon version 2.0 allows replication of multiple MySQL schemas to a PostgreSQL database. It uses two subprocesses to concurrently read from and replay data to MySQL. The presentation covered pg_chameleon's history, how it works as a replica, setup instructions, and a demo. Future development plans include parallelizing the initial load to speed it up and adding logical replication from PostgreSQL.
Euro python2011 High Performance PythonIan Ozsvald
I ran this as a 4 hour tutorial at EuroPython 2011 to teach High Performance Python coding.
Techniques covered include bottleneck analysis by profiling, bytecode analysis, converting to C using Cython and ShedSkin, use of the numerical numpy library and numexpr, multi-core and multi-machine parallelisation and using CUDA GPUs.
Write-up with 49 page PDF report: http://ianozsvald.com/2011/06/29/high-performance-python-tutorial-v0-1-from-my-4-hour-tutorial-at-europython-2011/
MySQL InnoDB Cluster and Group Replication in a NutshellFrederic Descamps
This document outlines the agenda and steps for a hands-on tutorial on MySQL InnoDB Cluster and Group Replication. The agenda includes preparing the workstation by setting up virtual machines, an overview of MySQL InnoDB Cluster and Group Replication, migrating from a master-slave topology to Group Replication, monitoring Group Replication, and application interaction with Group Replication. The first lab demonstrates the current master-slave setup. The migration plan involves installing MySQL InnoDB Cluster on a new server, restoring a backup, setting up asynchronous replication on the new server, adding it to the Group Replication group, pointing the application to a new node, and stopping asynchronous replication after catch up.
MySQL Group Replicatio in a nutshell - MySQL InnoDB ClusterFrederic Descamps
Group Replication is a plugin that provides multi-master replication for MySQL. It allows transactions to be executed on any node and replicated in a synchronous manner to all other nodes. The changes are delivered in total order to each node using GTIDs to ensure strong consistency across the cluster. Certification and application of the changes occurs asynchronously on each node after the writeset has been synchronously delivered.
The document discusses whether the PyPy implementation of Python is ready for production use. It provides an overview of PyPy, benchmarks various workloads against CPython, and evaluates PyPy based on common criteria for determining if a software project is production-ready. While some workloads are slower on PyPy and it fails with some Python modules, it meets most criteria and provides performance improvements for CPU-bound tasks. Overall, the document concludes PyPy could be considered for production use, especially given its advantages in scalability and upcoming improvements to its just-in-time compiler and Python 3 support.
The PHP mysqlnd plugin talk - plugins an alternative to MySQL ProxyUlf Wendel
The document discusses PHP mysqlnd plugins as an alternative to MySQL Proxy for extending the functionality of the MySQL native driver (mysqlnd) in PHP. It describes how mysqlnd plugins can hook into and replace mysqlnd C API calls to add capabilities like load balancing, read/write splitting, and query logging, without needing additional software like MySQL Proxy. The speaker explains that mysqlnd plugins are written in C or PHP and work by overriding mysqlnd method functions at initialization to intercept and modify behavior.
Percona Live 2022 - PBM - The Backup Open Source Tool for MongoDBJean Da Silva
Backup and restore are two of the most important things for databases. We don't often use the backup, but during a disaster situation, it is crucial to work.
In this session, we will discuss Percona Backup for MongoDB (PBM short).
We will walk through the process of taking backups and executing restores. We will also introduce the newest backup method that PBM offers, the physical backup in addition to the logical backup. After the introduction of the backup methods, we will evaluate the backup and restore times, and how to store the backup on remote backup storage.
OSDC 2017 | Mgmt Config: Autonomous systems by James ShubinNETWAYS
Mgmt is a next gen config management tool that takes a fresh look at existing automation problems. Three of the main design features of the tool include:
* Parallel execution
* Event driven mechanism
* Distributed architecture
This presentation will briefly introduce the tool and spend most of the time presenting and demoing some of the newer features in the project. We'll present some of the new resources (virt, password, etc) new features (libified mgmt, send/recv, DSL) and how these can be used to build autonomous systems. Finally we'll talk about some of the future designs we're planning and make it easy for new users to get involved and help shape the project.
OSDC 2017 - James Shubin - MGMT config autonomous systemsNETWAYS
Mgmt is a next gen config management tool that takes a fresh look at existing automation problems. Three of the main design features of the tool include:
* Parallel execution
* Event driven mechanism
* Distributed architecture
This presentation will briefly introduce the tool and spend most of the time presenting and demoing some of the newer features in the project. We'll present some of the new resources (virt, password, etc) new features (libified mgmt, send/recv, DSL) and how these can be used to build autonomous systems. Finally we'll talk about some of the future designs we're planning and make it easy for new users to get involved and help shape the project.
This document contains summaries of various projects and analyses posted on GitHub by Takayoshi Iitsuka. It describes analyzing the intermediate layers of VAE models, approximating the distribution of MNIST data in high-dimensional spaces, using preemptible VMs on GCP, applying k-means clustering to MNIST, creating executable Python tutorial scripts, scraping templates, developing a "Pico-OS" for MicroPython, and speeding up recalculations in large Excel sheets. Links to the relevant GitHub repositories are provided.
OpenNebula, the foreman and CentOS play nice, tooinovex GmbH
This document discusses setting up a private cloud using OpenNebula and the Foreman. It begins with an introduction and agenda. It then covers installing CentOS, setting up a local YUM repository using Pulp, installing the Foreman for bare metal provisioning, and using Puppet modules. It demonstrates deploying OpenNebula nodes using the Foreman and provides an overview of accessing the new cloud. It notes there are some rough edges to address but the modules are minor. It concludes by thanking the audience and providing contact information.
This document provides instructions for setting up MySQL for Python (MySQLdb) on Mac OS X. It describes downloading MySQL from mysql.com and installing it, then downloading and building MySQLdb from sourceforge.net. It notes potential issues like missing header files and explains how to fix them by installing additional developer packages or changing symbolic links. Comments provide corrections and additional troubleshooting tips for issues users encountered.
Jakob Lorberblatt is an open source database consultant who loves to talk about software and MySQL. The document discusses the confusion around MySQL versions, potential issues when upgrading versions like deprecated parameters or syntax, and strategies for upgrading versions safely such as backing up data, testing on a clone, and using tools like Percona Toolkit to analyze differences. It also covers techniques for gradually moving to a newer version like using ProxySQL for real-time mirroring or black hole relays for multi-version replication.
This document discusses various profiling tools that can be used to analyze MySQL performance, including Oprofile, perf, pt-pmp, and the MySQL Performance Schema. It provides examples of how these tools have been used to identify and resolve specific MySQL performance bugs. While the Performance Schema is useful, it does not always provide sufficient detail and other system-wide profilers like Oprofile and perf are still needed in some cases to pinpoint performance issues.
2018 data engineering for ml asset management for features and modelsGe Org
This document discusses asset management challenges for machine learning models. It notes that ML can increase productivity by 50x but achieving this requires addressing hidden complexities, such as data and model dependencies. The document reviews several open source solutions for managing ML workflows, models, code, data and deployments. These include Seldon.io for serving ML, Vespa for low latency serving with data, DVC for versioning code and data, and PipelineAI for reproducible model pipelines and serving. However, the document notes there is no single solution and integrating components remains challenging due to the interconnected nature of ML systems.
This document discusses Android custom kernel and ROM design. It provides information on the speaker's custom kernel projects for the Nexus 4 and Nexus 5 devices, including the features and modifications made. It also covers the process of developing a custom kernel, including cloning the source code, adding features via patching or cherry-picking, and compiling the kernel. The document briefly discusses custom ROMs and the process for syncing ROM sources.
The document discusses using mRuby and lightweight APIs for microservices. It introduces mRuby as a lightweight Ruby implementation that can be embedded into applications. It then demonstrates using mRuby with Nginx through the ngx_mruby module to build a simple microservice for handling API requests. Benchmark tests show the mRuby implementation serving requests faster than a standard Rack implementation in Ruby. However, some downsides of mRuby are also noted, such as needing to recompile when adding dependencies and lack of features like require that make code less dry.
MySQL Group Replication - HandsOn TutorialKenny Gryp
Group Replication is a plugin for MySQL that provides multi-master replication. It works by having each node send write transactions to other nodes through a group communication system. The writes are certified locally in an asynchronous manner to ensure total order of transactions across all nodes. Group Replication uses optimistic locking where local locks are released right after commit, and conflict detection happens during certification rather than at the start of transactions.
MySQL Document Store - when SQL & NoSQL live together... in peace!Frederic Descamps
Frédéric Descamps gave a demonstration of MySQL Document Store, showing how it allows both SQL and NoSQL functionality. He migrated sample data from MongoDB to MySQL Document Store and performed queries and CRUD operations. The conclusion is that MySQL Document Store provides the best of both worlds by combining schemaless and flexible data with ACID compliance, SQL capabilities, and data integrity.
Similar to pg_chameleon a MySQL to PostgreSQL replica (20)
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Full-RAG: A modern architecture for hyper-personalization
pg_chameleon a MySQL to PostgreSQL replica
1. pg chameleon
MySQL to PostgreSQL replica
Federico Campoli
Transferwise
30 May 2017
Federico Campoli (Transferwise) pg chameleon 30 May 2017 1 / 49
2. Few words about the speaker
Born in 1972
Passionate about IT since 1982
Federico Campoli (Transferwise) pg chameleon 30 May 2017 2 / 49
3. Few words about the speaker
Born in 1972
Passionate about IT since 1982
mostly because of the TRON movie
Joined the Oracle DBA secret society in 2004
In love with PostgreSQL since 2006
Currently runs the Brighton PostgreSQL User group
Works at Transferwise as Data Engineer
Federico Campoli (Transferwise) pg chameleon 30 May 2017 2 / 49
4. Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 3 / 49
5. Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 4 / 49
6. The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL
Federico Campoli (Transferwise) pg chameleon 30 May 2017 5 / 49
7. The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL
The script is in python 2.6
It’s a monolith script
And it’s slow, very slow
Federico Campoli (Transferwise) pg chameleon 30 May 2017 5 / 49
8. The beginnings
Years 2006/2012
neo my2pg.py
I wrote the script because of a struggling phpbb on MySQL
The database migration was successful
However phpbb didn’t work very well with PostgreSQL
The script is in python 2.6
It’s a monolith script
And it’s slow, very slow
It’s a good checklist for things to avoid when coding
https://github.com/the4thdoctor/neo my2pg
Federico Campoli (Transferwise) pg chameleon 30 May 2017 5 / 49
9. I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
Federico Campoli (Transferwise) pg chameleon 30 May 2017 6 / 49
10. I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
It was built during the years of the roller coaster
Therefore it was a just a way to discharge frustration
Abandoned after a while
Federico Campoli (Transferwise) pg chameleon 30 May 2017 6 / 49
11. I’m not scared of using the ORMs
Years 2013/2015
First attempt of pg chameleon
Developed in Python 2.7
Used SQLAlchemy for extracting the MySQL’s metadata
Proof of concept only
It was built during the years of the roller coaster
Therefore it was a just a way to discharge frustration
Abandoned after a while
SQLAlchemy’s limitations were frustrating as well
And there was already pgloader doing the same job
Federico Campoli (Transferwise) pg chameleon 30 May 2017 6 / 49
12. pg chameleon reborn
Year 2016
I revamped the project because I needed to replicate the data data from MySQL
to PostgreSQL.
And the library python-mysql-replication looked very promising for reading the
mysql replica protocol.
Trying won’t harm they said.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 7 / 49
13. pg chameleon v1
Compatible with CPython 2.7/3.3+
Removed SQLAlchemy
Replaced the mysqldb driver with PyMySQL
Added a command line helper
Installs in virtualenv and system wide
Shipped via pypi for easy installation
Federico Campoli (Transferwise) pg chameleon 30 May 2017 8 / 49
14. Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 9 / 49
15. MySQL Replica
MySQL replica is logical
When configured the data changes are stored in the master’s binary log files
The slave gets from the master the data changes
The data changes are saved in the slave’s relay logs
The relay logs are used to replay the data in the slave
Federico Campoli (Transferwise) pg chameleon 30 May 2017 10 / 49
16. Log formats
STATEMENT: It logs the statements which are replayed on the slave.
It’s the best solution for performance, however when replaying statements
with not deterministic functions this format generates different values o the
slave (e.g. using an insert wit the uuid function).
ROW: It’s deterministic. This format logs the row image and the DDL
queries.
This format is compulsory for using pg chameleon.
MIXED takes the best of both worlds. The master logs the statements unless
a not deterministic function is used. In that case it logs the row image.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 11 / 49
18. Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 13 / 49
19. pg chameleon
pg chameleon mimics a mysql slave’s behaviour
Reads the replica
Stores the decoded rows into a PostgreSQL table
PostgreSQL acts as relay log and replication slave
A plpgSQL function decodes the rows and replay the changes
Federico Campoli (Transferwise) pg chameleon 30 May 2017 14 / 49
20. MySQL replica + pg chameleon
Federico Campoli (Transferwise) pg chameleon 30 May 2017 15 / 49
21. Features
Read the schema and data from MySQL and restore it into a target
PostgreSQL schema
Setup PostgreSQL to act as a MySQL slave
Basic DDL Support (CREATE/DROP/ALTER TABLE, DROP PRIMARY
KEY/TRUNCATE)
Handles the rubbish data coming from the replica stream and saves the
problematic rows in sch chameleon.t discarded rows
Supports multiple MySQL sources for replica
There is a basic replica monitoring
Can detach replica from MySQL leaving PostgreSQL ready to work as
standalone server
Federico Campoli (Transferwise) pg chameleon 30 May 2017 16 / 49
22. Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 17 / 49
23. MySQL configuration
The mysql configuration file is usually stored in /etc/mysql/my.cnf
To enable the binary logging find the section [mysqld] and check the following
parameters are set.
binlog_format= ROW
log-bin = mysql-bin
server-id = 1
binlog-row-image = FULL
Federico Campoli (Transferwise) pg chameleon 30 May 2017 18 / 49
24. MySQL user for replica
Setup a replication user on MySQL
CREATE USER usr_replica ;
SET PASSWORD FOR usr_replica =PASSWORD(’replica ’);
GRANT ALL ON sakila .* TO ’usr_replica ’;
GRANT RELOAD ON *.* to ’usr_replica ’;
GRANT REPLICATION CLIENT ON *.* to ’usr_replica ’;
GRANT REPLICATION SLAVE ON *.* to ’usr_replica ’;
FLUSH PRIVILEGES;
In our example we are using the sakila test database.
https://dev.mysql.com/doc/sakila/en/
Federico Campoli (Transferwise) pg chameleon 30 May 2017 19 / 49
25. PostgreSQL setup
Add an user on PostgreSQL capable to create schemas and relations in the
destination database
CREATE USER usr_replica WITH PASSWORD ’replica ’;
CREATE DATABASE db_replica WITH OWNER usr_replica;
Federico Campoli (Transferwise) pg chameleon 30 May 2017 20 / 49
26. Install pg chameleon
The simplest way to install pg chameleon is with a virtual environment. However
if you have root access on your system the installation can be system wide.
It’s important to upgrade pip before installing the package.
python3 -m venv venv
source venv/bin/activate
pip install pip --upgrade
pip install pg_chameleon
Execute chameleon.py to create the configuration directory
$HOME/.pg chameleon/
chameleon.py
Federico Campoli (Transferwise) pg chameleon 30 May 2017 21 / 49
27. Replica setup
cd in $HOME/.pg chameleon/ and copy config-yaml.example to default.yaml then
edit the file adding the connection settings and the source and destination
schemas.
my_database: sakila
pg_database: db_replica
dest_schema: ’my_schema’
mysql_conn:
host: derpy
port: 3306
user: usr_replica
passwd: replica
pg_conn:
host: derpy
port: 5432
user: usr_replica
password: replica
Federico Campoli (Transferwise) pg chameleon 30 May 2017 22 / 49
28. Init replica
Activate the virtualenv and run
chameleon.py create_schema --config default
chameleon.py add_source --config default
chameleon.py init_replica --config default
Wait for the init replica to complete. If the database is large consider running the
init replica in a screen or tmux session.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 23 / 49
29. Start replica
Start the replica with
chameleon.py start_replica --config default
Federico Campoli (Transferwise) pg chameleon 30 May 2017 24 / 49
31. chameleon.py
Command line wrapper
Use argparse to execute the commands
Supports several commands
Federico Campoli (Transferwise) pg chameleon 30 May 2017 26 / 49
32. chameleon.py
Command line wrapper
Use argparse to execute the commands
Supports several commands
After the installation executing chameleon.py creates the configuration directory
$HOME/.pg chameleon/ with three subdirectories
pid
logs
config
Federico Campoli (Transferwise) pg chameleon 30 May 2017 26 / 49
33. chameleon.py
Commands
drop schema Drops the service schema sch chameleon with cascade option.
create schema Create the service schema sch chameleon.
upgrade schema Upgrade an existing schema sch chameleon to an newer
version.
init replica Creates the table structure from the mysql to PostgreSQL. The
mysql tables are locked in read only mode and the data is copied into the
PostgreSQL database. The master’s coordinates are stored in the
PostgreSQL replica catalogue.
start replica Starts the replication from mysql to PostgreSQL using the
master data stored in sch chameleon.t replica batch. The master’s position is
updated when a new batch is processed.
list config List the available configurations and their status (’ready’,
’initialising’,’initialised’,’stopped’,’running’,’error’)
Federico Campoli (Transferwise) pg chameleon 30 May 2017 27 / 49
34. chameleon.py
add source register a new configuration file as source
drop source remove the configuration from the registered sources
stop replica ends the replica process gracefully
disable replica ends the replica process and disable the restart
enable replica enable the replica process
sync replica sync the data between mysql and postgresql without dropping
the tables
show status displays the replication status for each source, with the lag in
seconds and the last received event
detach replica stops the replica stream, discards the replica setup and resets
the sequences in PostgreSQL to work as a standalone db.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 28 / 49
35. global lib.py
class global config: loads the configuration parameters into the class
attributes
class replica engine: wraps the classes mysql engine and pgsql engine and
setup the logging method. The global config instance is used to track the
configuration settings.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 29 / 49
36. mysql lib.py
class mysql connection: connects to mysql using the parameters provided by
replica engine
class mysql engine: does all the magic for the replication setup and execution
Federico Campoli (Transferwise) pg chameleon 30 May 2017 30 / 49
37. pg lib.py
class pg encoder: extends the class JSON and adds some special handling for
types like decimal and datetime
class pgsql connection: connects to the PostgreSQL database
class pgsql engine: does all the magic for rebuilding the data structure,
loading data and migrating the schema
Federico Campoli (Transferwise) pg chameleon 30 May 2017 31 / 49
38. sqlutil lib.py
Consists in just one class sql token which tokenise the mysql queries using the
regular expressions.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 32 / 49
39. sqlutil lib.py
Consists in just one class sql token which tokenise the mysql queries using the
regular expressions.
Yes, I have two problems now!
Federico Campoli (Transferwise) pg chameleon 30 May 2017 32 / 49
40. Limitations
Tables for being replicated require primary keys
No Daemonisation
Binary data are hexified to avoid issues with PostgreSQL
Federico Campoli (Transferwise) pg chameleon 30 May 2017 33 / 49
41. The future
pg chameleon v2 development is already started. The first alpha will come out
soon.
The new version is a reorganisation of the version 1 with several improvements.
Reorganised configuration files
Background copy with parallel processes
Separate daemon for read and replay
Improved monitoring
Python 3 only
Federico Campoli (Transferwise) pg chameleon 30 May 2017 34 / 49
42. Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 35 / 49
43. init replica tune
The replica initialisation required several improvements.
The OOM killer is always happy to kill processes using large amount of
memory
Using a general slice size doesn’t work well because with large rows the
process crashes
Estimating the total rows for user’s feedback is faster but the output can be
odd.
Using not buffered cursors improves the speed and the memory usage.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 36 / 49
44. Strictness is an illusion. MySQL doubly so
MySQL’s lack of strictness is not a mystery.
The funny way the default with NOT NULL is managed by MySQL can break the
replica.
Therefore any field with NOT NULL added after the initialisation are created
always as NULLable in PostgreSQL.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 37 / 49
45. I feel your lack of constraint disturbing
Rubbish data in MySQL can be stored without errors raised by the DBMS.
When this happens the replicator traps the error when the change is replayed on
PostgreSQL and discards the problematic row.
The value is stored hexified in the table t discarded rows for later analysis.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 38 / 49
46. Table of contents
1 Some history
2 MySQL Replica in a nutshell
3 A chameleon in the middle
4 Replica in action
5 Lessons learned
6 Wrap up
Federico Campoli (Transferwise) pg chameleon 30 May 2017 39 / 49
47. Igor, the green little guy
The chameleon logo has been developed by Elena Toma, a talented Italian Lady.
https://www.facebook.com/Tonkipapperoart/
The name Igor is inspired by Martin Feldman’s Igor portraited in Young
Frankenstein movie.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 40 / 49
48. Some numbers
Lines of code
global lib.py 327
mysql lib.py 401
pg lib.py 670
sql util.py 228
chameleon.py 58
Total lines of code 1684
Federico Campoli (Transferwise) pg chameleon 30 May 2017 41 / 49
49. pg chameleon’s license
2 clause BSD License
Copyright (c) 2016,2017 Federico Campoli
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Federico Campoli (Transferwise) pg chameleon 30 May 2017 42 / 49
50. Feedback please!
Please report any issue on github!
https://github.com/the4thdoctor/pg chameleon
Federico Campoli (Transferwise) pg chameleon 30 May 2017 43 / 49
51. Boring legal stuff
MySQL Image source WikiCommons
Hard Disk image source WikiCommons
Tron image source Tron Wikia
Federico Campoli (Transferwise) pg chameleon 30 May 2017 44 / 49
52. Did you say hire?
WE ARE HIRING!
https://transferwise.com/jobs/
Federico Campoli (Transferwise) pg chameleon 30 May 2017 45 / 49
54. Contacts and license
Twitter: 4thdoctor scarf
Blog:http://www.pgdba.co.uk
Brighton PostgreSQL Meetup:
http://www.meetup.com/Brighton-PostgreSQL-Meetup/
This document is distributed under the terms of the Creative Commons
Federico Campoli (Transferwise) pg chameleon 30 May 2017 47 / 49
55. pg chameleon
MySQL to PostgreSQL replica
Federico Campoli
Transferwise
30 May 2017
Federico Campoli (Transferwise) pg chameleon 30 May 2017 48 / 49