Linas Virbalas
                    Continuent, Inc.



© Continuent 2010
/  Definition & Motivation
  /  Scoping the Challenge
  /  MySQL ->
           •  PostgreSQL
           •  Oracle
           •  MongoDB
  /  Demo 1
  /  PostgreSQL ->
           •  MySQL
  /  Demo 2
  /  Q&A



© Continuent 2010
© Continuent 2010
Heterogeneous Replication
                               
       Replication between different types of DBMS




© Continuent 2010
1.  Real-time integration of data between different DBMS
      types
  2.  Seamless migration out of one DBMS type to another
  3.  Data warehousing (real-time) from different DBMS
      types
  4.  Leveraging specific SQL power of other DBMS types




© Continuent 2010
/        Name: Linas Virbalas
  /        Country: Lithuania
  /        Implementing for Tungsten:
         •          MySQL -> PostgreSQL
         •          MySQL -> Greenplum
         •          MySQL -> Oracle
         •          PostgreSQL WAL
         •          PostgreSQL Streaming Replication
         •          PostgreSQL Logical Replication
                    via Slony logs


  /        Blog:
           http://flyingclusters.blogspot.com


© Continuent 2010
© Continuent 2010
1.  MySQL -> …
         •          Replicating from MySQL to PostgreSQL/Greenplum, Oracle,
                    MongoDB
  2.  PostgreSQL -> …
         •          Replicating from PostgreSQL to MySQL




© Continuent 2010
With Tungsten Replicator




© Continuent 2010
/        Open Source GPL v2
  /        JAVA
  /        Interfaces to implement new:
         •          Extractors
         •          Filters
         •          Appliers
  /        Multiple replication services per one process




© Continuent 2010
Technology: Replication Pipelines




© Continuent 2010
© Continuent 2010
/  Statement Based Replication




  /  Row Based Replication




© Continuent 2010
© Continuent 2010
Master         Slave
                    Replicator    Replicator

                    Transaction   Transaction
                    History Log   History Log
                      Filters       Filters
                     MySQL        PostgreSQL
                     Extractor      Applier




© Continuent 2010
/  Provisioning
  /  Data Type Differences
  /  Database vs. Schema
  /  Default (Implicitly Defined) Schema Selection
  /  SQL Dialect Differences
           •  Statement Replication vs. Row Replication
  /  Character Sets and Binary Data
  /  Old Versions of MySQL




© Continuent 2010
Provisioning

 /  Harder way: Dump data explicitly




 /  Easier way: Replicate a mysqldump backup



                          Replicator




© Continuent 2010
/  Note the type differences between MySQL and PG

                        MySQL                PostgreSQL
                    !   TINYINT              SMALLINT
                        SMALLINT             SMALLINT
                        INTEGER              INTEGER
                        BIGINT               BIGINT
                    !   CHAR(1)              CHAR(5) = {‘true’, ‘false’}
                        CHAR(x)              CHAR(x)
                        VARCHAR(x)           VARCHAR(x)
                        DATE                 DATE
                        TIMESTAMP            TIMESTAMP
                    !   TEXT (diff. sizes)   TEXT
                    !   BLOB                 BYTEA
                        …
© Continuent 2010
Database vs. Schema

  /  In MySQL these are the same:
    ! !CREATE DATABASE foo!
    ! !CREATE SCHEMA foo!
  /  In PostgreSQL these are very different:
                    CREATE DATABASE foo!
    ! !CREATE SCHEMA foo!
  /  Tungsten uses filters to rectify MySQL databases to
     PostgreSQL schemas




© Continuent 2010
/  MySQL: Trivial to use `USE`
  /  MySQL: Going without `USE` generates different
     events
                        MySQL Implicit            MySQL Explicit
                        CREATE SCHEMA s;          CREATE SCHEMA s;
                        USE s;
                    !   CREATE TABLE t (i int);   CREATE TABLE s.t (i int);
                    !   INSERT INTO t (1);        INSERT INTO s.t (1);


  /  PG: Extract the default schema from the event
  /  PG: Set it before applying
                    MySQL            PostgreSQL
                    USE s;       >   SET search_path TO s, "$user”;
© Continuent 2010
/  Differences between DDL and DML statement SQL
     dialects
  /  Row Replication resolves issues rising from
     differences in DML, but still leaves DDL to handle
  /  Tungsten Replicator Filters come to the rescue!
           •  Simple to develop Java or JavaScript extensions
           •  Event structure IN -> Filter -> Event structure OUT

          MySQL                          PostgreSQL
          CREATE TABLE complex (id       CREATE TABLE complex (id
          INTEGER AUTO_INCREMENT         SERIAL PRIMARY KEY, i INT);
          PRIMARY KEY, i INT);

          CREATE TABLE dt (i TINYINT);    CREATE TABLE dt (i SMALLINT);
          …


© Continuent 2010
/  Statement replication: MySQL syntax is “permissive”
      /  Embedded binary / alternate charsets
      /  Different charsets for different clients
  /  Row replication: database/table/column charsets
     may differ
  /  Answer: Stick with one character set throughout; use
     row replication to move binary data
          MySQL                           PostgreSQL
          INSERT INTO embedded_blob ARGH!!! (SQL statement fails)
          (key, data) VALUES (1, ‘?0^Es
          0^0’’)
          create table xlate(id int, d1   ARGH!!! (no way to translate to
          varchar(25) character set       common charset)
          latin1, d2 varchar(25)
          character set utf8);
© Continuent 2010
MySQL Versions

  /  Problem: Data stored on hard-to-replicate MySQL
     versions or configurations
           •  Row replication not enabled (5.1)
           •  No row replication support (5.0, 4.1)
           •  Tungsten cannot read binlog (4.1)
  /  Answer: MySQL blackhole replication
           •  (Blackhole = no store, just a binlog)
           •  Caveat: Check MySQL docs carefully


                                                      Replicator




© Continuent 2010
© Continuent 2010
Master         Slave
                    Replicator    Replicator

                    Transaction   Transaction
                    History Log   History Log
                      Filters       Filters
                     MySQL          Oracle
                     Extractor      Applier




© Continuent 2010
/  TEXT length limitation
           •  VARCHAR(4000) => CLOB
  /  Primary Keys and PrimaryKeyFilter
           •  Goal:

               UPDATE t SET

               c1 = x1, c2 = x2, c3 = x3

               WHERE

               p = p1


           •  NOT:

               UPDATE t SET

               c1 = x1, c2 = x2, c3 = x3

               WHERE

               p = p1 AND c1 = x1 AND c2 = x2 AND c3 = x3 AND …!


© Continuent 2010
© Continuent 2010
> use mydb

    switched to db mydb!
  > db.test.insert(

    {"test": "test value", "anumber" : 5 }

    )!
  > db.test.find()

    {

    "_id" : ObjectId("4dce9a4f3d6e186ffccdd4bb"),

    "test" : "test value", "anumber" : 5

    }!
  > exit!




© Continuent 2010
/  MySQL binary log doesn’t hold column names

           •  mysql> INSERT INTO foo (id, data) VALUES

              (1, 'hello from MySQL!');

         •          If nothing done becomes:

                    >   db.foo.find();

                    {   "_id" : ObjectId("4dc55e45ad90a25b9b57909d"),

                    "   " : "1”,

                    "   " : "hello from MySQL!”}


         •          Solution: to fill in column names on master side. Then:

                    > db.foo.find();

                    { "_id" : ObjectId("4dc55e45ad90a25b9b57909d"),

                    ” " : "1”,

                    “    " : "hello from MySQL!”}

© Continuent 2010
MySQL -> MongoDB: The Pipeline




© Continuent 2010
© Continuent 2010
© Continuent 2010
Logical   Physical
                           MySQL Statement Based           x
                                 MySQL Row Based           x
                                      MySQL Mixed          x
                         PostgreSQL WAL Shipping                     x
                PostgreSQL Streaming Replication                     x
               Filters (data transformation) possible     +          -
                    Different data/structure on slave     +          -
                                             possible

  /  A transaction is not accessible to the replicator under
     physical replication
  /  Tungsten Replicator manages WAL/Streaming
     Replication

© Continuent 2010
Logical   Physical
                           MySQL Statement Based           x
                                 MySQL Row Based           x
                                      MySQL Mixed          x
                         PostgreSQL WAL Shipping                     x
                PostgreSQL Streaming Replication                     x
                           Tungsten Replicator w/          x
                        PostgreSQLSlonyExtractor
               Filters (data transformation) possible     +          -
                    Different data/structure on slave     +          -
                                             possible

  /  With PostgreSQLSlonyExtractor transaction goes
     through the Replicator pipeline


© Continuent 2010
Master           Slave
 Replicator      Replicator

 Transaction     Transaction
 History Log     History Log
   Filters          Filters
 PostgreSQL      MySQLApplier
SlonyExtractor
© Continuent 2010
/  We’ve reviewed an open source heterogeneous
     replicator (professional services available upon request)
  /  Tungsten Replicator encapsulates the complexity and
     corner cases of the subject
  /  Replicating:
           •  out of MySQL – now;
           •  out of PostgreSQL – prototype;
           •  out of Oracle – designs ready, awaiting sponsorship.




© Continuent 2010
© Continuent 2010
Open Source                      Commercial
   http://tungsten-replicator.org   sales@continuent.com
   #tungsten @ irc.freenode.net

   My Blog:
   http://flyingclusters.blogspot.com



                       Continuent Web Site:
                    http://www.continuent.com


© Continuent 2010

Breaking the-database-type-barrier-replicating-across-different-dbms

  • 1.
    Linas Virbalas Continuent, Inc. © Continuent 2010
  • 2.
    /  Definition &Motivation /  Scoping the Challenge /  MySQL -> •  PostgreSQL •  Oracle •  MongoDB /  Demo 1 /  PostgreSQL -> •  MySQL /  Demo 2 /  Q&A © Continuent 2010
  • 3.
  • 4.
    Heterogeneous Replication  Replication between different types of DBMS © Continuent 2010
  • 5.
    1.  Real-time integrationof data between different DBMS types 2.  Seamless migration out of one DBMS type to another 3.  Data warehousing (real-time) from different DBMS types 4.  Leveraging specific SQL power of other DBMS types © Continuent 2010
  • 6.
    /  Name: Linas Virbalas /  Country: Lithuania /  Implementing for Tungsten: •  MySQL -> PostgreSQL •  MySQL -> Greenplum •  MySQL -> Oracle •  PostgreSQL WAL •  PostgreSQL Streaming Replication •  PostgreSQL Logical Replication via Slony logs /  Blog: http://flyingclusters.blogspot.com © Continuent 2010
  • 7.
  • 8.
    1.  MySQL ->… •  Replicating from MySQL to PostgreSQL/Greenplum, Oracle, MongoDB 2.  PostgreSQL -> … •  Replicating from PostgreSQL to MySQL © Continuent 2010
  • 9.
  • 10.
    /  Open Source GPL v2 /  JAVA /  Interfaces to implement new: •  Extractors •  Filters •  Appliers /  Multiple replication services per one process © Continuent 2010
  • 11.
  • 12.
  • 13.
    /  Statement BasedReplication /  Row Based Replication © Continuent 2010
  • 14.
  • 15.
    Master Slave Replicator Replicator Transaction Transaction History Log History Log Filters Filters MySQL PostgreSQL Extractor Applier © Continuent 2010
  • 16.
    /  Provisioning /  Data Type Differences /  Database vs. Schema /  Default (Implicitly Defined) Schema Selection /  SQL Dialect Differences •  Statement Replication vs. Row Replication /  Character Sets and Binary Data /  Old Versions of MySQL © Continuent 2010
  • 17.
    Provisioning /  Harderway: Dump data explicitly /  Easier way: Replicate a mysqldump backup Replicator © Continuent 2010
  • 18.
    /  Note thetype differences between MySQL and PG MySQL PostgreSQL ! TINYINT SMALLINT SMALLINT SMALLINT INTEGER INTEGER BIGINT BIGINT ! CHAR(1) CHAR(5) = {‘true’, ‘false’} CHAR(x) CHAR(x) VARCHAR(x) VARCHAR(x) DATE DATE TIMESTAMP TIMESTAMP ! TEXT (diff. sizes) TEXT ! BLOB BYTEA … © Continuent 2010
  • 19.
    Database vs. Schema /  In MySQL these are the same: ! !CREATE DATABASE foo! ! !CREATE SCHEMA foo! /  In PostgreSQL these are very different: CREATE DATABASE foo! ! !CREATE SCHEMA foo! /  Tungsten uses filters to rectify MySQL databases to PostgreSQL schemas © Continuent 2010
  • 20.
    /  MySQL: Trivialto use `USE` /  MySQL: Going without `USE` generates different events MySQL Implicit MySQL Explicit CREATE SCHEMA s; CREATE SCHEMA s; USE s; ! CREATE TABLE t (i int); CREATE TABLE s.t (i int); ! INSERT INTO t (1); INSERT INTO s.t (1); /  PG: Extract the default schema from the event /  PG: Set it before applying MySQL PostgreSQL USE s; > SET search_path TO s, "$user”; © Continuent 2010
  • 21.
    /  Differences betweenDDL and DML statement SQL dialects /  Row Replication resolves issues rising from differences in DML, but still leaves DDL to handle /  Tungsten Replicator Filters come to the rescue! •  Simple to develop Java or JavaScript extensions •  Event structure IN -> Filter -> Event structure OUT MySQL PostgreSQL CREATE TABLE complex (id CREATE TABLE complex (id INTEGER AUTO_INCREMENT SERIAL PRIMARY KEY, i INT); PRIMARY KEY, i INT); CREATE TABLE dt (i TINYINT); CREATE TABLE dt (i SMALLINT); … © Continuent 2010
  • 22.
    /  Statement replication:MySQL syntax is “permissive” /  Embedded binary / alternate charsets /  Different charsets for different clients /  Row replication: database/table/column charsets may differ /  Answer: Stick with one character set throughout; use row replication to move binary data MySQL PostgreSQL INSERT INTO embedded_blob ARGH!!! (SQL statement fails) (key, data) VALUES (1, ‘?0^Es 0^0’’) create table xlate(id int, d1 ARGH!!! (no way to translate to varchar(25) character set common charset) latin1, d2 varchar(25) character set utf8); © Continuent 2010
  • 23.
    MySQL Versions /  Problem: Data stored on hard-to-replicate MySQL versions or configurations •  Row replication not enabled (5.1) •  No row replication support (5.0, 4.1) •  Tungsten cannot read binlog (4.1) /  Answer: MySQL blackhole replication •  (Blackhole = no store, just a binlog) •  Caveat: Check MySQL docs carefully Replicator © Continuent 2010
  • 24.
  • 25.
    Master Slave Replicator Replicator Transaction Transaction History Log History Log Filters Filters MySQL Oracle Extractor Applier © Continuent 2010
  • 26.
    /  TEXT lengthlimitation •  VARCHAR(4000) => CLOB /  Primary Keys and PrimaryKeyFilter •  Goal: UPDATE t SET
 c1 = x1, c2 = x2, c3 = x3
 WHERE
 p = p1
 •  NOT: UPDATE t SET
 c1 = x1, c2 = x2, c3 = x3
 WHERE
 p = p1 AND c1 = x1 AND c2 = x2 AND c3 = x3 AND …! © Continuent 2010
  • 27.
  • 28.
    > use mydb
 switched to db mydb! > db.test.insert(
 {"test": "test value", "anumber" : 5 }
 )! > db.test.find()
 {
 "_id" : ObjectId("4dce9a4f3d6e186ffccdd4bb"),
 "test" : "test value", "anumber" : 5
 }! > exit! © Continuent 2010
  • 29.
    /  MySQL binarylog doesn’t hold column names •  mysql> INSERT INTO foo (id, data) VALUES
 (1, 'hello from MySQL!'); •  If nothing done becomes: > db.foo.find();
 { "_id" : ObjectId("4dc55e45ad90a25b9b57909d"),
 " " : "1”,
 " " : "hello from MySQL!”}
 •  Solution: to fill in column names on master side. Then: > db.foo.find();
 { "_id" : ObjectId("4dc55e45ad90a25b9b57909d"),
 ” " : "1”,
 “ " : "hello from MySQL!”} © Continuent 2010
  • 30.
    MySQL -> MongoDB:The Pipeline © Continuent 2010
  • 31.
  • 32.
  • 33.
    Logical Physical MySQL Statement Based x MySQL Row Based x MySQL Mixed x PostgreSQL WAL Shipping x PostgreSQL Streaming Replication x Filters (data transformation) possible + - Different data/structure on slave + - possible /  A transaction is not accessible to the replicator under physical replication /  Tungsten Replicator manages WAL/Streaming Replication © Continuent 2010
  • 34.
    Logical Physical MySQL Statement Based x MySQL Row Based x MySQL Mixed x PostgreSQL WAL Shipping x PostgreSQL Streaming Replication x Tungsten Replicator w/ x PostgreSQLSlonyExtractor Filters (data transformation) possible + - Different data/structure on slave + - possible /  With PostgreSQLSlonyExtractor transaction goes through the Replicator pipeline © Continuent 2010
  • 35.
    Master Slave Replicator Replicator Transaction Transaction History Log History Log Filters Filters PostgreSQL MySQLApplier SlonyExtractor
  • 36.
  • 37.
    /  We’ve reviewedan open source heterogeneous replicator (professional services available upon request) /  Tungsten Replicator encapsulates the complexity and corner cases of the subject /  Replicating: •  out of MySQL – now; •  out of PostgreSQL – prototype; •  out of Oracle – designs ready, awaiting sponsorship. © Continuent 2010
  • 38.
  • 39.
    Open Source Commercial http://tungsten-replicator.org sales@continuent.com #tungsten @ irc.freenode.net My Blog: http://flyingclusters.blogspot.com Continuent Web Site: http://www.continuent.com © Continuent 2010