Migrating Rant & Rave to PostgreSQL

Migrating Rant & Rave to PostgreSQL
Mike Fowler, mike@mlfowler.com mike.fowler@rantandrave.com
PGDayUK 2014
Mike Fowler
PGDay UK 2014

PGDayUK 2014
Overview
● Who and what is Rant & Rave?
● The journey to PostgreSQL
– Why migrate?
– Migration requirements
– Moving the data
– Adjusting the queries

PGDayUK 2014
About Me
● Technical Lead/ScrumMaster at Rant & Rave
● Been using PostgreSQL for over 10 years
● Contributed some XML support
– XMLEXISTS/xpath_exists()
– xml_is_well_formed()
● Buildfarm member piapiac
– Amazon EC2 based build for JDBC driver
– Has lead to a number of bugfix patches for JDBC
http://www.pgbuildfarm.org/cgi-bin/show_status.pl?member=piapiac

PGDayUK 2014
We are a technology company providing
Customer engagement solutions
with a difference
to more than 250 global corporations
Customer engagement is “the ongoing interactions between customer and the company, offered by the company,
chosen by the customer.”
Paul Greenberg, ZDNet and credited with inventing the term ‘CRM’
What do we do?

PGDayUK 2014
What makes us different?
1. Everyone else is just listening. We help businesses act before
and after every interaction.
2. Everything we do is in real-time for the frontline.
We call them ‘Moments of Truth’® those emotional
opportunities to create Ravers from customers and your staff.
3. Rant & Rave is for all your customers, wherever they are,
however they touch your business, whenever they touch your
business.

PGDayUK 2014
The Journey
PGDayUK 2014

PGDayUK 2014
The Platform
PGDayUK 2014

PGDayUK 2014

PGDayUK 2014
Some of our customers...
PGDayUK 2014

PGDayUK 2014
Some of our customers' feedback...
PGDayUK 2014

PGDayUK 2014
The Rant & Rave Database
● 8189 tables with approximately 114 million rows
– 242 'system' tables
– Remaining tables are customer specific though many
similarities exist
● 2930 views
– Views are used to customise data end users wish to see
● e.g. A team leader might be restricted to seeing only the feedback
relating to them and their team
● No stored routines, functions, procedures or triggers

PGDayUK 2014
Why migrate?
● MySQL has oddities that routinely catch us out
– UNIONs are faster than ORs with JOINs
– Transactions need retrying when under load
– GROUP BYs allowing non-aggregated columns
– “” = 0
– Nightly database restores sometimes fail for no reason
● MySQL replication leaves a lot to be desired
– Statement replication and timestamps
– Manual does not inspire confidence

PGDayUK 2014
Migration Process Requirements
● Minimise downtime
● Ensure database equality
– Schema equality
– Data equality
● Minimise software changes
– Client query compatibility

PGDayUK 2014
Minimising Downtime
1) Have a complete clone ready
+ Fastest option
- Must manage data changes from when clone was built
- Clients must be build to support both database servers
2) Stop all writers, dump & restore
- Slowest option
+ Easiest option
- Downtime becomes a function of database size and server I/O speed

PGDayUK 2014
Minimising Dump & Restore
● Smaller the database the faster the process
– Remove unused tables & data
– Create the schema in PostgreSQL
– Move unchanging tables ahead of time
● Archived data
● In-active customer tables
● Stream the dump & restore
– Intermediary files are expensive

PGDayUK 2014
Introducing convert.pl
● Perl script that manipulates mysqldump format into
a format psql can execute
● Can be executed as part of a stream
mysqldump > convert.pl > psql
● All conversion rules are in one place allowing for
easy refinements as the differences between
MySQL and PostgreSQL are identified and resolved
● Aim to make it publicly available

PGDayUK 2014
The mysqldump Format
● SQL Script
● Lines beginning - - are comments
● Lines beginning /* and ending */ are MySQL version
sepcific commands (e.g. /*!40101 SET NAMES utf8 */;)
● Every section is started with a comment header
--
-- Table structure for table `system_users`
--

PGDayUK 2014
The mysqldump Format
● Four different operations are found in the dump
– Table schema
-- Table structure for table `table_name`
– Temporary table schema for views
-- Temporary table structure for view
`view_name`
– View creation
-- Final view structure for view `view_name`
– Data
-- Dumping data for table `table_name`

PGDayUK 2014
Dealing with differences
● MySQL uses ` to quote identifiers, PostgreSQL “
– `table_name` => “table_name”
● CREATE TABLE & VIEW is surprisingly similar
● Most data types need only minor adjustment
– Numeric types SMALLINT, INT, BIGINT, DOUBLE
– Character types TINYTEXT, VARCHAR, LONGTEXT
– Time types DATE, DATETIME, TIMESTAMP
● Some datatypes need careful adjustment
– TINYINT, BLOB

PGDayUK 2014
Numeric data types
MySQL allows for a variable display size when
declaring a numeric column - this is not storage size
`field_id` int(11)
MySQL Type PostgreSQL Type
SMALLINT SMALLINT
INT INTEGER
INT UNSIGNED BIGINT
BIGINT BIGINT
DOUBLE DOUBLE PRECISION

PGDayUK 2014
Character data types
As there is no performance difference between
PostgreSQL's character data types we mapped all
MySQL types to TEXT
TINYTEXT TEXT
VARCHAR TEXT
LONGTEXT TEXT

PGDayUK 2014
Time data types
● mysqldump writes times as UTC without timezone
information, specify --skip-tz-utc to keep
timezone
● MySQL CREATE TABLE allows for one timestamp
column to have an update trigger
updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE
CURRENT_TIMESTAMP
DATE DATE
DATETIME TIMESTAMP WITH TIME ZONE
TIMESTAMP TIMESTAMP WITH TIME ZONE

PGDayUK 2014
TINYINT data type
● MySQL does not have a BOOLEAN type but uses a
special case of TINYINT(1)
● TINYINT(1) defaults can be adjusted for better
readability
TINYINT(1) BOOLEAN
TINYINT(1) DEFAULT
'0'
BOOLEAN DEFAULT FALSE
TINYINT(1) DEFAULT
'1'
BOOLEAN DEFAULT TRUE
TINYINT(4) SMALLINT

PGDayUK 2014
BLOB data type
● MySQL's various BLOB types can all be mapped to
BYTEA
● The default binary output is the source of many
invalid multi-byte escape sequences. Specify
--hex-blob to output the blobs as hexadecimal
TINYBLOB BYTEA
BLOB BYTEA
MEDIUMBLOB BYTEA
LONGBLOB BYTEA

PGDayUK 2014
AUTO_INCREMENT
MySQL uses the AUTO_INCREMENT keyword to
allow a column to take the next value from a
sequence. This keyword is independent from the
numeric data type used. PostgreSQL embeds this
behaviour in the SERIAL data type.
TINYINT … AUTO_INCREMENT SERIAL
SMALLINT … AUTO_INCREMENT SERIAL
INT … AUTO_INCREMENT SERIAL
INT UNSIGNED …
AUTO_INCREMENT
BIGSERIAL
BIGINT … AUTO_INCREMENT BIGSERIAL

PGDayUK 2014
Column Annotations
MySQL allows you to annotate columns in CREATE
TABLE
CREATE TABLE t (col INT COMMENT 'My commented
column' …);
PostgreSQL does not support this however you can
achieve this after creating the table
CREATE TABLE t (col INTEGER …);
COMMENT ON COLUMN t.col IS 'My commented column';

PGDayUK 2014
Stuff to ignore
mysqldump outputs commands that are not
relevant to PostgreSQL as well as some extensions
that are not needed.
● CREATE TABLE (…) ENGINE=InnoDB
AUTO_INCREMENT=156583 DEFAULT
CHARSET=latin1;
● LOCK TABLES;
● UNLOCK TABLES;

PGDayUK 2014
INSERTs
mysqldump generates a multi-row version of INSERT.
PostgreSQL supports this syntax but COPY is better suited
to large inserts
INSERT INTO `field_mappings` VALUES (1,892,'YMC','A9'),
(2,892,'WIG','A81')
becomes
COPY field_mappings FROM STDIN WITH NULL AS 'NULL' CSV
QUOTE AS '''';
1,892,'YMC','A9'
2,892,'WIG','A81'
.

PGDayUK 2014
Escaped data
● MySQL escapes quotes and backslashes: ',
● Escaped carriage returns (r) and newlines (n)
need to be replaced with their real values
● MySQL escapes hexadecimals with 0x
MySQL Value PostgreSQL Value
' ''

r
n
0xFFFF E'FFFF'

PGDayUK 2014
Bizarre data
● The zero timestamp '0000-00-00 00:00:00'
– Can be converted to NULL
● > 0000-00-00 00:00:00 becomes IS NOT NULL
● DFEAULT 0000-00-00 00:00:00 NOT NULL is removed
– Use an alternate odd value (e.g. '1970-01-01 00:00:00')
● Character encoding is painful
– Specifying --default-character-set=utf8 helps
– Dependant on shell environment
– May need to map some sequences (Â => x{FFFD})

PGDayUK 2014
VIEWs
Syntax is subject to same caveats SELECT queries
Extracting the VIEW from mysqldump is a little more
involved than CREATE TABLE
/*!50001 CREATE ALGORITHM=UNDEFINED */
/*!50013 DEFINER=`root`@`%` SQL SECURITY
DEFINER */
/*!50001 VIEW `view_name` AS select
`col1`.`col2` FROM `table_name` WHERE (`col3`
= '9636') */;

PGDayUK 2014
mysqldump
For those taking notes the accumulated mysqldump
command looks like this:
mysqldump --single-transaction --skip-tz-
utc --hex-blob --default-character-
set=utf8 -uuser -ppassowrd db_name

PGDayUK 2014
Dealing with queries
● Identify queries that use functions (e.g. isnull())
– Recreate functions in pl/pgsql
– Rewrite query to produce same results
● MySQL is usually case insensitive
● PostgreSQL >= 8.3 does not support implicit casting
– WHERE clauses are prime source
– JOINs need verifying
● MySQL GROUP BY behaviour is non-standard

PGDayUK 2014
Case Insensitivity: LIKE
LIKE has a case insensitive alternate ILIKE. ILIKE
does not perform too well however combining
lower()/upper() with LIKE is better.
DB WHERE clause Time
MySQL customername LIKE '%Mike%' 20ms
PSQL customername ILIKE '%mike%' 33ms
PSQL lower(customername) LIKE '%mike%' 25ms

PGDayUK 2014
Case Insensitivity: Equality
Equality can achieved by using lower() or upper().
Obviously slower than equality tests however in our
set-up PostgreSQL initially out performs MySQL.
DB WHERE clause 1st
Time 2nd
Time
MySQL customername = 'Mike Fowler' 31ms 0ms
PSQL lower(customername) = 'mike fowler' 23ms 19ms

PGDayUK 2014
Functional Indexes
PostgreSQL allows the creation of indexes based on
the computed results of functions. This improves
query times at the expense of INSERT time.
CREATE INDEX idxname ON table
(lower(customername));
DB WHERE clause Time
MySQL customername = 'Mike Fowler' 31ms
PSQL (no index) lower(customername) = 'mike fowler' 23ms
PSQL (index) lower(customername) = 'mike fowler' 0.15ms

PGDayUK 2014
GROUP BY
● MySQL allows nonaggreated columns not named in
the GROUP BY clause
– When a nonaggreated column has differing values MySQL
will not choose in a determinate way
– Appending nonaggreated columns to GROUP BY clause
could generate extra rows compared to MySQL
● MySQL group_concat(col) function
– Aggregates all rows into a comma separated list
– PostgreSQL equivalent string_agg(col,',')

PGDayUK 2014
Implicit Casting
● Look for queries with numeric operands
– WHERE col > 56 becomes WHERE col::INTEGER >
56
– WHERE col != 95 becomes WHERE
col::INTEGER != 95
– WHERE col IN (345,347) becomes WHERE col::INTEGER IN
(345,347)
● You could cast everything as TEXT
WHERE col1 = col2 becomes WHERE col1::TEXT
= COL2::TEXT

PGDayUK 2014
Implicit Casting
● JOINs with USING clause will need rewriting
SELECT * FROM table1 JOIN table2 USING (id) becomes SELECT *
FROM table1 t1 JOIN table2 t2 ON t1.id::TEXT = t2.id::TEXT
● You could permanently ALTER the column data type
ALTER TABLE table ALTER COLUMN col type
INTEGER USING col::INTEGER
– Saves modifying lots of WHERE clauses but INSERTS and
UPDATES will need checking

PGDayUK 2014
Introducing MyPGJDBC
● Simple wrapper over existing PG-JDBC
● Idea is to rewrite & log MySQL style queries to work
in PostgreSQL
– Gives us a safety net for queries we miss
– Identifies the queries so we can repair them
● Aim to make it publicly available

PGDayUK 2014
Lessons from our experience
● Allow more time than you expect
● Moving the schema and data is the easiest part
● Identifying, verifying and reworking queries is what
takes the time

PGDayUK 2014
Questions?
Mike Fowler
PGDay UK 2014

PGDayUK 2014
Thank you!
Mike Fowler
PGDay UK 2014

Migrating Rant & Rave to PostgreSQL

Recommended

Recommended

More Related Content

Similar to Migrating Rant & Rave to PostgreSQL

Similar to Migrating Rant & Rave to PostgreSQL (20)

More from Mike Fowler

More from Mike Fowler (15)

Recently uploaded

Recently uploaded (20)

Migrating Rant & Rave to PostgreSQL

Editor's Notes