Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

Building a High-Volume
Reporting System on Amazon AWS
with MySQL, Tungsten, and Vertica

GAMIFIED REW ARDS

4/11/12 @jpmalek

What I’ll cover:
Our reporting/analytics growth stages,
their pitfalls and what we’ve learned:

1. Custom MySQL ETL via shell scripts, visualizations in Tableau
2. ETL via a custom Tungsten applier into Vertica
3. New Tungsten Vertica applier, built by Continuent
4. Sharded transactional system, multiple Tungsten Vertica appliers

4/11/12 @jpmalek

Stage 1 : Custom MySQL ETL via shell scripts, visualizations in Tableau

1. On slave, dump an hour’s worth of new rows via SELECT INTO OUTFILE
2. Ship data file to aggregations host, dump old hourly snapshot, load new
3. Perform aggregation queries against temporary snapshot and FEDERATED tables
4. Tableau refreshes its extracts after aggregated rows are inserted.

4/11/12 @jpmalek

Detour : RAID for the Win

Big drop in
API endpoint latency
(writes)

4/11/12 @jpmalek

Stage 2 : ETL via a custom Tungsten applier into Vertica

4/11/12 @jpmalek

Stage 2 : Customized Tungsten Replication Setup

Master Replicator
Extract binlog to
MySQL Tungsten Log

Extract From Extract Custom Vertica
Filter
Master to Log from Log JDBC Applier

Slave Replicator

Filter DDL &
unwanted
tables Vertica

4/11/12

Stage 2 : Issues with the Custom Tungsten Filter

1. OLTP transactions on Vertica are very slow! (10 transactions per second
vs. around 1000 per second for a MySQL slave). Slave applier could
not keep up with MySQL master.
2. Person who created the applier was no longer in the company.
3. Tungsten setup including custom applier was difficult to maintain and
hard to move to other hosts.

4/11/12 @jpmalek

Detour : flexible APIs
and baseball schedules

4/11/12 @jpmalek

Stage 3 : New Tungsten Vertica Applier

4/11/12 @jpmalek

Stage 3: A Template-Driven Batch Apply Process

Tungsten Replicator Pipeline

Extract- Extract-
MySQL Filter- Filter- Extract-Filter-Apply
Apply Apply

Base Tables
63, ‘bob’, 23, …
Staging Table 64, ‘sue’, 76, …
233, d, 64, …, 1 67, ‘jim’, 1, …
CSV 76, ‘dan’, 25, …
233, i, 64, …, 2
Files 98, ‘joe’, 66, …
239, I, 76, …, 3

COPY DELETE, then INSERT
(Template) (Template)

4/11/12

Stage 3 : Batch Applier Replication Setup

Master Replicator
Extract binlog to
MySQL Tungsten Log

Extract From Extract Batch applier using SQL
Filter
Master to Log from Log template commands
COPY /
Slave Replicator INSERT

Use built-in
CSV Vertica
Filters; DDL
ignored
Write date to
disk files
4/11/12

Stage 3 : Solving Problems to Get the New Applier to Work

1. Testing – Developed a lightweight testing mechanism for heterogeneous
replication
2. Batch applier implementation – Two tries to get it right including SQL
templates and full datatype support
3. Character sets – Ensuring consistent UTF-8 handling throughout the
replication change, including CSV files
4. Time zones – Ensuring Java VM handled time values correctly
5. Performance – Tweak SQL templates to get 50x boost over old applier

4/11/12 @jpmalek

Detour :
Sharding

or

Learning How To Sleep
In Any Position

4/11/12 @jpmalek

Stage 4 : Sharded transactional system, multiple Tungsten Vertica appliers

4/11/12 @jpmalek

Solving Problems to Scale Up The Replication Configuration

1. Implement remote batch apply so Tungsten can
run off-board from Vertica
2. Convert replication to a direct pipeline with a single
service between MySQL and Vertica
3. Create a script to deploy replicator in a single
command
4. Create staging tables on Vertica server
4/11/12 @jpmalek

Remaining Challenges to Complete Replication Setup

1. Configure replication for global and local DBShards
data
2. Ensure performance is up to snuff-currently at 500-
1000 transactions per second
3. Introduce intermediate staging servers to reduce
number of replication streams into Vertica

4/11/12 @jpmalek

Thank You!

In summary:
1. Tungsten is a great tool when it comes to MySQL ETL
automation, so check it out as an alternative to custom
in-house scripts or other options.
2. Vertica is a high-performance, scaleable BI platform that
now pairs well with Tungsten. Full360 offers a cloud-
based solution.
3. If you’re just getting started on the BI front, hire a BI
developer to focus on this stuff, if you can.
4. I see no reason why this framework couldn’t scale to
easily handle whatever our business needs in the future.

4/11/12 @jpmalek

Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

Similar to Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica (20)

Recently uploaded

Recently uploaded (20)

Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica

Editor's Notes