SlideShare a Scribd company logo
1 of 38
Download to read offline
Real-Time Loading from
MySQL to Hadoop
Featuring Continuent Tungsten
Robert Hodges, CEO
©Continuent 2014
Introducing Continuent

©Continuent 2014

2
Introducing Continuent

•

The leading provider of clustering and
replication for open source DBMS

•

Our Product: Continuent Tungsten

• Clustering - Commercial-grade HA, performance
scaling and data management for MySQL

• Replication - Flexible, high-performance data
movement

©Continuent 2014

3
Quick Continuent Facts

•

Largest Tungsten installation processes over
700 million transactions daily on 225
terabytes of data

•

Tungsten Replicator was application of the
year at the 2011 MySQL User Conference

•

Wide variety of topologies including MySQL,
Oracle, Vertica, and MongoDB are in
production now

•

MySQL to Hadoop deployments are now in
progress with multiple customers

©Continuent 2014

4
Selected Continuent Customers

23

©Continuent 2014

5
Five Minute Hadoop
Introduction

©Continuent 2014

6
What Is Hadoop, Exactly?

a.A distributed file system
b.A method of processing massive quantities
of data in parallel

c.The Cutting family’s stuffed elephant
d.All of the above

©Continuent 2014

7
Hadoop Distributed File System
hadoop	

command

Find 	

file

NameNode	

(directory)

Java	

Client

Hive
Read	

block(s)

Pig

DataNodes (replicated data)
©Continuent 2014

8
Map/Reduce
Acme,2013,4.75!
Spitze,2013,25.00!
Acme,2013,55.25!
Excelsior,2013,1.00!
Spitze,2013,5.00

Acme,60.00!
Excelsior,1.00!
Spitze,30.00

MAP

REDUCE
Spitze,2014,60.00!
Spitze,2014,9.50!
Acme,2014,1.00!
Acme,2014,4.00!
Excelsior,2014,1.00!
Excelsior,2014,9.00

©Continuent 2014

Acme,5.00!
Excelsior,10.00!
Spitze,69.50

MAP

9

Acme,65.00!
Excelsior,11.00!
Spitze,99.50
Typical MySQL to Hadoop Use Case
Hive	

(Analytics)

Initial Load?
Changes?
Materialized 	

views?
App changes?
Transaction
Processing

©Continuent 2014

App load?
Latency?
10

Hadoop
Cluster
Options for Loading Data

Manual	

Loading

Sqoop
CSV	

Files

©Continuent 2014

Tungsten	

Replicator
Sqoop

11
Comparing Methods in Detail
Manual via
CSV
Process
Incremental
Loading
Latency

Sqoop

Tungsten
Replicator

Manual/
Scripted

Manual/
Scripted

Fully
automated

Possible with Requires DDL
DDL changes
changes
Full-load

Intermittent

Fully
supported
Real-time

Extraction
Full and partial Low-impact
Full table scan
table scans
binlog scan
Requirements
©Continuent 2014

12
Replicating MySQL Data
to Hadoop using
Tungsten Replicator

©Continuent 2014

13
What is Tungsten Replicator?
A real-time,
high-performance,
open source database
replication engine
!

GPL V2 license - 100% open source	

Download from https://code.google.com/p/tungsten-replicator/	

Annual support subscription available from Continuent

“Golden Gate® without the Price Tag”
©Continuent 2014

14
Tungsten Replicator Overview
Master
Replicator

Extract
transactions
from log
DBMS	

Logs

(Transactions + Metadata)

Slave

Replicator

THL
(Transactions + Metadata)

Apply

©Continuent 2014

THL

15
Tungsten Replicator 3.0  Hadoop

•
•

Extract from MySQL or Oracle

•
•
•
•
•

Provision using Sqoop or parallel extraction

©Continuent 2014

Base Hadoop plus commercial distributions:
Cloudera and HortonWorks

Automatic replication of incremental changes
Transformation to preferred HDFS formats
Schema generation for Hive
Tools for generating materialized views
16
Basic MySQL to Hadoop Replication
Access via Hive
MySQL

binlog_format=row

Tungsten Slave
Replicator

hadoop

MySQL	

Binlog

Tungsten Master
Replicator

hadoop

Master-Side Filtering	

* pkey - Fill in pkey info	

* colnames - Fill in names	

* cdc - Add update type and
schema/table info	

* source - Add source DBMS	

* replicate - Subset tables to
be replicated

Load raw CSV to HDFS
(e.g., via LOAD DATA to
Hive)

Extract from
MySQL binlog

©Continuent 2014

CSV	

CSV	

CSV	

CSV	

Files
CSV	

Files
Files
Files
Files

17

Hadoop	

Cluster
Hadoop Data Loading - Gory Details
(Generate
Table
Definitions)
Replicator
Transactions
from master

Base Tables
Base Tables
Materialized Views

hadoop
Write data
to CSV

CSV	

CSV	

CSV	

Files
Files
Files
Javascript load
script	

e.g. hadoop.js

©Continuent 2014

Staging	

Staging	

Staging
Tables
Tables
“Tables”

Load using
hadoop
command

(Generate
Table
Definitions)

18

(Run Map/
Reduce)
Demo #1
!

Replicating sysbench data

©Continuent 2014

19
Viewing MySQL Data
in Hadoop

©Continuent 2014

20
Generating Staging Table Schema
$ ddlscan -template ddl-mysql-hive-0.10-staging.vm !
-user tungsten -pass secret !
-url jdbc:mysql:thin://logos1:3306/db01 -db db01!
...!
DROP TABLE IF EXISTS db01.stage_xxx_sbtest;!
!
CREATE EXTERNAL TABLE db01.stage_xxx_sbtest!
(!
tungsten_opcode STRING ,!
tungsten_seqno INT ,!
tungsten_row_id INT ,!
id INT ,!
k INT ,!
c STRING ,!
pad STRING)!
ROW FORMAT DELIMITED FIELDS TERMINATED BY '001' ESCAPED BY ''!
LINES TERMINATED BY 'n'!
STORED AS TEXTFILE LOCATION '/user/tungsten/staging/db01/sbtest';
©Continuent 2014

21
Generating Base Table Schema
$ ddlscan -template ddl-mysql-hive-0.10.vm -user tungsten !
-pass secret -url jdbc:mysql:thin://logos1:3306/db01 -db db01!
...!
DROP TABLE IF EXISTS db01.sbtest;!
!
CREATE TABLE db01.sbtest!
(!
id INT ,!
k INT ,!
c STRING ,!
pad STRING )!
;!

©Continuent 2014

22
Creating a Materialized View in Theory
Log #1

Log #2

...

Log #N

MAP	

Sort by key(s), transaction order

REDUCE	

Emit last row per key if not a delete
©Continuent 2014

23
Creating a Materialized View in Hive
$ hive!
...!
hive ADD FILE /home/rhodges/github/continuent-tools-hadoop/bin/
tungsten-reduce;!
hive FROM ( !
SELECT sbx.*!
FROM db01.stage_xxx_sbtest sbx!
DISTRIBUTE BY id !
SORT BY id,tungsten_seqno,tungsten_row_id!
) map1!
INSERT OVERWRITE TABLE db01.sbtest!
SELECT TRANSFORM(!
tungsten_opcode,tungsten_seqno,tungsten_row_id,id,k,c,pad)!
USING 'perl tungsten-reduce -k id -c
tungsten_opcode,tungsten_seqno,tungsten_row_id,id,k,c,pad'!
AS id INT,k INT,c STRING,pad STRING;!
...

MAP

REDUCE

©Continuent 2014

24
Comparing MySQL and Hadoop Data
$ export TUNGSTEN_EXT_LIBS=/usr/lib/hive/lib!
...!
$ /opt/continuent/tungsten/bristlecone/bin/dc !
-url1 jdbc:mysql:thin://logos1:3306/db01 !
-user1 tungsten -password1 secret !
-url2 jdbc:hive2://localhost:10000 !
-user2 'tungsten' -password2 'secret' -schema db01 !
-table sbtest -verbose -keys id !
-driver org.apache.hive.jdbc.HiveDriver!
22:33:08,093 INFO DC - Data comparison utility!
...!
22:33:24,526 INFO Tables compare OK!

©Continuent 2014

25
Doing it all at once
$ git clone !
https://github.com/continuent/continuent-toolshadoop.git!
!

$ cd continuent-tools-hadoop!
!

$ bin/load-reduce-check !
-U jdbc:mysql:thin://logos1:3306/db01 !
-s db01 --verbose

©Continuent 2014

26
Demo #2
!

Constructing and Checking a
Materialized View

©Continuent 2014

27
Scaling It Up!

©Continuent 2014

28
MySQL to Hadoop Fan-In Architecture
Masters

Slaves
Replicator
m1 (master)
RBR

Replicator
Replicator

m1 (slave)

m2 (master)

m2 (slave)
m3 (slave)

RBR
Replicator
m3 (master)
RBR
©Continuent 2014

29

Hadoop	

Cluster	

(many nodes)
Integration with Provisioning
MySQL

Access via Hive

(Initial provisioning run)

Sqoop/ETL

Tungsten Master
MySQL	

Binlog

Tungsten Slave

hadoop

hadoop

binlog_format=row

©Continuent 2014

CSV	

CSV	

CSV	

CSV	

Files
CSV	

Files
Files
Files
Files

30

Hadoop	

Cluster
On-Demand Provisioning via Parallel
Extract
Access via Hive
MySQL

binlog_format=row

Tungsten Slave
Replicator

hadoop

MySQL	

Binlog

Tungsten Master
Replicator

hadoop

Master-Side Filtering	

* pkey - Fill in pkey info	

* colnames - Fill in names	

* cdc - Add update type and
schema/table info	

* source - Add source DBMS	

* replicate - Subset tables to
be replicated	

(other filters as needed)	


Extract from
MySQL tables

©Continuent 2014

CSV	

CSV	

CSV	

CSV	

Files
CSV	

Files
Files
Files
Files
Load raw CSV to HDFS
(e.g., via LOAD DATA to
Hive)

31

Hadoop	

Cluster
Tungsten Replicator Roadmap

•
•
•

Parallel CSV file loading

•
•

Replication out of Hadoop

©Continuent 2014

Partition loaded data by commit time
Data formats and tools to support additional
Hadoop clients as well as HBase

Integration with emerging real-time analytics
based on HDFS (Impala, Spark/Shark,
Stinger,...)

32
Getting Started with
Continuent Tungsten

©Continuent 2014

33
Where Is Everything?

•

Tungsten Replicator 3.0 builds are now available on
code.google.com
http://code.google.com/p/tungsten-replicator/

•

Replicator 3.0 documentation is available on
Continuent website
http://docs.continuent.com/tungsten-replicator-3.0/
deployment-hadoop.html

•

Tungsten Hadoop tools are available on GitHub
https://github.com/continuent/continuent-tools-hadoop

Contact Continuent for support
©Continuent 2014

34
Commercial Terms

•
•

•

Replicator features are open source (GPL V2)
Investment Elements

•
•
•

POC / Development (Walk Away Option)
Production Deployment
Annual Support Subscription

Governing Principles

•
•

©Continuent 2014

Annual Subscription Required
More Upfront Investment - Less Annual Subscription

35
We Do Clustering Too!
GonzoPortal.com

Tungsten clusters combine offthe-shelf open source MySQL
servers into data services with:

apache
/php

!

• 24x7 data access
• Scaling of load on replicas
• Simple management commands
!

...without app changes or data
migration
Amazon
US West
©Continuent 2014

36

Connector

Connector
In Conclusion: Tungsten Offers...

•

Fully automated, real-time replication from MySQL
into Hadoop

•

Support for automatic transformation to HDFS data
formats and creation of full materialized views

•

Positions users to take advantage of evolving realtime features in Hadoop

©Continuent 2014

37
560 S. Winchester Blvd., Suite 500
San Jose, CA 95128
Tel +1 (866) 998-3642
Fax +1 (408) 668-1009
e-mail: sales@continuent.com

Our Blogs:
http://scale-out-blog.blogspot.com
http://mcslp.wordpress.com
http://www.continuent.com/news/blogs

Continuent Web Page:	

http://www.continuent.com	

!

Tungsten Replicator 2.0:	

http://code.google.com/p/tungsten-replicator	

©Continuent 2014

More Related Content

What's hot

Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsContinuent
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayDataWorks Summit
 
Set Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle ReplicationSet Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle ReplicationContinuent
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataYan Wang
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, Howmcsrivas
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith SharmaNewton Alex
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11Hortonworks
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Guy Harrison
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraDataWorks Summit
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hiverxu
 
Geographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL ClustersGeographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL ClustersContinuent
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestHBaseCon
 
Business-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud AirBusiness-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud AirContinuent
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Manish Chopra
 

What's hot (20)

Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, AnalyticsReal-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
Real-time Data Loading from Oracle and MySQL to Data Warehouses, Analytics
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native wayFrom docker to kubernetes: running Apache Hadoop in a cloud native way
From docker to kubernetes: running Apache Hadoop in a cloud native way
 
Set Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle ReplicationSet Up & Operate Open Source Oracle Replication
Set Up & Operate Open Source Oracle Replication
 
HBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and SparkHBaseCon 2015: HBase and Spark
HBaseCon 2015: HBase and Spark
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11
 
Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop Hadoop and rdbms with sqoop
Hadoop and rdbms with sqoop
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
 
Tune up Yarn and Hive
Tune up Yarn and HiveTune up Yarn and Hive
Tune up Yarn and Hive
 
Geographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL ClustersGeographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL Clusters
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Business-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud AirBusiness-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud Air
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3Difference between hadoop 2 vs hadoop 3
Difference between hadoop 2 vs hadoop 3
 

Similar to Real-Time Loading from MySQL to Hadoop with Continuent Tungsten

Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftContinuent
 
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopActian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopDataWorks Summit
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Continuent
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Continuent
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideDanairat Thanabodithammachari
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseHenk van der Valk
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentContinuent
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseSankar H
 
Hadoop World Oct 2009 Production Deep Dive With High Availability
Hadoop World Oct 2009 Production Deep Dive With High AvailabilityHadoop World Oct 2009 Production Deep Dive With High Availability
Hadoop World Oct 2009 Production Deep Dive With High AvailabilityAlex Dorman
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Edureka!
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Rajit Saha
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanJim Kaskade
 

Similar to Real-Time Loading from MySQL to Hadoop with Continuent Tungsten (20)

Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
 
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopActian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
 
Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...Replication in real-time from Oracle and MySQL into data warehouses and analy...
Replication in real-time from Oracle and MySQL into data warehouses and analy...
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
 
Get started with Microsoft SQL Polybase
Get started with Microsoft SQL PolybaseGet started with Microsoft SQL Polybase
Get started with Microsoft SQL Polybase
 
Keynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at ContinuentKeynote: Getting Serious about MySQL and Hadoop at Continuent
Keynote: Getting Serious about MySQL and Hadoop at Continuent
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Hadoop World Oct 2009 Production Deep Dive With High Availability
Hadoop World Oct 2009 Production Deep Dive With High AvailabilityHadoop World Oct 2009 Production Deep Dive With High Availability
Hadoop World Oct 2009 Production Deep Dive With High Availability
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
 
מיכאל
מיכאלמיכאל
מיכאל
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
Vmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps IronfanVmware Serengeti - Based on Infochimps Ironfan
Vmware Serengeti - Based on Infochimps Ironfan
 

More from Continuent

Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondContinuent
 
Continuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlContinuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterContinuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLContinuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterContinuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS AuroraWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS AuroraContinuent
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Continuent
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Continuent
 
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent FailoverWebinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent FailoverContinuent
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Continuent
 
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardTraining Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardContinuent
 
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & KafkaTraining Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & KafkaContinuent
 
Training Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data WarehousesTraining Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data WarehousesContinuent
 
Training Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a ClusterTraining Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a ClusterContinuent
 
Training Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMITraining Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMIContinuent
 
Training Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMITraining Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMIContinuent
 
Training Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProTraining Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProContinuent
 
Training Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & TroubleshootingTraining Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & TroubleshootingContinuent
 
Training Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSLTraining Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSLContinuent
 

More from Continuent (20)

Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
 
Continuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition Webinar
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS AuroraWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
 
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent FailoverWebinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
 
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardTraining Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
 
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & KafkaTraining Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
 
Training Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data WarehousesTraining Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data Warehouses
 
Training Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a ClusterTraining Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a Cluster
 
Training Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMITraining Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMI
 
Training Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMITraining Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMI
 
Training Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProTraining Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a Pro
 
Training Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & TroubleshootingTraining Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & Troubleshooting
 
Training Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSLTraining Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSL
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Real-Time Loading from MySQL to Hadoop with Continuent Tungsten

  • 1. Real-Time Loading from MySQL to Hadoop Featuring Continuent Tungsten Robert Hodges, CEO ©Continuent 2014
  • 3. Introducing Continuent • The leading provider of clustering and replication for open source DBMS • Our Product: Continuent Tungsten • Clustering - Commercial-grade HA, performance scaling and data management for MySQL • Replication - Flexible, high-performance data movement ©Continuent 2014 3
  • 4. Quick Continuent Facts • Largest Tungsten installation processes over 700 million transactions daily on 225 terabytes of data • Tungsten Replicator was application of the year at the 2011 MySQL User Conference • Wide variety of topologies including MySQL, Oracle, Vertica, and MongoDB are in production now • MySQL to Hadoop deployments are now in progress with multiple customers ©Continuent 2014 4
  • 7. What Is Hadoop, Exactly? a.A distributed file system b.A method of processing massive quantities of data in parallel c.The Cutting family’s stuffed elephant d.All of the above ©Continuent 2014 7
  • 8. Hadoop Distributed File System hadoop command Find file NameNode (directory) Java Client Hive Read block(s) Pig DataNodes (replicated data) ©Continuent 2014 8
  • 10. Typical MySQL to Hadoop Use Case Hive (Analytics) Initial Load? Changes? Materialized views? App changes? Transaction Processing ©Continuent 2014 App load? Latency? 10 Hadoop Cluster
  • 11. Options for Loading Data Manual Loading Sqoop CSV Files ©Continuent 2014 Tungsten Replicator Sqoop 11
  • 12. Comparing Methods in Detail Manual via CSV Process Incremental Loading Latency Sqoop Tungsten Replicator Manual/ Scripted Manual/ Scripted Fully automated Possible with Requires DDL DDL changes changes Full-load Intermittent Fully supported Real-time Extraction Full and partial Low-impact Full table scan table scans binlog scan Requirements ©Continuent 2014 12
  • 13. Replicating MySQL Data to Hadoop using Tungsten Replicator ©Continuent 2014 13
  • 14. What is Tungsten Replicator? A real-time, high-performance, open source database replication engine ! GPL V2 license - 100% open source Download from https://code.google.com/p/tungsten-replicator/ Annual support subscription available from Continuent “Golden Gate® without the Price Tag” ©Continuent 2014 14
  • 15. Tungsten Replicator Overview Master Replicator Extract transactions from log DBMS Logs (Transactions + Metadata) Slave Replicator THL (Transactions + Metadata) Apply ©Continuent 2014 THL 15
  • 16. Tungsten Replicator 3.0 Hadoop • • Extract from MySQL or Oracle • • • • • Provision using Sqoop or parallel extraction ©Continuent 2014 Base Hadoop plus commercial distributions: Cloudera and HortonWorks Automatic replication of incremental changes Transformation to preferred HDFS formats Schema generation for Hive Tools for generating materialized views 16
  • 17. Basic MySQL to Hadoop Replication Access via Hive MySQL binlog_format=row Tungsten Slave Replicator hadoop MySQL Binlog Tungsten Master Replicator hadoop Master-Side Filtering * pkey - Fill in pkey info * colnames - Fill in names * cdc - Add update type and schema/table info * source - Add source DBMS * replicate - Subset tables to be replicated Load raw CSV to HDFS (e.g., via LOAD DATA to Hive) Extract from MySQL binlog ©Continuent 2014 CSV CSV CSV CSV Files CSV Files Files Files Files 17 Hadoop Cluster
  • 18. Hadoop Data Loading - Gory Details (Generate Table Definitions) Replicator Transactions from master Base Tables Base Tables Materialized Views hadoop Write data to CSV CSV CSV CSV Files Files Files Javascript load script e.g. hadoop.js ©Continuent 2014 Staging Staging Staging Tables Tables “Tables” Load using hadoop command (Generate Table Definitions) 18 (Run Map/ Reduce)
  • 19. Demo #1 ! Replicating sysbench data ©Continuent 2014 19
  • 20. Viewing MySQL Data in Hadoop ©Continuent 2014 20
  • 21. Generating Staging Table Schema $ ddlscan -template ddl-mysql-hive-0.10-staging.vm ! -user tungsten -pass secret ! -url jdbc:mysql:thin://logos1:3306/db01 -db db01! ...! DROP TABLE IF EXISTS db01.stage_xxx_sbtest;! ! CREATE EXTERNAL TABLE db01.stage_xxx_sbtest! (! tungsten_opcode STRING ,! tungsten_seqno INT ,! tungsten_row_id INT ,! id INT ,! k INT ,! c STRING ,! pad STRING)! ROW FORMAT DELIMITED FIELDS TERMINATED BY '001' ESCAPED BY ''! LINES TERMINATED BY 'n'! STORED AS TEXTFILE LOCATION '/user/tungsten/staging/db01/sbtest'; ©Continuent 2014 21
  • 22. Generating Base Table Schema $ ddlscan -template ddl-mysql-hive-0.10.vm -user tungsten ! -pass secret -url jdbc:mysql:thin://logos1:3306/db01 -db db01! ...! DROP TABLE IF EXISTS db01.sbtest;! ! CREATE TABLE db01.sbtest! (! id INT ,! k INT ,! c STRING ,! pad STRING )! ;! ©Continuent 2014 22
  • 23. Creating a Materialized View in Theory Log #1 Log #2 ... Log #N MAP Sort by key(s), transaction order REDUCE Emit last row per key if not a delete ©Continuent 2014 23
  • 24. Creating a Materialized View in Hive $ hive! ...! hive ADD FILE /home/rhodges/github/continuent-tools-hadoop/bin/ tungsten-reduce;! hive FROM ( ! SELECT sbx.*! FROM db01.stage_xxx_sbtest sbx! DISTRIBUTE BY id ! SORT BY id,tungsten_seqno,tungsten_row_id! ) map1! INSERT OVERWRITE TABLE db01.sbtest! SELECT TRANSFORM(! tungsten_opcode,tungsten_seqno,tungsten_row_id,id,k,c,pad)! USING 'perl tungsten-reduce -k id -c tungsten_opcode,tungsten_seqno,tungsten_row_id,id,k,c,pad'! AS id INT,k INT,c STRING,pad STRING;! ... MAP REDUCE ©Continuent 2014 24
  • 25. Comparing MySQL and Hadoop Data $ export TUNGSTEN_EXT_LIBS=/usr/lib/hive/lib! ...! $ /opt/continuent/tungsten/bristlecone/bin/dc ! -url1 jdbc:mysql:thin://logos1:3306/db01 ! -user1 tungsten -password1 secret ! -url2 jdbc:hive2://localhost:10000 ! -user2 'tungsten' -password2 'secret' -schema db01 ! -table sbtest -verbose -keys id ! -driver org.apache.hive.jdbc.HiveDriver! 22:33:08,093 INFO DC - Data comparison utility! ...! 22:33:24,526 INFO Tables compare OK! ©Continuent 2014 25
  • 26. Doing it all at once $ git clone ! https://github.com/continuent/continuent-toolshadoop.git! ! $ cd continuent-tools-hadoop! ! $ bin/load-reduce-check ! -U jdbc:mysql:thin://logos1:3306/db01 ! -s db01 --verbose ©Continuent 2014 26
  • 27. Demo #2 ! Constructing and Checking a Materialized View ©Continuent 2014 27
  • 29. MySQL to Hadoop Fan-In Architecture Masters Slaves Replicator m1 (master) RBR Replicator Replicator m1 (slave) m2 (master) m2 (slave) m3 (slave) RBR Replicator m3 (master) RBR ©Continuent 2014 29 Hadoop Cluster (many nodes)
  • 30. Integration with Provisioning MySQL Access via Hive (Initial provisioning run) Sqoop/ETL Tungsten Master MySQL Binlog Tungsten Slave hadoop hadoop binlog_format=row ©Continuent 2014 CSV CSV CSV CSV Files CSV Files Files Files Files 30 Hadoop Cluster
  • 31. On-Demand Provisioning via Parallel Extract Access via Hive MySQL binlog_format=row Tungsten Slave Replicator hadoop MySQL Binlog Tungsten Master Replicator hadoop Master-Side Filtering * pkey - Fill in pkey info * colnames - Fill in names * cdc - Add update type and schema/table info * source - Add source DBMS * replicate - Subset tables to be replicated (other filters as needed) Extract from MySQL tables ©Continuent 2014 CSV CSV CSV CSV Files CSV Files Files Files Files Load raw CSV to HDFS (e.g., via LOAD DATA to Hive) 31 Hadoop Cluster
  • 32. Tungsten Replicator Roadmap • • • Parallel CSV file loading • • Replication out of Hadoop ©Continuent 2014 Partition loaded data by commit time Data formats and tools to support additional Hadoop clients as well as HBase Integration with emerging real-time analytics based on HDFS (Impala, Spark/Shark, Stinger,...) 32
  • 33. Getting Started with Continuent Tungsten ©Continuent 2014 33
  • 34. Where Is Everything? • Tungsten Replicator 3.0 builds are now available on code.google.com http://code.google.com/p/tungsten-replicator/ • Replicator 3.0 documentation is available on Continuent website http://docs.continuent.com/tungsten-replicator-3.0/ deployment-hadoop.html • Tungsten Hadoop tools are available on GitHub https://github.com/continuent/continuent-tools-hadoop Contact Continuent for support ©Continuent 2014 34
  • 35. Commercial Terms • • • Replicator features are open source (GPL V2) Investment Elements • • • POC / Development (Walk Away Option) Production Deployment Annual Support Subscription Governing Principles • • ©Continuent 2014 Annual Subscription Required More Upfront Investment - Less Annual Subscription 35
  • 36. We Do Clustering Too! GonzoPortal.com Tungsten clusters combine offthe-shelf open source MySQL servers into data services with: apache /php ! • 24x7 data access • Scaling of load on replicas • Simple management commands ! ...without app changes or data migration Amazon US West ©Continuent 2014 36 Connector Connector
  • 37. In Conclusion: Tungsten Offers... • Fully automated, real-time replication from MySQL into Hadoop • Support for automatic transformation to HDFS data formats and creation of full materialized views • Positions users to take advantage of evolving realtime features in Hadoop ©Continuent 2014 37
  • 38. 560 S. Winchester Blvd., Suite 500 San Jose, CA 95128 Tel +1 (866) 998-3642 Fax +1 (408) 668-1009 e-mail: sales@continuent.com Our Blogs: http://scale-out-blog.blogspot.com http://mcslp.wordpress.com http://www.continuent.com/news/blogs Continuent Web Page: http://www.continuent.com ! Tungsten Replicator 2.0: http://code.google.com/p/tungsten-replicator ©Continuent 2014