Tungsten University: Load A Vertica Data Warehouse With MySQL Data


Published on

Continuent Tungsten offers real-time replication from MySQL to a variety of DBMS types including Vertica. In this Tungsten University webcast we will show you the details of setting up MySQL-to-Vertica replication, including the following topics:

• Introduction to Continuent Tungsten features for data warehouse loading
• Installation for MySQL to Vertica replication
• Best practices for applications: charsets, schema design, time zones, etc.
• Techniques for filtering and transforming data
• Performance tuning and trouble shooting
• Adapting batch loading for new use cases

The webinar includes technical information and a live demo to help you get your own data warehouse loading application up and running quickly.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Tungsten University: Load A Vertica Data Warehouse With MySQL Data

  1. 1. ©Continuent 2013Tungsten University:Load a Vertica DataWarehouse with MySQL DataRobert HodgesCEO, Continuent
  2. 2. ©Continuent 2013Introducing Continuent2• The leading provider of clustering andreplication for open source DBMS• Our Product: Continuent Tungsten• Clustering - Commercial-grade HA, performancescaling and data management for MySQL• Replication - Flexible, high-performance datamovement
  3. 3. ©Continuent 2013OLTP and Data WarehouseFundamentals3
  4. 4. ©Continuent 2013The Contenders4Popular opensource RDBMSfor transactionprocessingPopular closedsource RDBMSfor analytics
  5. 5. ©Continuent 2013Storage Layout in MySQL5id cust_id prod_id ...1 335301 532 ...2 2378 6235 ...3 ... ... ...Sales Tableid sku type532 C00135 consumer533 S09957 specialty... ...Product Tableprod_id id532 16235 2... ...Prod_ID IndexRow formatmakes tablescans veryslowIndexes slowOLTPLow/no datacompressionLimitedindextypesLimitedjointypes
  6. 6. ©Continuent 2013Storage Layout in Vertica6Sales Tablecust_id3353012378...prod_id5326235...Fast scanson columnsUpdates to singlerows arehideously slowquantity13...id123Every columnis an indexGoodcompressionid532533...skuC00135S09957...typeconsumerspecialty...Product TableFast joinswith parallelquery
  7. 7. ©Continuent 2013Traditional ETL Problems7MySQLSalesTableSalesTableLoadTransferExtractDate columns = intrusiveBatch-oriented = not timelyScan for changes = performance hit
  8. 8. ©Continuent 2013Questions for Real-Time Loading• Do I need to transform data and if so how?• Do I need to clean up bad information?• Do I need to process UPDATE/DELETE too?• Do I need to load from multiple sources?• How timely do loads need to be?• What if something fails?8
  9. 9. ©Continuent 2013Tungsten Replicator Basics9
  10. 10. ©Continuent 2013Real-Time Data Replication10MySQLSalesTableSalesTableFast propagation = timelyNo SQL changes = transparentAutomatic change capture = low impactDBMSLogsDataReplication
  11. 11. ©Continuent 2013Tungsten Master/Slave in Action11Master(Transactions + Metadata)SlaveTHLDBMSLogsReplicator(Transactions + Metadata)THLReplicatorDownloadtransactionsvia networkApply using JDBC
  12. 12. ©Continuent 2013Pipelines with Parallel Apply12Extract Filter ApplyStageExtract Filter ApplyStageStagePipelineRemoteMasterTransactionHistory LogParallelQueueSlaveDBMSExtract Filter ApplyExtract Filter ApplyExtract Filter Apply(AssignShard ID)
  13. 13. ©Continuent 2013Real-Time Batch Loading13MySQL Tungsten MasterReplicatorService my2vrMySQLExtractorSpecial Filters* pkey - Fill in pkey info* colnames - Fill in names* replicate - Ignore tablesbinlog_format=rowTungsten SlaveReplicatorService my2vrMySQLBinlogCSVFilesCSVFilesCSVFilesCSVFilesCSVFilesLarge transactionbatches to leverageload parallelizationSingle transactionsfrom OLTPoperations
  14. 14. ©Continuent 2013Batch Loading--The Gory Details14ReplicatorService my2vrTransactionsfrom masterCSVFilesCSVFilesCSVFilesStagingTablesStagingTablesStagingTablesBaseTablesBaseTablesBaseTablesMergeScript(or)COPYdirectly tobase tablesCOPY tostage tables SELECT tobase tables
  15. 15. ©Continuent 2013Setting Up MySQL to VerticaReplication15
  16. 16. ©Continuent 2013DEMO16MySQL toVertica replicationwith some bells and a whistleMySQLdb01db02db03db01renamed02Xsysbenchsysbenchsysbench
  17. 17. ©Continuent 2013Get the Codewget --no-check-certificate https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.1.0-285.tar.gztar -xf tungsten-replicator-2.1.0-285.tar.gzcd tungsten-replicator-2.1.0-28517
  18. 18. ©Continuent 2013Installing MySQL Master18tools/tungsten-installer --master-slave -a --service-name=mysql2vertica --master-host=mysql1 --cluster-hosts=mysql1 --datasource-user=tungsten --datasource-password=secret --home-directory=/opt/continuent --buffer-size=100 --java-file-encoding=UTF8 --java-user-timezone=GMT --mysql-use-bytes-for-string=false --svc-extractor-filters=replicate,colnames,pkey --property=replicator.filter.pkey.addPkeyToInserts=true --property=replicator.filter.pkey.addColumnsToDeletes=true --property=replicator.filter.replicate.do=db01.*,db02.* --start-and-report
  19. 19. ©Continuent 2013Installing Vertica Slave19$ tools/tungsten-installer --master-slave -a --service-name=mysql2vertica --home-directory=/opt/continuent --cluster-hosts=vertica1 --master-host=mysql1 --datasource-type=vertica --datasource-user=dbadmin --datasource-password=secret --datasource-port=5433 --batch-enabled=true--batch-load-template=vertica6 --vertica-dbname=bigdata --java-user-timezone=GMT --java-file-encoding=UTF8 --svc-applier-filters=dbtransform --property=replicator.filter.dbtransform.from_regex1=db02 --property=replicator.filter.dbtransform.to_regex1=renamed02 --property=replicator.stage.q-to-dbms.blockCommitRowCount=25000 --start-and-report
  20. 20. ©Continuent 2013Generate Schema Using ddlscan20•Data types?•Column lengths?•Naming conventions?•Staging tables?MySQLTablesddlscan
  21. 21. ©Continuent 2013Tungsten ddlscan Utilitycd /opt/continuent/tungsten/tungsten-replicator/bin# Base table generation../ddlscan -template ddl-mysql-vertica.vm -db db01 -user tungsten -pass secret >> ddl.sql# Staging table generation./ddlscan -template ddl-mysql-vertica-staging.vm -db db01 -user tungsten -pass secret >> ddl.sql# Load into Verticavsql -Udbadmin -wsecret < ddl.sql21
  22. 22. ©Continuent 2013Checking Status# Checking status on mastertrepctl -host logos1 heartbeattrepctl -host logos1 status# Checking status on slavetrepctl -host vertica1 status# Checking detailed performance of apply task.trepctl -host vertica1 status -name tasks22
  23. 23. ©Continuent 2013Application Tips and Tricks23
  24. 24. ©Continuent 2013Application Design Practices24• Primary keys on all tables• (Tungsten requires single column keys)• Clean schema design *really* helps• UTF-8 character set--or at least be consistent• Use GMT timezone--or be very consistentabout dates• Use row replication on MySQL master
  25. 25. ©Continuent 2013Transforming Data -- Replicator Filters25• Tables to ignore/include?• Schema/table/column renaming?• Map names to upper/lower case?• Drop data?tungsten-installer --master-slave -a --service-name=mysql2vertica ...--svc-extractor-filters=pkey,colnames,replicate --property=replicator.filter.replicate.do=db01.*,db02.*...
  26. 26. ©Continuent 2013List of Commonly Used Filters26• CDC -- Transform log to record of changes• colnames -- Add column names• dbtransform -- Change db name only• enumtostring -- Make MySQL enums a string• pkey -- Add primary key metadata• rename -- Rename db/table/column• replicate -- Replicate/don’t replicate tables• zerodate2null -- Make MySQL ‘0’ dates null
  27. 27. ©Continuent 2013Transforming Data -- Staging Server(s)27OLTPServersStagingServer withTriggers/SQLVerticaCluster
  28. 28. ©Continuent 2013Transforming Data -- Merge Script Hacks28# Hacked load script for Vertica--deletes always precede inserts, so# inserts can load directly.# Extract deleted data keys and put in temp CSV file for deletes.!egrep ^"D", %%CSV_FILE%% |cut -d, -f4 > %%CSV_FILE%%.deleteCOPY %%STAGE_TABLE_FQN%% FROM %%CSV_FILE%%.deleteDIRECT NULL null DELIMITER , ENCLOSED BY "# Delete rows using an IN clause. You could also set a column value to# mark deleted rows.DELETE FROM %%BASE_TABLE%% WHERE %%BASE_PKEY%% IN(SELECT %%STAGE_PKEY%% FROM %%STAGE_TABLE_FQN%%)# Load inserts directly into base table from a separate CSV file.!egrep ^"I", %%CSV_FILE%% |cut -d, -f4- > %%CSV_FILE%%.insertCOPY %%BASE_TABLE%% FROM %%CSV_FILE%%.insertDIRECT NULL null DELIMITER , ENCLOSED BY "
  29. 29. ©Continuent 2013Provisioning -- Using CSV29mysql> SELECT * from sales INTOOUTFILE ‘sales.csv’;...(Fix up data if necessary)...vsql> COPY sales FROM sales.csvDIRECT NULL nullDELIMITER , ENCLOSED BY ";
  30. 30. ©Continuent 2013Provisioning Using a Sandbox Server30OLTPServerTemporarySandbox ServerVerticaCluster1. Restorelogicalbackup2. Replicaterestoredtransactions3. Replicatenormally afterrestore loads
  31. 31. ©Continuent 2013Parallel Provisioning from Sandbox31OLTPServerTemporarySandbox ServerVerticaCluster1. Restorelogicalbackup2. Replicaterestored data inparallel3. Replicatenormally afterrestore loads
  32. 32. ©Continuent 2013Complex Topologies: Fan-In32VerticaClusterlogos1Masterlogos2Masterlogos2SlaveServiceslogos1
  33. 33. ©Continuent 2013Wrapping Up33
  34. 34. ©Continuent 2013Tungsten University Sessions34• Load a Vertica Data Warehouse with MySQLData (May 30 10am PDT and June 4, 4pm CEST)Send feedback to: tu@continuent.com
  35. 35. ©Continuent 2012.Continuent Web Page:http://www.continuent.comTungsten Replicator 2.0:http://code.google.com/p/tungsten-replicatorOur Blogs:http://scale-out-blog.blogspot.comhttp://!yingclusters.blogspot.comhttp://datacharmer.org/bloghttp://www.continuent.com/news/blogs560 S. Winchester Blvd., Suite 500San Jose, CA 95128Tel +1 (866) 998-3642Fax +1 (408) 668-1009e-mail: sales@continuent.com