Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Replicate from Oracle to data warehouses and analytics

541 views

Published on

Analyzing transactional data residing in Oracle databases is becoming increasingly common, especially as the data sizes and complexity increase and transactional stores are no longer to keep pace with the ever-increasing storage. Although there are many techniques available for loading Oracle data, getting up-to-date data into your data warehouse store is a more difficult problem. VMware Continuent provides provides data replication from Oracle to data warehouses and analytics engines, to derive insight from big data for better business decisions. Learn practical tips on how to get your data warehouse loading projects off the ground quickly and efficiently when replicating from Oracle into Hadoop, Amazon Redshift, and HP Vertica.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Replicate from Oracle to data warehouses and analytics

  1. 1. © 2015 VMware Inc. All rights reserved. VMware Continuent Replication Replicate from Oracle to data warehouses and analytics MC Brown Senior Product Line Manager October 22nd, 2015
  2. 2. 2 Agenda 1 Introduction to VMware Continuent 2 Understanding VMware Continuent Replication 3 Using Analytics and Data Warehouses 4 Warp-up and Questions
  3. 3. Introducing VMware Continuent Business continuity for business-critical MySQL database applications Commercial-grade multi-site HA/DR Database Clustering Flexible, high-performance replication for Oracle and MySQL Simple data loading into analytics and big data Data Replication Oracle Oracle MySQL Oracle MySQL MySQL (+ MariaDB, Percona Server) Oracle Hadoop, Redshift, Vertica MySQL Hadoop, Redshift, Vertica ProductsProducts MySQL Single Site HA MySQL Multi-Site HA and DR
  4. 4. Replication solves important problems for RDBMS users •  Real-time local copies in case the DBMS fails •  Real-time remote copies in case the site fails •  Loading data into quickly into analytic systems •  Feeding edge applications from the Oracle mother ship •  Migrating from Oracle to: – New Oracle versions – Less expensive editions – Non-Oracle DBMS CONFIDENTIAL 4
  5. 5. 5 Agenda 1 Introduction to VMware Continuent 2 Understanding VMware Continuent Replication 3 Using Analytics and Data Warehouses 4 Wrap-up and Questions
  6. 6. VMware Continuent implements flexible, high- performance replication for Oracle and MySQL 6 Replicator mySQL DBMS Logs mySQL Replicator THL THL Download transactions via network or from file system Apply using JDBC (Transactions + metadata) (Transactions + metadata) Primary Secondary Source Target Low latency transfer Low application impact
  7. 7. VMware Continuent captures transactions directly from Oracle REDO logs 7 Replicator mySQL REDO Logs mySQL THL (Transactions + metadata) Primary (To secondary) Capture data dictionary Source Capture raw transactions Staging area for REDO log data Replicator HostOracle DBMS Host Convert to serialized row changes and DDL
  8. 8. Low-impact, high performance •  Source Oracle DBMS requirements: – Supplemental logging – Archive logs – Replicator metadata stored in DBMS – Replicator login with access to catalogs and flashback query – local process to read REDO logs •  Target Oracle DBMS requirements: – Replicator metadata stored in DBMS CONFIDENTIAL 8
  9. 9. Transaction Based Replication CONFIDENTIAL 9 Transaction Log (Row changes + Statements) 0 Create table db1.foo 1 Create table db2.foo 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1… 4 Insert into db2.foo values(5,…) 5 Insert into db1.foo values(3,…) 6 Delete from db2.foo where id=5 Source Target
  10. 10. Parallel Apply 10 THL Parallel queue(Transactions + metadata) Target Extract Filter Apply Extract Filter Apply Extract Filter Apply Extract Filter Apply Extract Filter Apply StageStage Stage Replicator Pipeline Source replicator
  11. 11. Parallel Extraction for Provisioning 11 THL (Transactions + metadata) Extract Filter Apply Extract Filter Apply StageStage Replicator Pipeline Source Multi-threaded data extraction using flashback queries
  12. 12. Topologies 12 Replicator Replicator Replicator Fan-in Replicator Replicator Replicator Fan-out
  13. 13. Multiple Targets 13 Replicator Replicator Replicator Replicator Source Other RDBMS versions and OS platforms Other RDBMS types Non-relational DBMS
  14. 14. We can even divide logs into transaction sequences on keys 14 Table=db1.foo, key=1 2 insert into db1. foo values(1, … 3 Update db1.foo where id=1… Table db2.foo, key=5 4 Insert into db2.foo values(5,…) 6 Delete from db2.foo where id=5 Table=db1.foo, key=3 5 Insert into db1.foo values(3,…) Source Target
  15. 15. Ordering transactions around keys enables efficient data warehouse loading 15 Replicator Source DBMS CSV Files CSV Files CSV Files CSV Files Load Script HADOOP CLUSTER Parallel loading Map/Reduce View Generation
  16. 16. 16 Agenda 1 Introduction to VMware Continuent 2 Understanding Continuent Replication 3 Using Analytics and Data Warehouses 4 Wrap-up and Questions
  17. 17. Data Warehouse Integration and Usage is Changing •  Traditional data warehouse usage was based on dump from transactional store, loads into data warehouse •  Data warehouse and analytics were done off historical data loaded •  Data warehouses often use merged data from multiple sources, which was hard to handled •  Data warehouses are now frequently sources as well as targets for data, i.e.: –  Export data to data warehouse –  Analyze data –  Feed summary data back to application to display stats to users 17
  18. 18. Modern Data Warehouse Sequences
  19. 19. How do we cope with that model •  Traditional Extract-Transform-Load (ETL) methods take too long •  Data needs to be replicated into a data warehouse in real-time •  Continuous stream of information •  Replicate everything •  Use data warehouse to provide join and analytics
  20. 20. Data Warehouse Choices •  Oracle •  Hadoop –  General purpose storage platform –  Map Reduce for data processing –  Front-end interfaces for interaction in SQL-like (Hive, HBase, Impala) and non-SQL (Pig, native, Spark) –  JDBC/ODBC Interfaces improving •  Vertica –  Massive cluster-based column store –  SQL and ODBC/JDBC Interface •  Amazon Redshift –  Highly flexible column store –  Easy to deploy
  21. 21. 21 (software formerly known as Tungsten Replicator) is a fast, open source, database replication engine Designed for speed and flexibility Apache V2 license 100% open source, find it on Github VMware Continuent for Replication/Data Warehouses
  22. 22. 22 Transactional Store Data Warehouse Dump/Provision Transactions? X Batch The Data Warehouse Impedance Mismatch
  23. 23. Transactional and Data Warehouse Metadata •  Replicating data is not just about the data •  Table structures must be replicated too •  ddlscan handles the translation –  Migrates an existing MySQL or Oracle schema into the target schema –  Template based –  Handles underlying data type matches –  Needs to be executed before replication starts
  24. 24. Replicating into Vertica Replicator Replicator CSV JS JDBC cpimport staging base merge
  25. 25. Replicating into Redshift Replicator Replicator CSV JS JDBC s3cmd staging base merge COPY
  26. 26. Replicating into Hadoop Replicator Replicator CSV JS hadoop fs
  27. 27. Initial Materialization within Hadoop load-reduce-check Migrate staging/base DDL Hive materialization CSV StagingTable Base Table
  28. 28. Ongoing Materialization within Hadoop materialize Hive materialization CSV StagingTable Base Table
  29. 29. Comparing Loading Methods for Hadoop Manual via CSV Sqoop Tungsten Replicator Process Manual/Scripted Manual/Scripted Fully Automated Incremental Loading Possible with DDL changes Requires DDL changes Fully Supported Latency Full-load Intermittent Real-time Extraction Requirements Full table scan Full and partial table scans Low-impact CDC/ binlog scan
  30. 30. Sqoop and Materialization within Hadoop Hive materialization CSV StagingTable Base Table Sqoop Replicate
  31. 31. 31 Op Seqno ID Msg I 1 1 Hello World! I 2 2 Meet MC D 3 1 I 3 1 Goodbye World Op Seq no ID Msg I 2 2 Meet MC I 3 1 Goodbye World How the Materialization Works
  32. 32. 32 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 3 0 3 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 3 9 4 0 4 1 4 2 4 3 4 4 4 5 Monday Wednesday Friday Data Warehouse Possibilities: Point in Time Tables
  33. 33. 33 Op Seqn o ID Date Msg I 1 1 1/6/14 Hello World! I 2 2 2/6/14 Meet MC I 3 1 2/6/14 Goodbye World I 4 1 3/6/14 Hello Tuesday I 4 2 3/6/14 Ruby Wednesday I 5 1 4/6/14 Final Count ID Date Msg 1 1/6/14 Hello World! 1 2/6/14 Goodbye World 1 3/6/14 Hello Tuesday 1 4/6/14 Final Count Data Warehouse Possibilities: Time Series Generation
  34. 34. 34 Agenda 1 Introduction to VMware Continuent 2 Understanding Continuent Replication 3 Using Analytics and Data Warehouses 4 Wrap-up and Questions
  35. 35. Wrap-up •  VMware Continuent Replication provides robust, flexible capabilities that have been battle-tested in demanding customer environments •  Replication features compare favorably to Oracle GoldenGate and Data Guard •  VMware Continuent handles HA/DR, data warehouse loading, and edge application use cases 35
  36. 36. For more information, contact us: Robert Noyes Alliance Manager, AMER & LATAM rnoyes@vmware.com +1 (650) 575-0958 Philippe Bernard Alliance Manager, EMEA & APJ pbernard@vmware.com +41 79 347 1385 MC Brown Senior Product Line Manager mcb@vmware.com Eero Teerikorpi Sr. Director, Strategic Alliance eteerikorpi@vmware.com +1 (408) 431-3305 www.vmware.com/products/continuent

×