Oracle GoldenGate 12c is well known for its highly performant data replication between relational databases. With the GoldenGate Adaptors, the tool can now apply the source transactions to a Big Data target, such as HDFS. In this session, we'll explore the different options for utilizing Oracle GoldenGate 12c to perform real-time data replication from a relational source database into HDFS. The GoldenGate Adaptors will be used to load movie data from the source to HDFS for use by Hive. Next, we'll take the demo a step further and publish the source transactions to a Flume agent, allowing Flume to handle the final load into the targets.
Presented at the Oracle Technology Network Virtual Technology Summit February/March 2015.
3. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Introduction
•Michael Rainey – Principal Consultant – Rittman Mead
‣Oracle Data Integration expert and Oracle ACE
‣GoldenGate and Oracle Data Integrator
•Rittman Mead
‣Provide consulting, training, and managed
services worldwide
‣Focus on business intelligence, data
integration, and advanced analytics
‣Rittman Mead India recently named Oracle
Analytics Partner of the Year
@mRainey
4. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Agenda
•Why Oracle Data Integration for Big Data?
•Review the technologies
‣Oracle GoldenGate 12c and GoldenGate 12c Adapters
‣Hadoop, HDFS, and Hive
‣Sqoop and Flume
•Big Data Lite VM introduction
•Demonstrations
‣Initial load from MySQL to Hadoop
‣Real-time replication using GoldenGate 12c direct to
Hadoop
‣Real-time replication using GoldenGate 12c and Flume
5. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Data Integration 12c
•High performance data integration
•Real-time data replication
•End-to-end integrated with simplified
deployment
•Unified tooling for both structured data
sources and Hadoop / NoSQL
•Flexible deployment on-premise or in
the cloud for heterogeneous systems
•Data governance and full metadata
management
Real-time data integration, data management for Cloud and Big Data
Big Data
Cloud
Apps
Database
Oracle Data IntegratorOracle Data IntegratorOracle Data IntegratorOracle Data Integrator
Oracle GoldenGateOracle GoldenGateOracle GoldenGateOracle GoldenGate
Oracle Enterprise DataOracle Enterprise Data
QualityQuality
Oracle Enterprise DataOracle Enterprise Data
QualityQuality
Oracle Data ServicesOracle Data Services
IntegratorIntegrator
Oracle Data ServicesOracle Data Services
IntegratorIntegrator
Oracle EnterpriseOracle Enterprise
Metadata ManagementMetadata Management
Oracle EnterpriseOracle Enterprise
Metadata ManagementMetadata Management
6. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle Data Integration with Big Data
•Big Data Adapters
‣Natively connect to Hadoop
‣Produce native code to execute
on big data source or target
•Utilize Oracle Data Integration
capabilities
‣“Design once, run anywhere”
‣High performance replication,
heterogeneous sources/targets
data quality checks, etc.
‣Easy to extend and customize
7. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle GoldenGate 12c
8. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Hadoop, HDFS and Hive
•Hadoop is the framework for storing large amounts of data and processing it in an efficient
and faster manner
‣Storage: Hadoop Distributed File System (HDFS)
‣Processing: MapReduce
•HDFS – stores data as large files across multiple systems, using redundancy for reliability
‣Master/slave architecture – namenodes and datanodes
‣Data replicated to multiple datanodes, with namenode tracking location
•Hive – data warehouse infrastructure for analysis of large datasets in HDFS
‣HiveQL – SQL-like language for querying data
‣Transparently converts queries to MapReduce
9. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Loading Relational Data with Sqoop
•Sqoop - short for “SQL to Hadoop”
‣Import whole tables, or whole schemas, from relational databases into HDFS
‣Export data from HDFS back out to these databases – with the export and import being
performed through MapReduce jobs
‣Import using a SQL SELECT statement, rather than grabbing whole tables
‣Incremental loads, specifying a key column to determine what to exclude
‣Load directly into Hive tables, creating HDFS files in the background and the Hive
metadata automatically
•Sqoop User Guide: http://archive.cloudera.com/cdh4/cdh/4/sqoop/SqoopUserGuide.html
10. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Moving Data with Flume
•Reliable, distributed approach to collecting, aggregating, & moving large amounts of log data
•Installed as Java agents that run on source and target
‣Source – listens for and consumes incoming events (transactions)
‣Channel – where events are queued and staged
‣Sink – processes that write transactions to disk
•Transactions can be distributed to multiple targets or fed from many sources to one target
11. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Oracle GoldenGate 12c Application Adapters
•GoldenGate Hive Adapter
‣Integrating OGG Adapter with
Hive (Doc ID 1586188.1)
•GoldenGate Flume Adapter
‣Integrating OGG Adapter with
Flume (Doc ID 1926867.1)
12. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Big Data Lite VM and Hands-On Labs
•Big Data Lite 4.0 VM
‣Fully integrated Oracle Big Data environment
‣Similar to Big Data Appliance setup
‣Technologies include ODI, GoldenGate, Big Data
Connectors, Hadoop, Flume, Sqoop, Hive, etc.
•Hands-on labs provide step-by-step instructions
‣Demo environment built-in to Big Data Lite VM
•Big Data Lite VM:
http://www.oracle.com/technetwork/database/
bigdata-appliance/oracle-bigdatalite-2104726.html
•Hands-on Labs:
http://www.oracle.com/webfolder/technetwork/tutorials/obe/fmw/odi/odi_12c/DI_BDL_Guide/
BigDataIntegration_Demo.html
13. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Getting Started with the Big Data Lite VM
•Scenario: Oracle MoviePlex is an online
movie streaming company with web logs
and MySQL database sources
•Goal: Move data into Hadoop, perform
integration, then distribute to Oracle Exadata
data warehouse for further processing
14. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Tame Big Data with Oracle Data Integration: Part 1
•Real-time data replication from MySQL
relational database to Hadoop
‣Initial load using Sqoop and Oracle Data
Integrator 12c
‣Change Data Capture using GoldenGate
12c and GoldenGate Adapters
(Hive and Flume)
15. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Initial Load using Sqoop and ODI 12c
•Mapping between source MySQL table “MOVIE” and target Hive table “movie”
‣Data server connections setup for each technology
‣Tables reverse engineered as Datastores in ODI Models
•Mapping uses “IKM SQL to Hive-HBase-File (SQOOP)” Knowledge Module to load the Hive
table via Sqoop
‣Creates Sqoop option file and launches Sqoop import
‣Optional loads into Hive staging table
‣Loads rows into target table using HiveQL insert overwrite
16. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Demonstration
Initial Load via Sqoop
17. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Real-time Load using Oracle GoldenGate 12c to Hive
•Adapter developed using Oracle GoldenGate’s Java API and Hadoop HDFS Java API
‣Writes trail data to target Hive tables
‣Example “SampleHandler.java” found on My Oracle Support
•Properties file must be in same directory as parameter files
‣Contains information about necessary JAR files, target Hive tables, logging parameters, etc
•Another example:
‣http://www.rittmanmead.com/2014/09/using-oracle-goldengate-for-trickle-feeding-rdbms-
transactions-into-hive-and-hdfs/
18. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Demonstration
Load MySQL to HDFS using
GoldenGate 12c
19. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Another Real-time Load Option – using GoldenGate and Flume
•Adapter developed using Oracle GoldenGate’s Java API and Apache Flume Client API
‣Transactions delivered to RpcClient API
‣Example “SampleHandlerFlume.java” found on My Oracle Support
•Flume configuration file (flume.conf)
‣Flume agent listens for RPC calls on the source, stages transactions in memory, and
delivers data to HDFS
‣Flume agent must be started
•Properties file must be in same directory as parameter files
20. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Demonstration
Load MySQL to HDFS using
GoldenGate 12c and Flume
21. T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or
+61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)
E : info@rittmanmead.com
W : www.rittmanmead.com
Summary
•Oracle Data Integration and Big Data are a great match
‣GoldenGate 12c Adapters
‣ODI 12c Big Data Connectors
‣“Design once, run anywhere” fits for Big Data sources and targets
•Big Data Lite VM and Hands On labs
‣Get started with Oracle Data Integration and Big Data
‣Learn by example
•GoldenGate 12c can load Hadoop effectively
‣Java API examples on MOS are great for getting started
•Further information:
‣http://www.rittmanmead.com/blog
‣https://blogs.oracle.com/dataintegration
Session length is 60 minutes. We suggest making a comment that encourages attendees to start asking questions throughout the presentation on this slide.
Why Oracle DI tools?
Why ODI/OGG with Big Data?
What is goldengate? How does it work with Hadoop… ogg connectors.
Hadoop, through MapReduce, breaks processing down into simple stages
Map : select the columns and values you’re interested in, pass through as key/value pairs
Reduce : aggregate the results
Most ETL jobs can be broken down into filtering, projecting and aggregating
Hadoop then automatically runs job on cluster
Share-nothing small chunks of work
Run the job on the node where the data is
Handle faults etc
Gather the results back in
Hive for querying data
Sqoop – how does it work?
Why sqoop vs other tools?
What is goldengate? How does it work with Hadoop… ogg connectors.
OGG Adapters
Examples?
Big data lite vm architecture – hol process.
Describe getting started…with the VM
Describe getting started…with the VM
What’s the goal? Initial load via sqoop.
How? ODI big data connecters – for sqoop..
How does it work?
Describe the OGG setup
Demo ogg
Show dir
Show mgr
Show obey file,
Execute obey file
Show prm files
run inserts on source (add new rows)…
Show data on target
Describe flume. Why flume?
Using ogg w/flume
- what’s different?
Demo ogg and flume
Summary
This slide should be displayed 2-3 munutes before your 1 hour session ends. In your voiceover please tell viewers that the console will close, so they should move on to the next session.