Streaming with Oracle Data Integration

Streaming Transformations Using Oracle Data Integration
Michael Rainey | BIWA Summit 2017

• Michael Rainey - Technical Advisor
• Spreading the good word about Gluent  
products with the world
• Oracle Data Integration expertise
• Oracle ACE Director
• mRainey.co
2
Introduction
we liberate enterprise data

• The processing and analysis of structured or “unstructured” data in real-time
• Why Streaming?
• When speed (velocity) of data is key
• Streaming data is processed in “time windows”, in memory, across a cluster of servers
• Examples:
• Calculating a retail buying opportunity
• Real-time cost calculations
• IoT data analysis
4
What is “Streaming”

“Publish-subscribe messaging rethought as a distributed commit log”
5
Streaming data - Apache Kafka
Image source: kafka.apache.org/

• Scalable, fault-tolerant, high-throughput stream processing
• Spark Streaming receives live input data streams from various sources
• Continuous stream of data is known as a discretized stream or DStream
• Data is divided into mini-batches and processed by the Spark engine
• Operations such as join, filter, map, count, windowed computations, etc are used to
transform data in-flight
7
Stream processing - Apache Spark

• Enterprise has invested heavily in ODI and/or GoldenGate
• Getting started with development languages (Python/pySpark, Java, etc)
• Centralized metadata management
• Integrate with other data sources using a single interface
• Realized cost savings
• According to Gartner, 200% increase in maintenance costs when custom coding
(https://www.gartner.com/doc/3432617/does-customcoded-data-integration-stack)
9
Why Oracle Data Integration?

10
Streaming with Oracle Data Integration

10
Streaming with Oracle Data Integration
Real-time
data
replication
Streaming
integration:
OGG -> Kafka
Streaming integration:
Kafka -> Spark Streaming

11
Relational database transactions to Kafka

• GoldenGate
• …is non-invasive
• …has checkpoints for recovery
• …moves data quickly
• …is easy to setup
12
Why GoldenGate with Kafka?

• Heterogeneous sources and targets
• Built to integrate all data
• Flexibility
• Reusable code templates  
(Knowledge Modules)
• Reusable Mappings
• ODI can adapt to your data warehouse - and not the other way around
• Flow based mappings
13
Why Oracle Data Integrator with Spark Streaming?

Getting started with streaming using
Oracle Data Integration

• Standard GoldenGate Extract / Pump processes to capture RDBMS data
• Replicat for Java parameter file & process group created and setup
• Kakfa Producer properties and Kafka Handler configuration setup
15
Oracle GoldenGate for Big Data - Kafka Handler Setup

• Kafka handler properties
• Set properties for how GoldenGate interacts with Kafka
• Format, transaction vs operation mode, etc
• Kafka producer configuration
16
GoldenGate for Kafka setup
http://mrainey.co/ogg-kafka-oow

17
Kafka and Oracle Data Integrator setup

• Create Model using Kafka Logical Schema
• Create Datastore
• Similar to standard “File”  
datastore, define file format and  
setup columns
• Only support for CSV
• Future formats may include JSON, Avro, etc
• Add Datastore to mapping
18
Kafka and Oracle Data Integrator

• Create Spark Data Server, Physical / Logical Schema
• Set Hadoop Data Server
• Add properties, such as checkpointing, asynchronous execution mode, etc
• Additional properties can be added: 
http://spark.apache.org/docs/latest/configuration.html
• Spark Server is setup as Staging location
• Source Datastore from Kafka, Oracle DB, etc
• Target Datastore is Cassandra, Oracle DB, etc
• Code generated by KM is pySpark
• pySpark code can be added to filters, joins, other components for transformations
• Additional languages (Scala, Java) may be coming soon
19
Spark Streaming and Oracle Data Integrator

20
Spark Streaming and Oracle Data Integrator
Enable the Streaming
flag in the Physical
design of a mapping.
To generate Spark code, set the Execute
On Hint option to use the Spark data
server as the staging location for your
mapping
Target IKM should not be set.
Spark generated code will handle
integration and load into target.

21
Tracking the process
When executing, the process
will run continuously in the
ODI Operator.
If the connection between
the ODI Agent and Spark
Agent is lost, it will
reestablish itself after
recovery.

• Streaming is the “velocity” in data. AKA “Fast Data”
• Oracle Data Integrator and Oracle GoldenGate provide a framework for
development and management of data streaming processes
• Big Data add-ons continue to support new technologies
• Build a streaming architecture using GoldenGate and ODI:
• Metadata management
• Integration of RDBMS data with “schema on read” data
• Build upon the skills in-house
22
Recap

23
we liberate enterprise data
thank you!

Streaming with Oracle Data Integration

More Related Content

What's hot

Viewers also liked

Similar to Streaming with Oracle Data Integration

More from Michael Rainey

Recently uploaded

Streaming with Oracle Data Integration