Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Streaming with Oracle Data Integration

717 views

Published on

As a data integration professional, it’s almost a guarantee that you’ve heard of real-time stream processing of Big Data. The usual players in the open source world are Apache Kafka, used to move data in real-time, and Spark Streaming, built for in-flight transformations. But what about relational data? Quite often we forget that products incubated in the Apache Foundation can also serve a purpose for “standard” relational databases as well. But how? Well, let’s introduce Oracle GoldenGate and Oracle Data Integrator for Big Data. GoldenGate can extract relational data in real time and produce Kafka messages, ensuring relational data is a part of the enterprise data bus. These messages can then be ingested via ODI through a Spark Streaming process, integrating with additional data sources, such as other relational tables, flat files, etc, as needed. Finally, the output can be sent to multiple locations: on through to a data warehouse for analytical reporting, back to Kafka for additional targets to consume, or any number of targets. Attendees will walk away with a framework on which they can build their data streaming projects, combining relational data with big data and using a common, structured approach via the Oracle Data Integration product stack.
Presented at BIWA Summit 2017.

Published in: Software

Streaming with Oracle Data Integration

  1. 1. Streaming Transformations Using Oracle Data Integration Michael Rainey | BIWA Summit 2017
  2. 2. • Michael Rainey - Technical Advisor • Spreading the good word about Gluent 
 products with the world • Oracle Data Integration expertise • Oracle ACE Director • mRainey.co 2 Introduction we liberate enterprise data
  3. 3. What is “Streaming”
  4. 4. • The processing and analysis of structured or “unstructured” data in real-time • Why Streaming? • When speed (velocity) of data is key • Streaming data is processed in “time windows”, in memory, across a cluster of servers • Examples: • Calculating a retail buying opportunity • Real-time cost calculations • IoT data analysis 4 What is “Streaming”
  5. 5. “Publish-subscribe messaging rethought as a distributed commit log” 5 Streaming data - Apache Kafka Image source: kafka.apache.org/
  6. 6. Enterprise Data Bus 6
  7. 7. Enterprise Data Bus 6
  8. 8. • Scalable, fault-tolerant, high-throughput stream processing • Spark Streaming receives live input data streams from various sources • Continuous stream of data is known as a discretized stream or DStream • Data is divided into mini-batches and processed by the Spark engine • Operations such as join, filter, map, count, windowed computations, etc are used to transform data in-flight 7 Stream processing - Apache Spark
  9. 9. Why Oracle Data Integration?
  10. 10. • Enterprise has invested heavily in ODI and/or GoldenGate • Getting started with development languages (Python/pySpark, Java, etc) • Centralized metadata management • Integrate with other data sources using a single interface • Realized cost savings • According to Gartner, 200% increase in maintenance costs when custom coding (https://www.gartner.com/doc/3432617/does-customcoded-data-integration-stack) 9 Why Oracle Data Integration?
  11. 11. 10 Streaming with Oracle Data Integration
  12. 12. 10 Streaming with Oracle Data Integration Real-time data replication Streaming integration: OGG -> Kafka Streaming integration: Kafka -> Spark Streaming
  13. 13. 11 Relational database transactions to Kafka
  14. 14. • GoldenGate • …is non-invasive • …has checkpoints for recovery • …moves data quickly • …is easy to setup 12 Why GoldenGate with Kafka?
  15. 15. • Heterogeneous sources and targets • Built to integrate all data • Flexibility • Reusable code templates 
 (Knowledge Modules) • Reusable Mappings • ODI can adapt to your data warehouse - and not the other way around • Flow based mappings 13 Why Oracle Data Integrator with Spark Streaming?
  16. 16. Getting started with streaming using Oracle Data Integration
  17. 17. • Standard GoldenGate Extract / Pump processes to capture RDBMS data • Replicat for Java parameter file & process group created and setup • Kakfa Producer properties and Kafka Handler configuration setup 15 Oracle GoldenGate for Big Data - Kafka Handler Setup
  18. 18. • Kafka handler properties • Set properties for how GoldenGate interacts with Kafka • Format, transaction vs operation mode, etc • Kafka producer configuration 16 GoldenGate for Kafka setup http://mrainey.co/ogg-kafka-oow
  19. 19. 17 Kafka and Oracle Data Integrator setup
  20. 20. 17 Kafka and Oracle Data Integrator setup
  21. 21. • Create Model using Kafka Logical Schema • Create Datastore • Similar to standard “File” 
 datastore, define file format and 
 setup columns • Only support for CSV • Future formats may include JSON, Avro, etc • Add Datastore to mapping 18 Kafka and Oracle Data Integrator
  22. 22. • Create Spark Data Server, Physical / Logical Schema • Set Hadoop Data Server • Add properties, such as checkpointing, asynchronous execution mode, etc • Additional properties can be added:
 http://spark.apache.org/docs/latest/configuration.html • Spark Server is setup as Staging location • Source Datastore from Kafka, Oracle DB, etc • Target Datastore is Cassandra, Oracle DB, etc • Code generated by KM is pySpark • pySpark code can be added to filters, joins, other components for transformations • Additional languages (Scala, Java) may be coming soon 19 Spark Streaming and Oracle Data Integrator
  23. 23. 20 Spark Streaming and Oracle Data Integrator Enable the Streaming flag in the Physical design of a mapping. To generate Spark code, set the Execute On Hint option to use the Spark data server as the staging location for your mapping Target IKM should not be set. Spark generated code will handle integration and load into target.
  24. 24. 21 Tracking the process When executing, the process will run continuously in the ODI Operator. If the connection between the ODI Agent and Spark Agent is lost, it will reestablish itself after recovery.
  25. 25. • Streaming is the “velocity” in data. AKA “Fast Data” • Oracle Data Integrator and Oracle GoldenGate provide a framework for development and management of data streaming processes • Big Data add-ons continue to support new technologies • Build a streaming architecture using GoldenGate and ODI: • Metadata management • Integration of RDBMS data with “schema on read” data • Build upon the skills in-house 22 Recap
  26. 26. 23 we liberate enterprise data thank you!

×