More Related Content Similar to Intelligent Integration OOW2017 - Jeff Pollock (20) More from Jeffrey T. Pollock (13) Intelligent Integration OOW2017 - Jeff Pollock1. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
OpenWorld 2017
Intelligent Integration and Governance
Next-Gen Enterprise Data Management
Greg Pavlik
Senior Vice President & CTO of PaaS
Jeff Pollock
Vice President, Product Management
Confidential – Oracle Internal/Restricted/Highly Restricted
2. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Data Management is going through a
major transformation…
3. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential
The future is already here; it’s just not very
evenly distributed. - Gibson
Enterprise data architecture
is a hub-and-spoke Kimball-
style solution
Big data technologies are
mostly for analytic use cases
that demand the 3 V’s
IT will provide the single
source of truth of the data
TRADITIONALAPPROACH
EMERGINGSOLUTION
Data Streams in Motion
Big Data is for Analytic and
Operational IT Use Cases for
Any Type of Data
Trusted Data means
Raw Data
4. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Discovery
RESTful API for Producers and Subscribers (events are pushed)
Raw Data
Topics
Schema
Event Topics
Data
Pipeline
(ETL)
Prepared
Data Topics
Master Data
Topics
Data
Pipeline
(ETL)
1,000’s 100’s 10’s
Oracle Open World 2015 4
App
DB
App
DB
App
DB
ERP
Operational
Data Store
EDW
Staging Prod
ETL
ETL
ETL
ETL
ETL
Mart
Mart
Mart
ETL
Enterprise BI
Mart
Mart
Mart
ETL
Departmental BIDiscovery
App
DB
App
DB
App
DB
ERP
WebApps
Mobile
EDW
NoSQL
Hadoop / Spark
Marts Marts
Less Governed --------------------------------------------------------------- More Governed
Enterprise BI
Departmental BI
Apps / Mobile
Classical Data Management: Hub and Spoke
• Invasive on Sources
• High Latency / SLA
• Mainly Relational Views
• Heavy IT process overhead
• Vendor-centric software
Next-Gen: Streaming Databus/Kappa
• Low impact on Sources
• Low Latency (< 1 second)
• Variety of Data Formats
• More Agile DevOps processes
• Open source centric software
GoldenGate
MDM
Hub
After 20yrs reign… hub-and-spoke is now a legacy
• ODS & ETL Hubs
• EDW/Mart Hubs
• MDM/RDM Hubs
• Static Data Lake Hubs
• Pub/Sub for Staging
• ETL in Pipelines
• Analytics/CEP in Stream
• Data is in Motion
NoSQL / APIs
LEGACY:
NEXT-GEN:
Less Governed ---------------------------------------------------- More Governed
Physical Layer for ETL Pipelines = MPP Streaming (eg; Apache Spark Streaming)
Physical Layer for Events = MPP Messaging (eg; Apache Kafka)
5. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Open World 2015 5
In the beginning there was MapReduce …and it was slow
• Original purpose for workloads on
peta-scale search indices
• Rise of Enterprise use cases, focus
shifted to Analytic Data
• 1st gen of big data in enterprise
reused old approach (hub-and-
spoke) using HDFS with MR2
• Operational Applications can use big
data for low-latency use cases
• Streaming for event correlation and
IoT style integrations (real-time)
• Analytics for traditional use cases
• Cloud/Object Storage is “the lake”
Next-Gen: Kappa is the Platform
Multi-
Structured
Data
Structured
Data
Applications
Analytics
Reporting
Batch Layer
Speed Layer
Raw Data
Layer
Serving
Layer
6. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Open World 2015 6
Trusted data used to mean …single source of truth
Extract Cleanse Stage
Summarize Create Views
Hello, is
this the IT
team? Yes, I need a
single source
of truth for
my data
Um, 2
years?
• IT heavy waterfall delivery cycle
creates drag on the business
• Over-optimizing data distorts truth
and diminishes fidelity of data
• Data is like food, trust it more
when it is minimally processed
• Raw Data accessible to business
• Subscriptions to data objects
• API Access REST API vs. SQL
• Agile DevOps process to deliver
• Schema/less can handle all data
Next-Gen: Sushi Principle of Data
1990’s
7. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Data
Staging
or Archive
Data Discovery
ETL Offload
Batch Layer
Oracle Confidential 7
Business
Data
Analytics
EDWs
Data Streams
Social and Logs
Enterprise Data
Highly Available
Databases
Databus
(topic modeling)
Stream Analytics
ETL Data Pipelines
Speed Layer
Our Vision is to enable the modern ‘Kappa style' data architecture for Enterprise Strength solutions
• Raw Data Layer common ingestion point for all enterprise data sources
• Speed Layer data processing for streaming data and ETL data pipelines, in-memory
• Batch Layer data processing for huge data volumes, that may span long time periods, using MPP
• Serving Layer technologies for easy access to any data, at any latency
Raw Data
Layer
Raw Events
Changed Data
Schema Events
Core Design Pattern: Kappa-style Databus
Pub / Sub
REST APIs
NoSQL
Bulk Data
Serving
Layer
Apps
8. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 8
Business
Data
Serving
Layer
Apps
Analytics
EDWs
Batch Layer
Data Streams
Social and Logs
Enterprise Data
Highly Available
Databases
Analytics
Speed Layer
Pub / Sub
REST APIs
NoSQL
Bulk Data
Raw Data
Layer
Oracle Approach: Blend of Commercial + Open Source
Modern Architecture will be a ‘Hybrid Open-Source’ pattern:
• Open Source at the core of speed and batch processing engines for general purpose data workloads
• Enterprise Vendors for connecting to legacy systems, strong governance, and for highly optimized workloads
• Cloud Platforms for Dev-Test (at least), rapid prototyping and eventually all production workloads
• SaaS & Applications are key data “producers” and will remain largely proprietary and/or highly customized
9. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential
Proof this is a Pattern: Many Instantiations
Kafka
Storm | Spark
| Apex | Flink
MapReduce | Pig
| Hive | Spark
Cassandra
| HBase
Hive
EventHubs
Stream Analytics
Data Lake
Table
Storage
SQL Server
Data
Factory
Kinesis
Firehose
EMR
Dynamo
Redshift
DMS
Pub/Sub
Dataflow
Dataproc
BigTable
BigQuery
10. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 10
Business
Data
Serving
Layer
Apps
Analytics
EDWs
Batch Layer
Data Streams
Social and Logs
Enterprise Data
Highly Available
Databases
Analytics
Speed Layer
Pub / Sub
REST APIs
NoSQL
Bulk Data
Raw Data
Layer
Best-of-Breed: Oracle Platform for Kappa-style Architecture
Oracle Software can help customers Accelerate & Reduce Risk around adoption:
• Ingest Data with lower latency, greater reliability and from any database using Oracle GoldenGate
• ETP Pipelines for Data automate pipeline creation with zero-footprint using Oracle Data Integrator
• Analyze Data In-Motion run temporal, spatial and predictive algorithms with Oracle Stream Analytics
• Foundation Services for hosting Kafka (Event Hub) Spark/Hadoop (Big Data Cloud) or Relational (Database)
• Govern the data flowing through Kappa architecture with Oracle Metadata Management
GoldenGate
Data Integrator
Stream Analytics
Event Hub Big Data Database
Metadata Management (for Data Governance)
Cassandra
11. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential
Oracle Big Data Platform Differentiators
• Industrial strength data ingest
• Streaming data pipelines
• ETL pipelines
• Analytic pipelines
• World class infrastructure
• Event hub cloud – Kafka
• Big data cloud – Spark
• Enterprise data governance
• Compare to sqoop / ETL tools
• Compare to old-style:
• Batch ETL processing
• Immature open-source
• Compare to:
• 1st Gen Cloud – AWS etc.
• Proprietary services
• Compare to Hadoop only
Complete
Simplified
Open
TECHNICALDIFFERENTIATION
12. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Raw
Data
Layer
Apps Layer Speed Layer
Batch Layer
Oracle Confidential 12
Industrial Strength Data Ingest: GoldenGate + Kappa
Streaming Analytics
Application
Serving
Layer
REST
Services
Visualization
Tools
Reporting
Tools
Data Marts
Capture
Trail
Route
Deliver
Pump
GG GG
User
Updates
DBMS
Updates
GoldenGate
for Big Data
Supported
Platforms
Kafka
HDFS
Fastest, most scalable and non-invasive way to ingest data into Apache. Benefits of
low-impact on Sources, micro-second access to transactions and ability to replicate
schema (DDL) events for downstream automation of change impact.
GG used with 4 of top 5 largest Kafka clusters in the world…
From user update
to serving layer in
<1 second & no
impact on Source
13. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 13
De-Coupling of the Database: Downstream Processing
Mid-Tier HostPrimary Site
Primary
Secondary Log Mine
GoldenGate
Capture
Trail
Route
Deliver
Pump
Business Apps
Active
DataGuard
WAN
REDO
Transport
Remote DR HostPrimary Site
Primary
Secondary
Remote
Standby
GoldenGate
Capture
Trail
Route
Deliver
Pump
Business Apps
AlwaysOn
WAN
AlwaysOn
Primary Site
DB2/z
Business Apps
LPAR / Linux Host
SP GoldenGate
Capture
Trail
Route
Deliver
Pump
OLD Approach: “Just the Facts!”
• On-host installation of Change Data
Capture (CDC) tools
• For typical DW use cases, only partial
data sets were replicated
• Focus was on summary data in the
analytic model, not the detail data
NEW Approach: “All the Data!”
• Mid-tier installation of replication
tools is necessary (non-invasive)
• Focus is now on moving all the data
into the data lake/data stream
• Raw data means detail data
• New use cases demand access to
more expansive (raw) data sets Primary Site
MySQL
Business Apps
Linux Host
GoldenGate
Capture
Trail
Route
Deliver
Pump
14. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Raw
Data
Layer
Speed Layer
Batch Layer
Oracle Confidential 14
ETL Pipelines with Data Integrator
Streaming ETL
Serving
Layer
REST
Services
Visualization
Tools
Reporting
Tools
Data Marts
Oracle Data Integrator
Capture
Trail
Route
Deliver
Pump
GG
SQOOP
API/File
SQOOP
+ Native Loaders
Data Integrator for Big Data
✓ Batch data ingestion with Sqoop,
native loaders & Oozie
✓ Generate data transformations in
Hive, Pig, Spark & Spark
Streaming
✓ Extract data into external DBs,
Files or Cloud
Compare to Informatica / Talend
✓ NoETL Engine native E-LT
execution, 1000’s of references
✓ Zero Footprint does not require
any Oracle install on cluster
✓ Loosely Coupled design time
means you can reuse mapping
logic in many big data languages
ETL
ETL
15. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Raw
Data
Layer
Speed Layer
Batch Layer
Streaming Analytics
Serving
Layer
REST
Services
Visualization
Tools
Reporting
Tools
Data Marts
Oracle Confidential 15
Analytic Pipelines with Stream Analytics
Capture
Trail
Route
Deliver
Pump
GoldenGate
CQL
Oracle Stream AnalyticsEvents &
Cloud Apps
Business
Dashboard
Native GoldenGate
Stream Type
Analytics
EDWs
AppsML
Geo
16. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 16
World-class Infrastructure for Big Data
• Elastic workload fabric to lower TCO
and right-size cluster based on
workload/SLA requirements
• A seamless storage fabric across
Cloud Storage, Block Storage, Local
NVMe and Memory for your Big Data
Clusters
• A broad choice of Storage engines
including relational databases (Oracle,
MySQL), NoSQL Databases
(Cassandra, HBase, Oracle NoSQL)
• REST or Native APIs deliver an
extremely low latency for real-time
processing needs.
• Start with a single partition and scale
up to hundreds of them as your
needs grow.
• Complexity of managing Kafka is done
by Oracle.
• Node-Failure and Cluster-Failure
Tolerance to ensures continuous
availability
• Big Data Cloud Machine for PaaS-like
experience in customer data center
• Same benefits and user
experience as Oracle PaaS
• Data stays onsite with customer
• Big Data Appliance for customer-
owned solutions.
• Customer owns full solution
• Very low startup costs
• Highly optimized platform
Oracle Event Hub Cloud Oracle Big Data Cloud
Oracle Bare Metal Cloud
17. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential
Kappa Data Flow Pattern using Oracle Tech Stack
GoldenGate
Raw Data (LCR)
Schema Events
(DDL)
Prepared Data Topics
Master Data
ETL ETL
1 Topic : 1 Table
Data Consumers
<subscribe>
Applications
Streaming Analytics
ODS (Data Store)
Big Data Lake
Data Warehouses
CQL & Spatial
Analytic Data
Oracle Event Hub
DBMS
Updates
Data Producers
Entire Enterprise
Database Estate
Stream Analytics
Data Integrator
Dev / Test Env.
Oracle Big Data
<generate>
<generate>
APIManagement
18. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
All Enterprise
Data Sources
Oracle Confidential 18
Data Views from a Pipeline: Subscribe to Canonical Topics
Poly-
Structured
Relational
Data
Science
Data
Analysts
Business
Analyst
DBAs
RAW
DATA
PREPARED
DATA
MASTER
DATA
SCHEMA
EVENTS
ETL ETL
<subscribe>
<subscribe>
<subscribe>
<subscribe>
<produce>
<produce>
<produce>
19. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Automate & Transform
• Intuit Migrated 100’s of DBs to
Cloud and Loads TB’s of data into
Streaming data fabric (Kafka)
• Tesla replicates databases and
links their SaaS Applications to
Enterprise Data Warehouse
Stream & Replicate
• SFDC trickle Feeds Data to its
corporate Data Warehouse
ensures 24x7x365 availability
• Starbucks Streams Data from POS
Systems into DW and Data Lake
• E-Bay streams 100B transactions
per day into private cloud
Cleanse & Govern
• Allianz harvests from non-Oracle
tools into enterprise data catalog
for complete data lineage
• Cummins operates global
Governance council with data
quality and metadata tools
>15,000 Oracle Data Integration Customers Worldwide
21. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 21
NEW: Oracle GoldenGate 12.3 is Available
New GoldenGate Micro-Services Architecture
REST interfaces for Configuration, Administration and Monitoring with included HTML5 Web applications
Deploy at cloud scale with fully secure HTTPS interfaces and Secure Web Sockets for streaming data
Oracle Database Sharding Support
Embedded & fully automated active-active replication within and across Database Shards
Automatic Conflict Detection and Resolution (CDR)
Conflict detection and resolution built directly into Oracle Database Kernel
Heterogeneous Enhancements
Generic JDBC Support. MySQL DDL replication. Remote capture for z/OS & SQL Server
Cloud – Data Integration
Integrated UI for Design, Development, Management and Monitoring for GoldenGate Cloud Service
Parallel Apply
Highly scalable client side apply engine with gains of over 5x throughput on Oracle
Enhanced Big Data, Kafka and NoSQL Support
Performance improvements, Adapters for Oracle NoSQL, MongoDB. Support for Hive Metadata
Expanded Oracle Database 12.2 Support
Long identifiers, Expanded SCN, Local Undo for PDBs, Top-level VARRAYs, REFs.
Procedural replication of Advanced Queues, Virtual Private Database, Online Redefinition, ….
22. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 22Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 22
NEW: Oracle Data Integration Platform
Integrate Cloud and On Premise Data Lakes and Data Warehouses
…a Unified solution …that’s Easy to use …for Powerful data-driven solutions
Key Capabilities
1. Data High Availability
2. Data Migrations
3. Data Warehouse
Automation
4. Databus & Stream
Integration
5. Data Governance
23. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 23Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 23
Applications Infrastructure Analytics
Integration for… Integration for… Integration for…
Unified Technology Platform (PaaS)
Application
Integration
API
Management
Process
Integration
Stream
Processing
Data
Replication
Bulk Data
ETL & E-LT
Metadata
Management
Data
Quality
New: Unified Integration Capabilities
Converged Solution for All Integration Needs
24. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |10/9/2017 24Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Complete
Simplified
Open
DATA
GOVERNANCE
PROCESS
AUTOMATION
STREAM
ANALYTICS
API
MANAGEMENT
APPLICATION
INTEGRATION
DATA
QUALITY
BULK DATA
TRANSFORMATION
REAL TIME DATA
STREAMING AND DATA
REPLICATION
Oracle Cloud Platform for Integration
24