SlideShare a Scribd company logo
The Future of Data Integration:
Data Mesh, and a Special Deep Dive into Stream Processing with
GoldenGate, Apache Kafka and Apache Spark
O R A C L E D E V E L O P M E N T , M A R - 2 0 2 0
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Today’s Agenda
Strategic IT Forces
GoldenGate Big Data in Brief
Database Change Streams to Apache Kafka
Stream Processing with Apache Spark
Copyright © 2020 Oracle and/or its affiliates.
1
2
3
4
Copyright © 2020 Oracle and/or its affiliates.
During 2020 / 2021 the
world continues to go
through a Paradigm Shift
into a future where “Cyber-
Physical Systems” are the
new normal.
“Digital Transformation”
requires mindset shift:
1. Sharing data is more
effective than accumulating
2. Decentralizing, distributing,
and copying is more
powerful than stockpiling
3. Connectivity and flow of
data is the starting point for
innovation and socializing.
The Problems of Monolithic Data Architecture
Copyright © 2020 Oracle and/or its affiliates.
People Process Technology
• Business units have few incentives
work across org boundaries
• Hyper-specialization in tech teams
narrow the focus on technology;
rather than outcomes or solutions
• Pressure on stakeholders to
produce value, but IT orgs still
mostly built like they were 30yrs ago
• 30yr old conception of data flows
• IT-led and technology constrained
• The monolithic data lake is big and
slow by design, not by accident
• “Ingest -> Process -> Serve” design
is same as old “ETL” data flows, it
institutionalizes the wrong goals
• Storage centric conception of data,
but data is Dynamic, not Static
• “the Lake” is conceived as a physical
place where we pile up data (in
Hadoop or on the Cloud)
• Cheaper storage than an EDW but
much worse Governance
• Does nothing to modernize data
architecture itself
Images: https://martinfowler.com/articles/data-monolith-to-mesh.html
5
Evolution towards Real-Time Data Mesh
Copyright © 2020 Oracle and/or its affiliates.
Industry 3.0: Hub and Spoke Transitional: Kappa Hub Mature: Distributed Kappa
This data pattern, popularized by Ralph
Kimball and Bill Inmon, has been the
foundation for enterprise data
management since 1993.
It is transaction consistent, can scale up
nicely for most use cases, and is based
on SQL, lingua-franca for most tools.
By 2010, the Lambda (big data) pattern
was common. In 2014, Jay Kreps (of
LinkedIn) questioned the Lambda
Architecture and spawned Kappa.
The Kappa principles consider batch
processing as a special case of stream
processing. Use a historized event log
to process both real-time as well as
batch processing.
By 2020, IT infrastructure has
dramatically changed – networking,
containers, cloud, compute, IoT etc
have all pushed data to the edge.
A mature Kappa architecture is not a
single instance “hub” but rather a
distributed mesh of data logs, stream
data processing, change events, and
time series data.
Kappa: https://www.oreilly.com/radar/questioning-the-lambda-architecture/
https://en.wikipedia.org/wiki/Dimensional_modeling
mesh & microservice controls
6
ETL
ETL
ETL
ETL
Lambda: http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
Real-Time Data Mesh for Industry 4.0
Copyright © 2020 Oracle and/or its affiliates.
…from Industry 3.0 …to Industry 4.0
Batch Centric, Schedulers Event Centric, Streams
Mostly Relational Data (aka Views) Polyglot Data (via Logs)
Size for Peak Workloads Elastic, Scale on Demand
Kimball / Inmon Architecture Distributed Kappa
Vendor Specific Open Source Enabled
Simplex Processing is Standard Massively Parallel is Standard
Hubs (EDW, Hadoop, Data Lake) Mesh (Edge, Hybrid, Multi-Cloud)
Governance is “Bolt On” Governance is Embedded
7
Data Mesh Conceptual View – Data Domains
8Copyright © 2020 Oracle and/or its affiliates.
Enterprise Data
Producers:
ERP Apps, DBs,
Middleware etc.
Data Domain
Consumers
People owners of “Data
Products”, collections of
data sets in various
stages of curation
IoT Data
Producers:
Devices & Things
Raw Data
Prepared Data
Canonical Data Data
Domain A
Data
Domain B
f(x)
f(x)
Data
Domain C
Data Mesh
(distributed Kappa, microservices, cloud agnostic)
Domain-Specific Views of Data
Raw Event Consumers
Automated Devices,
Edge Nodes (5G), Scheduled
Routines (eg; ETL etc)
Data Product-Specific
Storage Choices:
• RDBMS
• Data Lake
• Object Store
• Graph, etc.
Raw data, Time Series & Alerting events are pushed
Direct to Database (high fidelity transaction semantics fully preserved)
Consumer-Driven, Event-Centric Data Mesh
Copyright © 2020 Oracle and/or its affiliates.
Enterprise Data
Producers
Detect
Event
Logical
Change
Records
(LCRs)
App
DB
committed!
CDC Replication
Data Domain
Consumers
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Raw
Data
Prepared
Data
Canonical
Data
Raw Data (LCR)
Schema Events
(DDL)
Prepared
Data Topics
“Master”
Data Topics
JSON, XML,
Avro, Parquet,
CSV
Prepared data events are pushed
Canonical data events
Speed &
Fidelity
Trusted
Views
Ease of
Consumption
LCR/TFs
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
Data Mesh puts the consumer
needs first – they require data
at different latency, fidelity,
trust levels and views
Data Model
Object Model
System
Of Record
(SoR)
User
Action
App APIs and
system log events
9
Direct to Database (high fidelity transaction semantics fully preserved)
Distributed by Design, Microservices Based
Copyright © 2020 Oracle and/or its affiliates.
Data Domain
Producers
Detect
Event
Logical
Change
Records
(LCRs)
App
DB
committed!
Data Domain
Consumers
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Data Model
Object Model
System
Of Record
(SoR)
User
Action
CDC Replication
Microservices
Edge
Compute or
Cloud for Raw
Data Events
Prepare
Technical
Data Views
LCRs
Business
Data Views
Raw data, Time Series & Alerting events are pushed
Prepared data events are pushed
Canonical data
Events
(ephemeral or persisted)
Stream
Process
Events
(persisted)
Stream
Process
Events
(persisted)
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
10
Copyright © 2020 Oracle and/or its affiliates.
We are creating this today, a Data Integration
Platform for Industry 4.0 – a Real-Time Data
Mesh solution for everyone:
✓ Data Consumer Driven
(low code, browser-based)
✓ Distributed by Design
(multi-cloud, microservices)
✓ Event-Centric Pipelines
(CDC, replication and
streaming)
✓ Immutable Ledger Based
(fully event history of SoRs
aware)
✓ Polyglot Capable
(works with all data types)
✓ ACID Capable
(preserves DB transaction
semantics)
✓ Governed
(multi-level security,
metadata driven)
✓ Enterprise Class
(best of Open Source and
Commercial S/W together)
11
Single Pane of Glass for Real-Time Data Mesh
Copyright © 2020 Oracle and/or its affiliates.
connect
DB2/z
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
Real-Time Stream
Data Processing
Raw
Data
DBAs &
Data Engineers
Data Owners &
Data Products
12
Data Consumer DrivenEvent Centric Pipelines
Deploys in a Mesh
Across Containers, Public Clouds and 5G Edge Devices
How it Works Today: GoldenGate for Big Data
Copyright © 2020 Oracle and/or its affiliates.
Data Domain
Consumers
Data
Objects
Table
Data
Raw Data
/ Alerts
SQL
Consumers
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
DBAs for HA,
DR and OLTP
BYOS (Bring Your Own Spark)
* distributed, may run on any combination of containers and clouds
13
Data Engineer Data AnalystDBA/GG Ops
Capture Pipeline Analyze DeliverIngest
GoldenGate Microservices Applications Stream Analytics Application
BYOM
(Bring Your
Own
Messaging)
All Data Events
& Transactions
Today’s Agenda
Strategic IT Forces
GoldenGate Big Data in Brief
Database Change Streams to Apache Kafka
Stream Processing with Apache Spark
Copyright © 2020 Oracle and/or its affiliates.
1
2
3
4
GoldenGate Overall: for the Enterprise
Copyright © 2019 Oracle and/or its affiliates.
DB2/z
Replication of
Real-time Data
Transactions & Events
GoldenGate Stream Analytics
ETL
&ML
DBMS
Cloud
Big Data
NoSQL
Streams
Object
Storage
Relational
Non-
Relational
Apps
https://www.oracle.com/middleware/technologies/goldengate.html
GoldenGate for Big Data
Copyright © 2019 Oracle and/or its affiliates.
Compare to:
• Open Source tools like
Sqoop or Kafka Connect
• ETL Tools commercial or
open source
• Changed Data Capture
Tools in niche areas
GoldenGate is:
• Simpler to use via
microservices, cloud etc
• Better Performance on
most DBs, esp Oracle
• More Reliable (in high
availability and disaster
recovery situations)
Real-time Stream of Data
Transactions & Data Store Events
Kafka | Object Store
ElasticSearch | HDFS | etc.
Data Lake
Lowest overhead
High fidelity events
Fastest data visibility
No more batch windows
DML, DDL and Procedures
Consistent recovery point
DB2/z
GG for Big Data – What is Included?
17
• Java Messaging Service (JMS) –
typically covers any JMS
compliant source technology
• Cassandra
• Roadmap sources to include
Apache Kafka (Connect
frameworks) and Mongo DB
• GoldenGate trails from all OGG
Sources like Oracle DB- (MA &
Classic), Microsoft SQL Server,
MySQL, IBM DB2, HPE NonStop
etc.) - Note: the Source side for
relational databases is separately
licensed
• Hadoop (HDFS /Hive, Hbase)
• Kafka - Pub/Sub, Kafka Connect,
REST
• Elasticsearch
• NoSQL ( MongoDB , Cassandra,
Oracle NoSQL)
• Oracle Cloud (Object Storage)
• AWS ( Redshift , Kinesis, S3 )
• Google Cloud ( BigQuery)
• Microsoft Azure Datalake(v1&v2),
Blob Storage
• Flatfiles(Netezza, Greenplum)
• JMS
• JDBC
• Stream Processing for any GG
data feed is included (other data
sources require full use license)
• Low-code development
• ETL (Filter, Aggregate, Merge,
Transform, Load Data)
• Correlate/Enrich
• Alerts, Thresholds, Anomalies
• Business Rules, Data Policies
• Time Series Analysis
• Spatial Analytics, Geo-fence
• Classification, Clustering
• Statistical Inference, Machine
Learning, Regression Models
Sources Targets Streaming
Replication in/out
for Non-Relational
Data Lake Ingest Streaming Ingest Cloud Ingest Messaging Replication NoSQL Replication SaaS Replication
Foundation Patterns:
Database
Replication
Unidirectional Bi-Directional Peer-to-Peer Broadcast Consolidation Distribution
Stream Processing Data Pipelines Data Transformation GoldenGate Integrations Time Series Analysis Geo-Fencing Predictive Analytics
Capabilities Included with GG for Big Data
Copyright © 2020 Oracle and/or its affiliates. 18
OGG for Big Data – Supported Formats
19
• Native formats of targets
• HDFS - Sequence File
• Delimited Text (both Row &
Operation modes)
• JSON (both Row & Operation
modes )
• Avro (both Row & Operation
modes)
• XML
• Parquet
• ORC
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
Tap into Existing HA Deployments
Existing GG Deployments Add GG for Big Data Deployments
Use existing Extract,
no performance
penalty on Source DB
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
Existing
Applications
Data
Services
Analytics
GoldenGate is Transaction Safe
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
GoldenGate semantics are fully ACID / transaction-safe with strong HA
Disaster Recovery
GG may be optionally used as a recovery point
for big data, and GG can supply metadata to
downstream Big Data environments about
transaction/commit boundaries.
ACID Compliant
GoldenGate Pipelines
GoldenGate Coordinated Replicat
Thread 1
Thread 2
Thread ..n
GG Delivery
Single
PRM
GG Big Data Targets
• Unified Parameter File, which is read
by each process thread and
determines the operational
configuration of each thread.
• Each apply thread is independent of
the other apply threads. Each thread
opens the OGG Trail for shared read
operations and has a unique entry in
the OGG Checkpoint Table.
• Although each thread functions
independently, an unrecoverable
error condition on any thread will
cause all threads to terminate in the
ABEND state.
• Full barrier coordination is not
performed on foreign keys. Parent
and child tables must be processed
by the same apply thread.
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
GoldenGate Big Data with High Availability
GG Capture &
Distribution
GG Big Data Targets
Deep Storage
Lakes such as
Object Store,
HDFS or Elastic
Use preferred HA
mode, depending
on GG Extract
architecture
GG Big Data
Replicat 001
Clustering Technology
such as Oracle
Clusterware, Veritas,
RedHat etc.
Shared-Disk / Durable Storage (DBFS, ACFS, etc.)
GG Big Data
Replicat 002
Trail
Files
.properti
es
schema
files
checkpoi
nts(.cpj)
/dirsta
Today’s Agenda
Strategic IT Forces
GoldenGate Big Data in Brief
Database Change Streams to Apache Kafka
Stream Processing with Apache Spark
Copyright © 2020 Oracle and/or its affiliates.
1
2
3
4
For Apache Kafka
Copyright © 2019 Oracle and/or its affiliates.
Lowest overhead
High fidelity events
Fastest data visibility
No more batch windows
DML, DDL and Procedures
Consistent recovery point
DB2/z
Oracle Streaming
Service (OSS)
Some Stats:
• GoldenGate is moving ~4 Petabytes
of data into Kafka every day
• ~300 customers (G2000) use
GoldenGate with Kafka
• Real-world performance in the 10’s
of thousands of transactions per
second into Kafka
General Problem to Solve
Copyright © 2020 Oracle and/or its affiliates.
A
B
C
A
B
C
BUSINESS
APPLICATION
Applications,
Data Services
Biz Consumers
Analytics &
Data Marts
Data Science
& Streaming
Applications
System of Record
“Data Producer”
Data Sync &
Stream Processing
System to Serve
“Data Consumers”
SQL
Events DB Log
Events Messaging
User
Events
• One Kafka Topic per DB
Table [default setting in
GoldenGate]
• Handling Schema Change
(AKA: Data Drift)
• One Kafka Topic for all
Tables
• Group source data
records into different
Kafka Partitions
• Full supplemental GG
replication
• Partial supplemental,
using DB (Standby) to re-
create full records inside
Kafka
• Partial supplemental, use
Kafka + Cache for Full
Records
Copyright © 2020 Oracle and/or its affiliates.
Some Patterns to Consider
• DB to DB
(no Kafka or Big Data)
• Mongo or Apache Hive
for some basic ACID
properties
• Kafka, using Exactly Once,
Transactions and
GoldenGate SCN & CSN
metadata
• Mid-tier deployment of
GoldenGate Big Data
• Combined deployment for
Big Data and Database
Targets (from single host)
• Layered Topic Types for
Raw Data, Full Data, and
Canonical Data
Transaction Consistency Table / Topic Mappings Deployment TopologyThe Change Stream
Strongest Transaction Consistency -> Use a DBMS
Copyright © 2020 Oracle and/or its affiliates.
A
B
C
A
B
C
XY
fact
AC
ABBC
ETL
OLTP ODS OLAP
Transaction Consistency -> With Hive or Mongo
Copyright © 2020 Oracle and/or its affiliates.
A
B
C
A
B
C
OLTP
A
B
C
https://www.mongodb.com/blog/post
/mongodb-multi-document-acid-
transactions-general-availability
https://community.cloudera.com/t5/Co
mmunity-Articles/Hive-ACID-Merge-by-
Example/ta-p/245402
Apache Hive
(with ACID Merge)
MongoDB
(with ACID Tx)
Transactions -> Use GG to Decorate Kafka Msgs
Copyright © 2020 Oracle and/or its affiliates.
SCN – System Change Number, is the Oracle DB clock – every time a transaction commits, the clock
increments. The SCN marks a consistent point in time in the database.
CSN – Commit Sequence Number, is the GoldenGate clock – GG uses CSN during apply to identify
the point in time at which the transaction is committed for maintaining transaction consistency and
data integrity. A CSN is available for all Source DB transactions captured via GoldenGate:
https://docs.oracle.com/en/middleware/goldengate/core/19.1/admin/commit-sequence-number.html
Kafka
Single Partition
A
A { “customer_id": “1" ,
“first_name": “Debra" ,
“last_name": “Burks" ,
“phone": “" , “email":
“debra.burks@yahoo.com" ,
“SCN”: “130” , “CSN” : “130”
}
B
B
{ “customer_id": “1" , “9273
Thome Ave." , “city":
“Orchard Park" , “state":
“NY" , “zip_code": “14127“ ,
“SCN”: “130” , “CSN” : “130”
}
Data
Consumer is
responsible to
maintain
transaction
boundaries
OLTP
Updates and
Deletes both show
up in Kafka as new
messages,
Consumers must
interpret the flags
correctly
Typical Data Mapping Pattern (Default)
Copyright © 2020 Oracle and/or its affiliates.
Partition 3
Partition 2
https://docs.oracle.com/en/middleware/goldengate/big-data/19.1/gadbd/using-kafka-
handler.html#GUID-FAD2E590-361E-46CC-B7F4-3BB97E19680E
A
B
C
Partition 1
auto.create.topics.enable property to true. This is the default setting.
A
B
C
OLTP
Handle Schema Change (AKA: Data Drift)
Copyright © 2020 Oracle and/or its affiliates.
Partition 2
A
Partition 1
A
A
OLTP
A.DdlEvents
DDL
A¹
Alter Table
{add column}
DDL Event
DDL Event
A¹
application/vnd.schemaregistry.v1+json
Any Event
Consumer
Partition 3
Partition 2
Map All Tables to Single Topic
Copyright © 2020 Oracle and/or its affiliates.
A
B
C
Partition 1
N
OLTP
Partition 2
Natural Keys to Kafka Partitions
Copyright © 2020 Oracle and/or its affiliates.
A
B
C
Partition 1
X
Z
Eg; partition by Vendor or Client Codes
Kafka:
Full Records
Full Supplemental Logging (Preferred)
Copyright © 2020 Oracle and/or its affiliates.
A
Use LOGALLSUPCOLS to get the full records, all supplemental columns
A
{ “customer_id": “1" , “first_name": “Debra"
, “last_name": “Burks" , “phone": “" ,
“email": “debra.burks@yahoo.com" ,
“street": “9273 Thome Ave." , “city":
“Orchard Park" , “state": “NY" , “zip_code":
“14127"}
EXTRACT crm
USERIDALIAS ogg
LOGALLSUPCOLS
UPDATERECORDFORMAT COMPACT
EXTTRAIL /gghome/ogg/dirdat/hr
SOURCECATALOG orcl
TABLE crm.customer;
Periodically run
Topic compaction
Partial Supplemental Logging; Join Back to DB
Copyright © 2020 Oracle and/or its affiliates.
Primary
Read
Standby
Kafka (Raw) Kafka (Full)
Raw-A Full-A
A
changed
columns
only
all
columns
included
A Join Key to full records from Stream Processor
Change data only Full records in sync with SoR
Application
Domain (SoR)
Periodically run
Topic compaction
DB / Block
Replication
Partial Supplemental Logging; Self Join to Kafka
Copyright © 2020 Oracle and/or its affiliates.
Primary
Kafka (Raw) Kafka (Full)
Raw-A Full-A
A
changed
columns
only
Join to previous full records
using Stream Processor
Change data only Full records in sync with SoR
Application
Domain (SoR)
previous
full record
new full
recordCache:
Key:Offset
Periodically
run Topic
compaction
SAN/RAID Storage
GG Mid-Tier xxx
GG Mid-Tier 002
Example Mid-Tier Deployment Topology
38
GG4BD Replicat
K8S/Docker Container (optional)
Administration
Service
Metrics
Service
Service
Manager
Reverse Proxy
& Certs
GG Mid-Tier 001
Kafka Host
GG Trail
Files
GG Trail
Files
GG Trail
Files
Kafka
Segments
Kafka
Segments
DatabaseClientLibraries
OGG
Extract
SecureSQL*NetConnections
HTTPS / TLS 1.2
Data Services
(for Apps)
DBAs
(aligned to
business unit)
GG Ops
(aligned to
shared services)
Topic : Table
Data Lake
Data
Warehouse
OGG Replicat <push – data is staged
when events arrive>
<data transformation –
Eg; Stored Procs>
DB2/z
Example Topology with Stream Processing
39
HTTPS / TLS 1.2
Data Services
(for Apps)
GG Mid-Tier xxx
SAN/RAID Storage
GG Mid-Tier 002
GG4BD
Replicat
K8S/Docker Container (optional)
Administration
Service
Metrics
Service
Service
Manager
Reverse Proxy
& Certs
GG Mid-Tier 001
GG Trail
Files
GG Trail
Files
GG Trail
Files
Kafka
Segments
Kafka
Segments
DatabaseClientLibraries
OGG
Extract
SecureSQL*NetConnections
DBAs
(aligned to
business unit)
GG Ops
(aligned to
shared services)
Topic : Table
Kafka Raw Topics
Topic : Table
Topic : Table
Cache
Store
Kafka
Segments
Kafka Prepared TopicsSpark ETL Nodes
OSA Mid-Tier
Topic : Table
Topic : Table
OSA Spark
Application
OSA Web
Application
Data Pipes for Real-time ETL
Data Pipes for Real-time ETL
Data Pipes for Real-time ETL Direct Load to Databases
Data Lake
Database
Data Engineer
(DW / Data Lake
organization)
Pattern: Canonical Objects as Real-Time Events
Copyright © 2019 Oracle and/or its affiliates.
CDC
Enterprise Data
Producers
Detect
Event
Logical
Change
Records
(LCRs)
App
DB
committed!
Data Model
Object Model
System
Of Record
(SoR)
User
Action
Raw
Data
Prepared
Data
Canonical
Data
Data Consumers
Applications Data Services
ODS (Data
Store)
Data Marts &
Warehouses
IoT Apps Data Science
Raw data & Alerting events are pushed
Prepared data events are pushed
Raw Data (LCR)
Schema Events
(DDL)
Prepared
Data Topics
“Master”
Data Topics
Canonical data events
JSON, XML,
Avro, Parquet,
CSV
Data Objects
Table Data
Raw Data / Alerts
ETL is bounded by
Time Window,
lookups can happen
from memory, cache
or via SQL
Direct to Database (relational semantics fully preserved) SQL Consumers
Tradeoff between “Data Fidelity” vs. “Data Latency”
Today’s Agenda
Strategic IT Forces
GoldenGate Big Data in Brief
Database Change Streams to Apache Kafka
Stream Processing with Apache Spark
Copyright © 2020 Oracle and/or its affiliates.
1
2
3
4
Data In Motion | Stream Processing
42Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle OpenWorld 2018
Ingest Database Events Select Processing Patterns Build Event Pipelines Serve Data Downstream
Any GoldenGate event is included
free, Kafka native events require
full-use license
Rich set of pre-built patterns can
dramatically improve developer
efficiency and time-to-value
Tool can easily leverage geo-fencing,
machine-learning, and other lookup
data within the data stream
Data can be delivered out to kafka,
databases, or easily staged for
downstream ETL jobs
connect
Data Owners &
Data Products
Significant Intellectual Property
Copyright © 2020 Oracle and/or its affiliates.
More than 70 patents on stream processing
Mature tech stack for Event Processing
Over 10 years of IP investment
12.2
12.3
18c
19c
11g
Interactive Browser-based Designer
Copyright © 2020 Oracle and/or its affiliates.
Accessible to Non-Technical Users
• Empower data analysts to enhance data with
no coding skills required
• Intuitive, always-on data view shows results of
transformations as they are defined
• Filter and correlate streams, apply rules,
aggregate, calculate fields etc.
Function extensibility via Java
• Allow data engineers to provide custom
stages and functions to be used by all team
members
Integrated Visualizations
• Explore your business data live through
various tables, charts and geospatial maps
Big Upgrade from Apache Spark
Copyright © 2020 Oracle and/or its affiliates.
Oracle Stream Analytics Apache Spark (only)
Programming Graphical UX, with ability to cut-paste Scala
directly into Pipelines
Java/Scala low level programming only
Checkpointing Automatic, part of OSA Pipeline
implementation
Developer must be aware of the semantics
and logic
Record-by-Record Automatic Timestamps from OSA CQL Engine Spark Streaming treats all records within a
batch the same
Out of Order Events Automatic, via CQL Timestamps and also via
GG SCN
Not possible to reliably handle
Progression of Time Automatic, CQL engine progresses time (eg;
A not followed by B)
When no new Events, there is no native
progression of time
Windowing Functions Will handle windowing based on number of
Events, Dynamic attributes, other intervals
Micro batch only
Fault Tolerance Automatic, part of OSA application native
behavior
Developer must code
Rich Set of Streaming Patterns
Copyright © 2020 Oracle and/or its affiliates.
Simplify Access to Complex Algorithms
• Easy-to-use modules with user assistance in the
designer
• Pre-defined visualizations to provide immediate
feedback
• Accessible to data analysts
Comprehensive Library of Patterns
• Covers diverse areas such as anomaly detection,
stream correlation, trend analysis, spatial functions
• Duplicate, out-of-order, and missing event
detection
• Functions for financial, statistic, and log analytic
operations
Location and Geo-Spatial Capabilities
Copyright © 2020 Oracle and/or its affiliates.
Interactive Spatial Design and Visualization
• Show live location data on maps as events are
processed
• Track individual objects and highlight them based
on different conditions, e.g. Red for violation
Rich Geospatial Pattern Set
• Correlate multiple objects through their spatial
interaction
• Detect speed, and proximity
• Obtain address and city information from location
and vice versa through Geocoding
Scalable Definition of Areas and Geo-Fences
• Define polygons through drawing borders on a map
• Manage large amounts of shapes through spatial
types in Oracle database.
Time Series Analytics
Copyright © 2020 Oracle and/or its affiliates.
Anatomy of a Time Series Pattern
Built in Patterns for Anomalies
• Pre-defined visualizations to provide
immediate feedback
• Accessible to data analysts
GoldenGate Supplies High Fidelity Events
• Every database commit, logical change
record, schema and procedure event is
visible in the event stream
• Combine with application logs for full picture
Examples
• Banking, credit card transactions, trades…
• Sales and Marketing Data (eCommerce)
• IoT, Telemetry, Devices, Smart Home
• Monitoring data, data centers, networks etc
• Science/medicine, EEG, ECG, DNA
• Social networks, likes, classification, trends
Predictive Analysis and Machine Learning
Copyright © 2020 Oracle and/or its affiliates.
Real-time Scoring and Decision Making
• Use Machine Learning models to make business
decisions in real-time
• Predict future outcomes such as equipment failures,
customer behavior, fraud and security breaches
• Re-import refined models for improved predictions
Put Data Science in Production
• Import Predictive Models created by data scientists and
engineers in their own environment.
• Import of PMML models for a variety of algorithms such as
vector machines, association rules, Naive Bayes classifier,
clustering models, text models, decision trees, and different
regression models.
• Hide model complexity for use by data analysts
• Custom stages for access to external scoring systems Oracle R
Enterprise
Notebooks
(Jupyter,
Zeppelin, etc)
Data Scientist
Data Analyst/
Data Engineer
Built-in Dashboards
Copyright © 2020 Oracle and/or its affiliates.
Visualizations Built-In
•Not intended to be a replacement for
purpose-built Data Visualization tools,
•OSA includes some visualizations to
support building graphs on data that is
streaming in-memory (before writing to
Data Store), including:
• Bar Charts
• Line Charts
• Geo-Spatial (Google Maps)
• Area Charts
• Pie Charts
• Scatter Charts
• Bubble Charts
• Thematic Maps
Global Shift in
Technology
(Industry 4.0)
Data
Integration
Shift to Data
Mesh
GoldenGate
for Big Data
51
What have we learned?
Event-Driven,
Distributed Data
Integration *Industrial Strength
& Enterprise Class
This is not a Metamorphosis, it is a Paradigm Shift
Copyright © 2020 Oracle and/or its affiliates.
The Success Paradox
Data success factors that did well in
Industry 3.0 will not be the factors
that create success in Industry 4.0
Next Gen Data Architecture
ETL Vendors
1990 – 2010’s Gen1 :
• Replication
• Messaging
• Streaming
• Pipelines
Next-Gen has
new DNA not
tied to ETL tools
It is impossible to evolve older Batch
Processing tools into a modern Event-
Centric Stream Processing solution; the
underlying paradigms are fundamentally
different
52
53Copyright © 2020 Oracle and/or its affiliates.
Questions?
Copyright © 2020, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Copyright © 2020 Oracle and/or its affiliates.

More Related Content

What's hot

What's hot (20)

Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Snowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data WarehousingSnowflake Best Practices for Elastic Data Warehousing
Snowflake Best Practices for Elastic Data Warehousing
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 

Similar to Webinar future dataintegration-datamesh-and-goldengatekafka

Oracle GoldenGate Roadmap Oracle OpenWorld 2020
Oracle GoldenGate Roadmap Oracle OpenWorld 2020 Oracle GoldenGate Roadmap Oracle OpenWorld 2020
Oracle GoldenGate Roadmap Oracle OpenWorld 2020
Oracle
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
Fang Mac
 

Similar to Webinar future dataintegration-datamesh-and-goldengatekafka (20)

Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
Data Engineer, Patterns & Architecture The future: Deep-dive into Microservic...
 
Azure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdfAzure BI Cloud Architectural Guidelines.pdf
Azure BI Cloud Architectural Guidelines.pdf
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Oracle GoldenGate Roadmap Oracle OpenWorld 2020
Oracle GoldenGate Roadmap Oracle OpenWorld 2020 Oracle GoldenGate Roadmap Oracle OpenWorld 2020
Oracle GoldenGate Roadmap Oracle OpenWorld 2020
 
Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle Applications
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu BariApache Hadoop and its role in Big Data architecture - Himanshu Bari
Apache Hadoop and its role in Big Data architecture - Himanshu Bari
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed D...
Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed D...Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed D...
Apidays Paris 2023 - Productizing AsyncAPI for Data Replication and Changed D...
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 

More from Jeffrey T. Pollock

Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
Jeffrey T. Pollock
 

More from Jeffrey T. Pollock (19)

2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Flash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lonFlash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lon
 
Version Control Training - First Lego League
Version Control Training - First Lego LeagueVersion Control Training - First Lego League
Version Control Training - First Lego League
 
Oracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer IntroductionOracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer Introduction
 
GoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenGoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest Rakuten
 
Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
Oracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorldOracle Data Integration CON9737 at OpenWorld
Oracle Data Integration CON9737 at OpenWorld
 
CDO - Chief Data Officer Momentum and Trends
CDO - Chief Data Officer Momentum and TrendsCDO - Chief Data Officer Momentum and Trends
CDO - Chief Data Officer Momentum and Trends
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
 
Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)Tapping into the Big Data Reservoir (CON7934)
Tapping into the Big Data Reservoir (CON7934)
 
Brief lessons from the greatest product managers
Brief lessons from the greatest product managersBrief lessons from the greatest product managers
Brief lessons from the greatest product managers
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing
 
2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data Services
 
Semantic Web For Dummies
Semantic Web For DummiesSemantic Web For Dummies
Semantic Web For Dummies
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with StrimziStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi
 
iGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by SkilrockiGaming Platform & Lottery Solutions by Skilrock
iGaming Platform & Lottery Solutions by Skilrock
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024Top Mobile App Development Companies 2024
Top Mobile App Development Companies 2024
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 

Webinar future dataintegration-datamesh-and-goldengatekafka

  • 1. The Future of Data Integration: Data Mesh, and a Special Deep Dive into Stream Processing with GoldenGate, Apache Kafka and Apache Spark O R A C L E D E V E L O P M E N T , M A R - 2 0 2 0
  • 2. Copyright © 2020, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
  • 3. Today’s Agenda Strategic IT Forces GoldenGate Big Data in Brief Database Change Streams to Apache Kafka Stream Processing with Apache Spark Copyright © 2020 Oracle and/or its affiliates. 1 2 3 4
  • 4. Copyright © 2020 Oracle and/or its affiliates. During 2020 / 2021 the world continues to go through a Paradigm Shift into a future where “Cyber- Physical Systems” are the new normal. “Digital Transformation” requires mindset shift: 1. Sharing data is more effective than accumulating 2. Decentralizing, distributing, and copying is more powerful than stockpiling 3. Connectivity and flow of data is the starting point for innovation and socializing.
  • 5. The Problems of Monolithic Data Architecture Copyright © 2020 Oracle and/or its affiliates. People Process Technology • Business units have few incentives work across org boundaries • Hyper-specialization in tech teams narrow the focus on technology; rather than outcomes or solutions • Pressure on stakeholders to produce value, but IT orgs still mostly built like they were 30yrs ago • 30yr old conception of data flows • IT-led and technology constrained • The monolithic data lake is big and slow by design, not by accident • “Ingest -> Process -> Serve” design is same as old “ETL” data flows, it institutionalizes the wrong goals • Storage centric conception of data, but data is Dynamic, not Static • “the Lake” is conceived as a physical place where we pile up data (in Hadoop or on the Cloud) • Cheaper storage than an EDW but much worse Governance • Does nothing to modernize data architecture itself Images: https://martinfowler.com/articles/data-monolith-to-mesh.html 5
  • 6. Evolution towards Real-Time Data Mesh Copyright © 2020 Oracle and/or its affiliates. Industry 3.0: Hub and Spoke Transitional: Kappa Hub Mature: Distributed Kappa This data pattern, popularized by Ralph Kimball and Bill Inmon, has been the foundation for enterprise data management since 1993. It is transaction consistent, can scale up nicely for most use cases, and is based on SQL, lingua-franca for most tools. By 2010, the Lambda (big data) pattern was common. In 2014, Jay Kreps (of LinkedIn) questioned the Lambda Architecture and spawned Kappa. The Kappa principles consider batch processing as a special case of stream processing. Use a historized event log to process both real-time as well as batch processing. By 2020, IT infrastructure has dramatically changed – networking, containers, cloud, compute, IoT etc have all pushed data to the edge. A mature Kappa architecture is not a single instance “hub” but rather a distributed mesh of data logs, stream data processing, change events, and time series data. Kappa: https://www.oreilly.com/radar/questioning-the-lambda-architecture/ https://en.wikipedia.org/wiki/Dimensional_modeling mesh & microservice controls 6 ETL ETL ETL ETL Lambda: http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
  • 7. Real-Time Data Mesh for Industry 4.0 Copyright © 2020 Oracle and/or its affiliates. …from Industry 3.0 …to Industry 4.0 Batch Centric, Schedulers Event Centric, Streams Mostly Relational Data (aka Views) Polyglot Data (via Logs) Size for Peak Workloads Elastic, Scale on Demand Kimball / Inmon Architecture Distributed Kappa Vendor Specific Open Source Enabled Simplex Processing is Standard Massively Parallel is Standard Hubs (EDW, Hadoop, Data Lake) Mesh (Edge, Hybrid, Multi-Cloud) Governance is “Bolt On” Governance is Embedded 7
  • 8. Data Mesh Conceptual View – Data Domains 8Copyright © 2020 Oracle and/or its affiliates. Enterprise Data Producers: ERP Apps, DBs, Middleware etc. Data Domain Consumers People owners of “Data Products”, collections of data sets in various stages of curation IoT Data Producers: Devices & Things Raw Data Prepared Data Canonical Data Data Domain A Data Domain B f(x) f(x) Data Domain C Data Mesh (distributed Kappa, microservices, cloud agnostic) Domain-Specific Views of Data Raw Event Consumers Automated Devices, Edge Nodes (5G), Scheduled Routines (eg; ETL etc) Data Product-Specific Storage Choices: • RDBMS • Data Lake • Object Store • Graph, etc.
  • 9. Raw data, Time Series & Alerting events are pushed Direct to Database (high fidelity transaction semantics fully preserved) Consumer-Driven, Event-Centric Data Mesh Copyright © 2020 Oracle and/or its affiliates. Enterprise Data Producers Detect Event Logical Change Records (LCRs) App DB committed! CDC Replication Data Domain Consumers Data Objects Table Data Raw Data / Alerts SQL Consumers Raw Data Prepared Data Canonical Data Raw Data (LCR) Schema Events (DDL) Prepared Data Topics “Master” Data Topics JSON, XML, Avro, Parquet, CSV Prepared data events are pushed Canonical data events Speed & Fidelity Trusted Views Ease of Consumption LCR/TFs Applications, Data Services Biz Consumers Analytics & Data Marts Data Science & Streaming Applications DBAs for HA, DR and OLTP Data Mesh puts the consumer needs first – they require data at different latency, fidelity, trust levels and views Data Model Object Model System Of Record (SoR) User Action App APIs and system log events 9
  • 10. Direct to Database (high fidelity transaction semantics fully preserved) Distributed by Design, Microservices Based Copyright © 2020 Oracle and/or its affiliates. Data Domain Producers Detect Event Logical Change Records (LCRs) App DB committed! Data Domain Consumers Data Objects Table Data Raw Data / Alerts SQL Consumers Data Model Object Model System Of Record (SoR) User Action CDC Replication Microservices Edge Compute or Cloud for Raw Data Events Prepare Technical Data Views LCRs Business Data Views Raw data, Time Series & Alerting events are pushed Prepared data events are pushed Canonical data Events (ephemeral or persisted) Stream Process Events (persisted) Stream Process Events (persisted) Applications, Data Services Biz Consumers Analytics & Data Marts Data Science & Streaming Applications DBAs for HA, DR and OLTP 10
  • 11. Copyright © 2020 Oracle and/or its affiliates. We are creating this today, a Data Integration Platform for Industry 4.0 – a Real-Time Data Mesh solution for everyone: ✓ Data Consumer Driven (low code, browser-based) ✓ Distributed by Design (multi-cloud, microservices) ✓ Event-Centric Pipelines (CDC, replication and streaming) ✓ Immutable Ledger Based (fully event history of SoRs aware) ✓ Polyglot Capable (works with all data types) ✓ ACID Capable (preserves DB transaction semantics) ✓ Governed (multi-level security, metadata driven) ✓ Enterprise Class (best of Open Source and Commercial S/W together) 11
  • 12. Single Pane of Glass for Real-Time Data Mesh Copyright © 2020 Oracle and/or its affiliates. connect DB2/z Data Objects Table Data Raw Data / Alerts SQL Consumers Applications, Data Services Biz Consumers Analytics & Data Marts Data Science & Streaming Applications DBAs for HA, DR and OLTP Real-Time Stream Data Processing Raw Data DBAs & Data Engineers Data Owners & Data Products 12 Data Consumer DrivenEvent Centric Pipelines Deploys in a Mesh Across Containers, Public Clouds and 5G Edge Devices
  • 13. How it Works Today: GoldenGate for Big Data Copyright © 2020 Oracle and/or its affiliates. Data Domain Consumers Data Objects Table Data Raw Data / Alerts SQL Consumers Applications, Data Services Biz Consumers Analytics & Data Marts Data Science & Streaming Applications DBAs for HA, DR and OLTP BYOS (Bring Your Own Spark) * distributed, may run on any combination of containers and clouds 13 Data Engineer Data AnalystDBA/GG Ops Capture Pipeline Analyze DeliverIngest GoldenGate Microservices Applications Stream Analytics Application BYOM (Bring Your Own Messaging) All Data Events & Transactions
  • 14. Today’s Agenda Strategic IT Forces GoldenGate Big Data in Brief Database Change Streams to Apache Kafka Stream Processing with Apache Spark Copyright © 2020 Oracle and/or its affiliates. 1 2 3 4
  • 15. GoldenGate Overall: for the Enterprise Copyright © 2019 Oracle and/or its affiliates. DB2/z Replication of Real-time Data Transactions & Events GoldenGate Stream Analytics ETL &ML DBMS Cloud Big Data NoSQL Streams Object Storage Relational Non- Relational Apps https://www.oracle.com/middleware/technologies/goldengate.html
  • 16. GoldenGate for Big Data Copyright © 2019 Oracle and/or its affiliates. Compare to: • Open Source tools like Sqoop or Kafka Connect • ETL Tools commercial or open source • Changed Data Capture Tools in niche areas GoldenGate is: • Simpler to use via microservices, cloud etc • Better Performance on most DBs, esp Oracle • More Reliable (in high availability and disaster recovery situations) Real-time Stream of Data Transactions & Data Store Events Kafka | Object Store ElasticSearch | HDFS | etc. Data Lake Lowest overhead High fidelity events Fastest data visibility No more batch windows DML, DDL and Procedures Consistent recovery point DB2/z
  • 17. GG for Big Data – What is Included? 17 • Java Messaging Service (JMS) – typically covers any JMS compliant source technology • Cassandra • Roadmap sources to include Apache Kafka (Connect frameworks) and Mongo DB • GoldenGate trails from all OGG Sources like Oracle DB- (MA & Classic), Microsoft SQL Server, MySQL, IBM DB2, HPE NonStop etc.) - Note: the Source side for relational databases is separately licensed • Hadoop (HDFS /Hive, Hbase) • Kafka - Pub/Sub, Kafka Connect, REST • Elasticsearch • NoSQL ( MongoDB , Cassandra, Oracle NoSQL) • Oracle Cloud (Object Storage) • AWS ( Redshift , Kinesis, S3 ) • Google Cloud ( BigQuery) • Microsoft Azure Datalake(v1&v2), Blob Storage • Flatfiles(Netezza, Greenplum) • JMS • JDBC • Stream Processing for any GG data feed is included (other data sources require full use license) • Low-code development • ETL (Filter, Aggregate, Merge, Transform, Load Data) • Correlate/Enrich • Alerts, Thresholds, Anomalies • Business Rules, Data Policies • Time Series Analysis • Spatial Analytics, Geo-fence • Classification, Clustering • Statistical Inference, Machine Learning, Regression Models Sources Targets Streaming
  • 18. Replication in/out for Non-Relational Data Lake Ingest Streaming Ingest Cloud Ingest Messaging Replication NoSQL Replication SaaS Replication Foundation Patterns: Database Replication Unidirectional Bi-Directional Peer-to-Peer Broadcast Consolidation Distribution Stream Processing Data Pipelines Data Transformation GoldenGate Integrations Time Series Analysis Geo-Fencing Predictive Analytics Capabilities Included with GG for Big Data Copyright © 2020 Oracle and/or its affiliates. 18
  • 19. OGG for Big Data – Supported Formats 19 • Native formats of targets • HDFS - Sequence File • Delimited Text (both Row & Operation modes) • JSON (both Row & Operation modes ) • Avro (both Row & Operation modes) • XML • Parquet • ORC Deep Storage Lakes such as Object Store, HDFS or Elastic
  • 20. Tap into Existing HA Deployments Existing GG Deployments Add GG for Big Data Deployments Use existing Extract, no performance penalty on Source DB Deep Storage Lakes such as Object Store, HDFS or Elastic Existing Applications Data Services Analytics
  • 21. GoldenGate is Transaction Safe Deep Storage Lakes such as Object Store, HDFS or Elastic GoldenGate semantics are fully ACID / transaction-safe with strong HA Disaster Recovery GG may be optionally used as a recovery point for big data, and GG can supply metadata to downstream Big Data environments about transaction/commit boundaries. ACID Compliant GoldenGate Pipelines
  • 22. GoldenGate Coordinated Replicat Thread 1 Thread 2 Thread ..n GG Delivery Single PRM GG Big Data Targets • Unified Parameter File, which is read by each process thread and determines the operational configuration of each thread. • Each apply thread is independent of the other apply threads. Each thread opens the OGG Trail for shared read operations and has a unique entry in the OGG Checkpoint Table. • Although each thread functions independently, an unrecoverable error condition on any thread will cause all threads to terminate in the ABEND state. • Full barrier coordination is not performed on foreign keys. Parent and child tables must be processed by the same apply thread. Deep Storage Lakes such as Object Store, HDFS or Elastic
  • 23. GoldenGate Big Data with High Availability GG Capture & Distribution GG Big Data Targets Deep Storage Lakes such as Object Store, HDFS or Elastic Use preferred HA mode, depending on GG Extract architecture GG Big Data Replicat 001 Clustering Technology such as Oracle Clusterware, Veritas, RedHat etc. Shared-Disk / Durable Storage (DBFS, ACFS, etc.) GG Big Data Replicat 002 Trail Files .properti es schema files checkpoi nts(.cpj) /dirsta
  • 24. Today’s Agenda Strategic IT Forces GoldenGate Big Data in Brief Database Change Streams to Apache Kafka Stream Processing with Apache Spark Copyright © 2020 Oracle and/or its affiliates. 1 2 3 4
  • 25. For Apache Kafka Copyright © 2019 Oracle and/or its affiliates. Lowest overhead High fidelity events Fastest data visibility No more batch windows DML, DDL and Procedures Consistent recovery point DB2/z Oracle Streaming Service (OSS) Some Stats: • GoldenGate is moving ~4 Petabytes of data into Kafka every day • ~300 customers (G2000) use GoldenGate with Kafka • Real-world performance in the 10’s of thousands of transactions per second into Kafka
  • 26. General Problem to Solve Copyright © 2020 Oracle and/or its affiliates. A B C A B C BUSINESS APPLICATION Applications, Data Services Biz Consumers Analytics & Data Marts Data Science & Streaming Applications System of Record “Data Producer” Data Sync & Stream Processing System to Serve “Data Consumers” SQL Events DB Log Events Messaging User Events
  • 27. • One Kafka Topic per DB Table [default setting in GoldenGate] • Handling Schema Change (AKA: Data Drift) • One Kafka Topic for all Tables • Group source data records into different Kafka Partitions • Full supplemental GG replication • Partial supplemental, using DB (Standby) to re- create full records inside Kafka • Partial supplemental, use Kafka + Cache for Full Records Copyright © 2020 Oracle and/or its affiliates. Some Patterns to Consider • DB to DB (no Kafka or Big Data) • Mongo or Apache Hive for some basic ACID properties • Kafka, using Exactly Once, Transactions and GoldenGate SCN & CSN metadata • Mid-tier deployment of GoldenGate Big Data • Combined deployment for Big Data and Database Targets (from single host) • Layered Topic Types for Raw Data, Full Data, and Canonical Data Transaction Consistency Table / Topic Mappings Deployment TopologyThe Change Stream
  • 28. Strongest Transaction Consistency -> Use a DBMS Copyright © 2020 Oracle and/or its affiliates. A B C A B C XY fact AC ABBC ETL OLTP ODS OLAP
  • 29. Transaction Consistency -> With Hive or Mongo Copyright © 2020 Oracle and/or its affiliates. A B C A B C OLTP A B C https://www.mongodb.com/blog/post /mongodb-multi-document-acid- transactions-general-availability https://community.cloudera.com/t5/Co mmunity-Articles/Hive-ACID-Merge-by- Example/ta-p/245402 Apache Hive (with ACID Merge) MongoDB (with ACID Tx)
  • 30. Transactions -> Use GG to Decorate Kafka Msgs Copyright © 2020 Oracle and/or its affiliates. SCN – System Change Number, is the Oracle DB clock – every time a transaction commits, the clock increments. The SCN marks a consistent point in time in the database. CSN – Commit Sequence Number, is the GoldenGate clock – GG uses CSN during apply to identify the point in time at which the transaction is committed for maintaining transaction consistency and data integrity. A CSN is available for all Source DB transactions captured via GoldenGate: https://docs.oracle.com/en/middleware/goldengate/core/19.1/admin/commit-sequence-number.html Kafka Single Partition A A { “customer_id": “1" , “first_name": “Debra" , “last_name": “Burks" , “phone": “" , “email": “debra.burks@yahoo.com" , “SCN”: “130” , “CSN” : “130” } B B { “customer_id": “1" , “9273 Thome Ave." , “city": “Orchard Park" , “state": “NY" , “zip_code": “14127“ , “SCN”: “130” , “CSN” : “130” } Data Consumer is responsible to maintain transaction boundaries OLTP Updates and Deletes both show up in Kafka as new messages, Consumers must interpret the flags correctly
  • 31. Typical Data Mapping Pattern (Default) Copyright © 2020 Oracle and/or its affiliates. Partition 3 Partition 2 https://docs.oracle.com/en/middleware/goldengate/big-data/19.1/gadbd/using-kafka- handler.html#GUID-FAD2E590-361E-46CC-B7F4-3BB97E19680E A B C Partition 1 auto.create.topics.enable property to true. This is the default setting. A B C OLTP
  • 32. Handle Schema Change (AKA: Data Drift) Copyright © 2020 Oracle and/or its affiliates. Partition 2 A Partition 1 A A OLTP A.DdlEvents DDL A¹ Alter Table {add column} DDL Event DDL Event A¹ application/vnd.schemaregistry.v1+json Any Event Consumer
  • 33. Partition 3 Partition 2 Map All Tables to Single Topic Copyright © 2020 Oracle and/or its affiliates. A B C Partition 1 N OLTP
  • 34. Partition 2 Natural Keys to Kafka Partitions Copyright © 2020 Oracle and/or its affiliates. A B C Partition 1 X Z Eg; partition by Vendor or Client Codes
  • 35. Kafka: Full Records Full Supplemental Logging (Preferred) Copyright © 2020 Oracle and/or its affiliates. A Use LOGALLSUPCOLS to get the full records, all supplemental columns A { “customer_id": “1" , “first_name": “Debra" , “last_name": “Burks" , “phone": “" , “email": “debra.burks@yahoo.com" , “street": “9273 Thome Ave." , “city": “Orchard Park" , “state": “NY" , “zip_code": “14127"} EXTRACT crm USERIDALIAS ogg LOGALLSUPCOLS UPDATERECORDFORMAT COMPACT EXTTRAIL /gghome/ogg/dirdat/hr SOURCECATALOG orcl TABLE crm.customer; Periodically run Topic compaction
  • 36. Partial Supplemental Logging; Join Back to DB Copyright © 2020 Oracle and/or its affiliates. Primary Read Standby Kafka (Raw) Kafka (Full) Raw-A Full-A A changed columns only all columns included A Join Key to full records from Stream Processor Change data only Full records in sync with SoR Application Domain (SoR) Periodically run Topic compaction DB / Block Replication
  • 37. Partial Supplemental Logging; Self Join to Kafka Copyright © 2020 Oracle and/or its affiliates. Primary Kafka (Raw) Kafka (Full) Raw-A Full-A A changed columns only Join to previous full records using Stream Processor Change data only Full records in sync with SoR Application Domain (SoR) previous full record new full recordCache: Key:Offset Periodically run Topic compaction
  • 38. SAN/RAID Storage GG Mid-Tier xxx GG Mid-Tier 002 Example Mid-Tier Deployment Topology 38 GG4BD Replicat K8S/Docker Container (optional) Administration Service Metrics Service Service Manager Reverse Proxy & Certs GG Mid-Tier 001 Kafka Host GG Trail Files GG Trail Files GG Trail Files Kafka Segments Kafka Segments DatabaseClientLibraries OGG Extract SecureSQL*NetConnections HTTPS / TLS 1.2 Data Services (for Apps) DBAs (aligned to business unit) GG Ops (aligned to shared services) Topic : Table Data Lake Data Warehouse OGG Replicat <push – data is staged when events arrive> <data transformation – Eg; Stored Procs> DB2/z
  • 39. Example Topology with Stream Processing 39 HTTPS / TLS 1.2 Data Services (for Apps) GG Mid-Tier xxx SAN/RAID Storage GG Mid-Tier 002 GG4BD Replicat K8S/Docker Container (optional) Administration Service Metrics Service Service Manager Reverse Proxy & Certs GG Mid-Tier 001 GG Trail Files GG Trail Files GG Trail Files Kafka Segments Kafka Segments DatabaseClientLibraries OGG Extract SecureSQL*NetConnections DBAs (aligned to business unit) GG Ops (aligned to shared services) Topic : Table Kafka Raw Topics Topic : Table Topic : Table Cache Store Kafka Segments Kafka Prepared TopicsSpark ETL Nodes OSA Mid-Tier Topic : Table Topic : Table OSA Spark Application OSA Web Application Data Pipes for Real-time ETL Data Pipes for Real-time ETL Data Pipes for Real-time ETL Direct Load to Databases Data Lake Database Data Engineer (DW / Data Lake organization)
  • 40. Pattern: Canonical Objects as Real-Time Events Copyright © 2019 Oracle and/or its affiliates. CDC Enterprise Data Producers Detect Event Logical Change Records (LCRs) App DB committed! Data Model Object Model System Of Record (SoR) User Action Raw Data Prepared Data Canonical Data Data Consumers Applications Data Services ODS (Data Store) Data Marts & Warehouses IoT Apps Data Science Raw data & Alerting events are pushed Prepared data events are pushed Raw Data (LCR) Schema Events (DDL) Prepared Data Topics “Master” Data Topics Canonical data events JSON, XML, Avro, Parquet, CSV Data Objects Table Data Raw Data / Alerts ETL is bounded by Time Window, lookups can happen from memory, cache or via SQL Direct to Database (relational semantics fully preserved) SQL Consumers Tradeoff between “Data Fidelity” vs. “Data Latency”
  • 41. Today’s Agenda Strategic IT Forces GoldenGate Big Data in Brief Database Change Streams to Apache Kafka Stream Processing with Apache Spark Copyright © 2020 Oracle and/or its affiliates. 1 2 3 4
  • 42. Data In Motion | Stream Processing 42Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | Oracle OpenWorld 2018 Ingest Database Events Select Processing Patterns Build Event Pipelines Serve Data Downstream Any GoldenGate event is included free, Kafka native events require full-use license Rich set of pre-built patterns can dramatically improve developer efficiency and time-to-value Tool can easily leverage geo-fencing, machine-learning, and other lookup data within the data stream Data can be delivered out to kafka, databases, or easily staged for downstream ETL jobs connect Data Owners & Data Products
  • 43. Significant Intellectual Property Copyright © 2020 Oracle and/or its affiliates. More than 70 patents on stream processing Mature tech stack for Event Processing Over 10 years of IP investment 12.2 12.3 18c 19c 11g
  • 44. Interactive Browser-based Designer Copyright © 2020 Oracle and/or its affiliates. Accessible to Non-Technical Users • Empower data analysts to enhance data with no coding skills required • Intuitive, always-on data view shows results of transformations as they are defined • Filter and correlate streams, apply rules, aggregate, calculate fields etc. Function extensibility via Java • Allow data engineers to provide custom stages and functions to be used by all team members Integrated Visualizations • Explore your business data live through various tables, charts and geospatial maps
  • 45. Big Upgrade from Apache Spark Copyright © 2020 Oracle and/or its affiliates. Oracle Stream Analytics Apache Spark (only) Programming Graphical UX, with ability to cut-paste Scala directly into Pipelines Java/Scala low level programming only Checkpointing Automatic, part of OSA Pipeline implementation Developer must be aware of the semantics and logic Record-by-Record Automatic Timestamps from OSA CQL Engine Spark Streaming treats all records within a batch the same Out of Order Events Automatic, via CQL Timestamps and also via GG SCN Not possible to reliably handle Progression of Time Automatic, CQL engine progresses time (eg; A not followed by B) When no new Events, there is no native progression of time Windowing Functions Will handle windowing based on number of Events, Dynamic attributes, other intervals Micro batch only Fault Tolerance Automatic, part of OSA application native behavior Developer must code
  • 46. Rich Set of Streaming Patterns Copyright © 2020 Oracle and/or its affiliates. Simplify Access to Complex Algorithms • Easy-to-use modules with user assistance in the designer • Pre-defined visualizations to provide immediate feedback • Accessible to data analysts Comprehensive Library of Patterns • Covers diverse areas such as anomaly detection, stream correlation, trend analysis, spatial functions • Duplicate, out-of-order, and missing event detection • Functions for financial, statistic, and log analytic operations
  • 47. Location and Geo-Spatial Capabilities Copyright © 2020 Oracle and/or its affiliates. Interactive Spatial Design and Visualization • Show live location data on maps as events are processed • Track individual objects and highlight them based on different conditions, e.g. Red for violation Rich Geospatial Pattern Set • Correlate multiple objects through their spatial interaction • Detect speed, and proximity • Obtain address and city information from location and vice versa through Geocoding Scalable Definition of Areas and Geo-Fences • Define polygons through drawing borders on a map • Manage large amounts of shapes through spatial types in Oracle database.
  • 48. Time Series Analytics Copyright © 2020 Oracle and/or its affiliates. Anatomy of a Time Series Pattern Built in Patterns for Anomalies • Pre-defined visualizations to provide immediate feedback • Accessible to data analysts GoldenGate Supplies High Fidelity Events • Every database commit, logical change record, schema and procedure event is visible in the event stream • Combine with application logs for full picture Examples • Banking, credit card transactions, trades… • Sales and Marketing Data (eCommerce) • IoT, Telemetry, Devices, Smart Home • Monitoring data, data centers, networks etc • Science/medicine, EEG, ECG, DNA • Social networks, likes, classification, trends
  • 49. Predictive Analysis and Machine Learning Copyright © 2020 Oracle and/or its affiliates. Real-time Scoring and Decision Making • Use Machine Learning models to make business decisions in real-time • Predict future outcomes such as equipment failures, customer behavior, fraud and security breaches • Re-import refined models for improved predictions Put Data Science in Production • Import Predictive Models created by data scientists and engineers in their own environment. • Import of PMML models for a variety of algorithms such as vector machines, association rules, Naive Bayes classifier, clustering models, text models, decision trees, and different regression models. • Hide model complexity for use by data analysts • Custom stages for access to external scoring systems Oracle R Enterprise Notebooks (Jupyter, Zeppelin, etc) Data Scientist Data Analyst/ Data Engineer
  • 50. Built-in Dashboards Copyright © 2020 Oracle and/or its affiliates. Visualizations Built-In •Not intended to be a replacement for purpose-built Data Visualization tools, •OSA includes some visualizations to support building graphs on data that is streaming in-memory (before writing to Data Store), including: • Bar Charts • Line Charts • Geo-Spatial (Google Maps) • Area Charts • Pie Charts • Scatter Charts • Bubble Charts • Thematic Maps
  • 51. Global Shift in Technology (Industry 4.0) Data Integration Shift to Data Mesh GoldenGate for Big Data 51 What have we learned? Event-Driven, Distributed Data Integration *Industrial Strength & Enterprise Class
  • 52. This is not a Metamorphosis, it is a Paradigm Shift Copyright © 2020 Oracle and/or its affiliates. The Success Paradox Data success factors that did well in Industry 3.0 will not be the factors that create success in Industry 4.0 Next Gen Data Architecture ETL Vendors 1990 – 2010’s Gen1 : • Replication • Messaging • Streaming • Pipelines Next-Gen has new DNA not tied to ETL tools It is impossible to evolve older Batch Processing tools into a modern Event- Centric Stream Processing solution; the underlying paradigms are fundamentally different 52
  • 53. 53Copyright © 2020 Oracle and/or its affiliates. Questions?
  • 54.
  • 55. Copyright © 2020, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2020 Oracle and/or its affiliates.