SlideShare a Scribd company logo
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
CON6624
Oracle Data Integration Platform
A Cornerstone for Big Data
Christophe Dupupet (@XofDup)
Director | A-Team
Mark Rittman (@markrittman)
Independent Analyst
Julien Testut (@JulienTestut)
Senior Principal Product Manager
September, 2016
Confidential – Oracle Internal/Restricted/Highly Restricted
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Oracle Confidential 1
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Agenda
Oracle Data Integration for Big Data
Big Data Patterns
A Practitioner’s View on Oracle Data integration for Big Data
Q & A
1
2
3
4
Five Core
Capabilities
1. Business ContinuityDATA ALWAYS AVAILABLE
2. Data Movement
DATA ANYWHERE IT’S NEEDED
3. Data TransformationDATA ACCESSIBLE IN ANY FORMAT
4. Data GovernanceDATA THAT CAN BE TRUSTED
5. Streaming Data
DATA IN MOTION OR AT REST
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 5
Eight Core
Products
Cloud or On-
Premise
Most
Innovative
Technology
#1
#1
Realtime / Streaming
Data Integration Tool
Pushdown / E-LT
Data Integration Tool
1st to certify replication with
Streaming Big Data
1st to certify E-LT tool with
Apache Spark/Python
1st to power Data Preparation
w/ML + NLP + Graph Data
1st to offer Self-Service &
Hybrid Cloud solution
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 7
Hybrid Open-Source
...Open Source at the core of speed & batch processing engines
...Enterprise Vendor tools for connecting to existing IT system and
...Cloud Platforms for data fabric
Business
Data
Serving
Layer
Apps
Analytics
Batch Layer
Data Streams
Social and Logs
Enterprise Data
Highly Available
Databases
Pub / Sub
REST APIs
NoSQL
Bulk Data
Speed Layer
Raw Data Stream Processing
Batch Processing
Prepared Data
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Examples
Oracle Confidential 8
Reference Architecture
Business
Data
Serving
Layer
Apps
Analytics
Batch Layer
Data Streams
Social and Logs
Enterprise Data
Highly Available
Databases
Pub / Sub
REST APIs
NoSQL
Bulk Data
Speed Layer
GoldenGate
Data Preparation
Data Quality, Metadata Management & Business Glossary
Oracle Data Integrator
Active DataGuard
Comprehensive architecture covers key areas – #1. Data Ingestion, #2. Data Preparation &
Transformation, #3. Streaming Big Data, #4. Parallel Connectivity, and #5. Data Governance –
and Oracle Data Integration has it covered.
Dataflow ML
Stream Analytics
Connectors
Oracle GoldenGate
Realtime Performance
Extensible & Flexible
Proven & Reliable
Oracle GoldenGate provides low-impact capture, routing,
transformation, and delivery of database transactions
across homogeneous and heterogeneous environments in
real-time with no distance limitations.
Most
Databases
Data
Events
Transaction Streams
Cloud
DBs
Big
Data
Supports Databases, Big Data and NoSQL:
* The most popular enterprise integration tool in history
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
ApplicationsApplications DatabusApplications Speed Layer
Batch Layer
Capture
Trail
Route
Deliver
Pump
Oracle Confidential 10
Streaming Analytics
Application
Serving
Layer
REST
Services
Visualization
Tools
Reporting
Tools
Data Marts
User
Updates
DBMS
Updates
GoldenGate for Ingest
GG GG
Applications Serving
Layer
Speed Layer
Batch Layer
Platforms
Self-Service
Better Recommendations
Built-in Data Graph Zero software
to install, easy
to use browser
based interface
Better
automation and
less grunt work
for humans
Graph database
of real-world
facts used for
enrichment
Oracle Data Preparation
ReportingApps
Files
ETL
Oracle Data Preparation is a self-service tool that makes
it simple to transform, prepare, enrich and standardize
business data – it can help IT accelerate solutions for the
Business by giving control of data formatting directly to
data analysts.
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 12
MONTHS of effort
spent on each new
dataset
PROGRAMERS writing
scripts or complex ETL
DATA WRANGLING
wastes time and money
“Big Data’s dirty little secret is that 90% of time spent on a project is
devoted to preparing data… After all the preparation work, there isn’t
enough time left to do sophisticated analytics on it…” Thomas H. Davenport
Internet
Logs
UNSTRUCTUREDSTRUCTURED
Discovery
& Visualization
Enterprise
Reporting
Enterprise
ETL & Data
Integration
BUSINESS VALUEOPPORTUNITY
Weeks or
Months
I want my
data!!
BDP for Data Preparation
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integrator
Bulk Data Performance
Non Invasive Footprint
Future Proof IT Skills
Oracle Data Integrator provides high performance bulk
data movement, massively parallel data transformation
using database or big data technologies, and block-level
data loading that leverages native data utilities
Bulk Data
Transformation
Most Apps,
Databases
& Cloud Bulk Data Movement
Cloud
DBs
Big
Data
1000’s of
customers –
more than other
ETL tools
Flexible ELT
workloads run
anywhere: DBs,
Big Data, Cloud
Up to 2x faster
batch processes
and 3x more
efficient tooling
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 14
ODI for Transformations
ETL Engines
Big Data Frameworks
Speed Layer
Batch Layer
Serving
Layer
ApplicationsApplications DatabusApplications
Application
REST
Services
Visualization
Tools
Reporting
Tools
Data Marts
User
Updates
DBMS
Updates
Applications Serving
Layer
Speed Layer
Batch Layer
Oracle Data Integrator
Spark Streaming
Spark SQLSqoop
ERP
Oozie
Pig
Hive
Loaders
Kafka
NoSQL
OGG
SQL
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 15
No ETL engine is
required
Separation of
Logical and
Physical design
Physical exec on
SQL, Hive, Pig, or
Spark
Runtime exec in
Oozie or via ODI
Java Agent
Rich set of pre-
built operators
User defined
functions
Business Value of ODI: Only Tool with
Portable Mappings
Business Friendly
Extreme Performance
Spatial Awareness
Oracle Stream Analytics
DB
Web / Devices
Data
Event
Data & Transaction Streams
Downstream
(eg; Hadoop)
Data
Event
Oracle Stream Analytics is a powerful analytic toolkit
designed to work directly on data in motion – simple data
correlations, complex event processing, geo-fencing, and
advanced dashboards run on millions of events per
second.
Innovative dual
model for
Apache Spark or
Coherence grid
Simple to use
spatial and geo-
fencing features
an industry first
Includes Oracle
GoldenGate for
streaming
transactions
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Stream or Batch Data
Spark based Pipelines
ML-powered Profiling
Oracle Dataflow ML
Oracle Dataflow ML is big data solution for stream and
batch processing in a single environment – Lambda based
applications that can run streaming ETL for cloud based
analytic solutions.
Batch and
stream
processing at
the same time
Machine
learning guides
users for data
profiling
Data movement
across Oracle
PaaS services
Most Apps,
Databases
& Cloud
Bulk Data
Movement
Streaming Data Cloud
DBs
Big
Data
Big Data
Pipeline
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
from Devices
Batch Layer
Oracle Confidential 18
Streaming Data
ApplicationsApplications
Databus
Applications
Speed Layer Serving
Layer
REST
Services
Visualization
Tools
Reporting
Tools
Data Marts
Applications
Serving
Layer
Speed Layer
Batch Layer
Oracle Stream Analytics
Oracle Dataflow ML
Oracle GoldenGate
Application
ApplicationsApplicationsDevices
from Databases
Business Glossary
End-to-End Lineage
100+ Supported Systems
Oracle Metadata Management
Oracle Metadata Management provides an integrated
toolkit that combines business glossary, workflow,
metadata harvesting and rich data steward collaboration
features.
Supports Databases, Big Data, ETL Tools, BI Tools etc:
BI Report Lineage
Taxonomy Lineage
Data Model Lineage
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Data Catalog
Speed Layer
Batch Layer
Serving
Layer
Oracle Confidential 20
OEMM for Data Governance
ApplicationsApplications DatabusApplications
Application
REST
Services
Visualization
Tools
Reporting
Tools
Data Marts
User
Updates
DBMS
Updates
Applications Serving
Layer
Speed Layer
Batch Layer
Kafka
Generated Streaming
Generated ETL CodeSqoop
OLTP Databases
HDFS Files
HCatalog
Hive
NoSQL
ETL
Tools
Data Warehouses
BI Models
ER Models
Oracle Enterprise Metadata Management
140+ Supported Tools
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 21
Eight Core
Products
Cloud or On-
Premise
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Agenda
Oracle Data Integration for Big Data
Big Data Patterns
A Practitioner’s View on Oracle Data integration for Big Data
Q & A
1
2
3
4
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Leverage Wide Range of Modern Analytic Styles
4 Business Patterns of Big Data Customer Adoption
Oracle Confidential, under Non-Disclosure 23
DBMS
(on prem or cloud)
Sandbox
ETL Offload
Staging
Deep Data
Storage
1. Analytic Data Sandbox:
– Stakeholder: Functional Line of Business (LoB)
– Core Value: Faster access to business data, Faster
time to value on Analytics
– Innovation: Schema-on-read empowers rapid data
staging and true Data Discovery
2. ETL Offload:
– Stakeholder: Information Technology (IT)
– Core Value: Cost avoidance on DW/Marts
– Innovation: YARN/Hadoop empowers lower cost
compute and lower cost storage
3. Deep Data Storage:
– Stakeholder: Risk / Compliance (LoB)
– Core Value: High fidelity aged data
– Innovation: SQL on Hadoop engines enable very low
cost, queryable data access
4. Streaming:
– Stakeholder: Marketing (LoB) / Telematics (LoB)
– Core Value: New Data Services or Higher Click Rates
– Innovation: MPP capable streaming platforms
combined with modern in-motion analytics
Data First
Analytics
Model First
Analytics
In-Motion
Analytics
Streaming
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Discovery, Exploratory and
Visualization Style Analytics
• Oracle Endeca, Big Data Discovery
• Tableau, Cliq, Spotfire
• DataMeer etc
Business Intelligence, Reporting and
Dashboard Style Analytics
• Oracle BIEE, Visual Analyzer
• Cognos, SAS, MicroStrategy
• Business Objects, Actuate etc
Analytic Data Sandbox
Oracle Confidential, under Non-Disclosure 24
Analytic Data Sandbox:
– Stakeholder: Functional Line of Business (LoB)
– Core Value: Faster access to business data, Faster
time to value on Analytics
– Innovation: Schema-on-read empowers rapid data
staging and true Data Discovery
– Industries: All industries
Supports “Data First” Style of Analytics
– No schema required
– Staging data is simple and fast
– Minimal data preparation required
(mainly for un/semi-structured data sets)
Typical Customer Data Types / Sets
– Usually bringing in Structured Data from OLTP
(Primary data is their existing Application data)
– Often bringing in Semi-Structured data
(Secondary data is clickstream, logs, machine data)
– Business value is usually in the combination of the
various data sets and the improved speed of
discovery
DBMS
(on prem or cloud)
Sandbox
ETL Offload
Staging
Data First
Analytics
Model First
Analytics
Often the data flow may
not require any ETL Tooling
Other data flows may still
require ETL as a pipeline
BI Self Service
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Discovery, Exploratory and
Visualization Style Analytics
• Oracle Endeca, Big Data Discovery
• Tableau, Cliq, Spotfire
• DataMeer etc
Business Intelligence, Reporting and
Dashboard Style Analytics
• Oracle BIEE, Visual Analyzer
• Cognos, SAS, MicroStrategy
• Business Objects, Actuate etc
ETL Offload
Oracle Confidential, under Non-Disclosure 25
DBMS
(on prem or cloud)
Sandbox
ETL Offload
Staging
2. ETL Offload:
– Stakeholder: Information Technology (IT)
– Core Value: Cost avoidance on DW/Marts
– Innovation: YARN/Hadoop empowers lower cost
compute and lower cost storage
– Industries: Teradata, Netezza & AbInitio customers
Supports “Model First” Style of Analytics
– Schemas required
(for working areas, sources and targets)
– Staging data requires modeled staging tables
– Data preparation required (mapping data sets)
(un/semi-structured data sets require pre-parsing)
Typical Customer Data Types / Sets
– Usually bringing in Structured Data from OLTP Apps
(Primary data is their existing Application data)
– Occasionally adding new data types to EDW schema
(Secondary data is clickstream, logs, machine data)
– Business value is usually tied to the “cost avoidance”
around escalating DW and ETL tooling costs
Data First
Analytics
Model First
Analytics
Primary Data Flow Requires
Data Integration Tools
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Discovery, Exploratory and
Visualization Style Analytics
• Oracle Endeca, Big Data Discovery
• Tableau, Cliq, Spotfire
• DataMeer etc
Business Intelligence, Reporting and
Dashboard Style Analytics
• Oracle BIEE, Visual Analyzer
• Cognos, SAS, MicroStrategy
• Business Objects, Actuate etc
Deep Data Storage
Oracle Confidential, under Non-Disclosure 26
DBMS
(on prem or cloud)
Sandbox
ETL Offload
Staging
Deep Data
Storage
3. Deep Data Storage:
– Stakeholder: Risk / Compliance (LoB)
– Core Value: High fidelity aged data
– Innovation: SQL on Hadoop engines enable very low
cost, queryable data access
– Industries: Insurance and Banking
Typically Deep Storage of Relational Data
– Schemas required
(item detail records, not necessarily aggregates)
– Archival can be “on the way in” as part of routine
loading, and also via “periodic” pruning from the
EDW and data marts
Popular with SQL on Hadoop and Federation
– Teradata Query Grid from Teradata/Aster
– IBM BigSQL from Netezza/PureData
– Oracle Big Data SQL from Exadata
– Pivotal HAWQ from Greenplum
– Cisco Composite Software also selling on this use
case (in addition to BI Virtualization)
Data First
Analytics
Model First
Analytics
Pattern
mining
Compliance
Queryable Archive
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Streaming Big Data Analytics
Oracle Confidential, under Non-Disclosure 27
DBMS
(on prem or cloud)
Sandbox
ETL Offload
Staging
Deep Data
Storage
4. Streaming:
– Stakeholder: Marketing (LoB) / Telematics (LoB)
– Core Value: New Data Services or Higher Click Rates
– Innovation: MPP capable streaming platforms
combined with modern in-motion analytics
– Industries: Automotive, Aerospace, Industrial
Manufacturing, some Energy/Oil & Gas
Decisions on Data Before it hits Disk
– Data volume may be too high to persist all data
• Only save the important data
– Data may be highly repetitive (sensor data)
– Correlations may need to happen with very low
latency requirements based on LoB demand
Key Use Case for “Data Monetization”
– Customers are standing up new Data Services (eg;
realtime equipment failure alerts and subscription
based monitoring)
– “Connected Car” services from most car makers
– Disaster preparedness centers – Energy/Aerospace
In-Motion
Analytics
Streaming
Other data flows may still
require ETL as a pipeline
Data First
Analytics
Model First
Analytics
Pattern
mining
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Some Common Themes Across Use Cases
Oracle Confidential, under Non-Disclosure 28
1. Nearly 100% Analytic Use Cases
– Data Discovery directly in Hadoop
– ETL Offloading for analytics in SQL DB
– Deep Data Storage for analytics in SQL DB
– Streaming Analytics for data before it hits disk – Lambda Arch
2. Nearly all the Data is Structured Data:
– OLTP Sources: every customer starts with the trusted data sets
that already drive the majority of business value – App Data
– New Sources: Clickstream Logs, Machine Data and other App
Exhaust all have “structure” even if they may not have schema
3. Many more Sources are App/OLTP Sources:
– By Quantity of Sources: most customers have many (dozens or
hundreds) of App/OLTP source they are bringing in
– By Volume: by quantity of data, the amount of Machine Data or
Log data may often exceed the OLTP data sets
4. Mainframes Matter:
– High Value App : most of the biggest customers bringing
mainframe (DB2/z, IMS, VSAM) data to Hadoop
5. Multiple Projects / Programs using Hadoop:
– Larger Customers: most of the biggest customers have multiple
Hadoop projects running in parallel, some are IT led (DW/ETL
Offload) and others are LoB led (Discovery/Telematics)
6. Customers are Starting in Phases:
– By Value: IT led vs. LoB led initiatives have different
characteristics – even if the “Lake / Reservoir” factors in as a
long term goal, the initial phases are often quite small in scale
7. Size of Hadoop Clusters vary widely:
– Investment Sizes Differ (by a lot): some “start” with mega-
commitments (1000’s of Nodes) and others start very small
8. Commodity H/W Clusters Dominate:
– Commodity: for use cases designed to work across groups
– Appliances: for use cases attached to a single project
9. Data Lakes as a Way to Handle Vendor Diversity:
– Middleware for Data: bigger customers have DWs/DBs from
every vendor and >6+ different BI tools; Hadoop is becoming the
“canonical” data platform to sit in between
10. Open Source Data Platform is a Strategic Priority:
– Senior Stakeholder Feedback: as a design point priority for their
“next gen” it is becoming more important that Open Source has
a central role to play in the enterprise data platform
11. Industry Clusters:
– 1. Banking, 2. Insurance, 3. Manufacturing, 4. Media, 4. Retail
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Agenda
Oracle Data Integration for Big Data
Big Data Patterns
A Practitioner’s View on Oracle Data Integration for Big Data
Q & A
1
2
3
4
T : @markrittman
THOUGHTS ON ORACLE DATA INTEGRATION
FOR BIG DATA - A PRACTITIONER'S VIEW
Mark Rittman, Oracle ACE Director
ORACLE OPENWORLD 2016, SAN FRANCISCO
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Oracle ACE Director, blogger + ODTUG member
•Regular columnist for Oracle Magazine
•Past ODTUG Executive Board Member
•Author of two books on Oracle BI
•Co-founder of Rittman Mead, now independent analyst
•15+ Years in Oracle BI, DW, ETL + now Big Data
•Based in Brighton, UK
About the Presenter
31
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Every engagement and customer discussion has Big Data central to the project
• Hadoop extending traditional DWs through scalability, flexibility, cost, RDBMS -compatibility
• Hadoop as the ETL engine driven by ODI Big Data KMs
• New datatypes and methods of analysis enabled by Hadoop schema-on-read
• Project innovation driven by machine learning, streaming, ability to store + keep *all* data
Big Data Technology Core to Modern BI Platforms
32
•And what is driving the interest in these projects…?
Data Reservoir
Oracle
Data Visualization
Oracle Big Data Platform
Oracle Big Data Discovery
Safe & secure Discovery and Development
environment
Data sets and
samples
Models and
programs
Marketing /
Sales Applications
Models
Machine
Learning
Segments
Operational Data
Transactions
Customer
Master ata
Event, Social +
Unstructured Data
Voice + Chat
Transcripts
Data Factory
OGG for
Big Data 12c
Oracle
Stream
Analytics
Data streams
ODI12c
Raw
Customer Data
Data stored in
the original
format (usually
files) such as
SS7, ASN.1,
JSON etc.
Mapped
Customer Data
Data sets
produced by
mapping and
transforming
raw data
Oracle
Data
Preparation
Oracle Big Data Appliance
Starter Rack + Expansion
• Cloudera CDH + Oracle software
• 18 High-spec Hadoop Nodes with
InfiniBand switches for internal Hadoop
traffic, optimised for network throughput
• 1 Cisco Management Switch
• Single place for support for H/W + S/W
Oracle Big Data Appliance
Starter Rack + Expansion
• Cloudera CDH + Oracle software
• 18 High-spec Hadoop Nodes with
InfiniBand switches for internal Hadoop
traffic, optimised for network throughput
• 1 Cisco Management Switch
• Single place for support for H/W + S/W
Enriched
Customer Profile
Modeling
Scoring
Infiniband
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman33
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Data from all the sources will need to be integrated to create the single customer view
• Hadoop technologies (Flume, Kafka, Storm) can be used to ingest events, log data
• Files can be loaded “as is” into the HDFS filesystem
• Oracle/DB data can be bulk-loaded using Sqoop
• GoldenGate for trickle-feeding transactional data
•But nature of new data sources brings challenges
• May be semi-structured or unknown schema
• Joining schema-free datasets
•Need to consider quality and resolve incorrect,
incomplete, and inconsistent customer data
The Big data Secret? IT’s all about Data Integration
35
Single Customer
View
Enriched
CustomerProfile
M/L
“How”
Chat
“What”“Who”
“Why”
Data from
structured +
schema-on-read
sources needs
integrating
Requires
preparation +
obfuscation
Streaming
sources with
JSON payloads
Apply Schema to
Raw and Semi-
Structured Data
Heterogeneous
Enterprise +
Web sources
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Finding raw data is easy; then the real work needs to be done - can be > 90% of project
•Four main tasks to land, prepare and integrate raw data to turn it into a customer profile
1. Ingest it in real-time into the data reservoir
2. Apply Schema to Raw and Semi-Structured Data
3. Remove Sensitive Data from Any Input Files
4. Transform and map into your Customer 360-degree profile
Landing, Preparing and Securing Raw Data is *Hard*
36
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Data enrichment tool aimed at domain experts, not programmers
•Uses machine-learning to automate
data classification + profiling steps
•Automatically highlight sensitive data,
and offer to redact or obfuscate
•Dramatically reduce the time required
to onboard new data sources
•Hosted in Oracle Cloud for zero-install
• File upload and download from browser
• Automate for production data loads
Oracle Big Data Preparation Cloud Service
37
Raw Data
Data stored in the original
format (usually files) such
as SS7, ASN.1, JSON etc.
Mapped Data
Data sets produced by
mapping and transforming
raw data
Voice + Chat
Transcripts
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
Step 2: Apply Schema to Raw and Semi-Structured Data
38
NLP
Embedded Information in
unstructured text
Entities
Embedded Information
No reliable patterns
Invalid and missing data
Sensitive data
Invalid
emails
Stream from
APIs, HTTP:
Moderate
Batch Load
from files, DB:
Easy
Load raw text from
blog entries,
reviews
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Automatically profile and analyse datasets
•Use Machine Learning to spot and obfuscate sensitive data automatically
Step 3: Remove Sensitive Data from Any Input Files
39
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Oracle Data Integration offers a wider set of products for managing Customer 360 data
•Oracle GoldenGate
•Oracle Enterprise Data Quality
•Oracle Data Integrator
•Oracle Enterprise Metadata
Management
•All Hadoop enabled
•Works across Big Data,
Relational and Cloud
Step 4 : Transform, Join + Map into Polyglot Data Stores
40
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Projects build yesterday using MapReduce today need to be rewritten in Spark
• Then Spark needs to be upgraded to Spark Streaming + Kafka for real time…
• Upgrades, and replatforming onto the latest tech, can bring “fragile” initiatives to a halt
•ODI’s pluggable KM approach to big data integration makes tech upgrades simple
•Focus time + investment on new big data initiatives
• Not rewriting fragile hand-coded scripts
Future-Proof Big Data Integration Platform
41
41
Discovery & Development Labs
Safe & secure Discovery and Development environment
Data
Warehouse
Curated data :
Historical view and
business aligned
access
ODI
Desktop
Client
Big Data Management Platform
Data sets and
samples
Models and programs
Big Data Platform - All Running Natively Under Hadoop
YARN (Cluster Resource Management)
Hive + Pig
(Log processing,
UDFs etc)
HDFS (Cluster Filesystem holding raw data)
Kafka + Spark
Streaming
Apache
Beam?
Enriched
Customer Profile
Modeling
Scoring
Spark
(In-Memory
Data Processing)
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•Big data projects have had it “easy” so far in terms of data quality + data provenance
• Innovation labs + schema-on-read prioritise discovery + insight, not accuracy and audit trails
• But a data reservoir without any cleansing, management + data quality = data cesspool
• … and nobody knows where all the contamination came from, or who made it worse
And the Next Challenge : Data Quality + Provenance
42
(C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
•From my perspective, this is what makes Oracle Data Integration my Hadoop DI platform of choice
•Most vendors can load and transform data in Hadoop (not as well, but basic capability)
•Only Oracle have the tools to tackle
tomorrow’s Big Data challenge:
Data Quality + Data Governance
• Oracle Enterprise Data Quality
• Oracle Enteprise Metadata Mgmt
•Seamlessly integrated with ODI
•Brings enterprise “smarts” to
less mature Big Data projects
Data Governance : Why I Recommend Oracle DI
Tools
43
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Presen-
tations
on:
Oracle Confidential 44
Data Integration Solutions Program - tinyurl.com/DISOOW16
Demo
Stations:
Hands-
on labs:
Oracle
Enterprise
Metadata
Management
Oracle
Enterprise
Data Quality
Oracle
GoldenGate
Oracle
Data
Integrator
Oracle
Big Data
Preparation
Cloud Service
Oracle
Enterprise
Data Quality
HOL7466
Oracle
GoldenGate
Deep Dive
HOL7528
ODI and OGG
for Big Data
HOL7434
Oracle Big Data
Preparation
Cloud Service
HOL7432
Middleware
Demoground
- Moscone South
Big Data
Showcase
- Moscone South
Database
Demoground
- Moscone South
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 45
Data Integration Solutions Program - tinyurl.com/DISOOW16
Monday, Sept 19
• Oracle Data Integration Solutions – Platform Overview and Roadmap
[CON6619 ]
• Oracle Data Integration: the Foundation for Cloud Integration [CON6620 ]
• A Practical Path to Enterprise Data Governance with Cummins [CON6621]
• Oracle Data Integrator Product Update and Strategy [CON6622]
• Deep Dive into Oracle GoldenGate 12.3 New Features for the Oracle 12.2
Database [CON6555]
Tuesday, Sept 20
• Oracle Big Data Integration in the Cloud [CON7472]
• Oracle Data Integration Platform: a Cornerstone for Big Data [CON6624]
• Oracle Data Integrator and Oracle GoldenGate for Big Data [HOL7434]
• Oracle Enterprise Data Quality – Product Overview and Roadmap
[CON6627]
• Self Service Data Preparation for Domain Experts – No Programming
Required [CON6630]
• Oracle Big Data Preparation Cloud Service: Self-Service Data Prep for
Business Users [HOL7432]
• Oracle GoldenGate 12.3 Product Update and Strategy [CON6631]
• New GoldenGate 12.3 Services Architecture [CON6551]
• Meet the Experts: Oracle GoldenGate Cloud Service [MTE7119]
Wednesday, Sept 21
• Data Quality for the Cloud: Enabling Cloud Applications with Trusted Data
[CON6629]
• Transforming Streaming Analytical Business Intelligence to Business
Advantage [CON7352]
• Oracle Enterprise Data Quality for All Types of Data [HOL7466]
• Oracle GoldenGate for Big Data [CON6632]
• Accelerate Cloud On-Boarding using Oracle GoldenGate Cloud Service
[CON6633]
• Oracle GoldenGate Deep Dive and Oracle GoldenGate Cloud Service for Cloud
Onboarding [HOL7528]
Thursday, Sept 22
• Best Practices for Migrating to Oracle Data Integrator [CON6623]
• Best Practices for Oracle Data Integrator: Hear from the Experts [CON6625]
• Dataflow, Machine Learning and Streaming Big Data Preparation [CON6626]
• Data Governance with Oracle Enterprise Data Quality and Metadata
Management [CON6628]
• Faster Design, Development and Deployment with Oracle GoldenGate Studio
[CON6634]
• Getting started with Oracle GoldenGate [CON7318]
• Best Practice for High Availability and Performance Tuning for Oracle
GoldenGate [CON6558]
Copyright © 2016 Oracle and/or its affiliates. All rights reserved. |
Oracle Cloud Platform
Innovation Awards
Meet the Most Impressive Cloud
Platform Innovators
• Meet peers who implemented
cutting-edge solutions with Oracle
Cloud Platform
• Learn how you can Transform your
Business
No registration or OpenWorld pass required to attend
Oracle PaaS Customer
Appreciation Reception
Tuesday, Sep 20, 4:00 p.m. - 6:00 p.m.
YBCA Theater | 701 Mission St
Meet the Most Impressive Cloud
Platform Innovators
• FREE Appreciation Reception for all
Oracle PaaS Customers directly
following the Innovation Awards
Ceremony
No OpenWorld pass is required to attend this reception
Tuesday, Sep 20, 6:00 p.m. - 8:30 p.m.
YBCA Theater | 701 Mission St
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Connect with Oracle Data Integration
@OracleDI
Blogs.oracle.com/DataIntegration/
Oracle Data Integration
Oracle Data Integration
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Agenda
Oracle Data Integration for Big Data
Big Data Patterns
A Practitioner’s View on Oracle Data integration for Big Data
Q & A
1
2
3
4
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 49
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Copyright © 2015, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The preceding is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.
Oracle Confidential 51

More Related Content

What's hot

IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
Mark Rittman
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
Hortonworks
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Diego Alberto Tamayo
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
DataWorks Summit/Hadoop Summit
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Hortonworks
 
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
avanttic Consultoría Tecnológica
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
DataWorks Summit
 
Beyond TCO
Beyond TCOBeyond TCO
A7 storytelling with_oracle_analytics_cloud
A7 storytelling with_oracle_analytics_cloudA7 storytelling with_oracle_analytics_cloud
A7 storytelling with_oracle_analytics_cloud
Dr. Wilfred Lin (Ph.D.)
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
Riccardo Romani
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
Cuneyt Goksu
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
DataWorks Summit
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Real-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache KafkaReal-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache Kafka
Carole Gunst
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Hortonworks
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
DataWorks Summit/Hadoop Summit
 
Priyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQLPriyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQL
The Hive
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
DataWorks Summit/Hadoop Summit
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
DataWorks Summit/Hadoop Summit
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
DataWorks Summit
 

What's hot (20)

IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) ...
 
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
Meetup Oracle Database MAD_BCN: 1.2 Oracle Database 18c (autonomous database)
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
A7 storytelling with_oracle_analytics_cloud
A7 storytelling with_oracle_analytics_cloudA7 storytelling with_oracle_analytics_cloud
A7 storytelling with_oracle_analytics_cloud
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Real-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache KafkaReal-time Data Pipelines with SAP and Apache Kafka
Real-time Data Pipelines with SAP and Apache Kafka
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & TrifactaExtend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
Extend Governance in Hadoop with Atlas Ecosystem: Waterline, Attivo & Trifacta
 
Priyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQLPriyank Patel, Teradata, Hadoop & SQL
Priyank Patel, Teradata, Hadoop & SQL
 
The DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to ProductionThe DAP - Where YARN, HBase, Kafka and Spark go to Production
The DAP - Where YARN, HBase, Kafka and Spark go to Production
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 

Viewers also liked

Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
Michael Rainey
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
Alasdair Gray
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
David Lauzon
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
solarisyougood
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
Mark Rittman
 
Extending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data PlatformExtending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data Platform
DataWorks Summit/Hadoop Summit
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
Mark Rittman
 

Viewers also liked (7)

Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
 
Oracle big data appliance and solutions
Oracle big data appliance and solutionsOracle big data appliance and solutions
Oracle big data appliance and solutions
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Extending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data PlatformExtending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data Platform
 
The Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data PlatformsThe Future of Analytics, Data Integration and BI on Big Data Platforms
The Future of Analytics, Data Integration and BI on Big Data Platforms
 

Similar to Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)

Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
Jeffrey T. Pollock
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Jeffrey T. Pollock
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
Dataconomy Media
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
Jeffrey T. Pollock
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
Maria Colgan
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
Jeffrey T. Pollock
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
MarketingArrowECS_CZ
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
Jeffrey T. Pollock
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
Denodo
 
OData External Data Integration Strategies for SaaS
OData External Data Integration Strategies for SaaSOData External Data Integration Strategies for SaaS
OData External Data Integration Strategies for SaaS
Sumit Sarkar
 
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
jstrobl
 
Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users
Senturus
 
Best Bigquery ETL Tool
Best Bigquery ETL ToolBest Bigquery ETL Tool
Best Bigquery ETL Tool
Lyftron Data
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle Applications
Ray Février
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Denodo
 
8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare
Julianna DeLua
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
Fran Navarro
 

Similar to Data Integration for Big Data (OOW 2016, Co-Presented With Oracle) (20)

Oracle Data Integration - Overview
Oracle Data Integration - OverviewOracle Data Integration - Overview
Oracle Data Integration - Overview
 
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
OData External Data Integration Strategies for SaaS
OData External Data Integration Strategies for SaaSOData External Data Integration Strategies for SaaS
OData External Data Integration Strategies for SaaS
 
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"AGIT 2015  - Hans Viehmann: "Big Data and Smart Cities"
AGIT 2015 - Hans Viehmann: "Big Data and Smart Cities"
 
Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users Data Integration for Both Self-Service Analytics and IT Users
Data Integration for Both Self-Service Analytics and IT Users
 
Best Bigquery ETL Tool
Best Bigquery ETL ToolBest Bigquery ETL Tool
Best Bigquery ETL Tool
 
Analytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle ApplicationsAnalytics and Lakehouse Integration Options for Oracle Applications
Analytics and Lakehouse Integration Options for Oracle Applications
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and VisualizationAccelerate Self-Service Analytics with Data Virtualization and Visualization
Accelerate Self-Service Analytics with Data Virtualization and Visualization
 
8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare8.17.11 big data and hadoop with informatica slideshare
8.17.11 big data and hadoop with informatica slideshare
 
Big data oracle_introduccion
Big data oracle_introduccionBig data oracle_introduccion
Big data oracle_introduccion
 

More from Rittman Analytics

From Zero to One with Rittman Analytics
From Zero to One with Rittman AnalyticsFrom Zero to One with Rittman Analytics
From Zero to One with Rittman Analytics
Rittman Analytics
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big Data
Rittman Analytics
 
User Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity ModelUser Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity Model
Rittman Analytics
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
Rittman Analytics
 
Planning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data WarehousingPlanning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data Warehousing
Rittman Analytics
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big Data
Rittman Analytics
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Rittman Analytics
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
Rittman Analytics
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Rittman Analytics
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
Rittman Analytics
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Rittman Analytics
 
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Rittman Analytics
 
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 HoursAnalytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Rittman Analytics
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
Rittman Analytics
 
Petabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and LookerPetabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and Looker
Rittman Analytics
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Rittman Analytics
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Rittman Analytics
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
Rittman Analytics
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
Rittman Analytics
 

More from Rittman Analytics (19)

From Zero to One with Rittman Analytics
From Zero to One with Rittman AnalyticsFrom Zero to One with Rittman Analytics
From Zero to One with Rittman Analytics
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big Data
 
User Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity ModelUser Engagement Analysis using the new Looker System Activity Model
User Engagement Analysis using the new Looker System Activity Model
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 
Planning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data WarehousingPlanning a Strategy for Autonomous Analytics and Data Warehousing
Planning a Strategy for Autonomous Analytics and Data Warehousing
 
Where Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big DataWhere Digital Analytics is taking BI and Big Data
Where Digital Analytics is taking BI and Big Data
 
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data WarehouseData Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
Data Warehouse Like a Tech Startup with Oracle Autonomous Data Warehouse
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
 
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake EditionFrom BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
From BI Developer to Data Engineer with Oracle Analytics Cloud Data Lake Edition
 
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations DataUsing Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
Using Google Cloud Dataprep to Wrangle Strava, Fitbit and Google Locations Data
 
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
Using Data & Analytics To Find Out How Much Daily Mail Readers Hate Me (and W...
 
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 HoursAnalytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
Analytics, BigQuery, Looker and How I Became an Internet Meme for 48 Hours
 
Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17Analytics is Taking over the World (Again) - UKOUG Tech'17
Analytics is Taking over the World (Again) - UKOUG Tech'17
 
Petabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and LookerPetabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and Looker
 
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...
 
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
Data Integration and Data Warehousing for Cloud, Big Data and IoT: 
What’s Ne...
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 

Recently uploaded

Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
sheetal singh$A17
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
fatima shekh$A17
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
saadkhan1485265
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
6459astrid
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
Research proposal seminar ,Research Methodology
Research proposal seminar ,Research MethodologyResearch proposal seminar ,Research Methodology
Research proposal seminar ,Research Methodology
doctorzlife786
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
ginni singh$A17
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
dizzycaye
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
huseindihon
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
kinni singh$A17
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
tanupasswan6
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
gargnatasha985
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
kuldeepsharmaks8120
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
Kanchana Weerasinghe
 
CHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptxCHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptx
girewiy968
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
TARIKU ENDALE
 
Welcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what yourWelcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what your
Virni Arrora
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
Joel Ngushwai
 

Recently uploaded (20)

Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
 
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Nagpur 000XX00000 Provide Best And Top Girl Service And No1 i...
 
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
Premium Girls Call Navi Mumbai 🎈🔥9920725232 🔥💋🎈 Provide Best And Top Girl Ser...
 
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy DsouzaOpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
OpenMetadata Spotlight - OpenMetadata @ Aspire by Vinol Joy Dsouza
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
Research proposal seminar ,Research Methodology
Research proposal seminar ,Research MethodologyResearch proposal seminar ,Research Methodology
Research proposal seminar ,Research Methodology
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
 
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
Female Service Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Se...
 
potential development of the A* search algorithm specifically
potential development of the A* search algorithm specificallypotential development of the A* search algorithm specifically
potential development of the A* search algorithm specifically
 
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
Noida Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Ava...
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
 
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
 
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Nashik  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Nashik 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
 
DataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptxDataScienceConcept_Kanchana_Weerasinghe.pptx
DataScienceConcept_Kanchana_Weerasinghe.pptx
 
CHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptxCHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptx
 
Supervised Learning (Data Science).pptx
Supervised Learning  (Data Science).pptxSupervised Learning  (Data Science).pptx
Supervised Learning (Data Science).pptx
 
Welcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what yourWelcome back to Instagram. Sign in to check out what your
Welcome back to Instagram. Sign in to check out what your
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
 

Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)

  • 1. Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | CON6624 Oracle Data Integration Platform A Cornerstone for Big Data Christophe Dupupet (@XofDup) Director | A-Team Mark Rittman (@markrittman) Independent Analyst Julien Testut (@JulienTestut) Senior Principal Product Manager September, 2016 Confidential – Oracle Internal/Restricted/Highly Restricted
  • 2. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Oracle Confidential 1
  • 3. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Agenda Oracle Data Integration for Big Data Big Data Patterns A Practitioner’s View on Oracle Data integration for Big Data Q & A 1 2 3 4
  • 4. Five Core Capabilities 1. Business ContinuityDATA ALWAYS AVAILABLE 2. Data Movement DATA ANYWHERE IT’S NEEDED 3. Data TransformationDATA ACCESSIBLE IN ANY FORMAT 4. Data GovernanceDATA THAT CAN BE TRUSTED 5. Streaming Data DATA IN MOTION OR AT REST
  • 5. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 5 Eight Core Products Cloud or On- Premise
  • 6. Most Innovative Technology #1 #1 Realtime / Streaming Data Integration Tool Pushdown / E-LT Data Integration Tool 1st to certify replication with Streaming Big Data 1st to certify E-LT tool with Apache Spark/Python 1st to power Data Preparation w/ML + NLP + Graph Data 1st to offer Self-Service & Hybrid Cloud solution
  • 7. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 7 Hybrid Open-Source ...Open Source at the core of speed & batch processing engines ...Enterprise Vendor tools for connecting to existing IT system and ...Cloud Platforms for data fabric Business Data Serving Layer Apps Analytics Batch Layer Data Streams Social and Logs Enterprise Data Highly Available Databases Pub / Sub REST APIs NoSQL Bulk Data Speed Layer Raw Data Stream Processing Batch Processing Prepared Data
  • 8. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Examples Oracle Confidential 8 Reference Architecture Business Data Serving Layer Apps Analytics Batch Layer Data Streams Social and Logs Enterprise Data Highly Available Databases Pub / Sub REST APIs NoSQL Bulk Data Speed Layer GoldenGate Data Preparation Data Quality, Metadata Management & Business Glossary Oracle Data Integrator Active DataGuard Comprehensive architecture covers key areas – #1. Data Ingestion, #2. Data Preparation & Transformation, #3. Streaming Big Data, #4. Parallel Connectivity, and #5. Data Governance – and Oracle Data Integration has it covered. Dataflow ML Stream Analytics Connectors
  • 9. Oracle GoldenGate Realtime Performance Extensible & Flexible Proven & Reliable Oracle GoldenGate provides low-impact capture, routing, transformation, and delivery of database transactions across homogeneous and heterogeneous environments in real-time with no distance limitations. Most Databases Data Events Transaction Streams Cloud DBs Big Data Supports Databases, Big Data and NoSQL: * The most popular enterprise integration tool in history
  • 10. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | ApplicationsApplications DatabusApplications Speed Layer Batch Layer Capture Trail Route Deliver Pump Oracle Confidential 10 Streaming Analytics Application Serving Layer REST Services Visualization Tools Reporting Tools Data Marts User Updates DBMS Updates GoldenGate for Ingest GG GG Applications Serving Layer Speed Layer Batch Layer Platforms
  • 11. Self-Service Better Recommendations Built-in Data Graph Zero software to install, easy to use browser based interface Better automation and less grunt work for humans Graph database of real-world facts used for enrichment Oracle Data Preparation ReportingApps Files ETL Oracle Data Preparation is a self-service tool that makes it simple to transform, prepare, enrich and standardize business data – it can help IT accelerate solutions for the Business by giving control of data formatting directly to data analysts.
  • 12. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 12 MONTHS of effort spent on each new dataset PROGRAMERS writing scripts or complex ETL DATA WRANGLING wastes time and money “Big Data’s dirty little secret is that 90% of time spent on a project is devoted to preparing data… After all the preparation work, there isn’t enough time left to do sophisticated analytics on it…” Thomas H. Davenport Internet Logs UNSTRUCTUREDSTRUCTURED Discovery & Visualization Enterprise Reporting Enterprise ETL & Data Integration BUSINESS VALUEOPPORTUNITY Weeks or Months I want my data!! BDP for Data Preparation
  • 13. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Data Integrator Bulk Data Performance Non Invasive Footprint Future Proof IT Skills Oracle Data Integrator provides high performance bulk data movement, massively parallel data transformation using database or big data technologies, and block-level data loading that leverages native data utilities Bulk Data Transformation Most Apps, Databases & Cloud Bulk Data Movement Cloud DBs Big Data 1000’s of customers – more than other ETL tools Flexible ELT workloads run anywhere: DBs, Big Data, Cloud Up to 2x faster batch processes and 3x more efficient tooling
  • 14. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 14 ODI for Transformations ETL Engines Big Data Frameworks Speed Layer Batch Layer Serving Layer ApplicationsApplications DatabusApplications Application REST Services Visualization Tools Reporting Tools Data Marts User Updates DBMS Updates Applications Serving Layer Speed Layer Batch Layer Oracle Data Integrator Spark Streaming Spark SQLSqoop ERP Oozie Pig Hive Loaders Kafka NoSQL OGG SQL
  • 15. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 15 No ETL engine is required Separation of Logical and Physical design Physical exec on SQL, Hive, Pig, or Spark Runtime exec in Oozie or via ODI Java Agent Rich set of pre- built operators User defined functions Business Value of ODI: Only Tool with Portable Mappings
  • 16. Business Friendly Extreme Performance Spatial Awareness Oracle Stream Analytics DB Web / Devices Data Event Data & Transaction Streams Downstream (eg; Hadoop) Data Event Oracle Stream Analytics is a powerful analytic toolkit designed to work directly on data in motion – simple data correlations, complex event processing, geo-fencing, and advanced dashboards run on millions of events per second. Innovative dual model for Apache Spark or Coherence grid Simple to use spatial and geo- fencing features an industry first Includes Oracle GoldenGate for streaming transactions
  • 17. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Stream or Batch Data Spark based Pipelines ML-powered Profiling Oracle Dataflow ML Oracle Dataflow ML is big data solution for stream and batch processing in a single environment – Lambda based applications that can run streaming ETL for cloud based analytic solutions. Batch and stream processing at the same time Machine learning guides users for data profiling Data movement across Oracle PaaS services Most Apps, Databases & Cloud Bulk Data Movement Streaming Data Cloud DBs Big Data Big Data Pipeline
  • 18. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | from Devices Batch Layer Oracle Confidential 18 Streaming Data ApplicationsApplications Databus Applications Speed Layer Serving Layer REST Services Visualization Tools Reporting Tools Data Marts Applications Serving Layer Speed Layer Batch Layer Oracle Stream Analytics Oracle Dataflow ML Oracle GoldenGate Application ApplicationsApplicationsDevices from Databases
  • 19. Business Glossary End-to-End Lineage 100+ Supported Systems Oracle Metadata Management Oracle Metadata Management provides an integrated toolkit that combines business glossary, workflow, metadata harvesting and rich data steward collaboration features. Supports Databases, Big Data, ETL Tools, BI Tools etc: BI Report Lineage Taxonomy Lineage Data Model Lineage
  • 20. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Data Catalog Speed Layer Batch Layer Serving Layer Oracle Confidential 20 OEMM for Data Governance ApplicationsApplications DatabusApplications Application REST Services Visualization Tools Reporting Tools Data Marts User Updates DBMS Updates Applications Serving Layer Speed Layer Batch Layer Kafka Generated Streaming Generated ETL CodeSqoop OLTP Databases HDFS Files HCatalog Hive NoSQL ETL Tools Data Warehouses BI Models ER Models Oracle Enterprise Metadata Management 140+ Supported Tools
  • 21. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 21 Eight Core Products Cloud or On- Premise
  • 22. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Agenda Oracle Data Integration for Big Data Big Data Patterns A Practitioner’s View on Oracle Data integration for Big Data Q & A 1 2 3 4
  • 23. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Leverage Wide Range of Modern Analytic Styles 4 Business Patterns of Big Data Customer Adoption Oracle Confidential, under Non-Disclosure 23 DBMS (on prem or cloud) Sandbox ETL Offload Staging Deep Data Storage 1. Analytic Data Sandbox: – Stakeholder: Functional Line of Business (LoB) – Core Value: Faster access to business data, Faster time to value on Analytics – Innovation: Schema-on-read empowers rapid data staging and true Data Discovery 2. ETL Offload: – Stakeholder: Information Technology (IT) – Core Value: Cost avoidance on DW/Marts – Innovation: YARN/Hadoop empowers lower cost compute and lower cost storage 3. Deep Data Storage: – Stakeholder: Risk / Compliance (LoB) – Core Value: High fidelity aged data – Innovation: SQL on Hadoop engines enable very low cost, queryable data access 4. Streaming: – Stakeholder: Marketing (LoB) / Telematics (LoB) – Core Value: New Data Services or Higher Click Rates – Innovation: MPP capable streaming platforms combined with modern in-motion analytics Data First Analytics Model First Analytics In-Motion Analytics Streaming
  • 24. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Discovery, Exploratory and Visualization Style Analytics • Oracle Endeca, Big Data Discovery • Tableau, Cliq, Spotfire • DataMeer etc Business Intelligence, Reporting and Dashboard Style Analytics • Oracle BIEE, Visual Analyzer • Cognos, SAS, MicroStrategy • Business Objects, Actuate etc Analytic Data Sandbox Oracle Confidential, under Non-Disclosure 24 Analytic Data Sandbox: – Stakeholder: Functional Line of Business (LoB) – Core Value: Faster access to business data, Faster time to value on Analytics – Innovation: Schema-on-read empowers rapid data staging and true Data Discovery – Industries: All industries Supports “Data First” Style of Analytics – No schema required – Staging data is simple and fast – Minimal data preparation required (mainly for un/semi-structured data sets) Typical Customer Data Types / Sets – Usually bringing in Structured Data from OLTP (Primary data is their existing Application data) – Often bringing in Semi-Structured data (Secondary data is clickstream, logs, machine data) – Business value is usually in the combination of the various data sets and the improved speed of discovery DBMS (on prem or cloud) Sandbox ETL Offload Staging Data First Analytics Model First Analytics Often the data flow may not require any ETL Tooling Other data flows may still require ETL as a pipeline BI Self Service
  • 25. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Discovery, Exploratory and Visualization Style Analytics • Oracle Endeca, Big Data Discovery • Tableau, Cliq, Spotfire • DataMeer etc Business Intelligence, Reporting and Dashboard Style Analytics • Oracle BIEE, Visual Analyzer • Cognos, SAS, MicroStrategy • Business Objects, Actuate etc ETL Offload Oracle Confidential, under Non-Disclosure 25 DBMS (on prem or cloud) Sandbox ETL Offload Staging 2. ETL Offload: – Stakeholder: Information Technology (IT) – Core Value: Cost avoidance on DW/Marts – Innovation: YARN/Hadoop empowers lower cost compute and lower cost storage – Industries: Teradata, Netezza & AbInitio customers Supports “Model First” Style of Analytics – Schemas required (for working areas, sources and targets) – Staging data requires modeled staging tables – Data preparation required (mapping data sets) (un/semi-structured data sets require pre-parsing) Typical Customer Data Types / Sets – Usually bringing in Structured Data from OLTP Apps (Primary data is their existing Application data) – Occasionally adding new data types to EDW schema (Secondary data is clickstream, logs, machine data) – Business value is usually tied to the “cost avoidance” around escalating DW and ETL tooling costs Data First Analytics Model First Analytics Primary Data Flow Requires Data Integration Tools
  • 26. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Discovery, Exploratory and Visualization Style Analytics • Oracle Endeca, Big Data Discovery • Tableau, Cliq, Spotfire • DataMeer etc Business Intelligence, Reporting and Dashboard Style Analytics • Oracle BIEE, Visual Analyzer • Cognos, SAS, MicroStrategy • Business Objects, Actuate etc Deep Data Storage Oracle Confidential, under Non-Disclosure 26 DBMS (on prem or cloud) Sandbox ETL Offload Staging Deep Data Storage 3. Deep Data Storage: – Stakeholder: Risk / Compliance (LoB) – Core Value: High fidelity aged data – Innovation: SQL on Hadoop engines enable very low cost, queryable data access – Industries: Insurance and Banking Typically Deep Storage of Relational Data – Schemas required (item detail records, not necessarily aggregates) – Archival can be “on the way in” as part of routine loading, and also via “periodic” pruning from the EDW and data marts Popular with SQL on Hadoop and Federation – Teradata Query Grid from Teradata/Aster – IBM BigSQL from Netezza/PureData – Oracle Big Data SQL from Exadata – Pivotal HAWQ from Greenplum – Cisco Composite Software also selling on this use case (in addition to BI Virtualization) Data First Analytics Model First Analytics Pattern mining Compliance Queryable Archive
  • 27. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Streaming Big Data Analytics Oracle Confidential, under Non-Disclosure 27 DBMS (on prem or cloud) Sandbox ETL Offload Staging Deep Data Storage 4. Streaming: – Stakeholder: Marketing (LoB) / Telematics (LoB) – Core Value: New Data Services or Higher Click Rates – Innovation: MPP capable streaming platforms combined with modern in-motion analytics – Industries: Automotive, Aerospace, Industrial Manufacturing, some Energy/Oil & Gas Decisions on Data Before it hits Disk – Data volume may be too high to persist all data • Only save the important data – Data may be highly repetitive (sensor data) – Correlations may need to happen with very low latency requirements based on LoB demand Key Use Case for “Data Monetization” – Customers are standing up new Data Services (eg; realtime equipment failure alerts and subscription based monitoring) – “Connected Car” services from most car makers – Disaster preparedness centers – Energy/Aerospace In-Motion Analytics Streaming Other data flows may still require ETL as a pipeline Data First Analytics Model First Analytics Pattern mining
  • 28. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Some Common Themes Across Use Cases Oracle Confidential, under Non-Disclosure 28 1. Nearly 100% Analytic Use Cases – Data Discovery directly in Hadoop – ETL Offloading for analytics in SQL DB – Deep Data Storage for analytics in SQL DB – Streaming Analytics for data before it hits disk – Lambda Arch 2. Nearly all the Data is Structured Data: – OLTP Sources: every customer starts with the trusted data sets that already drive the majority of business value – App Data – New Sources: Clickstream Logs, Machine Data and other App Exhaust all have “structure” even if they may not have schema 3. Many more Sources are App/OLTP Sources: – By Quantity of Sources: most customers have many (dozens or hundreds) of App/OLTP source they are bringing in – By Volume: by quantity of data, the amount of Machine Data or Log data may often exceed the OLTP data sets 4. Mainframes Matter: – High Value App : most of the biggest customers bringing mainframe (DB2/z, IMS, VSAM) data to Hadoop 5. Multiple Projects / Programs using Hadoop: – Larger Customers: most of the biggest customers have multiple Hadoop projects running in parallel, some are IT led (DW/ETL Offload) and others are LoB led (Discovery/Telematics) 6. Customers are Starting in Phases: – By Value: IT led vs. LoB led initiatives have different characteristics – even if the “Lake / Reservoir” factors in as a long term goal, the initial phases are often quite small in scale 7. Size of Hadoop Clusters vary widely: – Investment Sizes Differ (by a lot): some “start” with mega- commitments (1000’s of Nodes) and others start very small 8. Commodity H/W Clusters Dominate: – Commodity: for use cases designed to work across groups – Appliances: for use cases attached to a single project 9. Data Lakes as a Way to Handle Vendor Diversity: – Middleware for Data: bigger customers have DWs/DBs from every vendor and >6+ different BI tools; Hadoop is becoming the “canonical” data platform to sit in between 10. Open Source Data Platform is a Strategic Priority: – Senior Stakeholder Feedback: as a design point priority for their “next gen” it is becoming more important that Open Source has a central role to play in the enterprise data platform 11. Industry Clusters: – 1. Banking, 2. Insurance, 3. Manufacturing, 4. Media, 4. Retail
  • 29. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Agenda Oracle Data Integration for Big Data Big Data Patterns A Practitioner’s View on Oracle Data Integration for Big Data Q & A 1 2 3 4
  • 30. T : @markrittman THOUGHTS ON ORACLE DATA INTEGRATION FOR BIG DATA - A PRACTITIONER'S VIEW Mark Rittman, Oracle ACE Director ORACLE OPENWORLD 2016, SAN FRANCISCO
  • 31. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman •Oracle ACE Director, blogger + ODTUG member •Regular columnist for Oracle Magazine •Past ODTUG Executive Board Member •Author of two books on Oracle BI •Co-founder of Rittman Mead, now independent analyst •15+ Years in Oracle BI, DW, ETL + now Big Data •Based in Brighton, UK About the Presenter 31
  • 32. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman •Every engagement and customer discussion has Big Data central to the project • Hadoop extending traditional DWs through scalability, flexibility, cost, RDBMS -compatibility • Hadoop as the ETL engine driven by ODI Big Data KMs • New datatypes and methods of analysis enabled by Hadoop schema-on-read • Project innovation driven by machine learning, streaming, ability to store + keep *all* data Big Data Technology Core to Modern BI Platforms 32 •And what is driving the interest in these projects…? Data Reservoir Oracle Data Visualization Oracle Big Data Platform Oracle Big Data Discovery Safe & secure Discovery and Development environment Data sets and samples Models and programs Marketing / Sales Applications Models Machine Learning Segments Operational Data Transactions Customer Master ata Event, Social + Unstructured Data Voice + Chat Transcripts Data Factory OGG for Big Data 12c Oracle Stream Analytics Data streams ODI12c Raw Customer Data Data stored in the original format (usually files) such as SS7, ASN.1, JSON etc. Mapped Customer Data Data sets produced by mapping and transforming raw data Oracle Data Preparation Oracle Big Data Appliance Starter Rack + Expansion • Cloudera CDH + Oracle software • 18 High-spec Hadoop Nodes with InfiniBand switches for internal Hadoop traffic, optimised for network throughput • 1 Cisco Management Switch • Single place for support for H/W + S/W Oracle Big Data Appliance Starter Rack + Expansion • Cloudera CDH + Oracle software • 18 High-spec Hadoop Nodes with InfiniBand switches for internal Hadoop traffic, optimised for network throughput • 1 Cisco Management Switch • Single place for support for H/W + S/W Enriched Customer Profile Modeling Scoring Infiniband
  • 33. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman33
  • 34. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman
  • 35. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman •Data from all the sources will need to be integrated to create the single customer view • Hadoop technologies (Flume, Kafka, Storm) can be used to ingest events, log data • Files can be loaded “as is” into the HDFS filesystem • Oracle/DB data can be bulk-loaded using Sqoop • GoldenGate for trickle-feeding transactional data •But nature of new data sources brings challenges • May be semi-structured or unknown schema • Joining schema-free datasets •Need to consider quality and resolve incorrect, incomplete, and inconsistent customer data The Big data Secret? IT’s all about Data Integration 35 Single Customer View Enriched CustomerProfile M/L “How” Chat “What”“Who” “Why” Data from structured + schema-on-read sources needs integrating Requires preparation + obfuscation Streaming sources with JSON payloads Apply Schema to Raw and Semi- Structured Data Heterogeneous Enterprise + Web sources
  • 36. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman •Finding raw data is easy; then the real work needs to be done - can be > 90% of project •Four main tasks to land, prepare and integrate raw data to turn it into a customer profile 1. Ingest it in real-time into the data reservoir 2. Apply Schema to Raw and Semi-Structured Data 3. Remove Sensitive Data from Any Input Files 4. Transform and map into your Customer 360-degree profile Landing, Preparing and Securing Raw Data is *Hard* 36
  • 37. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman •Data enrichment tool aimed at domain experts, not programmers •Uses machine-learning to automate data classification + profiling steps •Automatically highlight sensitive data, and offer to redact or obfuscate •Dramatically reduce the time required to onboard new data sources •Hosted in Oracle Cloud for zero-install • File upload and download from browser • Automate for production data loads Oracle Big Data Preparation Cloud Service 37 Raw Data Data stored in the original format (usually files) such as SS7, ASN.1, JSON etc. Mapped Data Data sets produced by mapping and transforming raw data Voice + Chat Transcripts
  • 38. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman Step 2: Apply Schema to Raw and Semi-Structured Data 38 NLP Embedded Information in unstructured text Entities Embedded Information No reliable patterns Invalid and missing data Sensitive data Invalid emails Stream from APIs, HTTP: Moderate Batch Load from files, DB: Easy Load raw text from blog entries, reviews
  • 39. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman •Automatically profile and analyse datasets •Use Machine Learning to spot and obfuscate sensitive data automatically Step 3: Remove Sensitive Data from Any Input Files 39
  • 40. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman •Oracle Data Integration offers a wider set of products for managing Customer 360 data •Oracle GoldenGate •Oracle Enterprise Data Quality •Oracle Data Integrator •Oracle Enterprise Metadata Management •All Hadoop enabled •Works across Big Data, Relational and Cloud Step 4 : Transform, Join + Map into Polyglot Data Stores 40
  • 41. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman •Projects build yesterday using MapReduce today need to be rewritten in Spark • Then Spark needs to be upgraded to Spark Streaming + Kafka for real time… • Upgrades, and replatforming onto the latest tech, can bring “fragile” initiatives to a halt •ODI’s pluggable KM approach to big data integration makes tech upgrades simple •Focus time + investment on new big data initiatives • Not rewriting fragile hand-coded scripts Future-Proof Big Data Integration Platform 41 41 Discovery & Development Labs Safe & secure Discovery and Development environment Data Warehouse Curated data : Historical view and business aligned access ODI Desktop Client Big Data Management Platform Data sets and samples Models and programs Big Data Platform - All Running Natively Under Hadoop YARN (Cluster Resource Management) Hive + Pig (Log processing, UDFs etc) HDFS (Cluster Filesystem holding raw data) Kafka + Spark Streaming Apache Beam? Enriched Customer Profile Modeling Scoring Spark (In-Memory Data Processing)
  • 42. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman •Big data projects have had it “easy” so far in terms of data quality + data provenance • Innovation labs + schema-on-read prioritise discovery + insight, not accuracy and audit trails • But a data reservoir without any cleansing, management + data quality = data cesspool • … and nobody knows where all the contamination came from, or who made it worse And the Next Challenge : Data Quality + Provenance 42
  • 43. (C) Mark Rittman 2016 W: http://www.rittman.co.uk T : @markrittman •From my perspective, this is what makes Oracle Data Integration my Hadoop DI platform of choice •Most vendors can load and transform data in Hadoop (not as well, but basic capability) •Only Oracle have the tools to tackle tomorrow’s Big Data challenge: Data Quality + Data Governance • Oracle Enterprise Data Quality • Oracle Enteprise Metadata Mgmt •Seamlessly integrated with ODI •Brings enterprise “smarts” to less mature Big Data projects Data Governance : Why I Recommend Oracle DI Tools 43
  • 44. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Presen- tations on: Oracle Confidential 44 Data Integration Solutions Program - tinyurl.com/DISOOW16 Demo Stations: Hands- on labs: Oracle Enterprise Metadata Management Oracle Enterprise Data Quality Oracle GoldenGate Oracle Data Integrator Oracle Big Data Preparation Cloud Service Oracle Enterprise Data Quality HOL7466 Oracle GoldenGate Deep Dive HOL7528 ODI and OGG for Big Data HOL7434 Oracle Big Data Preparation Cloud Service HOL7432 Middleware Demoground - Moscone South Big Data Showcase - Moscone South Database Demoground - Moscone South
  • 45. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 45 Data Integration Solutions Program - tinyurl.com/DISOOW16 Monday, Sept 19 • Oracle Data Integration Solutions – Platform Overview and Roadmap [CON6619 ] • Oracle Data Integration: the Foundation for Cloud Integration [CON6620 ] • A Practical Path to Enterprise Data Governance with Cummins [CON6621] • Oracle Data Integrator Product Update and Strategy [CON6622] • Deep Dive into Oracle GoldenGate 12.3 New Features for the Oracle 12.2 Database [CON6555] Tuesday, Sept 20 • Oracle Big Data Integration in the Cloud [CON7472] • Oracle Data Integration Platform: a Cornerstone for Big Data [CON6624] • Oracle Data Integrator and Oracle GoldenGate for Big Data [HOL7434] • Oracle Enterprise Data Quality – Product Overview and Roadmap [CON6627] • Self Service Data Preparation for Domain Experts – No Programming Required [CON6630] • Oracle Big Data Preparation Cloud Service: Self-Service Data Prep for Business Users [HOL7432] • Oracle GoldenGate 12.3 Product Update and Strategy [CON6631] • New GoldenGate 12.3 Services Architecture [CON6551] • Meet the Experts: Oracle GoldenGate Cloud Service [MTE7119] Wednesday, Sept 21 • Data Quality for the Cloud: Enabling Cloud Applications with Trusted Data [CON6629] • Transforming Streaming Analytical Business Intelligence to Business Advantage [CON7352] • Oracle Enterprise Data Quality for All Types of Data [HOL7466] • Oracle GoldenGate for Big Data [CON6632] • Accelerate Cloud On-Boarding using Oracle GoldenGate Cloud Service [CON6633] • Oracle GoldenGate Deep Dive and Oracle GoldenGate Cloud Service for Cloud Onboarding [HOL7528] Thursday, Sept 22 • Best Practices for Migrating to Oracle Data Integrator [CON6623] • Best Practices for Oracle Data Integrator: Hear from the Experts [CON6625] • Dataflow, Machine Learning and Streaming Big Data Preparation [CON6626] • Data Governance with Oracle Enterprise Data Quality and Metadata Management [CON6628] • Faster Design, Development and Deployment with Oracle GoldenGate Studio [CON6634] • Getting started with Oracle GoldenGate [CON7318] • Best Practice for High Availability and Performance Tuning for Oracle GoldenGate [CON6558]
  • 46. Copyright © 2016 Oracle and/or its affiliates. All rights reserved. | Oracle Cloud Platform Innovation Awards Meet the Most Impressive Cloud Platform Innovators • Meet peers who implemented cutting-edge solutions with Oracle Cloud Platform • Learn how you can Transform your Business No registration or OpenWorld pass required to attend Oracle PaaS Customer Appreciation Reception Tuesday, Sep 20, 4:00 p.m. - 6:00 p.m. YBCA Theater | 701 Mission St Meet the Most Impressive Cloud Platform Innovators • FREE Appreciation Reception for all Oracle PaaS Customers directly following the Innovation Awards Ceremony No OpenWorld pass is required to attend this reception Tuesday, Sep 20, 6:00 p.m. - 8:30 p.m. YBCA Theater | 701 Mission St
  • 47. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Connect with Oracle Data Integration @OracleDI Blogs.oracle.com/DataIntegration/ Oracle Data Integration Oracle Data Integration
  • 48. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Agenda Oracle Data Integration for Big Data Big Data Patterns A Practitioner’s View on Oracle Data integration for Big Data Q & A 1 2 3 4
  • 49. Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 49
  • 51. Copyright © 2015, Oracle and/or its affiliates. All rights reserved. | Safe Harbor Statement The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Oracle Confidential 51