CON6619 - OpenWorld Presentation. Oracle data integration, big data, data governance, and cloud integration. Replication, ETL, Data Quality, Streaming Big Data, and Data Preparation
Oracle Data Integration overview, vision and roadmap. Covers GoldenGate, Data Integrator (ODI), Data Quality (EDQ), Metadata Management (MM) and Big Data Preparation (BDP)
Expand a Data warehouse with Hadoop and Big Datajdijcks
After investing years in the data warehouse, are you now supposed to start over? Nope. This session discusses how to leverage Hadoop and big data technologies to augment the data warehouse with new data, new capabilities and new business models.
Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks
Overview presentation showing Oracle Big Data Appliance and Oracle Big Data SQL in combination with why this really matters. Big Data SQL brings you the unique ability to analyze data across the entire spectrum of system, NoSQL, Hadoop and Oracle Database.
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
Set of product roadmap + capabilities slides from Oracle Data Integration Product Management, and thoughts on data integration on big data implementations by Mark Rittman (Independent Analyst)
CON6619 - OpenWorld Presentation. Oracle data integration, big data, data governance, and cloud integration. Replication, ETL, Data Quality, Streaming Big Data, and Data Preparation
Oracle Data Integration overview, vision and roadmap. Covers GoldenGate, Data Integrator (ODI), Data Quality (EDQ), Metadata Management (MM) and Big Data Preparation (BDP)
Expand a Data warehouse with Hadoop and Big Datajdijcks
After investing years in the data warehouse, are you now supposed to start over? Nope. This session discusses how to leverage Hadoop and big data technologies to augment the data warehouse with new data, new capabilities and new business models.
Oracle Big Data Appliance and Big Data SQL for advanced analyticsjdijcks
Overview presentation showing Oracle Big Data Appliance and Oracle Big Data SQL in combination with why this really matters. Big Data SQL brings you the unique ability to analyze data across the entire spectrum of system, NoSQL, Hadoop and Oracle Database.
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
Set of product roadmap + capabilities slides from Oracle Data Integration Product Management, and thoughts on data integration on big data implementations by Mark Rittman (Independent Analyst)
Presentation to discuss major shift in enterprise data management. Describes movement away from older hub and spoke data architecture and towards newer, more modern Kappa data architecture
Hortonworks Oracle Big Data Integration Hortonworks
Slides from joint Hortonworks and Oracle webinar on November 11, 2014. Covers the Modern Data Architecture with Apache Hadoop and Oracle Data Integration products.
Oracle's BigData solutions consist of a number of new products and solutions to support customers looking to gain maximum business value from data sets such as weblogs, social media feeds, smart meters, sensors and other devices that generate massive volumes of data (commonly defined as ‘Big Data’) that isn’t readily accessible in enterprise data warehouses and business intelligence applications today.
This is a brief technology introduction to Oracle Stream Analytics, and how to use the platform to develop streaming data pipelines that support a wide variety of industry use cases
Modern data management using Kappa and streaming architectures, including discussion by EBay's Connie Yang about the Rheos platform and the use of Oracle GoldenGate, Kafka, Flink, etc.
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
Tame Big Data with Oracle Data IntegrationMichael Rainey
In this session, Oracle Product Management covers how Oracle Data Integrator and Oracle GoldenGate are vital to big data initiatives across the enterprise, providing the movement, translation, and transformation of information and data not only heterogeneously but also in big data environments. Through a metadata-focused approach for cataloging, defining, and reusing big data technologies such as Hive, Hadoop Distributed File System (HDFS), HBase, Sqoop, Pig, Oracle Loader for Hadoop, Oracle SQL Connector for Hadoop Distributed File System, and additional big data projects, Oracle Data Integrator bridges the gap in the ability to unify data across these systems and helps deliver timely and trusted data to analytic and decision support platforms.
Co-presented with Alex Kotopoulis at Oracle OpenWorld 2014.
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
The Future of Data Integration: Data Mesh, and a Special Deep Dive into Stream Processing with GoldenGate, Apache Kafka and Apache Spark. This video is a replay of a Live Webinar hosted on 03/19/2020.
Join us for a timely 45min webinar to see our take on the future of Data Integration. As the global industry shift towards the “Fourth Industrial Revolution” continues, outmoded styles of centralized batch processing and ETL tooling continue to be replaced by realtime, streaming, microservices and distributed data architecture patterns.
This webinar will start with a brief look at the macro-trends happening around distributed data management and how that affects Data Integration. Next, we’ll discuss the event-driven integrations provided by GoldenGate Big Data, and continue with a deep-dive into some essential patterns we see when replicating Database change events into Apache Kafka. In this deep-dive we will explain how to effectively deal with issues like Transaction Consistency, Table/Topic Mappings, managing the DB Change Stream, and various Deployment Topologies to consider. Finally, we’ll wrap up with a brief look into how Stream Processing will help to empower modern Data Integration by supplying realtime data transformations, time-series analytics, and embedded Machine Learning from within data pipelines.
GoldenGate: https://www.oracle.com/middleware/tec...
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Oracle Solaris Build and Run Applications Better on 11.3OTN Systems Hub
Build and Run Applications Better on Oracle Solaris 11.3
Tech Day, NYC
Liane Praza, Senior Principal Software Engineer
Ikroop Dhillon, Principal Product Manager
June, 2016
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
Oracle Itay Systems Presales Team presents : Big Data in any flavor, on-prem, public cloud and cloud at customer.
Presentation done at Digital Transformation event - February 2017
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesSteven Feuerstein
Slides presented at moug.org's August 2018 conference. Covers the RADstack (REST - APEX - Database) + our community sites (AskTOM, LiveSQL and Dev Gym) + a whole bunch of cool new PL/SQL features. Search LiveSQL.oracle.com for scripts to match up with the features presented.
The Next Generation of Big Data AnalyticsHortonworks
Apache Hadoop has evolved rapidly to become a leading platform for managing and processing big data. If your organization is examining how you can use Hadoop to store, transform, and refine large volumes of multi-structured data, please join us for this session where we will discuss, the emergence of "big data" and opportunities for deriving business value, the evolution of Apache Hadoop and future directions, essential components required in a Hadoop-powered platform, and solution architectures that integrate Hadoop with existing data discovery and data warehouse platforms.
Presentation to discuss major shift in enterprise data management. Describes movement away from older hub and spoke data architecture and towards newer, more modern Kappa data architecture
Hortonworks Oracle Big Data Integration Hortonworks
Slides from joint Hortonworks and Oracle webinar on November 11, 2014. Covers the Modern Data Architecture with Apache Hadoop and Oracle Data Integration products.
Oracle's BigData solutions consist of a number of new products and solutions to support customers looking to gain maximum business value from data sets such as weblogs, social media feeds, smart meters, sensors and other devices that generate massive volumes of data (commonly defined as ‘Big Data’) that isn’t readily accessible in enterprise data warehouses and business intelligence applications today.
This is a brief technology introduction to Oracle Stream Analytics, and how to use the platform to develop streaming data pipelines that support a wide variety of industry use cases
Modern data management using Kappa and streaming architectures, including discussion by EBay's Connie Yang about the Rheos platform and the use of Oracle GoldenGate, Kafka, Flink, etc.
Deep-dive into Microservices Patterns with Replication and Stream Analytics
Target Audience: Microservices and Data Architects
This is an informational presentation about microservices event patterns, GoldenGate event replication, and event stream processing with Oracle Stream Analytics. This session will discuss some of the challenges of working with data in a microservices architecture (MA), and how the emerging concept of a “Data Mesh” can go hand-in-hand to improve microservices-based data management patterns. You may have already heard about common microservices patterns like CQRS, Saga, Event Sourcing and Transaction Outbox; we’ll share how GoldenGate can simplify these patterns while also bringing stronger data consistency to your microservice integrations. We will also discuss how complex event processing (CEP) and stream processing can be used with event-driven MA for operational and analytical use cases.
Business pressures for modernization and digital transformation drive demand for rapid, flexible DevOps, which microservices address, but also for data-driven Analytics, Machine Learning and Data Lakes which is where data management tech really shines. Join us for this presentation where we take a deep look at the intersection of microservice design patterns and modern data integration tech.
Tame Big Data with Oracle Data IntegrationMichael Rainey
In this session, Oracle Product Management covers how Oracle Data Integrator and Oracle GoldenGate are vital to big data initiatives across the enterprise, providing the movement, translation, and transformation of information and data not only heterogeneously but also in big data environments. Through a metadata-focused approach for cataloging, defining, and reusing big data technologies such as Hive, Hadoop Distributed File System (HDFS), HBase, Sqoop, Pig, Oracle Loader for Hadoop, Oracle SQL Connector for Hadoop Distributed File System, and additional big data projects, Oracle Data Integrator bridges the gap in the ability to unify data across these systems and helps deliver timely and trusted data to analytic and decision support platforms.
Co-presented with Alex Kotopoulis at Oracle OpenWorld 2014.
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
The Future of Data Integration: Data Mesh, and a Special Deep Dive into Stream Processing with GoldenGate, Apache Kafka and Apache Spark. This video is a replay of a Live Webinar hosted on 03/19/2020.
Join us for a timely 45min webinar to see our take on the future of Data Integration. As the global industry shift towards the “Fourth Industrial Revolution” continues, outmoded styles of centralized batch processing and ETL tooling continue to be replaced by realtime, streaming, microservices and distributed data architecture patterns.
This webinar will start with a brief look at the macro-trends happening around distributed data management and how that affects Data Integration. Next, we’ll discuss the event-driven integrations provided by GoldenGate Big Data, and continue with a deep-dive into some essential patterns we see when replicating Database change events into Apache Kafka. In this deep-dive we will explain how to effectively deal with issues like Transaction Consistency, Table/Topic Mappings, managing the DB Change Stream, and various Deployment Topologies to consider. Finally, we’ll wrap up with a brief look into how Stream Processing will help to empower modern Data Integration by supplying realtime data transformations, time-series analytics, and embedded Machine Learning from within data pipelines.
GoldenGate: https://www.oracle.com/middleware/tec...
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Oracle Solaris Build and Run Applications Better on 11.3OTN Systems Hub
Build and Run Applications Better on Oracle Solaris 11.3
Tech Day, NYC
Liane Praza, Senior Principal Software Engineer
Ikroop Dhillon, Principal Product Manager
June, 2016
Oracle Cloud : Big Data Use Cases and ArchitectureRiccardo Romani
Oracle Itay Systems Presales Team presents : Big Data in any flavor, on-prem, public cloud and cloud at customer.
Presentation done at Digital Transformation event - February 2017
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesSteven Feuerstein
Slides presented at moug.org's August 2018 conference. Covers the RADstack (REST - APEX - Database) + our community sites (AskTOM, LiveSQL and Dev Gym) + a whole bunch of cool new PL/SQL features. Search LiveSQL.oracle.com for scripts to match up with the features presented.
The Next Generation of Big Data AnalyticsHortonworks
Apache Hadoop has evolved rapidly to become a leading platform for managing and processing big data. If your organization is examining how you can use Hadoop to store, transform, and refine large volumes of multi-structured data, please join us for this session where we will discuss, the emergence of "big data" and opportunities for deriving business value, the evolution of Apache Hadoop and future directions, essential components required in a Hadoop-powered platform, and solution architectures that integrate Hadoop with existing data discovery and data warehouse platforms.
Options for Data Prep - A Survey of the Current MarketDremio Corporation
Data comes in many shapes and sizes, and every company struggles to find ways to transform, validate, and enrich data for multiple purposes. The problem has been around as long as data, and the market has an overwhelming number of options. In this presentation we look at the problem and key options from vendors in the market today. Dremio is a new approach that eliminates the need for stand alone data prep tools.
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
Thirty years is a long time for a technology foundation to be as active as relational databases. Are their replacements here? In this webinar, we say no.
Databases have not sat around while Hadoop emerged. The Hadoop era generated a ton of interest and confusion, but is it still relevant as organizations are deploying cloud storage like a kid in a candy store? We’ll discuss what platforms to use for what data. This is a critical decision that can dictate two to five times additional work effort if it’s a bad fit.
Drop the herd mentality. In reality, there is no “one size fits all” right now. We need to make our platform decisions amidst this backdrop.
This webinar will distinguish these analytic deployment options and help you platform 2020 and beyond for success.
Introducing the Big Data Ecosystem with Caserta Concepts & TalendCaserta
In this one-hour webinar, Caserta Concepts and Talend described an approach to achieve an architectural framework and roadmap to extend a traditional enterprise data warehouse environment, into a Big Data ecosystem.
They illustrated the architectural components involved for collecting, analyzing and delivering Big Data, with a focus on the importance of Hadoop, Data Integration, Machine Learning, NoSQL, Business Intelligence and Analytics.
Attendees learned:
Which Big Data technologies can’t be ignored
Considerations when extending the data ecosystem
What happens to your existing investment
What are the points of integration
Does Big Data = better data?
To find access the recorded webinar or to learn more, visit http://www.casertaconcepts.com/.
A talk given at VT Code Camp 2019 covering a variety of big data infrastructures. High level summary of distributed relational databases, NoSQL databases, ETL processes, high throughput computing, high performance computing, and hybrid systems.
5 Things that Make Hadoop a Game Changer
Webinar by Elliott Cordo, Caserta Concepts
There is much hype and mystery surrounding Hadoop's role in analytic architecture. In this webinar, Elliott presented, in detail, the services and concepts that makes Hadoop a truly unique solution - a game changer for the enterprise. He talked about the real benefits of a distributed file system, the multi workload processing capabilities enabled by YARN, and the 3 other important things you need to know about Hadoop.
To access the recorded webinar, visit the event site: https://www.brighttalk.com/webcast/9061/131029
For more information the services and solutions that Caserta Concepts offers, please visit http://casertaconcepts.com/
This is Part 4 of the GoldenGate series on Data Mesh - a series of webinars helping customers understand how to move off of old-fashioned monolithic data integration architecture and get ready for more agile, cost-effective, event-driven solutions. The Data Mesh is a kind of Data Fabric that emphasizes business-led data products running on event-driven streaming architectures, serverless, and microservices based platforms. These emerging solutions are essential for enterprises that run data-driven services on multi-cloud, multi-vendor ecosystems.
Join this session to get a fresh look at Data Mesh; we'll start with core architecture principles (vendor agnostic) and transition into detailed examples of how Oracle's GoldenGate platform is providing capabilities today. We will discuss essential technical characteristics of a Data Mesh solution, and the benefits that business owners can expect by moving IT in this direction. For more background on Data Mesh, Part 1, 2, and 3 are on the GoldenGate YouTube channel: https://www.youtube.com/playlist?list=PLbqmhpwYrlZJ-583p3KQGDAd6038i1ywe
Webinar Speaker: Jeff Pollock, VP Product (https://www.linkedin.com/in/jtpollock/)
Mr. Pollock is an expert technology leader for data platforms, big data, data integration and governance. Jeff has been CTO at California startups and a senior exec at Fortune 100 tech vendors. He is currently Oracle VP of Products and Cloud Services for Data Replication, Streaming Data and Database Migrations. While at IBM, he was head of all Information Integration, Replication and Governance products, and previously Jeff was an independent architect for US Defense Department, VP of Technology at Cerebra and CTO of Modulant – he has been engineering artificial intelligence based data platforms since 2001. As a business consultant, Mr. Pollock was a Head Architect at Ernst & Young’s Center for Technology Enablement. Jeff is also the author of “Semantic Web for Dummies” and "Adaptive Information,” a frequent keynote at industry conferences, author for books and industry journals, formerly a contributing member of W3C and OASIS, and an engineering instructor with UC Berkeley’s Extension for object-oriented systems, software development process and enterprise architecture.
this is part 3 of the series on Data Mesh ... looking at the intersection of microservices architecture concepts, data integration / replication technologies and log-based stream integration techniques. This webinar was mostly a demonstration, but several slides used to setup the demo are included here as a PDF for viewers.
Oracle OpenWorld London - session for Stream Analysis, time series analytics, streaming ETL, streaming pipelines, big data, kafka, apache spark, complex event processing
Brief training targeted to middle school aged students who are participating in First Lego League robotics and planning to use a version control tool such as EV3Hub
GoldenGate and Stream Processing with Special Guest RakutenJeffrey T. Pollock
Oracle OpenWorld roadmap presentation for GoldenGate, stream processing, analytics and big data use cases with special guest presenters from Rakuten Travel.
A modern approach to streaming data integration, event processing with a big data (kappa style) data architecture. Key patterns are discussed with pros/cons of newer approaches and open source technologies. Focus on Oracle and GoldenGate technology. OpenWorld 2018 presentation.
Strata 2015 presentation from Oracle for Big Data - we are announcing several new big data products including GoldenGate for Big Data, Big Data Discovery, Oracle Big Data SQL and Oracle NoSQL
2. The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
2
4. Agenda
• Data Warehouse Problem Space (Data Intg. Focus)
• Ancient Pre-History of Data Warehouse
• “The Good Old Days” of Data Warehouse
• Revival Period for Data Warehouse
• Data Integration for Modern Data Warehousing
• Old Generation: Hub & Spoke with Invasive Capture
• New Generation: Agent-based with Non-invasive Capture
• Drive Business Value with Data Integration
• Why Replace? Isn’t my Old _____ Good Enough?
• The Oracle Solution for Data Integration
• Oracle GoldenGate
• Oracle Data Integrator
• Oracle Data Quality
4
6. Data Warehouse Ancient History
• 1985 – 1995 “Controlled Chaos”
• Fragmented Strategy for Marts vs. Warehouse
• No practical notion of “Enterprise Data Warehouse”
• Data Integration:
• Hand-coded Scripts (External to DB)
• Not Optimized
• Procedural Transformations (PL/SQL etc)
• Few Data Integration Tools
• No Formal Methodology, Metrics or Governance
6
7. Data Warehouse Good Old Days
• 1995 – 2005 “Formal Methods and Discipline”
• Strategy Choices for Marts vs. Warehouse
• Top-down (Inmon) vs. Bottom-up (Kimball)
• Formal notion of “Enterprise Data Warehouse”
• Data Integration:
• Tool-based Data Integration Solutions
• Optimized, Parallel Server-based Transforms
• Formal Methodology, Metrics or Governance
• Reduced Reliance on Hand-coded Scripts and
Procedural Transformations (PL/SQL etc)
7
8. Data Warehouse Revival Period
• 2005 – 2015 “Specialized Warehouse Solutions”
• Technology-driven Choices for High-end DW’s
• Commodity H/W vs. Optimized Appliances
• Relational/Star vs. Columnar (vs. Cubes/OLAP)
• Database + BI vs. Distributed Analytic Apps (Hadoop etc)
• EDW as a “source of truth” vision morphs and
expands to MDM as a distinct problem domain
• Data Integration is still stuck in the “Good Old Days”
Good Old Days Modern Alternative
Hub-based Runtime Agent-based Runtime
Centralized ETL Server Optimized E-LT (DW Appliance)
Mainly Batch Mainly Real Time / Trickle Feed
8
10. Modern Data Integration Approach
Heterogeneous, Real-time, Non-Invasive, High Performance E-LT
Traditional ETL + CDC Modern E-LT + Real-time
• Invasive Capture on OLTP • Continuous feeds from
systems using complex Adapters operational systems
• Transformations in ETL engine • Non-invasive data capture
on expensive middle tier servers • Thin middle tier with
• Bulk load to the data warehouse transformations on the database
with large nightly/daily batch platform (target)
• Mini-batches throughout the day
Extract
or bulk processing nightly
Trickle
Agent
Agent
Xform Xform Bulk
Lookup Lookup
Data Data
Staging Load Heterogeneous
10
11. Good Old Days of ETL Batch Integration
• Good Tools, but:
• Expensive Environments, Performance
Bottlenecks, Too Many Data Hops,
Proprietary Skills w/Vendor Lock-in, and
Heavy Optimization in Complex Situations Development,
QA, System (etc)
Environments
• Won’t scale w/new Generation of DW’s
Extract Transform Load Lookups/Calcs Transform Load
ETL engines ETL Metadata
require BIG Lookup Meta
H/W and heavy Data
parallel tuning
ETL Engine(s)
Lookup
Sources Stage Data Prod
11
12. Modern Agent-based E-LT Processing
• Same Good Tools you Expect, plus:
• Reduce Data Center Costs, De-commission Servers
• Open Frameworks, Non-Proprietary SQL Skills
• Deploys Seamlessly Alone or within SOA Servers
• Scales Linearly with Modern DW Appliances
Extract Transform Load Lookups/Calcs Transform Load
Development,
QA, System (etc)
Environments
Set-based SQL SQL Load
E-LT transforms inside DB is
Meta typically faster always faster
Agent
Lookup
Sources Data Movement Stage Data Data Transformation Prod
12
13. Good Old Days of Real Time Replication
• Good Tools, but:
• Arcane capture process, sometimes invasive
• Okay for Data Integration Changed Data Capture, but:
• not used for Active-Active / ZDT Migrations
• not used for High Availability or Disaster Recovery
ETL Engine(s)
Lookup
Sources Stage Data Prod
Transaction Apply
CDC Hub(s)
Mgmt Server
13
14. Agent-based Real Time Replication
• Same Good Tools you Expect, but:
• Not dependent on hardware for replication
• Capable of Heterogeneous, Active-Active Deployments
• Suitable for Zero Downtime Migrations
• Point-in-time Recovery
Lookup
Sources Stage Data Data Movement Prod
Capture Replicat
Agent Agent
14
15. Data Capture Architecture Options
• Next Generation Capabilities
• Non-invasive, heterogeneous, disk-based log access
• Suitable for CDC + High Availability & Active-Active
• Bi-directional and high performance
• Check-pointing and Simple Trail/Queue Management
Updates Triggers
Inserts
Deletes
Log Tables
Oracle
IBM DB2
MSFT SQL Server
Sybase
Teradata On-Disk Logs
Enscribe
15
16. Good Old Days of Data Integration
• Monolithic & Expensive Environments
• Fragile, Hard to Manage
Development,
QA, System (etc)
• Difficult to Tune or Optimize Environments
Extract Transform Load Lookups/Calcs Transform Load
ETL engines ETL Metadata
require BIG Lookup Meta
H/W and heavy Data
parallel tuning
ETL Engine(s)
Lookup
Sources Stage Data Prod
Transaction Apply
CDC Hub(s)
Mgmt Server
16
17. Modern Data Integration Architecture
• Lightweight, Inexpensive Environments – Agents
• Resilient, Easy to Manage – Non-Invasive
• Easy to Optimize and Tune – uses DBMS power
Extract Transform Load Lookups/Calcs Transform Load
Development,
QA, System (etc)
Environments
Set-based SQL SQL Load
E-LT transforms inside DB is
Meta typically faster always faster
Agent
Bulk Data Movement Lookup
Sources Stage Data Data Transformation Prod
Capture Replicat
Agent Agent
17
19. Business Drivers for Data Integration
Add Value to the Core Business Lines
Design metadata-driven integration
1. Do More with Less Leverage skills & dictate patterns
Ensure continuous uptime
2. Compete Globally 24X7 Access data in real time
3. Use Data for Competitive Ensure the quality of your data
Advantage Actively govern most valuable asset
4. Automate and Adapt Expose data services for reuse
Business Processes Orchestrate processes using SOA
19
20. Project Drivers for Data Integration
Essential Ingredient for Information Agility
Strategic Value of Data Integration
• Consistency for major enterprise initiatives like BI, DW, & MDM
• Common technical foundation platform across data silos
• Central point for data governance, availability and controls
Key Data Integration Use Cases
• BI, DW, and OLTP Data Integration & Replication
• SOA, Enterprise Integration & Modernization
• Migrations and Master Data Management
20
22. Why Replace _______?
• We often hear, “my company has already standardized
on __________, why should I replace it?
Answer:
Save Money on Data Center Costs
Accelerate Project Delivery / TTM
Supply Real Time Intelligence to the Business
Reduce Batch Windows on Data Warehouse
Unify Data Integration with SOA Plans
22
23. Save Money on Hardware/Data Center
E-LT runs on Small Commodity Servers as an Agent Process
Typical: Separate ETL Server Next Generation Architecture
• Proprietary ETL Engine, Poor Performance
• High Costs for Separate Standalone
Server E-LT
E-LT: No New Servers Transform Transform
Extract Load
• Lower Cost: Leverage Compute
Resources & Partition Workload efficiently
• Efficient: Exploits Database Optimizer
• Fast: Exploits Native Bulk Load & Other
Database Interfaces Conventional ETL Architecture
• Scalable: Scales as you add Processors to
Source or Target
Extract Transform Load
Benefits
• Optimal Performance & Scalability
• Better Hardware Leverage
• Easier to Manage & Lower Cost
23
24. Speed Project Delivery/Time to Market
E-LT uses Declarative SQL-style Design + Simple Runtime
• Development Productivity • Environment Setup (ex: BI Apps)
• 40% Efficiency Gains • 33-50% Less Complex
Number of Setup Steps 7
Number of Servers 1
Number of connections 3
Number of Setup Steps 10
Number of Servers 3
Number of connections 7
24
25. Supply Real Time Business Intelligence
Non-invasive Capture + E-LT Processing
Application Real Time BI Analytic BI
(using Data Copy) (Facts & Dims)
Consistency
Window
E-LT
(Mini-Batch + Transforms)
25
26. Reduce Consistency Windows w/E-LT
Fewer Steps, Faster Xform, and Faster Loads vs. typical ETL
Extract Transform Load Lookups/Calcs Transform Load
Lookup
Sources Stage Data Prod
ETL engines Main driver for batch
require BIG ETL Engine(s) window is data integrity &
H/W and heavy consistency; once lookup &
parallel tuning ETL Metadata calc functions begin, DW
Lookup Meta
typically goes offline
Data
Extract Transform Load Extract Transform Load ETL Batch Window
DW is
Online Extract Transform Load Uptime Gains Transform Load E-LT Batch Window
Lookup
Sources Data Movement Stage Data Data Movement Prod
E-LT Set-based SQL SQL Load
Meta transforms inside DB is
Agent typically faster always faster
26
27. *What About “Pushdown Processing”
• Pushdown Processing is what the ETL vendors do to
compensate for bad performance – push the transformation
processing to the Database
• Both Pushdown & E-LT have in common:
• uses the power of your Data Warehouse for maximum performance
• can combine engine-based operations with DB-based transformations to
accomplish any level of data transformation complexity
• can scale to any multi-TB level and using parallel processing
• Only E-LT can claim:
• performance optimized for your Database – whichever DB you use
• operate without any new IT Hardware costs
• 100% Java-based
• easily embedded within your existing or planned SOA infrastructure
• is not a glorified scheduler that relies on PL-SQL, or other custom-coded
DB scripts to achieve maximal performance
• can entirely eliminate needless network-hops for remote data joins
• can operate with no additional energy drain in your Datacenter
27
28. Unify E-LT Agent with SOA Runtime
Best of Breed Data Integration as a Shared SOA Service
Unified Management + Monitoring
• Common Runtime – 100% Java
• Common Monitoring
Example Use Cases
• Bulk Data Transformation (any2any)
• XML/EDI Large File Handling
• SOA-driven Business Intelligence
• Load DW from SOA
High Performance
ETL & Replication • Unified Data Steward Workflow
(ETL Error Hospital w/BPEL PM)
Data Warehouse • ERP Migration, Replication / Loading
Any Data Source
& OLAP • Query Offloading & Zero Downtime
E-LT Frameworks are optimal architectures for:
• Embedded Applications • Business Intelligence
• Application Integration • Performance Management
• Middleware Servers • Database & OLAP
28
30. Oracle Data Integration Solution
Best-in-class Heterogeneous Platform for Data Integration
Oracle Custom MDM Business Activity SOA
Applications Applications Applications Intelligence Monitoring Platforms
Comprehensive Data Integration Solution
SOA Abstraction Layer
Process Manager Service Bus Data Services Data Federation
Oracle Data Integrator Oracle GoldenGate Oracle Data Quality
ELT/ETL Real-time Data Data Profiling
Data Transformation Log-based CDC Data Parsing
Bulk Data Movement Bi-directional Replication Data Cleansing
Data Lineage Data Verification Match and Merge
Storage Data Warehouse/ OLTP OLAP Cube Flat Files Web 2.0 Web and Event
Data Mart System Services, SOA
30
31. Key Data Integration Products
• Heterogeneous E-LT & ETL • OLAP Data Loading
• High-speed Transformations • Data Warehouse Loading
• Real Time Data Replication • DBMS High Availability
• Changed Data Capture • Disaster Tolerance
• Comprehensive Integration • Process Orchestration
• ELT/ETL for Bulk Data • Human Workflow
• Service Bus • Data Grid
• Data Service Modeling • Data Redaction
• Query Federation • Service Data Objects
• Business Data / Metadata • Time Series Reporting
• Statistical Analysis • Integrated Data Quality
• Cleansing & Parsing • High Performance
• De-duplication • Integrated w/ODI
31
32. Oracle Data Integrator Enterprise Edition
Optimized E-LT for improved Performance, Productivity and Lower TCO
Legacy
Sources
E-LT Transformation Any Data
vs. E-T-L Warehouse
Application
Sources Declarative Set-based design
Any
Change Data Capture Planning
System
OLTP DB Hot-pluggable Architecture
Sources
Pluggable Knowledge Modules
32
33. Oracle GoldenGate Overview
Enterprise-wide Solution for Real Time Data Needs
• Standardize on Single
Disaster Recovery,
Data Protection Standby Technology for Multiple Needs
(Open & Active)
• Deploy for Continuous
Zero Downtime Availability and Real-time Data
Migration and
Upgrades Access for Reporting / BI
Log Based, Real-
Time Change Data Operational
Capture Reporting
Reporting Database
OGG
ETL
ODS EDW
ETL
• Highly Flexible
Heterogeneous EDW
Source Systems
Real-time BI • Fast Deployments
• Lower TCO & Improved ROI
Query Offloading
Data Distribution
33
34. How Oracle GoldenGate Works
Modular De-Coupled Architecture
Capture: committed transactions are captured (and can be
filtered) as they occur by reading the transaction logs.
Trail: stages and queues data for routing.
Pump: distributes data for routing to target(s).
Route: data is compressed,
encrypted for routing to target(s).
Delivery: applies data with transaction
integrity, transforming the data as required.
Trail Trail
Capture Pump Delivery
LAN/WAN
Internet
TCP/IP
Source Target
Database(s) Bi-directional Database(s)
34
35. Govern Data Better with Data Quality
• Data Movement
• Data Profiling – E-LT & ETL
– Statistical Analysis – Data Transformation
– Rule-based Validation – Change Data Capture
– Monitoring & Timeslice – Data Access
– Fine-grained Auditing Data Movement – Data Services
Data Quality and Data Integration
Profiling
Data Cleansing
• Data Cleansing
• Data Validation during ETL
• Data Standardization
• Address Matching & Dedup
• Error Hospital / Workflow
35
37. Modern Data Integration Approach
Heterogeneous, Real-time, Non-Invasive, High Performance E-LT
Traditional ETL + CDC Modern E-LT + Real-time
• Invasive Capture on OLTP • Continuous feeds from
systems using complex Adapters operational systems
• Transformations in ETL engine • Non-invasive data capture
on expensive middle tier servers • Thin middle tier with
• Bulk load to the data warehouse transformations on the database
with large nightly/daily batch platform (target)
• Mini-batches throughout the day
Extract
or bulk processing nightly
Trickle
Agent
Agent
Xform Xform Bulk
Lookup Lookup
Data Data
Staging Load Heterogeneous
37
40. The preceeding is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
40