SlideShare a Scribd company logo
1 of 40
Download to read offline
<Insert Picture Here>
The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.




                                                      2
<Insert Picture Here>




Modern Data Integration for Data Warehousing
Oracle Fusion Middleware
Agenda
• Data Warehouse Problem Space (Data Intg. Focus)
   • Ancient Pre-History of Data Warehouse
   • “The Good Old Days” of Data Warehouse
   • Revival Period for Data Warehouse
• Data Integration for Modern Data Warehousing
   • Old Generation: Hub & Spoke with Invasive Capture
   • New Generation: Agent-based with Non-invasive Capture
• Drive Business Value with Data Integration
• Why Replace? Isn’t my Old _____ Good Enough?
• The Oracle Solution for Data Integration
   • Oracle GoldenGate
   • Oracle Data Integrator
   • Oracle Data Quality

                                                             4
Data Warehousing
PROBLEM SPACE




                      5
Data Warehouse Ancient History
• 1985 – 1995 “Controlled Chaos”
• Fragmented Strategy for Marts vs. Warehouse
• No practical notion of “Enterprise Data Warehouse”
• Data Integration:
   • Hand-coded Scripts (External to DB)
   • Not Optimized
   • Procedural Transformations (PL/SQL etc)
   • Few Data Integration Tools
   • No Formal Methodology, Metrics or Governance




                                                       6
Data Warehouse Good Old Days
• 1995 – 2005 “Formal Methods and Discipline”
• Strategy Choices for Marts vs. Warehouse
   • Top-down (Inmon) vs. Bottom-up (Kimball)
• Formal notion of “Enterprise Data Warehouse”
• Data Integration:
   • Tool-based Data Integration Solutions
   • Optimized, Parallel Server-based Transforms
   • Formal Methodology, Metrics or Governance
   • Reduced Reliance on Hand-coded Scripts and
     Procedural Transformations (PL/SQL etc)



                                                   7
Data Warehouse Revival Period
• 2005 – 2015 “Specialized Warehouse Solutions”
• Technology-driven Choices for High-end DW’s
   • Commodity H/W vs. Optimized Appliances
   • Relational/Star vs. Columnar (vs. Cubes/OLAP)
   • Database + BI vs. Distributed Analytic Apps (Hadoop etc)
• EDW as a “source of truth” vision  morphs and
  expands to MDM as a distinct problem domain
• Data Integration is still stuck in the “Good Old Days”
   Good Old Days                  Modern Alternative
   Hub-based Runtime              Agent-based Runtime
   Centralized ETL Server         Optimized E-LT (DW Appliance)
   Mainly Batch                   Mainly Real Time / Trickle Feed



                                                                    8
Data Warehousing with
MODERN DATA INTEGRATION




                              9
Modern Data Integration Approach
     Heterogeneous, Real-time, Non-Invasive, High Performance E-LT


    Traditional ETL + CDC                  Modern E-LT + Real-time

•   Invasive Capture on OLTP       •       Continuous feeds from
    systems using complex Adapters         operational systems
•   Transformations in ETL engine      •   Non-invasive data capture
    on expensive middle tier servers   •   Thin middle tier with
•   Bulk load to the data warehouse        transformations on the database
    with large nightly/daily batch         platform (target)
                                       •   Mini-batches throughout the day
       Extract
                                           or bulk processing nightly

                                                           Trickle




                                                                     Agent
                                                 Agent
        Xform          Xform                               Bulk
        Lookup         Lookup
         Data           Data

                 Staging        Load                     Heterogeneous


                                                                             10
Good Old Days of ETL Batch Integration
• Good Tools, but:
   • Expensive Environments, Performance
     Bottlenecks, Too Many Data Hops,
     Proprietary Skills w/Vendor Lock-in, and
     Heavy Optimization in Complex Situations                                             Development,
                                                                                         QA, System (etc)
                                                                                          Environments
   • Won’t scale w/new Generation of DW’s

                    Extract   Transform   Load     Lookups/Calcs           Transform            Load


   ETL engines                                                     ETL Metadata
   require BIG                                                        Lookup      Meta
  H/W and heavy                                                        Data

  parallel tuning

                                          ETL Engine(s)



                                                           Lookup
   Sources                                       Stage      Data                                       Prod




                                                                                                              11
Modern Agent-based E-LT Processing
        • Same Good Tools you Expect, plus:
                   • Reduce Data Center Costs, De-commission Servers
                   • Open Frameworks, Non-Proprietary SQL Skills
                   • Deploys Seamlessly Alone or within SOA Servers
                   • Scales Linearly with Modern DW Appliances

                         Extract    Transform     Load       Lookups/Calcs        Transform          Load
 Development,
QA, System (etc)
 Environments
                                                    Set-based SQL                                      SQL Load
                         E-LT                         transforms                                      inside DB is
                                     Meta           typically faster                                 always faster
                         Agent



                                                           Lookup
     Sources             Data Movement          Stage       Data       Data Transformation    Prod




                                                                                                                     12
Good Old Days of Real Time Replication
• Good Tools, but:
   • Arcane capture process, sometimes invasive
   • Okay for Data Integration Changed Data Capture, but:
             • not used for Active-Active / ZDT Migrations
             • not used for High Availability or Disaster Recovery


                                        ETL Engine(s)




                                                  Lookup
   Sources                              Stage      Data         Prod
                    Transaction Apply

                     CDC Hub(s)


                     Mgmt Server




                                                                       13
Agent-based Real Time Replication
• Same Good Tools you Expect, but:
   • Not dependent on hardware for replication
   • Capable of Heterogeneous, Active-Active Deployments
   • Suitable for Zero Downtime Migrations
   • Point-in-time Recovery




                                        Lookup
  Sources                       Stage    Data    Data Movement   Prod
        Capture          Replicat
         Agent            Agent




                                                                        14
Data Capture Architecture Options
• Next Generation Capabilities
   • Non-invasive, heterogeneous, disk-based log access
   • Suitable for CDC + High Availability & Active-Active
       • Bi-directional and high performance
       • Check-pointing and Simple Trail/Queue Management



          Updates                 Triggers
          Inserts
          Deletes
                                 Log Tables
          Oracle
          IBM DB2
          MSFT SQL Server
          Sybase
          Teradata              On-Disk Logs
          Enscribe




                                                            15
Good Old Days of Data Integration
• Monolithic & Expensive Environments
• Fragile, Hard to Manage
                                                                                            Development,
                                                                                           QA, System (etc)
• Difficult to Tune or Optimize                                                             Environments




                   Extract     Transform    Load     Lookups/Calcs           Transform            Load


  ETL engines                                                        ETL Metadata
  require BIG                                                           Lookup      Meta
 H/W and heavy                                                           Data

 parallel tuning

                                            ETL Engine(s)



                                                             Lookup
  Sources                                          Stage      Data                                       Prod
                             Transaction Apply

                              CDC Hub(s)


                              Mgmt Server




                                                                                                                16
Modern Data Integration Architecture
• Lightweight, Inexpensive Environments – Agents
• Resilient, Easy to Manage – Non-Invasive
• Easy to Optimize and Tune – uses DBMS power



                         Extract     Transform              Load       Lookups/Calcs        Transform          Load
 Development,
QA, System (etc)
 Environments
                                                              Set-based SQL                                      SQL Load
                         E-LT                                   transforms                                      inside DB is
                                     Meta                     typically faster                                 always faster
                         Agent



                       Bulk Data Movement                            Lookup
     Sources                                            Stage         Data       Data Transformation    Prod

             Capture                             Replicat
              Agent                               Agent




                                                                                                                               17
Data Integration Drives
BUSINESS VALUE




                             18
Business Drivers for Data Integration
 Add Value to the Core Business Lines


                                 Design metadata-driven integration
1. Do More with Less             Leverage skills & dictate patterns



                                 Ensure continuous uptime
2. Compete Globally 24X7         Access data in real time



3. Use Data for Competitive      Ensure the quality of your data
   Advantage                     Actively govern most valuable asset



4. Automate and Adapt            Expose data services for reuse
   Business Processes            Orchestrate processes using SOA



                                                                       19
Project Drivers for Data Integration
Essential Ingredient for Information Agility

                                                                           Strategic Value of Data Integration
                                                    • Consistency for major enterprise initiatives like BI, DW, & MDM
                                                    • Common technical foundation platform across data silos
                                                    • Central point for data governance, availability and controls




Key Data Integration Use Cases
• BI, DW, and OLTP Data Integration & Replication
• SOA, Enterprise Integration & Modernization
• Migrations and Master Data Management




                                                                                                                        20
Modern Data Integration Alternatives:
W H Y R E P L A C E _______?




                                           21
Why Replace _______?
• We often hear, “my company has already standardized
  on __________, why should I replace it?


Answer:
  Save Money on Data Center Costs
  Accelerate Project Delivery / TTM
  Supply Real Time Intelligence to the Business
  Reduce Batch Windows on Data Warehouse
  Unify Data Integration with SOA Plans




                                                        22
Save Money on Hardware/Data Center
  E-LT runs on Small Commodity Servers as an Agent Process
Typical: Separate ETL Server                     Next Generation Architecture
  • Proprietary ETL Engine, Poor Performance
  • High Costs for Separate Standalone
    Server                                                   E-LT
E-LT: No New Servers                             Transform                     Transform
                                                              Extract   Load
  • Lower Cost: Leverage Compute
    Resources & Partition Workload efficiently
  • Efficient: Exploits Database Optimizer
  • Fast: Exploits Native Bulk Load & Other
    Database Interfaces                          Conventional ETL Architecture
  • Scalable: Scales as you add Processors to
    Source or Target
                                                    Extract       Transform    Load

Benefits
  • Optimal Performance & Scalability
  • Better Hardware Leverage
  • Easier to Manage & Lower Cost


                                                                                           23
Speed Project Delivery/Time to Market
E-LT uses Declarative SQL-style Design + Simple Runtime
• Development Productivity    • Environment Setup (ex: BI Apps)
   • 40% Efficiency Gains        • 33-50% Less Complex
                                           Number of Setup Steps   7
                                           Number of Servers       1
                                           Number of connections   3




                                           Number of Setup Steps   10
                                           Number of Servers       3
                                           Number of connections   7




                                                                        24
Supply Real Time Business Intelligence
 Non-invasive Capture + E-LT Processing

Application           Real Time BI                               Analytic BI
                   (using Data Copy)                           (Facts & Dims)
                                           Consistency
                                             Window




                                                E-LT
                                       (Mini-Batch + Transforms)




                                                                                25
Reduce Consistency Windows w/E-LT
         Fewer Steps, Faster Xform, and Faster Loads vs. typical ETL
                  Extract      Transform          Load     Lookups/Calcs              Transform             Load


                                                                      Lookup
    Sources                                              Stage         Data                                        Prod




 ETL engines                                                                                                        Main driver for batch
 require BIG                                      ETL Engine(s)                                                   window is data integrity &
H/W and heavy                                                                                                    consistency; once lookup &
parallel tuning                                                                ETL Metadata                       calc functions begin, DW
                                                                                  Lookup      Meta
                                                                                                                    typically goes offline
                                                                                   Data



               Extract         Transform          Load           Extract               Transform                 Load        ETL Batch Window
DW is
Online         Extract         Transform          Load           Uptime Gains                        Transform      Load    E-LT Batch Window


                                                                      Lookup
     Sources                Data Movement                Stage         Data            Data Movement               Prod




                            E-LT                                           Set-based SQL                                            SQL Load
                                           Meta                              transforms                                            inside DB is
                            Agent                                          typically faster                                       always faster




                                                                                                                                                  26
*What About “Pushdown Processing”
• Pushdown Processing is what the ETL vendors do to
  compensate for bad performance – push the transformation
  processing to the Database

• Both Pushdown & E-LT have in common:
   •   uses the power of your Data Warehouse for maximum performance
   •   can combine engine-based operations with DB-based transformations to
       accomplish any level of data transformation complexity
   •   can scale to any multi-TB level and using parallel processing
• Only E-LT can claim:
   •   performance optimized for your Database – whichever DB you use
   •   operate without any new IT Hardware costs
   •   100% Java-based
   •   easily embedded within your existing or planned SOA infrastructure
   •   is not a glorified scheduler that relies on PL-SQL, or other custom-coded
       DB scripts to achieve maximal performance
   •   can entirely eliminate needless network-hops for remote data joins
   •   can operate with no additional energy drain in your Datacenter


                                                                                   27
Unify E-LT Agent with SOA Runtime
    Best of Breed Data Integration as a Shared SOA Service



                                                                    Unified Management + Monitoring
                                                                       •   Common Runtime – 100% Java
                                                                       •   Common Monitoring


                                                                    Example Use Cases
                                                                       •   Bulk Data Transformation (any2any)
                                                                              • XML/EDI Large File Handling

                                                                       •   SOA-driven Business Intelligence
                                                                             • Load DW from SOA
                   High Performance
                   ETL & Replication                                   •   Unified Data Steward Workflow
                                                                           (ETL Error Hospital w/BPEL PM)

                                 Data Warehouse                        •   ERP Migration, Replication / Loading
Any Data Source
                                     & OLAP                                  • Query Offloading & Zero Downtime


                   E-LT Frameworks are optimal architectures for:
• Embedded Applications          • Business Intelligence
• Application Integration        • Performance Management
• Middleware Servers             • Database & OLAP




                                                                                                                  28
Data Integration the:
ORACLE SOLUTION




                            29
Oracle Data Integration Solution
Best-in-class Heterogeneous Platform for Data Integration


    Oracle               Custom            MDM                    Business                   Activity               SOA
  Applications         Applications     Applications             Intelligence               Monitoring           Platforms



                                Comprehensive Data Integration Solution
    SOA Abstraction Layer
         Process Manager              Service Bus                       Data Services               Data Federation



       Oracle Data Integrator                  Oracle GoldenGate                                Oracle Data Quality

                 ELT/ETL                             Real-time Data                                 Data Profiling

          Data Transformation                       Log-based CDC                                   Data Parsing

          Bulk Data Movement                   Bi-directional Replication                          Data Cleansing

             Data Lineage                           Data Verification                              Match and Merge




   Storage       Data Warehouse/       OLTP         OLAP Cube                  Flat Files        Web 2.0     Web and Event
                    Data Mart         System                                                                 Services, SOA




                                                                                                                             30
Key Data Integration Products
           • Heterogeneous E-LT & ETL     • OLAP Data Loading
           • High-speed Transformations   • Data Warehouse Loading


           • Real Time Data Replication   • DBMS High Availability
           • Changed Data Capture         • Disaster Tolerance

           • Comprehensive Integration    • Process Orchestration
           • ELT/ETL for Bulk Data        • Human Workflow
           • Service Bus                  • Data Grid

           • Data Service Modeling        • Data Redaction
           • Query Federation             • Service Data Objects


           • Business Data / Metadata     • Time Series Reporting
           • Statistical Analysis         • Integrated Data Quality


           • Cleansing & Parsing          • High Performance
           • De-duplication               • Integrated w/ODI



                                                                      31
Oracle Data Integrator Enterprise Edition
      Optimized E-LT for improved Performance, Productivity and Lower TCO




    Legacy
   Sources

                                E-LT Transformation                  Any Data
                                      vs. E-T-L                      Warehouse
Application
  Sources                   Declarative Set-based design

                                                                     Any
                               Change Data Capture                   Planning
                                                                     System
  OLTP DB                    Hot-pluggable Architecture
  Sources


                           Pluggable Knowledge Modules



                                                                                32
Oracle GoldenGate Overview
 Enterprise-wide Solution for Real Time Data Needs


                                                                • Standardize on Single
                           Disaster Recovery,
                            Data Protection      Standby         Technology for Multiple Needs
                                              (Open & Active)
                                                                • Deploy for Continuous
                             Zero Downtime                       Availability and Real-time Data
                              Migration and
                               Upgrades                          Access for Reporting / BI
        Log Based, Real-
        Time Change Data      Operational
            Capture                              Reporting
                               Reporting         Database
             OGG

                                        ETL

                               ODS                 EDW
                                 ETL
                                                                • Highly Flexible
Heterogeneous                                      EDW
Source Systems
                                  Real-time BI                  • Fast Deployments
                                                                • Lower TCO & Improved ROI
                               Query Offloading



                                Data Distribution




                                                                                                   33
How Oracle GoldenGate Works
       Modular De-Coupled Architecture
 Capture: committed transactions are captured (and can be
 filtered) as they occur by reading the transaction logs.
                   Trail: stages and queues data for routing.
                             Pump: distributes data for routing to target(s).
                                    Route: data is compressed,
                                    encrypted for routing to target(s).
                                                 Delivery: applies data with transaction
                                                 integrity, transforming the data as required.



                     Trail                                          Trail
           Capture           Pump                                           Delivery
                                          LAN/WAN
                                           Internet

                                           TCP/IP

  Source                                                                                 Target
Database(s)                            Bi-directional                                  Database(s)


                                                                                             34
Govern Data Better with Data Quality

                                                                     • Data Movement
• Data Profiling                                                        –   E-LT & ETL
   –   Statistical Analysis                                             –   Data Transformation
   –   Rule-based Validation                                            –   Change Data Capture
   –   Monitoring & Timeslice                                           –   Data Access
   –   Fine-grained Auditing                    Data Movement           –   Data Services




                Data Quality and   Data Integration
                     Profiling

                                                      Data Cleansing



                                                          •     Data Cleansing
                                                                 •    Data Validation during ETL
                                                                 •    Data Standardization
                                                                 •    Address Matching & Dedup
                                                                 •    Error Hospital / Workflow


                                                                                                   35
CONCLUSION




             36
Modern Data Integration Approach
     Heterogeneous, Real-time, Non-Invasive, High Performance E-LT


    Traditional ETL + CDC                  Modern E-LT + Real-time

•   Invasive Capture on OLTP       •       Continuous feeds from
    systems using complex Adapters         operational systems
•   Transformations in ETL engine      •   Non-invasive data capture
    on expensive middle tier servers   •   Thin middle tier with
•   Bulk load to the data warehouse        transformations on the database
    with large nightly/daily batch         platform (target)
                                       •   Mini-batches throughout the day
       Extract
                                           or bulk processing nightly

                                                           Trickle




                                                                     Agent
                                                 Agent
        Xform          Xform                               Bulk
        Lookup         Lookup
         Data           Data

                 Staging        Load                     Heterogeneous


                                                                             37
Questions




            38
The preceeding is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.




                                                      40

More Related Content

What's hot

Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
 

What's hot (20)

Intelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff PollockIntelligent Integration OOW2017 - Jeff Pollock
Intelligent Integration OOW2017 - Jeff Pollock
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Oracle's BigData solutions
Oracle's BigData solutionsOracle's BigData solutions
Oracle's BigData solutions
 
Oracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer IntroductionOracle Stream Analytics - Developer Introduction
Oracle Stream Analytics - Developer Introduction
 
2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration2017 OpenWorld Keynote for Data Integration
2017 OpenWorld Keynote for Data Integration
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
Tame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data IntegrationTame Big Data with Oracle Data Integration
Tame Big Data with Oracle Data Integration
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3
 
Oracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and ArchitectureOracle Cloud : Big Data Use Cases and Architecture
Oracle Cloud : Big Data Use Cases and Architecture
 
Flash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lonFlash session -goldengate--lht1053-lon
Flash session -goldengate--lht1053-lon
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Biwa summit 2015 oaa oracle data miner hands on lab
Biwa summit 2015 oaa oracle data miner hands on labBiwa summit 2015 oaa oracle data miner hands on lab
Biwa summit 2015 oaa oracle data miner hands on lab
 
Extending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data PlatformExtending Hortonworks with Oracle's Big Data Platform
Extending Hortonworks with Oracle's Big Data Platform
 
Presentation big dataappliance-overview_oow_v3
Presentation   big dataappliance-overview_oow_v3Presentation   big dataappliance-overview_oow_v3
Presentation big dataappliance-overview_oow_v3
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
 
The Next Generation of Big Data Analytics
The Next Generation of Big Data AnalyticsThe Next Generation of Big Data Analytics
The Next Generation of Big Data Analytics
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 

Similar to 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
confluent
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
ssuser8ccb5a
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
camyla81
 
Otm 2013 c13_e-23b-hatcher-neil-otm-gtm-data-maintenance
Otm 2013 c13_e-23b-hatcher-neil-otm-gtm-data-maintenanceOtm 2013 c13_e-23b-hatcher-neil-otm-gtm-data-maintenance
Otm 2013 c13_e-23b-hatcher-neil-otm-gtm-data-maintenance
jucaab
 
2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data Management2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data Management
MavenWire
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
Amr Awadallah
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
Joey Echeverria
 

Similar to 2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing (20)

Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
 
Processing Big Data
Processing Big DataProcessing Big Data
Processing Big Data
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Survey of Big Data Infrastructures
Survey of Big Data InfrastructuresSurvey of Big Data Infrastructures
Survey of Big Data Infrastructures
 
Otm 2013 c13_e-23b-hatcher-neil-otm-gtm-data-maintenance
Otm 2013 c13_e-23b-hatcher-neil-otm-gtm-data-maintenanceOtm 2013 c13_e-23b-hatcher-neil-otm-gtm-data-maintenance
Otm 2013 c13_e-23b-hatcher-neil-otm-gtm-data-maintenance
 
2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data Management2013 OTM EU SIG evolv applications Data Management
2013 OTM EU SIG evolv applications Data Management
 
5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer5 Things that Make Hadoop a Game Changer
5 Things that Make Hadoop a Game Changer
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
Oracle: Fundamental Of DW
Oracle: Fundamental Of DWOracle: Fundamental Of DW
Oracle: Fundamental Of DW
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dw
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 

More from Jeffrey T. Pollock

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
Jeffrey T. Pollock
 

More from Jeffrey T. Pollock (13)

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Version Control Training - First Lego League
Version Control Training - First Lego LeagueVersion Control Training - First Lego League
Version Control Training - First Lego League
 
GoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest RakutenGoldenGate and Stream Processing with Special Guest Rakuten
GoldenGate and Stream Processing with Special Guest Rakuten
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 
CDO - Chief Data Officer Momentum and Trends
CDO - Chief Data Officer Momentum and TrendsCDO - Chief Data Officer Momentum and Trends
CDO - Chief Data Officer Momentum and Trends
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
 
Brief lessons from the greatest product managers
Brief lessons from the greatest product managersBrief lessons from the greatest product managers
Brief lessons from the greatest product managers
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
Semantic Web For Dummies
Semantic Web For DummiesSemantic Web For Dummies
Semantic Web For Dummies
 

2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

  • 2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 2
  • 3. <Insert Picture Here> Modern Data Integration for Data Warehousing Oracle Fusion Middleware
  • 4. Agenda • Data Warehouse Problem Space (Data Intg. Focus) • Ancient Pre-History of Data Warehouse • “The Good Old Days” of Data Warehouse • Revival Period for Data Warehouse • Data Integration for Modern Data Warehousing • Old Generation: Hub & Spoke with Invasive Capture • New Generation: Agent-based with Non-invasive Capture • Drive Business Value with Data Integration • Why Replace? Isn’t my Old _____ Good Enough? • The Oracle Solution for Data Integration • Oracle GoldenGate • Oracle Data Integrator • Oracle Data Quality 4
  • 6. Data Warehouse Ancient History • 1985 – 1995 “Controlled Chaos” • Fragmented Strategy for Marts vs. Warehouse • No practical notion of “Enterprise Data Warehouse” • Data Integration: • Hand-coded Scripts (External to DB) • Not Optimized • Procedural Transformations (PL/SQL etc) • Few Data Integration Tools • No Formal Methodology, Metrics or Governance 6
  • 7. Data Warehouse Good Old Days • 1995 – 2005 “Formal Methods and Discipline” • Strategy Choices for Marts vs. Warehouse • Top-down (Inmon) vs. Bottom-up (Kimball) • Formal notion of “Enterprise Data Warehouse” • Data Integration: • Tool-based Data Integration Solutions • Optimized, Parallel Server-based Transforms • Formal Methodology, Metrics or Governance • Reduced Reliance on Hand-coded Scripts and Procedural Transformations (PL/SQL etc) 7
  • 8. Data Warehouse Revival Period • 2005 – 2015 “Specialized Warehouse Solutions” • Technology-driven Choices for High-end DW’s • Commodity H/W vs. Optimized Appliances • Relational/Star vs. Columnar (vs. Cubes/OLAP) • Database + BI vs. Distributed Analytic Apps (Hadoop etc) • EDW as a “source of truth” vision morphs and expands to MDM as a distinct problem domain • Data Integration is still stuck in the “Good Old Days” Good Old Days Modern Alternative Hub-based Runtime Agent-based Runtime Centralized ETL Server Optimized E-LT (DW Appliance) Mainly Batch Mainly Real Time / Trickle Feed 8
  • 9. Data Warehousing with MODERN DATA INTEGRATION 9
  • 10. Modern Data Integration Approach Heterogeneous, Real-time, Non-Invasive, High Performance E-LT Traditional ETL + CDC Modern E-LT + Real-time • Invasive Capture on OLTP • Continuous feeds from systems using complex Adapters operational systems • Transformations in ETL engine • Non-invasive data capture on expensive middle tier servers • Thin middle tier with • Bulk load to the data warehouse transformations on the database with large nightly/daily batch platform (target) • Mini-batches throughout the day Extract or bulk processing nightly Trickle Agent Agent Xform Xform Bulk Lookup Lookup Data Data Staging Load Heterogeneous 10
  • 11. Good Old Days of ETL Batch Integration • Good Tools, but: • Expensive Environments, Performance Bottlenecks, Too Many Data Hops, Proprietary Skills w/Vendor Lock-in, and Heavy Optimization in Complex Situations Development, QA, System (etc) Environments • Won’t scale w/new Generation of DW’s Extract Transform Load Lookups/Calcs Transform Load ETL engines ETL Metadata require BIG Lookup Meta H/W and heavy Data parallel tuning ETL Engine(s) Lookup Sources Stage Data Prod 11
  • 12. Modern Agent-based E-LT Processing • Same Good Tools you Expect, plus: • Reduce Data Center Costs, De-commission Servers • Open Frameworks, Non-Proprietary SQL Skills • Deploys Seamlessly Alone or within SOA Servers • Scales Linearly with Modern DW Appliances Extract Transform Load Lookups/Calcs Transform Load Development, QA, System (etc) Environments Set-based SQL SQL Load E-LT transforms inside DB is Meta typically faster always faster Agent Lookup Sources Data Movement Stage Data Data Transformation Prod 12
  • 13. Good Old Days of Real Time Replication • Good Tools, but: • Arcane capture process, sometimes invasive • Okay for Data Integration Changed Data Capture, but: • not used for Active-Active / ZDT Migrations • not used for High Availability or Disaster Recovery ETL Engine(s) Lookup Sources Stage Data Prod Transaction Apply CDC Hub(s) Mgmt Server 13
  • 14. Agent-based Real Time Replication • Same Good Tools you Expect, but: • Not dependent on hardware for replication • Capable of Heterogeneous, Active-Active Deployments • Suitable for Zero Downtime Migrations • Point-in-time Recovery Lookup Sources Stage Data Data Movement Prod Capture Replicat Agent Agent 14
  • 15. Data Capture Architecture Options • Next Generation Capabilities • Non-invasive, heterogeneous, disk-based log access • Suitable for CDC + High Availability & Active-Active • Bi-directional and high performance • Check-pointing and Simple Trail/Queue Management Updates Triggers Inserts Deletes Log Tables Oracle IBM DB2 MSFT SQL Server Sybase Teradata On-Disk Logs Enscribe 15
  • 16. Good Old Days of Data Integration • Monolithic & Expensive Environments • Fragile, Hard to Manage Development, QA, System (etc) • Difficult to Tune or Optimize Environments Extract Transform Load Lookups/Calcs Transform Load ETL engines ETL Metadata require BIG Lookup Meta H/W and heavy Data parallel tuning ETL Engine(s) Lookup Sources Stage Data Prod Transaction Apply CDC Hub(s) Mgmt Server 16
  • 17. Modern Data Integration Architecture • Lightweight, Inexpensive Environments – Agents • Resilient, Easy to Manage – Non-Invasive • Easy to Optimize and Tune – uses DBMS power Extract Transform Load Lookups/Calcs Transform Load Development, QA, System (etc) Environments Set-based SQL SQL Load E-LT transforms inside DB is Meta typically faster always faster Agent Bulk Data Movement Lookup Sources Stage Data Data Transformation Prod Capture Replicat Agent Agent 17
  • 19. Business Drivers for Data Integration Add Value to the Core Business Lines Design metadata-driven integration 1. Do More with Less Leverage skills & dictate patterns Ensure continuous uptime 2. Compete Globally 24X7 Access data in real time 3. Use Data for Competitive Ensure the quality of your data Advantage Actively govern most valuable asset 4. Automate and Adapt Expose data services for reuse Business Processes Orchestrate processes using SOA 19
  • 20. Project Drivers for Data Integration Essential Ingredient for Information Agility Strategic Value of Data Integration • Consistency for major enterprise initiatives like BI, DW, & MDM • Common technical foundation platform across data silos • Central point for data governance, availability and controls Key Data Integration Use Cases • BI, DW, and OLTP Data Integration & Replication • SOA, Enterprise Integration & Modernization • Migrations and Master Data Management 20
  • 21. Modern Data Integration Alternatives: W H Y R E P L A C E _______? 21
  • 22. Why Replace _______? • We often hear, “my company has already standardized on __________, why should I replace it? Answer: Save Money on Data Center Costs Accelerate Project Delivery / TTM Supply Real Time Intelligence to the Business Reduce Batch Windows on Data Warehouse Unify Data Integration with SOA Plans 22
  • 23. Save Money on Hardware/Data Center E-LT runs on Small Commodity Servers as an Agent Process Typical: Separate ETL Server Next Generation Architecture • Proprietary ETL Engine, Poor Performance • High Costs for Separate Standalone Server E-LT E-LT: No New Servers Transform Transform Extract Load • Lower Cost: Leverage Compute Resources & Partition Workload efficiently • Efficient: Exploits Database Optimizer • Fast: Exploits Native Bulk Load & Other Database Interfaces Conventional ETL Architecture • Scalable: Scales as you add Processors to Source or Target Extract Transform Load Benefits • Optimal Performance & Scalability • Better Hardware Leverage • Easier to Manage & Lower Cost 23
  • 24. Speed Project Delivery/Time to Market E-LT uses Declarative SQL-style Design + Simple Runtime • Development Productivity • Environment Setup (ex: BI Apps) • 40% Efficiency Gains • 33-50% Less Complex Number of Setup Steps 7 Number of Servers 1 Number of connections 3 Number of Setup Steps 10 Number of Servers 3 Number of connections 7 24
  • 25. Supply Real Time Business Intelligence Non-invasive Capture + E-LT Processing Application Real Time BI Analytic BI (using Data Copy) (Facts & Dims) Consistency Window E-LT (Mini-Batch + Transforms) 25
  • 26. Reduce Consistency Windows w/E-LT Fewer Steps, Faster Xform, and Faster Loads vs. typical ETL Extract Transform Load Lookups/Calcs Transform Load Lookup Sources Stage Data Prod ETL engines Main driver for batch require BIG ETL Engine(s) window is data integrity & H/W and heavy consistency; once lookup & parallel tuning ETL Metadata calc functions begin, DW Lookup Meta typically goes offline Data Extract Transform Load Extract Transform Load ETL Batch Window DW is Online Extract Transform Load Uptime Gains Transform Load E-LT Batch Window Lookup Sources Data Movement Stage Data Data Movement Prod E-LT Set-based SQL SQL Load Meta transforms inside DB is Agent typically faster always faster 26
  • 27. *What About “Pushdown Processing” • Pushdown Processing is what the ETL vendors do to compensate for bad performance – push the transformation processing to the Database • Both Pushdown & E-LT have in common: • uses the power of your Data Warehouse for maximum performance • can combine engine-based operations with DB-based transformations to accomplish any level of data transformation complexity • can scale to any multi-TB level and using parallel processing • Only E-LT can claim: • performance optimized for your Database – whichever DB you use • operate without any new IT Hardware costs • 100% Java-based • easily embedded within your existing or planned SOA infrastructure • is not a glorified scheduler that relies on PL-SQL, or other custom-coded DB scripts to achieve maximal performance • can entirely eliminate needless network-hops for remote data joins • can operate with no additional energy drain in your Datacenter 27
  • 28. Unify E-LT Agent with SOA Runtime Best of Breed Data Integration as a Shared SOA Service Unified Management + Monitoring • Common Runtime – 100% Java • Common Monitoring Example Use Cases • Bulk Data Transformation (any2any) • XML/EDI Large File Handling • SOA-driven Business Intelligence • Load DW from SOA High Performance ETL & Replication • Unified Data Steward Workflow (ETL Error Hospital w/BPEL PM) Data Warehouse • ERP Migration, Replication / Loading Any Data Source & OLAP • Query Offloading & Zero Downtime E-LT Frameworks are optimal architectures for: • Embedded Applications • Business Intelligence • Application Integration • Performance Management • Middleware Servers • Database & OLAP 28
  • 30. Oracle Data Integration Solution Best-in-class Heterogeneous Platform for Data Integration Oracle Custom MDM Business Activity SOA Applications Applications Applications Intelligence Monitoring Platforms Comprehensive Data Integration Solution SOA Abstraction Layer Process Manager Service Bus Data Services Data Federation Oracle Data Integrator Oracle GoldenGate Oracle Data Quality ELT/ETL Real-time Data Data Profiling Data Transformation Log-based CDC Data Parsing Bulk Data Movement Bi-directional Replication Data Cleansing Data Lineage Data Verification Match and Merge Storage Data Warehouse/ OLTP OLAP Cube Flat Files Web 2.0 Web and Event Data Mart System Services, SOA 30
  • 31. Key Data Integration Products • Heterogeneous E-LT & ETL • OLAP Data Loading • High-speed Transformations • Data Warehouse Loading • Real Time Data Replication • DBMS High Availability • Changed Data Capture • Disaster Tolerance • Comprehensive Integration • Process Orchestration • ELT/ETL for Bulk Data • Human Workflow • Service Bus • Data Grid • Data Service Modeling • Data Redaction • Query Federation • Service Data Objects • Business Data / Metadata • Time Series Reporting • Statistical Analysis • Integrated Data Quality • Cleansing & Parsing • High Performance • De-duplication • Integrated w/ODI 31
  • 32. Oracle Data Integrator Enterprise Edition Optimized E-LT for improved Performance, Productivity and Lower TCO Legacy Sources E-LT Transformation Any Data vs. E-T-L Warehouse Application Sources Declarative Set-based design Any Change Data Capture Planning System OLTP DB Hot-pluggable Architecture Sources Pluggable Knowledge Modules 32
  • 33. Oracle GoldenGate Overview Enterprise-wide Solution for Real Time Data Needs • Standardize on Single Disaster Recovery, Data Protection Standby Technology for Multiple Needs (Open & Active) • Deploy for Continuous Zero Downtime Availability and Real-time Data Migration and Upgrades Access for Reporting / BI Log Based, Real- Time Change Data Operational Capture Reporting Reporting Database OGG ETL ODS EDW ETL • Highly Flexible Heterogeneous EDW Source Systems Real-time BI • Fast Deployments • Lower TCO & Improved ROI Query Offloading Data Distribution 33
  • 34. How Oracle GoldenGate Works Modular De-Coupled Architecture Capture: committed transactions are captured (and can be filtered) as they occur by reading the transaction logs. Trail: stages and queues data for routing. Pump: distributes data for routing to target(s). Route: data is compressed, encrypted for routing to target(s). Delivery: applies data with transaction integrity, transforming the data as required. Trail Trail Capture Pump Delivery LAN/WAN Internet TCP/IP Source Target Database(s) Bi-directional Database(s) 34
  • 35. Govern Data Better with Data Quality • Data Movement • Data Profiling – E-LT & ETL – Statistical Analysis – Data Transformation – Rule-based Validation – Change Data Capture – Monitoring & Timeslice – Data Access – Fine-grained Auditing Data Movement – Data Services Data Quality and Data Integration Profiling Data Cleansing • Data Cleansing • Data Validation during ETL • Data Standardization • Address Matching & Dedup • Error Hospital / Workflow 35
  • 37. Modern Data Integration Approach Heterogeneous, Real-time, Non-Invasive, High Performance E-LT Traditional ETL + CDC Modern E-LT + Real-time • Invasive Capture on OLTP • Continuous feeds from systems using complex Adapters operational systems • Transformations in ETL engine • Non-invasive data capture on expensive middle tier servers • Thin middle tier with • Bulk load to the data warehouse transformations on the database with large nightly/daily batch platform (target) • Mini-batches throughout the day Extract or bulk processing nightly Trickle Agent Agent Xform Xform Bulk Lookup Lookup Data Data Staging Load Heterogeneous 37
  • 38. Questions 38
  • 39.
  • 40. The preceeding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 40