2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

967 views
912 views

Published on

Presentation describes a modern alternative to conventional hub-based ETL and Replication for Data Warehousing

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
967
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

2010.03.16 Pollock.Edw2010.Modern D Ifor Warehousing

  1. 1. <Insert Picture Here>
  2. 2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 2
  3. 3. <Insert Picture Here> Modern Data Integration for Data Warehousing Oracle Fusion Middleware
  4. 4. Agenda • Data Warehouse Problem Space (Data Intg. Focus) • Ancient Pre-History of Data Warehouse • “The Good Old Days” of Data Warehouse • Revival Period for Data Warehouse • Data Integration for Modern Data Warehousing • Old Generation: Hub & Spoke with Invasive Capture • New Generation: Agent-based with Non-invasive Capture • Drive Business Value with Data Integration • Why Replace? Isn’t my Old _____ Good Enough? • The Oracle Solution for Data Integration • Oracle GoldenGate • Oracle Data Integrator • Oracle Data Quality 4
  5. 5. Data Warehousing PROBLEM SPACE 5
  6. 6. Data Warehouse Ancient History • 1985 – 1995 “Controlled Chaos” • Fragmented Strategy for Marts vs. Warehouse • No practical notion of “Enterprise Data Warehouse” • Data Integration: • Hand-coded Scripts (External to DB) • Not Optimized • Procedural Transformations (PL/SQL etc) • Few Data Integration Tools • No Formal Methodology, Metrics or Governance 6
  7. 7. Data Warehouse Good Old Days • 1995 – 2005 “Formal Methods and Discipline” • Strategy Choices for Marts vs. Warehouse • Top-down (Inmon) vs. Bottom-up (Kimball) • Formal notion of “Enterprise Data Warehouse” • Data Integration: • Tool-based Data Integration Solutions • Optimized, Parallel Server-based Transforms • Formal Methodology, Metrics or Governance • Reduced Reliance on Hand-coded Scripts and Procedural Transformations (PL/SQL etc) 7
  8. 8. Data Warehouse Revival Period • 2005 – 2015 “Specialized Warehouse Solutions” • Technology-driven Choices for High-end DW’s • Commodity H/W vs. Optimized Appliances • Relational/Star vs. Columnar (vs. Cubes/OLAP) • Database + BI vs. Distributed Analytic Apps (Hadoop etc) • EDW as a “source of truth” vision morphs and expands to MDM as a distinct problem domain • Data Integration is still stuck in the “Good Old Days” Good Old Days Modern Alternative Hub-based Runtime Agent-based Runtime Centralized ETL Server Optimized E-LT (DW Appliance) Mainly Batch Mainly Real Time / Trickle Feed 8
  9. 9. Data Warehousing with MODERN DATA INTEGRATION 9
  10. 10. Modern Data Integration Approach Heterogeneous, Real-time, Non-Invasive, High Performance E-LT Traditional ETL + CDC Modern E-LT + Real-time • Invasive Capture on OLTP • Continuous feeds from systems using complex Adapters operational systems • Transformations in ETL engine • Non-invasive data capture on expensive middle tier servers • Thin middle tier with • Bulk load to the data warehouse transformations on the database with large nightly/daily batch platform (target) • Mini-batches throughout the day Extract or bulk processing nightly Trickle Agent Agent Xform Xform Bulk Lookup Lookup Data Data Staging Load Heterogeneous 10
  11. 11. Good Old Days of ETL Batch Integration • Good Tools, but: • Expensive Environments, Performance Bottlenecks, Too Many Data Hops, Proprietary Skills w/Vendor Lock-in, and Heavy Optimization in Complex Situations Development, QA, System (etc) Environments • Won’t scale w/new Generation of DW’s Extract Transform Load Lookups/Calcs Transform Load ETL engines ETL Metadata require BIG Lookup Meta H/W and heavy Data parallel tuning ETL Engine(s) Lookup Sources Stage Data Prod 11
  12. 12. Modern Agent-based E-LT Processing • Same Good Tools you Expect, plus: • Reduce Data Center Costs, De-commission Servers • Open Frameworks, Non-Proprietary SQL Skills • Deploys Seamlessly Alone or within SOA Servers • Scales Linearly with Modern DW Appliances Extract Transform Load Lookups/Calcs Transform Load Development, QA, System (etc) Environments Set-based SQL SQL Load E-LT transforms inside DB is Meta typically faster always faster Agent Lookup Sources Data Movement Stage Data Data Transformation Prod 12
  13. 13. Good Old Days of Real Time Replication • Good Tools, but: • Arcane capture process, sometimes invasive • Okay for Data Integration Changed Data Capture, but: • not used for Active-Active / ZDT Migrations • not used for High Availability or Disaster Recovery ETL Engine(s) Lookup Sources Stage Data Prod Transaction Apply CDC Hub(s) Mgmt Server 13
  14. 14. Agent-based Real Time Replication • Same Good Tools you Expect, but: • Not dependent on hardware for replication • Capable of Heterogeneous, Active-Active Deployments • Suitable for Zero Downtime Migrations • Point-in-time Recovery Lookup Sources Stage Data Data Movement Prod Capture Replicat Agent Agent 14
  15. 15. Data Capture Architecture Options • Next Generation Capabilities • Non-invasive, heterogeneous, disk-based log access • Suitable for CDC + High Availability & Active-Active • Bi-directional and high performance • Check-pointing and Simple Trail/Queue Management Updates Triggers Inserts Deletes Log Tables Oracle IBM DB2 MSFT SQL Server Sybase Teradata On-Disk Logs Enscribe 15
  16. 16. Good Old Days of Data Integration • Monolithic & Expensive Environments • Fragile, Hard to Manage Development, QA, System (etc) • Difficult to Tune or Optimize Environments Extract Transform Load Lookups/Calcs Transform Load ETL engines ETL Metadata require BIG Lookup Meta H/W and heavy Data parallel tuning ETL Engine(s) Lookup Sources Stage Data Prod Transaction Apply CDC Hub(s) Mgmt Server 16
  17. 17. Modern Data Integration Architecture • Lightweight, Inexpensive Environments – Agents • Resilient, Easy to Manage – Non-Invasive • Easy to Optimize and Tune – uses DBMS power Extract Transform Load Lookups/Calcs Transform Load Development, QA, System (etc) Environments Set-based SQL SQL Load E-LT transforms inside DB is Meta typically faster always faster Agent Bulk Data Movement Lookup Sources Stage Data Data Transformation Prod Capture Replicat Agent Agent 17
  18. 18. Data Integration Drives BUSINESS VALUE 18
  19. 19. Business Drivers for Data Integration Add Value to the Core Business Lines Design metadata-driven integration 1. Do More with Less Leverage skills & dictate patterns Ensure continuous uptime 2. Compete Globally 24X7 Access data in real time 3. Use Data for Competitive Ensure the quality of your data Advantage Actively govern most valuable asset 4. Automate and Adapt Expose data services for reuse Business Processes Orchestrate processes using SOA 19
  20. 20. Project Drivers for Data Integration Essential Ingredient for Information Agility Strategic Value of Data Integration • Consistency for major enterprise initiatives like BI, DW, & MDM • Common technical foundation platform across data silos • Central point for data governance, availability and controls Key Data Integration Use Cases • BI, DW, and OLTP Data Integration & Replication • SOA, Enterprise Integration & Modernization • Migrations and Master Data Management 20
  21. 21. Modern Data Integration Alternatives: W H Y R E P L A C E _______? 21
  22. 22. Why Replace _______? • We often hear, “my company has already standardized on __________, why should I replace it? Answer: Save Money on Data Center Costs Accelerate Project Delivery / TTM Supply Real Time Intelligence to the Business Reduce Batch Windows on Data Warehouse Unify Data Integration with SOA Plans 22
  23. 23. Save Money on Hardware/Data Center E-LT runs on Small Commodity Servers as an Agent Process Typical: Separate ETL Server Next Generation Architecture • Proprietary ETL Engine, Poor Performance • High Costs for Separate Standalone Server E-LT E-LT: No New Servers Transform Transform Extract Load • Lower Cost: Leverage Compute Resources & Partition Workload efficiently • Efficient: Exploits Database Optimizer • Fast: Exploits Native Bulk Load & Other Database Interfaces Conventional ETL Architecture • Scalable: Scales as you add Processors to Source or Target Extract Transform Load Benefits • Optimal Performance & Scalability • Better Hardware Leverage • Easier to Manage & Lower Cost 23
  24. 24. Speed Project Delivery/Time to Market E-LT uses Declarative SQL-style Design + Simple Runtime • Development Productivity • Environment Setup (ex: BI Apps) • 40% Efficiency Gains • 33-50% Less Complex Number of Setup Steps 7 Number of Servers 1 Number of connections 3 Number of Setup Steps 10 Number of Servers 3 Number of connections 7 24
  25. 25. Supply Real Time Business Intelligence Non-invasive Capture + E-LT Processing Application Real Time BI Analytic BI (using Data Copy) (Facts & Dims) Consistency Window E-LT (Mini-Batch + Transforms) 25
  26. 26. Reduce Consistency Windows w/E-LT Fewer Steps, Faster Xform, and Faster Loads vs. typical ETL Extract Transform Load Lookups/Calcs Transform Load Lookup Sources Stage Data Prod ETL engines Main driver for batch require BIG ETL Engine(s) window is data integrity & H/W and heavy consistency; once lookup & parallel tuning ETL Metadata calc functions begin, DW Lookup Meta typically goes offline Data Extract Transform Load Extract Transform Load ETL Batch Window DW is Online Extract Transform Load Uptime Gains Transform Load E-LT Batch Window Lookup Sources Data Movement Stage Data Data Movement Prod E-LT Set-based SQL SQL Load Meta transforms inside DB is Agent typically faster always faster 26
  27. 27. *What About “Pushdown Processing” • Pushdown Processing is what the ETL vendors do to compensate for bad performance – push the transformation processing to the Database • Both Pushdown & E-LT have in common: • uses the power of your Data Warehouse for maximum performance • can combine engine-based operations with DB-based transformations to accomplish any level of data transformation complexity • can scale to any multi-TB level and using parallel processing • Only E-LT can claim: • performance optimized for your Database – whichever DB you use • operate without any new IT Hardware costs • 100% Java-based • easily embedded within your existing or planned SOA infrastructure • is not a glorified scheduler that relies on PL-SQL, or other custom-coded DB scripts to achieve maximal performance • can entirely eliminate needless network-hops for remote data joins • can operate with no additional energy drain in your Datacenter 27
  28. 28. Unify E-LT Agent with SOA Runtime Best of Breed Data Integration as a Shared SOA Service Unified Management + Monitoring • Common Runtime – 100% Java • Common Monitoring Example Use Cases • Bulk Data Transformation (any2any) • XML/EDI Large File Handling • SOA-driven Business Intelligence • Load DW from SOA High Performance ETL & Replication • Unified Data Steward Workflow (ETL Error Hospital w/BPEL PM) Data Warehouse • ERP Migration, Replication / Loading Any Data Source & OLAP • Query Offloading & Zero Downtime E-LT Frameworks are optimal architectures for: • Embedded Applications • Business Intelligence • Application Integration • Performance Management • Middleware Servers • Database & OLAP 28
  29. 29. Data Integration the: ORACLE SOLUTION 29
  30. 30. Oracle Data Integration Solution Best-in-class Heterogeneous Platform for Data Integration Oracle Custom MDM Business Activity SOA Applications Applications Applications Intelligence Monitoring Platforms Comprehensive Data Integration Solution SOA Abstraction Layer Process Manager Service Bus Data Services Data Federation Oracle Data Integrator Oracle GoldenGate Oracle Data Quality ELT/ETL Real-time Data Data Profiling Data Transformation Log-based CDC Data Parsing Bulk Data Movement Bi-directional Replication Data Cleansing Data Lineage Data Verification Match and Merge Storage Data Warehouse/ OLTP OLAP Cube Flat Files Web 2.0 Web and Event Data Mart System Services, SOA 30
  31. 31. Key Data Integration Products • Heterogeneous E-LT & ETL • OLAP Data Loading • High-speed Transformations • Data Warehouse Loading • Real Time Data Replication • DBMS High Availability • Changed Data Capture • Disaster Tolerance • Comprehensive Integration • Process Orchestration • ELT/ETL for Bulk Data • Human Workflow • Service Bus • Data Grid • Data Service Modeling • Data Redaction • Query Federation • Service Data Objects • Business Data / Metadata • Time Series Reporting • Statistical Analysis • Integrated Data Quality • Cleansing & Parsing • High Performance • De-duplication • Integrated w/ODI 31
  32. 32. Oracle Data Integrator Enterprise Edition Optimized E-LT for improved Performance, Productivity and Lower TCO Legacy Sources E-LT Transformation Any Data vs. E-T-L Warehouse Application Sources Declarative Set-based design Any Change Data Capture Planning System OLTP DB Hot-pluggable Architecture Sources Pluggable Knowledge Modules 32
  33. 33. Oracle GoldenGate Overview Enterprise-wide Solution for Real Time Data Needs • Standardize on Single Disaster Recovery, Data Protection Standby Technology for Multiple Needs (Open & Active) • Deploy for Continuous Zero Downtime Availability and Real-time Data Migration and Upgrades Access for Reporting / BI Log Based, Real- Time Change Data Operational Capture Reporting Reporting Database OGG ETL ODS EDW ETL • Highly Flexible Heterogeneous EDW Source Systems Real-time BI • Fast Deployments • Lower TCO & Improved ROI Query Offloading Data Distribution 33
  34. 34. How Oracle GoldenGate Works Modular De-Coupled Architecture Capture: committed transactions are captured (and can be filtered) as they occur by reading the transaction logs. Trail: stages and queues data for routing. Pump: distributes data for routing to target(s). Route: data is compressed, encrypted for routing to target(s). Delivery: applies data with transaction integrity, transforming the data as required. Trail Trail Capture Pump Delivery LAN/WAN Internet TCP/IP Source Target Database(s) Bi-directional Database(s) 34
  35. 35. Govern Data Better with Data Quality • Data Movement • Data Profiling – E-LT & ETL – Statistical Analysis – Data Transformation – Rule-based Validation – Change Data Capture – Monitoring & Timeslice – Data Access – Fine-grained Auditing Data Movement – Data Services Data Quality and Data Integration Profiling Data Cleansing • Data Cleansing • Data Validation during ETL • Data Standardization • Address Matching & Dedup • Error Hospital / Workflow 35
  36. 36. CONCLUSION 36
  37. 37. Modern Data Integration Approach Heterogeneous, Real-time, Non-Invasive, High Performance E-LT Traditional ETL + CDC Modern E-LT + Real-time • Invasive Capture on OLTP • Continuous feeds from systems using complex Adapters operational systems • Transformations in ETL engine • Non-invasive data capture on expensive middle tier servers • Thin middle tier with • Bulk load to the data warehouse transformations on the database with large nightly/daily batch platform (target) • Mini-batches throughout the day Extract or bulk processing nightly Trickle Agent Agent Xform Xform Bulk Lookup Lookup Data Data Staging Load Heterogeneous 37
  38. 38. Questions 38
  39. 39. The preceeding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 40

×