Oracle9i for data warehousing
Upcoming SlideShare
Loading in...5
×
 

Oracle9i for data warehousing

on

  • 780 views

 

Statistics

Views

Total Views
780
Views on SlideShare
780
Embed Views
0

Actions

Likes
0
Downloads
16
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • 8
  • 3

Oracle9i for data warehousing Oracle9i for data warehousing Presentation Transcript

  •  
  • Oracle9 i for Data Warehousing John Abrahams Technology Sales Consultant Oracle Nederland
  • The Old Way: Fragmented Information Supply Chain Data Integration Engine OLAP Engine Mining Engine
    • Protracted implementation and maintenance cycle
      • Synchronization and currency issues
      • Information Management chaos
    Data Warehouse Engine
  • The New Way: Oracle9i
    • Single business intelligence platform
      • Reduce administration and implementation costs
      • Faster deployment
      • Improved scalability and reliability
    Data Warehousing ETL OLAP Data Mining Oracle9i
  • Oracle9i Complete, Therefore Simple
  • Oracle9i Database Single business-intelligence data server Relational OLAP Data Mining ETL M e t a d a t a
  • Oracle9i Application Server Runs All Your Business Intelligence Applications Portal Query & Reporting BI Components Web Site Analysis M e t a d a t a
  • Oracle Database for Data Warehousing Continuous Innovation Oracle 7.3
    • Hash Join
    • Bitmap Indexes
    • Parallel-Aware Optimizer
    • Partition Views
    • Instance Affinity: Function Shipping
    • Parallel Union All
    • Asynchronous Read-Ahead
    • Histograms
    • Anti-Join
    • Partitioned Tables and Indexes
    • Partition Pruning
    • Parallel Index Scans
    • Parallel Insert, Update, Delete
    • Parallel Bitmap Star Query
    • Parallel ANALYZE
    • Parallel Constraint Enabling
    • Server Managed Backup/Recovery
    • Point-in-Time Recovery
    Oracle 8.0 Oracle8 i
    • Summary Management
    • New Partitioning Schemes
    • Resource Manager
    • Progress Monitor
    • Adaptive Parallel Query
    • Server-based Analytic Functions
    • Transportable Tablespaces
    • Direct Loader API
    • Functional Indexes
    • Partition-wise Joins
    • Security Enhancements
    • and more ...
  • New Oracle9i RDBMS features Extending Oracle’s leadership
    • Automatic Memory Tuning
    • ETL Infrastructure
      • Change data capture
      • External tables
      • Table functions
      • Upserts
      • Multi-table INSERTs
      • Resumable statements
      • Transportable tablespace enhancements
    • List Partitioning
    • Internal enhancements for:
      • parallel query
      • aggregation
      • cost-based optimization
    • Bitmap Join Indexes
    • Analytic SQL fns
      • Grouping sets
      • FIRST/LAST aggregates
      • Inverse distribution
      • Hypothetical rank
    • Proactive query governing
    • Enhancements to MVs
      • Broader refresh and rewrite capabilities
      • More sophisticated summary advisor
    • Full Outer Joins
    • WITH-clause
  • Real Application Clusters (RAC)
    • Exploiting clustered systems
    • Shared Cache Architecture One database
    • Avoiding application downtime by single node failure
    • Allows applications to become
      • Highly scaleable
      • Highly available
  • Cache Fusion means Scalability
    • Protocol that allows instances to combine their data caches into a shared global cache
      • Global Cache Service (GCS) coordinates sharing
    • Key features are
      • Direct sharing of volatile buffer caches
      • Efficient inter-node messaging framework
      • Fast recovery from node failures using cache and CPU resources from all surviving nodes
    Data A-Z Shared Cache
  • Manage Large Volumes of Data
  • Managing Large Volumes of Data
    • Partitioning and parallelism are crucial for VLDB
    • Parallelism for all operations
      • DBA operations: loading, index-creation, table-creation, data-modification, backup and recovery
      • End-user operations: Queries
      • Unbounded scalability: Real Application Clusters
    • Partitioning provides ‘incremental’ operations for:
      • Data loading
      • Indexing
      • Referential Integrity
      • Backup and recovery
  • How Parallel Execution Works
    • With serial execution only one process is used
    • With parallel execution
      • One parllel execution coordinator process
      • Many parallel execution servers
      • Table is dynamically partitioned into granules
  • VLDB Manageability and Performance Constraints
    • Table availability:
      • Large tables are more vulnerable to disk failure
      • It is too costly to have a large table inaccessible for hours due to recovery
    • Large table manageability
      • They take to long to be loaded
      • Indexes take too long to be built
      • Partial deletes take hours
    • Performance considerations
      • Large table and index scans are costly
      • Scanning a subset improves performance
  • Benefits of Partioning
    • Availability
      • Partions can be independently managed
      • Backup and restore operations can be done on individual partitions
      • Partitions that are unavailable do not affect queries on DML operations on other paritions that use the same table or index
    • Manageability
      • A partition can be moved from one tablespace to another
      • A partition can be dropped, truncated, added
      • A partition can be divided at user-defined value
  • Benefits of Partioning (2)
    • Performance
      • The optimizer eliminates partitions that not have to be scanned
      • Partitions can be scanned in parallel
      • Partitions can be load-balanced across physical devices
      • Join operations can be optimized to “join by the partition”
  • Partitioning Methods
    • Range Partitioning
    • Hash partitioning
    • List Partioning
    • Composite Partioning
    Range partitioning Hash partitioning Composite partitioning List Partioning
  • Partitioned Indexes
    • Indexes can be partitioned like tables
    • Partitioned or nonpartioned indexes can be used with partitioned or nonpartitioned tables
    • Partioned indexes can be
      • Global or local
      • Prefixed or nonprefixed
  • Guidelines for Partioning Indexes
  • Rolling Window Operations ... 01-Mar 01-Feb 96-May 96-Apr Partitioned Tables with Local Indexes: 96-Jun
  • Rolling Window Operations ... 01-Mar 01-Feb 96-May 96-Apr Partitioned Tables with Local Indexes: 1. Load and index new month 96-Jun 01-Apr
  • Rolling Window Operations ... 01-Mar 01-Feb 96-May 96-Apr Partitioned Tables with Local Indexes: 1. Load and index new month 2. Add new month to table 96-Jun 01-Apr
  • Rolling Window Operations ... 01-Mar 01-Feb 96-May 96-Apr Partitioned Tables with Local Indexes: 1. Load and index new month 2. Add new month to table 3. Remove old month from table 96-Jun 01-Apr
  • Rolling Window Operations ... 01-Mar 01-Feb 96-May
    • Partitioned Tables with Local Indexes:
    • New data has been loaded with virtually no disruption
    • Powerful methodology for managing time-based updates to the Warehouse
    96-Jun 01-Apr
  • List Partitioning 1200 GMT Same benefits as rolling window: data is partitioned according to business requirements Maintenance Online Queries Europe Region Americas Region Asia Region
  • List Partitioning Same benefits as rolling window: data is partitioned according to business requirements Maintenance Online Queries Europe Region Americas Region Asia Region 1200 GMT 2000 GMT
  • List Partitioning Same benefits as rolling window: data is partitioned according to business requirements Europe Region Americas Region Asia Region Maintenance Online Queries 1200 GMT 2000 GMT 0400 GMT
  • Table Compression: What is it?
    • Tables can be compressed
      • Compression can also be specified at the partition level
      • Indexes and index-organized tables are not compressed
    • Typical compression ratios are 3:1 - 5:1
      • Compression is dependent upon the actual data
      • Compression algorithm based on removing data redundancy
    • All DDL/DML commands are supported on compressed tables
  • Table Compression: What isn’t it?
    • This is not a generic ‘zip’-style compression
      • Not all tables will have good compression
        • Compression algorithm guarantees that compression will never increase size of table
      • Most large DW tables seem to compress well
    • Compression happens between column/row values, not within column/row values
      • Long character strings are not compressed unless the exact same string appear multiple times
    • LOB/BLOB columns are not compressed
  • Table Compression: How it works
    • <rowid> ‘650-506-7000’ ‘650-123-4567’
    • <rowid> ‘650-506-7000’ ‘650-506-7001’
    • <rowid> ‘650-506-7000’ ‘650-456-7890’
    • <rowid> ‘650-506-7000’ ‘650-098-7654’
    • <rowid> ‘650-506-7000’ ‘650-123-4567’
    • <rowid> ‘650-506-7001’ ‘650-123-4567’
    • <rowid> ‘650-506-7001’ ‘650-123-4567’
    <symbol table: <A>= ‘650-506-7000’, <B>=‘650-506-7001’, <C>=‘650-123-4567’> <rowid> <A> ‘650-123-4567’ <rowid> <A> <B> <rowid> <A> ‘650-456-7890’ <rowid> <A> ‘650-098-7654’ <rowid> <A> <C> <rowid> <B> <C> <rowid> <B> <C> …
    • Duplicate values are stored in symbol table for each block
    Uncompressed Compressed
  • Table Compression: Usage
    • Creating compressed tables:
      • CREATE TABLE T1(id integer) COMPRESS;
    • Converting tables to compressed tables:
      • ALTER TABLE T3 MOVE COMPRESS;
    • Creating compressed tablespaces:
      • CREATE TABLESPACE tabspace_2
      • DATAFILE 'diska:tabspace_file2.dat' SIZE 20M
      • DEFAULT COMPRESS STORAGE ( … );
  • Table Compression: Performance Impact
    • Queries on compressed tables may observe minor performance degradation
      • Performance impact depends upon the query
        • Many queries will be faster
      • Compression reduces IO but increases CPU utilization
      • For a set of heterogeneous queries, performance should degrade by no more than 5%
    • Load and direct-path INSERT performance will be slower
      • Data must be compressed as it is added to the table
  • Table Compression: When to Use it
    • Data warehouses containing large volumes of historical data
      • Compress all of the older data in a data warehouse
      • Integrate compression into the ‘rolling window’ paradigm
        • For example, most recent 3 months of data could be stored uncompressed and the previous 21 months could be stored compressed
    • Materialized views and other derived data sets
    • Generally, compression should be applied to data that is infrequently updated
  • Manage large numbers of concurrent users
  • Manage large numbers of users
    • Key requirements:
      • Guarantee optimal resource utilization all the time
        • Provide the appropriate amount of resources to every job or query based on priority and system load
      • Pro-actively prevent ‘runaway’ queries
      • Pro-actively prevent system overloading
    • Managing large numbers of users should be simple and automated
  • Appropriate Resources to Each Query
    • CPU
      • Business-critical processes receive more CPU
        • Database Resource Manager allows DBA to assign CPU resources to groups of users
    • Memory
      • Oracle9i dynamically allocates runtime memory based on current available memory and each query’s requirements
    • Parallelism
      • Degree of parallelism is dynamically chosen based on available resources and each query’s requirements
  • Automatic Runtime Memory Tuning
    • One parameter:
      • PGA_AGGREGATE_SIZE = <size>
    • Dynamic allocation of ‘runtime’ memory based upon each query’s requirements
      • In data-warehouse environments, >50% of a server’s physical memory is typically used for query ‘runtime’ memory
    • Benefits:
      • Reduced overall memory usage
      • Improved throughput
      • Simplified tuning
  • Pro-active management of DW Workloads
    • Predictive Query Governing and Dynamic Re-prioritization:
      • Queries which are estimated to take longer than an DBA-specified limit will abort or be ‘de-prioritized’
    • Automatic Queuing:
      • A limit can be set on the number of active session for each group of users; queries submitted which exceed this limit will be queued
    • Via Database Resource Manager
  • Example Scenario
    • Power Users
      • Up to 70% of the CPU resources
      • Any degree of parallelism
      • Any query which is expected to take over one hour will be migrated to background
    • Report Users
      • Up to 20% of the CPU resources
      • No parallelism
      • Limit of 40 concurrent queries
      • Any query which is expected to take over 20 minutes will be aborted
    • Background Jobs
      • Up to 10% of the CPU resources
      • Any degree of parallelism
      • Limit of 5 concurrent queries
  • Fast Query Performance
    • The best approach for every query
      • Integrated
      • Comprehensive
    Materialized Views Access & Join Methods Parallel Operations Partitioning Query Optimizer
  • Bitmap Indexes
    • The most common index type in Oracle DW environments
      • Bitmap indexes introduced in Oracle 7.3
      • Bitmap join indexes introduced in Oracle9i
      • Oracle has over a dozen patents for bitmap index technology
    • Oracle provides patented compression technique for bitmap indexes
      • Bitmap indexes are 3-20x smaller than b-tree indexes
      • Less storage yields better query performance and more indexed columns
  • Bitmap indexes introduction Structure of a bitmap index Separate ‘bitmap’ created for each value of the color column A high-level b-tree structure is created so that each bitmap can be located CREATE BITMAP INDEX PROD_COLOR ON PROD(COLOR) <Blue, <rowid>, 1000100100010010100> <Green, <rowid>, 0001010000100100000> <Red, <rowid>, 0100000011000001001> <Yellow, <rowid>, 0010001000001000010>
    • Columns with Low-to-Medium Cardinality
    • ‘ Set-based’ manipulation of data
    • Especially good for large, complex queries
      • Orders of magnitude performance improvement
    • Fully integrated within Oracle9 i
      • Created and managed similar to other Oracle9 i indexes
      • Used to accelerate single-table access, joins, and aggregation
      • Transparently selected by the query optimizer
    Bitmap indexes characteristics
  • Bitmap join indexes CREATE BITMAP INDEX cust_sales_bji ON Sales(Customer.state) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id; Sales Customer
  • Bitmap join indexes CREATE BITMAP INDEX cust_sales_bji ON Sales(Customer.state) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id; Sales Customer Index key is Customer.State Sales( Customer.state )
  • Bitmap join indexes CREATE BITMAP INDEX cust_sales_bji ON Sales(Customer.state) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id; Sales Customer Index key is Customer.State Sales( Customer.state ) Sales ( Customer.state ) Indexed table is Sales SELECT SUM(SALES.DOLLAR_AMOUNT FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id AND CUSTOMER.STATE = ‘California’;
  • Materialized Views
    • Currently, indexes provide fast path access to specific data
    • Materialized views work on the same principle
    • A Materialized view is an instantiation of a SQL statement - a view with data storage
    • Materialized views can be partitioned, indexed separately
    • Used for query rewrite to increase performance
    • Rewrites are transparent to applications
    • Rewrites do not require any special privileges
    CREATE MATERIALIZED VIEW sf_sales AS SELECT * FROM sales WHERE city_name = ’SAN FRANCISCO’ SF_Sales SELECT prod_code FROM sales WHERE city_name = ’SAN FRANCISCO’ SELECT prod_code FROM sf_sales Sales CREATE
  • SQL support for analytic calculations
    • Why enhance the RDBMS for analytic calculations?
    • Benefits
      • Performance
      • Scalability
      • Simpler SQL development
  • Analytic Functions: Examples
    • Rank
      • Top 10 sales-reps in each region
    • Moving Window
      • Today’s stock price minus 200-day moving average
    • Period-over-period comparisons
      • Percentage growth of Jan-99 sales over Jan-98
    • Ratio-to-report
      • January’s sales as a percentage of the entire year’s
  • Platform for Business Intelligence: Data Warehousing Data Warehousing ETL OLAP Data Mining Oracle9 i
    • Foundation of the Business Intelligence Platform
    • More data
    • More users
    • Faster
    • Simple management
    • Oracle9i introduces dozens of new DW features
  • Platform for Business Intelligence: ETL Data Warehousing ETL OLAP Data Mining Oracle9i
    • Transformation Engine
      • Integrated in Oracle9i
      • Scalable (parallel)
      • Extensible (Java, PL/SQL)
      • Efficient (no data staging)
    • Warehouse Builder
      • Extensible framework for designing and deploying DW’s
  • Platform for Business Intelligence: OLAP Data Warehousing ETL OLAP Data Mining Oracle9 i
    • OLAP Services
    • Analysis-ready Oracle database
      • Support for complex, multidimensional queries
      • Highly scalable
    • Development platform for Internet-ready analytical applications
      • Java OLAP API
      • Business Intelligence Beans and JDeveloper
  • Oracle9i OLAP Services Forecasts  Models  Allocations Consolidations  Scenarios  Custom Functions Oracle Relational Database OLAP Services Metadata Java OLAP API Business Intelligence Beans Query Processor Data Metadata Provider Analytic Workspace SQL Generator Data Multidimensional Engine Metadata Metadata Provider Data Warehouse - Query and Reporting
  • Platform for Business Intelligence: Data Mining Data Warehousing ETL OLAP Data Mining Oracle9i
    • Data Mining Suite
      • Scalable, integrated data-mining capabilities
    • Oracle Personalization
      • Real-time personalization engine for building 1:1 relationships over the web
  •