"Oracle10g for Data Warehousing"
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

"Oracle10g for Data Warehousing"

on

  • 823 views

 

Statistics

Views

Total Views
823
Views on SlideShare
823
Embed Views
0

Actions

Likes
0
Downloads
41
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Since the release of Oracle7.1 in which Parallel Query was introduced Oracle has continued to increase its investement for the hub of Oracle Warehouse the relational data base engine. In the last three years the rate of innovation has accellerated with Oracle releasing major functionality for data warehousing every year culmanating with our most recent release Oracle8i. Oracle8i extends Oracle’s technological leadership in data warehousing. By providing the broadest range of data-warehouse capabilities, Oracle8i is unique in its ability to support any decision-support system, ranging from easy-to-build data marts, to the largest terabyte data warehouses. This slide show only a partial list of the technology improvements since Oracle7.3
  • Amazon.com: Today, the Amazon data warehouse contains 7 TB of data and continues to grow at a 100% annual rate. It contains data on customers, inventory, orders, products, supply chain, web site activity (clickstream), pricing, financial activity and other subjects. The database is updated 6 times a day. The data warehouse is used by about 500 users covering every department within the Amazon organization. On the average, users submit about 2000 queries a day. Growth rates are formidable: while data volume is doubling annually, the number of queries is growing at 2.5 times per year. Quote from Amazon” the business value realized in one year was more than 10 times the cost of the data warehouse.” See the Winter Report: http://www.oracle.com/features/facts/0612_db_winterdw.html France Telecom: Needed data warehouse to support fraud detection, customer service, network traffic analysis for fixed and wireless 32TB Oracle database 180 billion CDRs 8000 end users (600 concurrent) CDRs added hourly (peak 65 million an hour) CDR queries completed in under 4 seconds How they did it…. Composite Partitioning, Transportable Tablespaces What they are doing next….. Oracle9i release 2, Continuous update, Double the volume of online data with Data Compression See the Winter Report: http://www.oracle.com/features/facts/0612_db_winterdw.html
  • When configuring hardware for a data warehouse defining the IO requirements is very challenging because predicting the IO demand is difficult. In general the IO requirements depends on the query mix. As a rule of thumb assuming today’s CPUs (3 GHz Xeon or 2.2 GHz Opteron) an IO bound query can drive about 200MB/s per CPU. Having said this, it depends on the ratio of IO vs. CPU bound queries how high the IO requirements are. It is also very important to investigate whether the system will have more queries issuing random vs sequential IOs. Random IOs are more dominant in index driven queries such as index lookups, index driven joins such as nested loops or bitmap indexes or index scans. Sequential IOs are more dominant in table scans used in hash joins Depending on the workload-mix the entire system has to be setup to guarantee the maximal throughput for random and sequential IOs.
  • To make sure that a system delivers the IO demand that is required, all system components on the IO path need to be orchestrated to work together. The weakest link determines the IO throughput. On the left side you see a high level picture of an system. This is a system with 4 nodes, 2 HBAs per node two fibre channel switches, which are attached to 4 disk arrays each. The components on the IO path are the HBAs, cable, switches and disk arrays. Performance depends on the number and speed of the HBAs, switch speed, controller quantity and speed plus number and speed of disks. If any of these components are underconfigured, the system throughput is determined by this component. Assuming we have 2GBit HBA, the nodes can read about 8 times 200MB/s = 1.6GBytes/s. On the other hand assuming each disk array has one controller, all 8 arrays can also do 8 times 200MB/s = 1.6GBytes/s. Hence, each of the fibre channel switches also need to deliver at least 2GBit/s per port, to a total of 800 MB/s total throughput. The two switches will then deliver the needed 1.6 Gbytes/s When sizing a system also take the system limits into consideration. For instance, the number of Bus Slots per node is limited and may need to be shared between HBAs and Network cards. In some cases dual port cards exist if the number of slots is exhausted. The number of HBAs per node determines the maximal number of fibre channel switches. And the total number of ports on a switch limits the number of HBAs and disk controllers.
  • In discussion we often we see that people confuse bits with bytes. This confusion mostly originates from the fact that hardware vendors tend to describe component’s performance in bits/s while database vendors and customers describe their performance requirements in bytes/s Here is a list of common hardware components with their theoretical performance in Bits/second and typical performance in Bytes/second. HBAs come in 1 or 2GBit per second with a typical throughput of 100 or 200MB/s A 16 port switch comes with 16 2Gbit ports. However, the total throughput is 8 times 2Gbit, which results in 1200 Mbytes/s Fibre Channel cables have a 2Gbit/s throughput, which translates into 200MB/s Disk Controllers come in 2GBit/s throughput, which translates into about 200MB/s GigE has a typical performance of about 80MB/s while Infiniband delivers about 160MB/s
  • To make sure that a system delivers the IO demand that is required, all system components on the IO path need to be orchestrated to work together. The weakest link determines the IO throughput. On the left side you see a high level picture of an system. This is a system with 4 nodes, 2 HBAs per node two fibre channel switches, which are attached to 4 disk arrays each. The components on the IO path are the HBAs, cable, switches and disk arrays. Performance depends on the number and speed of the HBAs, switch speed, controller quantity and speed plus number and speed of disks. If any of these components are underconfigured, the system throughput is determined by this component. Assuming we have 2GBit HBA, the nodes can read about 8 times 200MB/s = 1.6GBytes/s. On the other hand assuming each disk array has one controller, all 8 arrays can also do 8 times 200MB/s = 1.6GBytes/s. Hence, each of the fibre channel switches also need to deliver at least 2GBit/s per port, to a total of 800 MB/s total throughput. The two switches will then deliver the needed 1.6 Gbytes/s When sizing a system also take the system limits into consideration. For instance, the number of Bus Slots per node is limited and may need to be shared between HBAs and Network cards. In some cases dual port cards exist if the number of slots is exhausted. The number of HBAs per node determines the maximal number of fibre channel switches. And the total number of ports on a switch limits the number of HBAs and disk controllers.
  • What are the conclusion that one can draw from a dd test. This chart shows an example of how dd and Oracle behave. On this particular system, each dd reads about 80MB/s. The Oracle throughput is slightly lower. With dd the total throughput flattens out slightly below 500MB. Total throughput with Oracle is slightly lower at about 450MB/s or about 90% of what can be achieved with dd.
  • 15 14 One typical type of query often found in data warehouses is a star query. Oracle has developed specific technology to address this common type of business query. Oracle supports star queries with its ‘star transformation’ technology, an innovative application of bitmap indexes and advanced query optimization. This proven technology has been broadly implemented by customers using Oracle8 and Oracle8i. This technology gets even better in Oracle9i since Oracle’s star query algorithms can additionally take advantage of bitmap join indexes.
  • Partitioning not only provides benefits for managing large volumes of data, but partitioning also provide tremendous benefits for query performance. The most basic way in which partitioning improves query performance is through partition-pruning. In this example query, we only need to retrieve data for March, April and May. Oracle will therefore automatically ‘prune’ the unnecessary partitions, so only the partitions corresponding to March, April, and May are accessed. In this example, the partition-pruning results a 2x gain in performance, since Oracle is scanning 3 partitions instead of 6 partitions. In many cases, the actual gains from partition pruning can be much more dramatic (consider the business query which examines data from one month in a partitioned table containing 36 months of historical data). Partition-pruning works in conjunction with all other performance features. A query can take advantage of partition-pruning, while also taking advantage of other features like parallelism and indexing.
  • Here is a typical data-warehouse query. An analyst wants to find the total quarterly sales for each of several regions. This query will benefit considerably from many of Oracle8's current performance features. But performance could be even further enhanced. For example, a database administrator could create 'summary tables'. These summary tables store precomputed results. For example, here is a summary table has the precomputed results for the total monthly sales for each region. The advantage of using precomputed results is that the queries against the precomputed results will be orders of magnitude faster than the same queries against the detail data. The disadvantage is that the user or the query tool may need to know that these summary tables exist in order to realize these performance benefits. The solution is that Oracle8i knows about these summary tables. Although a query may nominally access the detail data, Oracle8i will rewrite the query to use an appropriate summary table. In this way, Oracle8i can vastly improve the performance of many data warehouse queries. In fact, this was one of the features that was used to achieve the excellent performance of the new TPC-D benchmark.

"Oracle10g for Data Warehousing" Presentation Transcript

  • 1. Oracle 10g for Data Warehousing Hermann Baer, Oracle Product Management Data Warehousing Server Technologies NoCOUG Winter Conference, Feb 8 th 2005
  • 2. Agenda
    • Oracle10g for data warehousing - short trip back in the history
      • Continuous innovation over decades
    • Adoption trends and drivers
      • What do we see in the market
    • Design and build a Data Warehouse
      • Ensure a well-balanced system
      • Optimize Oracle
    • Oracle Database 10gR2 – sneak preview
  • 3. The way to Oracle10g …
    • Data Warehousing development started decades ago with Oracle 7.0
      • Primary focus on performance and scalability
      • Extended with Manageability and the BI platform vision in the Oracle8 i time frame
    • Data Warehousing Imperatives
      • Efficient Extract, Transform, Load (ETL)
      • Managing Large Data Volumes
      • Fast Query Response
      • Supporting Large User Population
      • Managing Simply
  • 4. Oracle10 g for Data Warehousing Continuous Innovation Oracle 7.3
    • Hash and Composite Partitioning
    • Resource Manager
    • Progress Monitor
    • Adaptive Parallel Query
    • Server-based Analytic Functions
    • Materialized Views
    • Transportable Tablespaces
    • Direct Loader API
    • Functional Indexes
    • Partition-wise Joins
    • Security Enhancements
    Oracle9 i
    • List and Range-List Partitioning
    • Table Compression
    • Bitmap Join Index
    • Self-Tuning Runtime Memory
    • New Analytic Functions
    • Grouping Sets
    • External Tables
    • MERGE
    • Multi-Table Insert
    • Proactive Query Governing
    • System Managed Undo
    Oracle8 i Oracle10 g
    • Self-tuning SQL Optimization
    • SQL Access Advisor
    • Automatic Storage Manager
    • Self-tuning Memory
    • Change Data Capture
    • SQL Models
    • SQL Frequent Itemsets
    • SQL Partition Outer Joins
    • Statistical functions
    • and much more ...
    • Partitioned Tables and Indexes
    • Partition Pruning
    • Parallel Index Scans
    • Parallel Insert, Update, Delete
    • Parallel Bitmap Star Query
    • Parallel ANALYZE
    • Parallel Constraint Enabling
    • Server Managed Backup/Recovery
    • Point-in-Time Recovery
    Oracle 8.0
  • 5. Agenda
    • Oracle10g for data warehousing - short trip back in the history
      • Continuous innovation over decades
    • Adoption trends and drivers
      • What do we see in the market
    • Design and build a Data Warehouse
      • Ensure a well-balanced system
      • Optimize Oracle
    • Oracle Database 10gR2 – sneak preview
  • 6.
    • Oracle VLDWs are growing
      • Less systems, more data
    • DW systems are consolidated
      • Global view of the business
    • Importance of Data Warehousing increases dramatically
      • Growing operational/tactical importance
    • Cost Effectiveness becomes more important
      • Better decisions, lower cost
    Main Trends and Drivers
  • 7.
    • Sears Teradata 4.63
    • HCIA Informix 4.50
    • Wal-Mart Teradata 4.42
    • Tele Danmark DB2 2.84
    • CitiCorp DB2 2.47
    • MCI Informix 1.88
    • NDC Health Oracle 1.85
    • Sprint Teradata 1.30
    • Ford Oracle 1.20
    • Acxiom Oracle 1.13
    Oracle VLDWs are growing Winter 2003 VLDB Survey Largest Database Size, Decision Support SBC Teradata 10.50 First Union Informix 4.50 Dialog Proprietary 4.25 Telecom Italia DB2 3.71 FedEx Teradata 3.70 Office Depot Teradata 3.08 AT & T Teradata 2.83 SK C&C Oracle 2.54 NetZero Oracle 2.47 Telecom Italia Informix 2.32 2001 Survey 1998 Survey France Telecom Oracle 29.23 AT&T Daytona 26.27 SBC Teradata 24.81 Anonymous DB2 16.19 Amazon.com Oracle 13.00 Kmart Teradata 12.59 Claria Oracle 12.10 HIRA Sybase IQ 11.94 FedEx Teradata 9.98 Vodafone Teradata 9.91 2003 Survey
  • 8.
    • Powerful RDBMS functionality becomes more important and visible, e.g.
      • Partitioning
      • Table compression
      • Automatic Storage Management (ASM)
      • Parallel processing
    Oracle VLDWs are growing
  • 9. Increasing Importance of DW
    • Latency between operational and analytical data must be minimized
      • Intelligence when you need it
    • Need for new and enhanced analytical capabilities
      • More value from your data
    • “ Classical” strengths of an RDBMS become more important
      • E.g. Security, B/R, Availability, Concurrency
  • 10.
    • Safe money whenever possible
      • Commodity servers
      • Commodity disks
      • Software manageability
    • Example Amazon
      • 16 low cost Intel boxes replaced one SuperDome
      • Low cost storage arrays replaced high end storage arrays
      • 2 DBAs
    Cost Effectiveness
  • 11. Cost Effectiveness Pay and Scale Incrementally 3 6 9 12 15 18 21 24 Months 100% 200% 300% W o r k l o d
  • 12. Cost Effectiveness Pay and Scale Incrementally ... with RAC 3 6 9 12 15 18 21 24 Months 100% 200% 300% W o r k l o d
  • 13.
    • Commodity components make specific database functionality more important
      • RAC for Scalability and Availability
      • Resource Manager
      • Automatic Storage Management (ASM)
      • RMAN / Oracle Backup (Oracle10gR2)
    Cost Effectiveness
  • 14. Oracle Database 10 g DW Major Feature Summary
    • ULDB support
      • Database size extended to Exabytes (BIGFILES)
      • Unlimited size LOBs
      • Hash Partitioned Global Indexes
      • ASM removes file system limits
    • More Value From Your Data
      • Many New OLAP Features
      • New Data Mining algorithms
      • Stand-alone Data Mining Tool
      • Advanced Statistics
      • SQL Model Clause
      • Frequent Item Sets
      • Partition Outer Join
    • Intelligence When You Need It
      • Cross Platform Transportable Tablespaces
      • Data Pump
      • Async Change Data Capture
      • Enhancements to MERGE
    • Reduced Total Cost of Ownership
    • Manageability
      • Workload Repository
      • Automatic SQL Tuning
      • Self-Tuning Global Memory
      • ASM
  • 15. Agenda
    • Oracle10g for data warehousing - short trip back in the history
      • Continuous innovation over decades
    • Adoption trends and drivers
      • What do we see in the market
    • Design and build a Data Warehouse
      • Ensure a well-balanced system
      • Optimize Oracle
    • Oracle Database 10gR2 – sneak preview
  • 16. Build the foundation for Success
    • Even after decades of innovation, a computer ‘still’ consists of three main components
      • CPU provides the computing power
      • Memory stores the transient data for computational operations
      • Disks (I/O) store the persistent information
    • Getting the best performance is finding the right balance of all these components and use them optimally
      • Size your system appropriately
      • Design your database appropriately
      • Use the database appropriately
    • Data Warehousing is ‘just a special kind of application’
  • 17. Configuring for your Workload
    • CPU requirements depend on user workload:
      • Concurrency of users, ratio of CPU-related tasks
    • Memory requirement mostly user-process driven
    • IO requirements depend on query-mix:
      • CPU vs. IO
        • Relative CPU power for IO related tasks
      • Logically Random IOs (predominant in star schema)
        • required for index driven queries, e.g. Index lookups, Index driven joins, Index scans
      • Logically Sequential IOs (predominant in 3 rd NF schema)
        • required for table scans, e.g. Hash Joins
    • Find the balance between CPU and IO
  • 18.
    • Oracle can read 300+MB/sec per GHz/CPU power
      • Direct Read, multi-block IO,
        • e.g, parallel full table scan ('lab environment')
    • An ‘average’ DW system should plan for 75 -100MB/sec per GHz/CPU
      • Typical mixture of IO and CPU intensive operations
      • Ball park number, adjust accordingly
    • TPC-H plans for appr. 200MB per 3GHz Xeon
    Configuring for Throughput Sizing Guidelines
  • 19. Configuring for Throughput FC-Switch1 FC-Switch2 Disk Array 1 Disk Array 2 Disk Array 3 Disk Array 4 Disk Array 5 Disk Array 6 Disk Array 7 Disk Array 8
    • “ The weakest link” defines the throughput
    • Components to consider:
      • CPU: Quantity and speed
      • HBA (Host Bus Adapter): Quantity and speed
      • Switch speed
      • Controller: Quantity and speed
      • Disk: Quantity and speed
    HBA1 HBA2 HBA1 HBA2 HBA1 HBA2 HBA1 HBA2
  • 20. Configuring for Throughput Bit is not Byte Throughput Performance Component theory (Bit/s) maximal Byte/s HBA 1/2Gbit/s 100/200 Mbytes/s 16 Port Switch 8 x 2Gbit/s 1600 Mbytes/s Fibre Channel 2Gbit/s 200 Mbytes/s Disk Controller 2Gbit/s 200 Mbytes/s GigE NIC 1Gbit/s 80 Mbytes/s Infiniband 10Gbit/s 890 Mbytes/s CPU 200MB/s
  • 21. Configuring for Throughput FC-Switch1 FC-Switch2 Disk Array 1 Disk Array 2 Disk Array 3 Disk Array 4 Disk Array 5 Disk Array 6 Disk Array 7 Disk Array 8 HBA1 HBA2 HBA1 HBA2 HBA1 HBA2 HBA1 HBA2 Each switch needs to support 800MB/s to guarantee a total system throughput of 1600 MB/s Each machine has 2 HBAs = 400MB/s; all 8 HBAs can sustain 8 * 200MB/s = 1600 MB/s Each machine has 2 CPUs; all four servers drive about 2 * 200MB/s * 4 = 1600 MB/s Each disk array has one 2Gbit controller; all 8 disk arrays can sustain 8 * 200MB/s = 1600 MB/s
  • 22. Configuring the Storage
    • Design for throughput, not capacity
    • Keep it simple
      • Try using RAID 0+1
    • Use S.A.M.E. methodology
      • S tripe A nd M irror E verything
      • At the HW level, if available
      • Using ASM capabilities
    • Leverage ASM whenever possible
      • Striping and Mirroring capabilities
      • Automatic rebalancing
      • Enables low cost storage
  • 23.
    • You can easily compute the theoretical I/O performance of your system
      • Typically measured by the minimum of [ I/O channel capacity, I/O controller capacity, disk I/O capacity]
    • Verify the I/O performance limits using OS-level commands
      • Do this prior to using the database
    • Cover basic IO operations and the average future load pattern
      • Random single block IO vs. sequential multi block IO
      • Concurrency
    Calibrate your System
  • 24. Calibrate your System Throughput dd vs. ORCL DIRECT READ
    • Oracle drives about 90% of what dd can drive with a table scan
    • If you do not get the expected throughput fix the hardware
  • 25. Agenda
    • Oracle10g for data warehousing - short trip back in the history
      • Continuous innovation over decades
    • Adoption trends and drivers
      • What do we see in the market
    • Design and build a Data Warehouse
      • Ensure a well-balanced system
      • Optimize Oracle
    • Oracle Database 10gR2 – sneak preview
  • 26. Schema – which way to go?
    • Don’t get lost in theory and academia
      • Philosophical discussions won’t help (“Star fights 3NF”)
      • None of the two extremes will work (RedBrick?, Teradata?)
    • Design according to your business needs
      • Reality shows that most of the customers are doing a mix and match
        • 3NF more in an ODS layer
        • ‘ Denormalized’ 3NF in DW/Stage for general purposes
        • Dimensional model for subject areas, e.g. sales, marketing (remember shared dimensions!)
    * OLAP will not be covered in this presentation Successful database has to support everything
  • 27.
    • The chosen schema approach determines used Oracle functionality
    • The chosen schema approach determines IO pattern
      • Logically Random IOs (predominant in star schema)
        • required for index driven queries, e.g. Index lookups, Index driven joins, Index scans
      • Logically Sequential IOs (predominant in 3 rd NF schema)
        • required for table scans, e.g. Hash Join
    • Oracle has both functionality to
      • Push the IO to the limit
      • Optimize the IO requirements
    Schema – which way to go?
  • 28. Schema – which way to go? Star Schema
    • Leading performance for dimensional schemas
    • Innovative usage of bitmap indexes and bitmap join indexes
      • Index access instead of large table access
      • Bitmap indexes 3 to 20 times smaller than btree indexes
    • Support for complex star schemas
      • Multiple fact tables
      • Snowflake schemas
      • Large number of dimensions
    • Fully integrated
      • Parallel execution
      • Partition Pruning
  • 29. I/O – Minimize Requests
    • Only the relevant partitions will be accessed
    • Optimizer knows or finds the relevant partitions
      • Static pruning with known values in advance
      • Dynamic pruning uses internal recursive SQL to find the relevant partitions
    • Minimizes I/O operations
      • Also provides order of magnitude performance gains
    Partition Pruning 99-May 99-Apr 99-Feb 99-Jan 99-Mar 99-Jun Sales
  • 30. Monthly Sales by Region Query What were the sales in the West and South regions for the past three Quarters ? Detail Materialized Views Query Rewrite I/O – Minimize Requests A simple rollup Month -> Quarter provides unprecedented gain on performance and minimal I/O
  • 31. Schema – which way to go? CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS ... ... Jan Feb Mar Apr ... ... Jan Feb Mar Apr Jan Jan 3NF example Jan, Hash 1 Jan, Hash 2 Jan, Hash 3 Jan, Hash 4 Example of an optimized parallel partition-wise join of a composite partitioned table
  • 32.
    • Use parallelism to enable single process scalability
    • Unrestricted parallelism
      • No data layout requirement or restriction (as in shared nothing systems)
      • All operations can be parallelized
    Schema – which way to go? Schema Agnostic - Parallel Execution Data on Disk Query Servers scan scan scan sort A-K sort L-S sort T-Z Dispatch work Scanners Sorters (Aggregators) Coordinator
  • 33.
    • I/O bandwidth requirement increases with single process parallelism and multi-user concurrency
      • Plan for your system’s expected I/O throughput based on average concurrent users and parallelism
    Schema – which way to go? Schema Agnostic - Parallel Execution DOP 2 DOP 2 Total 200 MB/sec
  • 34.
    • I/O bandwidth requirement increases with single process parallelism and multi-user concurrency
      • Plan for your system’s expected I/O throughput based on average concurrent users and parallelism
    Schema – which way to go? Schema Agnostic - Parallel Execution DOP 4 DOP 4 DOP 4 DOP 4 Total 400 MB/sec
  • 35.
    • I/O bandwidth requirement increases with single process parallelism and multi-user concurrency
      • Plan for your system’s expected I/O throughput based on average concurrent users and parallelism
    Schema – which way to go? Schema Agnostic - Parallel Execution Total 800 MB/sec DOP 8 DOP 8 DOP 8 DOP 8 DOP 8 DOP 8 DOP 8 DOP 8 Total 1600 MB/sec DOP 8 DOP 8 DOP 8 DOP 8 DOP 8 DOP 8 DOP 8 DOP 8
  • 36.
    • Star schema
      • Range-partition fact tables by time
      • Bitmap indexes on dimension-key columns of fact table
      • ‘ Star transformation’ for end-user queries
      • Materialized views for pre-aggregated cubes
    • 3NF or normalized schema
      • Composite range-hash partitioning on large tables
      • ‘ Partition-wise’ joins and parallel execution are key performance enabler for joining large tables
    • Hybrid environments
      • Use both dogmas concurrently in the same system without affecting each other
    Schema – which way to go? Oracle‘s functionality Choose what fits your needs best! Oracle provides optimizations for any kind of setup
  • 37. Init.ora – less is more
    • Do not de-tune Oracle
      • Very often, our performance engineers are getting improvements just by removing parameters
      • Results can be poor optimizer plans, wasted memory, and serialization points
    • Trust Oracle
      • Don’t try and second guess the software
      • With the exception of buffer and subject area related parameters, the system defaults are usually optimum
    Lessons learned from History
  • 38. Init.ora – less is more
    • Ensure that data warehouse relevant parameters are set
      • Not all parameters are enabled by default in older database releases prior to Oracle10g
    • Size and set buffer and memory related parameters
      • Two parameters are enough
    • Do not touch other parameters unless necessary
    Basic Rules
  • 39. Init.ora – less is more
    • COMPATIBLE
      • Database release version to enable new functionality
    • OPTIMIZER_FEATURES_ENABLED
      • Database release version to enable new functionality
    • DB_MULTIBLOCK_READ_COUNT
      • Maximize multiblock I/O (use multiple of OS I/O size)
    • DISK_ASYNCH_IO
      • Set to TRUE (Only relevant for older Linux versions)
    • PARALLEL_MAX_SERVERS
      • Adjust to system capabilities (default to 5 prior to Oracle10g)
    • QUERY_REWRITE_ENABLED
      • Set to TRUE, enabled by default with Oracle10g
    • QUERY_REWRITE_INTEGRITY
      • ENFORCED by default, can be potentially lowered
    • STAR_TRANSFORMATION_ENABLED
      • Set to TRUE
    Data Warehouse relevant parameters
  • 40.
    • Data Warehousing is ‘just a special kind of application’
    • Ensure a well-tuned I/O subsystem
      • Size for I/O throughput, not for disk capacity
      • Use appropriate hardware / storage
    • Find a schema balance
      • Design according your needs using the appropriate model, not the other way around
    • Init.ora settings: less is more
    Build the foundation for Success Summary
  • 41. Agenda
    • Oracle10g for data warehousing - short trip back in the history
      • Continuous innovation over decades
    • Adoption trends and drivers
      • What do we see in the market
    • Design and build a Data Warehouse
      • Ensure a well-balanced system
      • Optimize Oracle
    • Oracle Database 10gR2 – sneak preview
  • 42. ETL Enhancements
    • DML error logging
      • Column values that are too large
      • Constraint violations ( NOT NULL , unique, referential, check constraints)
      • Errors raised during trigger execution
      • Type conversion errors
      • Partition mapping errors
    • Distributed Change Data Capture
      • Enables 9.2 as source for asynchronous CDC
  • 43. DML Error Logging (example) INSERT INTO sales SELECT product_id, customer_id , TRUNC(sales_date), 3, promotion_id , quantity, amount FROM sales_activity_direct LOG ERRORS INTO sales_activity_errors('load_20050801') REJECT LIMIT UNLIMITED ;
  • 44. Performance Enhancements
    • Sort
      • ORDER BY statements
      • (B-tree) index creation
      • Up to 5 times performance improvement
    • Aggregation
      • GROUP BY statements
      • Materialized views using aggregations
      • Implicit use of aggregations, e.g. statistics gathering
      • Two to three times performance improvement
    • Query rewrite using multiple materialized views
  • 45. Partitioning Enhancements
    • Scalability
      • Maximum number of partitions 64K -> 1M
      • Resource optimization for DROP TABLE of a partitioned table
      • Support for partitioning on index-organized tables
      • Support for hash-partitioned global indexes
    • Performance
      • Support for ‘Multi dimensional’ partition pruning
  • 46. Other Enhancements
    • Manageability
      • SQL Access Advisor improvements
      • Materialized view refresh improvements
    • Analytics
      • SQL model clause enhancements
  • 47. Summary
    • Oracle10g for data warehousing - short trip back in the history
      • The most powerful and successful DW platform
    • Adoption trends and drivers
      • Be visionary, though conservative
      • Guarantee success and protect investments
    • Design and build a Data Warehouse
      • Ensure a well-balanced system
      • Optimize Oracle
    • Oracle Database 10gR2 Beta – Interested?
  • 48. A Q & Q U E S T I O N S A N S W E R S