The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
Performance Summit – Intro and Data
Warehouse Performance
Andrew Holdsworth, Tom Kyte, Graham Wood
Server Technologies, Oracle Corp
Agenda
• 09:00 - 10:30 session
• 10:30 - 10:45 break
• 10:45 - 12:30 session
• 12:30 - 13:30 lunch
• 13:30 - 15:00 session
• 15:00 - 15:15 break
• 15:15 - ~17:00 session / Q&A / wrapup
• http://asktom.oracle.com ->
– Files Tab ->
• realworld.zip
Andrew Holdsworth
Senior Director
Real World Performance
Server Technologies
Tom Kyte
• Been with Oracle since 1993
• User of Oracle since 1987
• The “Tom” behind AskTom in
Oracle Magazine
www.oracle.com/oramag
• Expert Oracle Database
Architecture
• Effective Oracle by Design
• Expert One on One Oracle
• Beginning Oracle
Graham Wood
Architect
Server Technologies
• Make the experts prove everything
• Statements that should raise your
eyebrows:
– It is my opinion...
– I claim...
– I think...
– I feel…
– I KNOW…
• Everything can (and should) be proven
• Things change, expect that
• It only takes a single counter case
• “It depends” or “Why” are the only
answers you need
“Question Authority.”
The Data
Warehouse
© 2009 Oracle Corporation – Proprietary and Confidential
Why 3 Screens
• Interpreting the windows
– Demonstration menus
– Monitoring the Database Machine
• Begin loading 1 Terabyte of data
– Create Tablespaces
– Create Schema Objects
– One-Off Load
– Gather Statistics
Data Loading
Data Loading
Data Loading
Data Loading
Is Data Loading Really the Problem
• Where have Oracle Customers struggled with the
Performance of their Data Warehouse ?
– Data Loading ?
– Data Validation and Verification ?
– ETL and Transformation ?
• So what went Wrong ?
Data Warehouse Death Spiral
• HW CPU Sizing 10X
– Sized like an OLTP System
• I/O Sizing 10X
– Sized by Space requirements
– Cannot use Parallel Query
• Using the the incorrect Query Optimization
Techniques 10X
– Over Indexed Database
– Data Loads and ETL running to Slow
• System Over loaded to Make the CPU look Busy
– 100s of Concurrent Queries taking Hours to Execute
Some Basic Maths
• Index Driven Query retrieving 1,000,000 rows
– Assume the Index is cache and the data is not.
• 1,000,000 random IOPS @ 5ms per I/O
• This required 5000 Seconds to Execute
• This is why queries may take over an hour
– How much data could you scan in 5000 Seconds with a fully
sized I/O system able to scan 28 Gig/Sec ?
• Clearly For Oracle Data Warehouses the game is
changing
– New Design Techniques
– Time to Re-Train the DBAs
So, joe (or josephine) sql coder needs to run the following query:
select t1.object_name, t2.object_name
from t t1, t t2
where t1.object_id = t2.object_id
and t1.owner = 'WMSYS'
Rows Row Source Operation
------- ---------------------------------------------------
528384 HASH JOIN
8256 TABLE ACCESS FULL T
1833856 TABLE ACCESS FULL T
suppose they ran it or explain planned it -- and saw that plan.
"Stupid stupid CBO" they say -- "I have indexes, why won't it use
them. We all know that indexes mean fast=true! Ok, let me use the
faithful RBO and see what happens"
Mythology – why isn’t it using my index
select /*+ RULE */ t1.object_name, t2.object_name
from t t1, t t2
where t1.object_id = t2.object_id
and t1.owner = 'WMSYS'
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T'
2 1 NESTED LOOPS
3 2 TABLE ACCESS (FULL) OF 'T'
4 2 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE)
See, now that’s what I’m talking about – indexes are good…
Or are they?
Mythology – why isn’t it using my index
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 35227 5.63 9.32 23380 59350 0 528384
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 35229 5.63 9.33 23380 59350 0 528384
Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 80
Rows Row Source Operation
------- ---------------------------------------------------
528384 HASH JOIN
8256 TABLE ACCESS FULL T
1833856 TABLE ACCESS FULL T
Mythology – why isn’t it using my index
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 35227 912.07 3440.70 1154555 121367981 0 528384
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 35229 912.07 3440.70 1154555 121367981 0 528384
Misses in library cache during parse: 0
Optimizer goal: RULE
Parsing user id: 80
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=HINT: RULE
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T'
2 1 NESTED LOOPS
3 2 TABLE ACCESS (FULL) OF 'T'
4 2 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE)
Mythology – why isn’t it using my index
1 SELECT phy.value,
2 cur.value,
3 con.value,
4 1-((phy.value)/((cur.value)+(con.value))) "Cache hit ratio"
5 FROM v$sysstat cur, v$sysstat con, v$sysstat phy
6 WHERE cur.name='db block gets'
7 AND con.name='consistent gets'
8* AND phy.name='physical reads'
VALUE VALUE VALUE Cache hit ratio
-------- ---------- ---------- ---------------
1277377 58486 121661490 .989505609
98.9% cache hit, not bad eh?
Mythology – why isn’t it using my index
© 2009 Oracle Corporation – Proprietary and Confidential
Oracle Retail Data Warehouse Schema
© 2009 Oracle Corporation – Proprietary and Confidential
Retail Demonstration
Table Sizes
TABLE SIZE OF SOURCE DATA NUMBER OF RECORDS
• Transactions • 52 GByte • 461M
• Payments • 54 GByte • 461M
• Line Items • 936 Gbyte • 6945M
• Total • 1042 GByte • 7867M
© 2009 Oracle Corporation – Proprietary and Confidential
Retail Demonstration
Table Sizes (Default Compression)
TABLE SIZE OF TABLE COMPRESSION RATIO
• Transactions • 30 GByte • 1.77 : 1
• Payments • 30 GByte • 1.84 : 1
• Line Items • 268 Gbyte • 3.55 : 1
• Total • 327 GByte • 3.23 : 1
© 2009 Oracle Corporation – Proprietary and Confidential
Retail Demonstration
Table Sizes (With HCC)
TABLE SIZE OF TABLE COMPRESSION RATIO
• Transactions • 5 GByte • 7.00 : 1
• Payments • 5 GByte • 7.60 : 1
• Line Items • 54 Gbyte • 12.85 : 1
• Total • 64 GByte • 11.98 : 1
NOTE: The compression ratio is compare to the data of no-compression
© 2009 Oracle Corporation – Proprietary and Confidential
Data Loading
Bulk Loading Challenges
• Problem: Moving data to the database host machine
– For high load rates the data staging machine and network
becomes the serialization point/bottleneck
– Increased network and staging area I/O bandwidth is an
expensive option
• Solution: Compress the source data files
– Compression reduces the number of bytes copied from disk
and over the network
© 2009 Oracle Corporation – Proprietary and Confidential
Data Loading
Tip #1 Consider the Data Transfer Rate
• What would it take to load 1 TByte in one Hour?
– 17 GByte/minute or 291 MByte/second
• This is higher than the specification of most networks
and any portable drive
• So compression of source data becomes crucial
– 1057 Gbyte  136 Gbyte (7.7x compression)
– 2.3 GByte/minute or 40 MByte/s
• This eliminates the first challenge to migrating data
• Extraction of the data from legacy systems often
takes much longer than this!
© 2009 Oracle Corporation – Proprietary and Confidential
Data Staging
Data Sources
SOURCE THROUGHPUT
• USB Drive • 20 MByte/s
• Local Disk • 30-40 MByte/s
• Scalable NFS Server
• Potentially at Network
Speeds
• DBFS
• Fastest ( assuming data
has been copied !)
© 2009 Oracle Corporation – Proprietary and Confidential
Data Loading
Bulk Loading Challenges
• Problem: Data loading is CPU/Memory Constrained
– Data loads scale well over multiple CPUs, cores and hosts
(assuming no other form of contention)
– Memory usage for meta data associated with highly
partitioned objects can become significant at high DOP
• Solution: Use the correct tools and plan accordingly
– Use external tables with a parallel SQL statement (e.g. CTAS
or IAS) to minimize on-disk and in-memory meta data. Do
NOT use multiple copies of SQL*Loader
– Data types for columns have a huge impact on the CPU
required to load the data. Raw is the cheapest and
Timestamp is the most expensive.
© 2009 Oracle Corporation – Proprietary and Confidential
Data Loading
Anatomy of an External Table
create table FAST_LOAD
(
column definition list ...
)
organization external
(type oracle_loader
default directory SPEEDY_FILESYSTEM
preprocessor exec_file_dir:’zcat.sh’
characterset ‘ZHS16GBK’
badfile ERROR_DUMP:’FAST_LOAD.bad’
logfile ERROR_DUMP:’FAST_LOAD.log’
(
file column mapping list ...
)
location
(file_1.gz, file_2.gz, file_3.gz, file_4.gz )
reject limit 1000
parallel 4
/
External Table
Definition
Reference the
Mount Point
Uncompress the
data using a secure
wrapper
The Characterset
must match the
Characterset of the
Files
Note Compressed
Files
Parallel should
match or be less
than the number
of Files
© 2009 Oracle Corporation – Proprietary and Confidential
Loading Data
Tip #2 Learn About Impact of Compression
• Compression incurs costs when loading
– Increased CPU time
– Increased elapsed time
• Compression provides benefits
– For scans
– For backup and recovery
• Write-once and Read-many means that compression
is a net benefit, not a cost
© 2009 Oracle Corporation – Proprietary and Confidential
Loading Data
Tip #3 Learn about Impact of Partitioning
• Partitioning incurs costs when loading
– Increased CPU time
– Increased elapsed time
• Partitioning provides benefits
– For queries
– For manageability
• Write-once and Read-many means that partitioning is
a net benefit, not a cost
© 2009 Oracle Corporation – Proprietary and Confidential
Gathering Statistics
Strategy For New Databases
• Create tables
• Optionally Run (or explain) queries on empty tables
– Prime / Seed the optimizer
• Enable incremental statistics
– For large partitioned tables
• Load data
• Gather statistics
– Use the defaults
• Create indexes (if required!)
Gathering Statistics
Incremental Statistics
• One of the biggest problems with large tables is
keeping the schema statistics up to date and accurate
• This is particularly challenging in a Data Warehouse
where tables continue to grow and so the statistics
gathering time and resources grow proportionately
• To address this problem, 11.1 introduced the concept
of incremental statistics for partitioned objects
• This means that statistics are gathered for recently
modified partitions
© 2009 Oracle Corporation – Proprietary and Confidential
Gathering Statistics
The Concept of Synopses
• It is not possible to simply add partition statistics
together to create an up to date set of global statistics
• This is because the Number of Distinct Values (NDV)
for a partition may include values common to multiple
partitions.
• To resolve this problem, compressed representations
of the distinct values of each column are created in a
structure in the SYSAUX tablespace known as a
synopsis
© 2009 Oracle Corporation – Proprietary and Confidential
Gathering Statistics
Synopsis Example
© 2009 Oracle Corporation – Proprietary and Confidential
Object Column Values NDV
Partition #1 1,1,3,4,5 4
Partition #2 1,2,3,4,5 5
NDV by addition WRONG 9
NDV by Synopsis CORRECT 5
© 2009 Oracle Corporation – Proprietary and Confidential
Using Services To Manage Resources
• Services can be used to isolate different workloads
Load Query
Resource Management
• The Debate
– Why would you do it ?
– How should you do it ?
– Where are the sweet spots ?
Validation Example
Set based processing vs. row by row
© 2009 Oracle Corporation – Proprietary and Confidential
Time
H:MI:SS
Validation and Transformation
Proof Points
• For the two validation processes we can now make
these claims
– Store Validation - Over 200 times faster
– Product Validation - Over 2500 times faster
• Same Hardware!
– This is a case of using the wrong methodology
© 2009 Oracle Corporation – Proprietary and Confidential
Ad Hoc Query
Question
“What were the most popular items in the baskets of
shoppers who visited stores in California in the first
week of May and didn't buy bananas?”
© 2009 Oracle Corporation – Proprietary and Confidential
Ad Hoc Query
SQL
© 2009 Oracle Corporation – Proprietary and Confidential
with qbuy as
( select rt.TRX_NBR
from DWR_ORG_BSNS_UNIT obu, DWB_RTL_TRX rt, DWB_RTL_SLS_RETRN_LINE_ITEM rsrli, DWR_SKU_ITEM sku
where obu.ORG_BSNS_UNIT_KEY = rt.BSNS_UNIT_KEY
and rt.TRX_NBR = rsrli.TRX_NBR
and rt.DAY_KEY = rsrli.DAY_KEY
and rsrli.SKU_ITEM_KEY = sku.SKU_ITEM_KEY
and rt.DAY_KEY between 20090501 and 20090507
and obu.STATE in 'CA’
and sku.SKU_ITEM_DESC = 'Bananas'),
qall as
( select rt.TRX_NBR
from DWR_ORG_BSNS_UNIT obu, DWB_RTL_TRX rt
where obu.ORG_BSNS_UNIT_KEY = rt.BSNS_UNIT_KEY
and rt.DAY_KEY between 20090501 and 20090507
and obu.STATE in 'CA')
select sku.SKU_ITEM_DESC,q.SCANS
from
( select SKU_ITEM_KEY,count(*) as SCANS,rank() over (order by count(*) desc) as POP
from qall,qbuy, DWB_RTL_SLS_RETRN_LINE_ITEM rsrli
where qall.TRX_NBR = qbuy.TRX_NBR(+)
and qbuy.TRX_NBR IS NULL
and rsrli.TRX_NBR = qall.TRX_NBR
and rsrli.DAY_KEY between 20090501 and 20090507
group by SKU_ITEM_KEY) q, DWR_SKU_ITEM sku
where q.SKU_ITEM_KEY = sku.SKU_ITEM_KEY
order by q.POP asc;
4 Table join to select
all transactions buying
Bananas in California
in the first week of May
2 Table join to select
all transactions in
California in the first
week of May
Join the results sets in an
outer join to find the
exclusions, then
rank,group and sort the
results
Concurrent Query Testing
Out of the Box Settings(secs)
© 2009 Oracle Corporation – Proprietary and Confidential
0 20 40 60 80 100 120
User #1
User #2
User #3
User #4
User #5
User #6
User #7
User #8
User #9
User #10
User #11
User #12
Concurrent Query Testing
DBA Restricting DOP(secs)
© 2009 Oracle Corporation – Proprietary and Confidential
0 20 40 60 80 100 120
User #1
User #2
User #3
User #4
User #5
User #6
User #7
User #8
User #9
User #10
User #11
User #12
Concurrent Query Testing
Query Queuing(secs)
© 2009 Oracle Corporation – Proprietary and Confidential
0 20 40 60 80 100 120
User #1
User #2
User #3
User #4
User #5
User #6
User #7
User #8
User #9
User #10
User #11
User #12
Concurrent Query Testing
Out of the Box Fixed DoP With Queuing
User 1 30 24 10
User 2 33 27 11
User 3 34 27 11
User 4 39 29 11
User 5 40 29 15
User 6 41 29 19
User 7 43 29 21
User 8 47 28 16
User 9 49 30 25
User 10 106 28 27
User 11 108 26 25
User 12 112 27 26
Average 57 28 18
© 2009 Oracle Corporation – Proprietary and Confidential
© 2009 Oracle Corporation – Proprietary and Confidential
1 Terabyte Loaded and Ready To Go In 20 Minutes
Operation Time
Create Tablespaces and Initial Load 0:39
Initial 1TB Load 9:55
Gather Statistics 3:36
Daily Incremental Load 1:44
Referential Integrity Check 0:51
Transform Data 1:09
Exchange and Incremental Statistics 0:22
Query from Hell 0:32
Total 18:48
1 Terabyte Loaded and Ready To Go In 20 Minutes
© 2009 Oracle Corporation – Proprietary and Confidential
0:39
9:55
3:36
1:44
0:51
1:09
0:22
0:32
Create Tablespaces and run DDL
Initial 1TB Load
Gather Statistics
Daily Incremental Load
Referential Integrity Check
Transform Data
Exchange and Incremental Statistics
Query from Hell
The preceding is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.

Real World Performance - Data Warehouses

  • 2.
    The following isintended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
  • 3.
    Performance Summit –Intro and Data Warehouse Performance Andrew Holdsworth, Tom Kyte, Graham Wood Server Technologies, Oracle Corp
  • 4.
    Agenda • 09:00 -10:30 session • 10:30 - 10:45 break • 10:45 - 12:30 session • 12:30 - 13:30 lunch • 13:30 - 15:00 session • 15:00 - 15:15 break • 15:15 - ~17:00 session / Q&A / wrapup • http://asktom.oracle.com -> – Files Tab -> • realworld.zip
  • 5.
    Andrew Holdsworth Senior Director RealWorld Performance Server Technologies
  • 6.
    Tom Kyte • Beenwith Oracle since 1993 • User of Oracle since 1987 • The “Tom” behind AskTom in Oracle Magazine www.oracle.com/oramag • Expert Oracle Database Architecture • Effective Oracle by Design • Expert One on One Oracle • Beginning Oracle
  • 7.
  • 8.
    • Make theexperts prove everything • Statements that should raise your eyebrows: – It is my opinion... – I claim... – I think... – I feel… – I KNOW… • Everything can (and should) be proven • Things change, expect that • It only takes a single counter case • “It depends” or “Why” are the only answers you need “Question Authority.”
  • 9.
  • 10.
    © 2009 OracleCorporation – Proprietary and Confidential Why 3 Screens • Interpreting the windows – Demonstration menus – Monitoring the Database Machine • Begin loading 1 Terabyte of data – Create Tablespaces – Create Schema Objects – One-Off Load – Gather Statistics
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    Is Data LoadingReally the Problem • Where have Oracle Customers struggled with the Performance of their Data Warehouse ? – Data Loading ? – Data Validation and Verification ? – ETL and Transformation ? • So what went Wrong ?
  • 16.
    Data Warehouse DeathSpiral • HW CPU Sizing 10X – Sized like an OLTP System • I/O Sizing 10X – Sized by Space requirements – Cannot use Parallel Query • Using the the incorrect Query Optimization Techniques 10X – Over Indexed Database – Data Loads and ETL running to Slow • System Over loaded to Make the CPU look Busy – 100s of Concurrent Queries taking Hours to Execute
  • 17.
    Some Basic Maths •Index Driven Query retrieving 1,000,000 rows – Assume the Index is cache and the data is not. • 1,000,000 random IOPS @ 5ms per I/O • This required 5000 Seconds to Execute • This is why queries may take over an hour – How much data could you scan in 5000 Seconds with a fully sized I/O system able to scan 28 Gig/Sec ? • Clearly For Oracle Data Warehouses the game is changing – New Design Techniques – Time to Re-Train the DBAs
  • 18.
    So, joe (orjosephine) sql coder needs to run the following query: select t1.object_name, t2.object_name from t t1, t t2 where t1.object_id = t2.object_id and t1.owner = 'WMSYS' Rows Row Source Operation ------- --------------------------------------------------- 528384 HASH JOIN 8256 TABLE ACCESS FULL T 1833856 TABLE ACCESS FULL T suppose they ran it or explain planned it -- and saw that plan. "Stupid stupid CBO" they say -- "I have indexes, why won't it use them. We all know that indexes mean fast=true! Ok, let me use the faithful RBO and see what happens" Mythology – why isn’t it using my index
  • 19.
    select /*+ RULE*/ t1.object_name, t2.object_name from t t1, t t2 where t1.object_id = t2.object_id and t1.owner = 'WMSYS' Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=HINT: RULE 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' 2 1 NESTED LOOPS 3 2 TABLE ACCESS (FULL) OF 'T' 4 2 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) See, now that’s what I’m talking about – indexes are good… Or are they? Mythology – why isn’t it using my index
  • 20.
    call count cpuelapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 35227 5.63 9.32 23380 59350 0 528384 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 35229 5.63 9.33 23380 59350 0 528384 Misses in library cache during parse: 1 Optimizer goal: CHOOSE Parsing user id: 80 Rows Row Source Operation ------- --------------------------------------------------- 528384 HASH JOIN 8256 TABLE ACCESS FULL T 1833856 TABLE ACCESS FULL T Mythology – why isn’t it using my index
  • 21.
    call count cpuelapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 35227 912.07 3440.70 1154555 121367981 0 528384 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 35229 912.07 3440.70 1154555 121367981 0 528384 Misses in library cache during parse: 0 Optimizer goal: RULE Parsing user id: 80 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=HINT: RULE 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T' 2 1 NESTED LOOPS 3 2 TABLE ACCESS (FULL) OF 'T' 4 2 INDEX (RANGE SCAN) OF 'T_IDX' (NON-UNIQUE) Mythology – why isn’t it using my index
  • 22.
    1 SELECT phy.value, 2cur.value, 3 con.value, 4 1-((phy.value)/((cur.value)+(con.value))) "Cache hit ratio" 5 FROM v$sysstat cur, v$sysstat con, v$sysstat phy 6 WHERE cur.name='db block gets' 7 AND con.name='consistent gets' 8* AND phy.name='physical reads' VALUE VALUE VALUE Cache hit ratio -------- ---------- ---------- --------------- 1277377 58486 121661490 .989505609 98.9% cache hit, not bad eh? Mythology – why isn’t it using my index
  • 23.
    © 2009 OracleCorporation – Proprietary and Confidential Oracle Retail Data Warehouse Schema
  • 24.
    © 2009 OracleCorporation – Proprietary and Confidential Retail Demonstration Table Sizes TABLE SIZE OF SOURCE DATA NUMBER OF RECORDS • Transactions • 52 GByte • 461M • Payments • 54 GByte • 461M • Line Items • 936 Gbyte • 6945M • Total • 1042 GByte • 7867M
  • 25.
    © 2009 OracleCorporation – Proprietary and Confidential Retail Demonstration Table Sizes (Default Compression) TABLE SIZE OF TABLE COMPRESSION RATIO • Transactions • 30 GByte • 1.77 : 1 • Payments • 30 GByte • 1.84 : 1 • Line Items • 268 Gbyte • 3.55 : 1 • Total • 327 GByte • 3.23 : 1
  • 26.
    © 2009 OracleCorporation – Proprietary and Confidential Retail Demonstration Table Sizes (With HCC) TABLE SIZE OF TABLE COMPRESSION RATIO • Transactions • 5 GByte • 7.00 : 1 • Payments • 5 GByte • 7.60 : 1 • Line Items • 54 Gbyte • 12.85 : 1 • Total • 64 GByte • 11.98 : 1 NOTE: The compression ratio is compare to the data of no-compression
  • 27.
    © 2009 OracleCorporation – Proprietary and Confidential Data Loading Bulk Loading Challenges • Problem: Moving data to the database host machine – For high load rates the data staging machine and network becomes the serialization point/bottleneck – Increased network and staging area I/O bandwidth is an expensive option • Solution: Compress the source data files – Compression reduces the number of bytes copied from disk and over the network
  • 28.
    © 2009 OracleCorporation – Proprietary and Confidential Data Loading Tip #1 Consider the Data Transfer Rate • What would it take to load 1 TByte in one Hour? – 17 GByte/minute or 291 MByte/second • This is higher than the specification of most networks and any portable drive • So compression of source data becomes crucial – 1057 Gbyte  136 Gbyte (7.7x compression) – 2.3 GByte/minute or 40 MByte/s • This eliminates the first challenge to migrating data • Extraction of the data from legacy systems often takes much longer than this!
  • 29.
    © 2009 OracleCorporation – Proprietary and Confidential Data Staging Data Sources SOURCE THROUGHPUT • USB Drive • 20 MByte/s • Local Disk • 30-40 MByte/s • Scalable NFS Server • Potentially at Network Speeds • DBFS • Fastest ( assuming data has been copied !)
  • 30.
    © 2009 OracleCorporation – Proprietary and Confidential Data Loading Bulk Loading Challenges • Problem: Data loading is CPU/Memory Constrained – Data loads scale well over multiple CPUs, cores and hosts (assuming no other form of contention) – Memory usage for meta data associated with highly partitioned objects can become significant at high DOP • Solution: Use the correct tools and plan accordingly – Use external tables with a parallel SQL statement (e.g. CTAS or IAS) to minimize on-disk and in-memory meta data. Do NOT use multiple copies of SQL*Loader – Data types for columns have a huge impact on the CPU required to load the data. Raw is the cheapest and Timestamp is the most expensive.
  • 31.
    © 2009 OracleCorporation – Proprietary and Confidential Data Loading Anatomy of an External Table create table FAST_LOAD ( column definition list ... ) organization external (type oracle_loader default directory SPEEDY_FILESYSTEM preprocessor exec_file_dir:’zcat.sh’ characterset ‘ZHS16GBK’ badfile ERROR_DUMP:’FAST_LOAD.bad’ logfile ERROR_DUMP:’FAST_LOAD.log’ ( file column mapping list ... ) location (file_1.gz, file_2.gz, file_3.gz, file_4.gz ) reject limit 1000 parallel 4 / External Table Definition Reference the Mount Point Uncompress the data using a secure wrapper The Characterset must match the Characterset of the Files Note Compressed Files Parallel should match or be less than the number of Files
  • 32.
    © 2009 OracleCorporation – Proprietary and Confidential Loading Data Tip #2 Learn About Impact of Compression • Compression incurs costs when loading – Increased CPU time – Increased elapsed time • Compression provides benefits – For scans – For backup and recovery • Write-once and Read-many means that compression is a net benefit, not a cost
  • 33.
    © 2009 OracleCorporation – Proprietary and Confidential Loading Data Tip #3 Learn about Impact of Partitioning • Partitioning incurs costs when loading – Increased CPU time – Increased elapsed time • Partitioning provides benefits – For queries – For manageability • Write-once and Read-many means that partitioning is a net benefit, not a cost
  • 34.
    © 2009 OracleCorporation – Proprietary and Confidential Gathering Statistics Strategy For New Databases • Create tables • Optionally Run (or explain) queries on empty tables – Prime / Seed the optimizer • Enable incremental statistics – For large partitioned tables • Load data • Gather statistics – Use the defaults • Create indexes (if required!)
  • 35.
    Gathering Statistics Incremental Statistics •One of the biggest problems with large tables is keeping the schema statistics up to date and accurate • This is particularly challenging in a Data Warehouse where tables continue to grow and so the statistics gathering time and resources grow proportionately • To address this problem, 11.1 introduced the concept of incremental statistics for partitioned objects • This means that statistics are gathered for recently modified partitions © 2009 Oracle Corporation – Proprietary and Confidential
  • 36.
    Gathering Statistics The Conceptof Synopses • It is not possible to simply add partition statistics together to create an up to date set of global statistics • This is because the Number of Distinct Values (NDV) for a partition may include values common to multiple partitions. • To resolve this problem, compressed representations of the distinct values of each column are created in a structure in the SYSAUX tablespace known as a synopsis © 2009 Oracle Corporation – Proprietary and Confidential
  • 37.
    Gathering Statistics Synopsis Example ©2009 Oracle Corporation – Proprietary and Confidential Object Column Values NDV Partition #1 1,1,3,4,5 4 Partition #2 1,2,3,4,5 5 NDV by addition WRONG 9 NDV by Synopsis CORRECT 5
  • 38.
    © 2009 OracleCorporation – Proprietary and Confidential Using Services To Manage Resources • Services can be used to isolate different workloads Load Query
  • 39.
    Resource Management • TheDebate – Why would you do it ? – How should you do it ? – Where are the sweet spots ?
  • 40.
    Validation Example Set basedprocessing vs. row by row © 2009 Oracle Corporation – Proprietary and Confidential Time H:MI:SS
  • 41.
    Validation and Transformation ProofPoints • For the two validation processes we can now make these claims – Store Validation - Over 200 times faster – Product Validation - Over 2500 times faster • Same Hardware! – This is a case of using the wrong methodology © 2009 Oracle Corporation – Proprietary and Confidential
  • 42.
    Ad Hoc Query Question “Whatwere the most popular items in the baskets of shoppers who visited stores in California in the first week of May and didn't buy bananas?” © 2009 Oracle Corporation – Proprietary and Confidential
  • 43.
    Ad Hoc Query SQL ©2009 Oracle Corporation – Proprietary and Confidential with qbuy as ( select rt.TRX_NBR from DWR_ORG_BSNS_UNIT obu, DWB_RTL_TRX rt, DWB_RTL_SLS_RETRN_LINE_ITEM rsrli, DWR_SKU_ITEM sku where obu.ORG_BSNS_UNIT_KEY = rt.BSNS_UNIT_KEY and rt.TRX_NBR = rsrli.TRX_NBR and rt.DAY_KEY = rsrli.DAY_KEY and rsrli.SKU_ITEM_KEY = sku.SKU_ITEM_KEY and rt.DAY_KEY between 20090501 and 20090507 and obu.STATE in 'CA’ and sku.SKU_ITEM_DESC = 'Bananas'), qall as ( select rt.TRX_NBR from DWR_ORG_BSNS_UNIT obu, DWB_RTL_TRX rt where obu.ORG_BSNS_UNIT_KEY = rt.BSNS_UNIT_KEY and rt.DAY_KEY between 20090501 and 20090507 and obu.STATE in 'CA') select sku.SKU_ITEM_DESC,q.SCANS from ( select SKU_ITEM_KEY,count(*) as SCANS,rank() over (order by count(*) desc) as POP from qall,qbuy, DWB_RTL_SLS_RETRN_LINE_ITEM rsrli where qall.TRX_NBR = qbuy.TRX_NBR(+) and qbuy.TRX_NBR IS NULL and rsrli.TRX_NBR = qall.TRX_NBR and rsrli.DAY_KEY between 20090501 and 20090507 group by SKU_ITEM_KEY) q, DWR_SKU_ITEM sku where q.SKU_ITEM_KEY = sku.SKU_ITEM_KEY order by q.POP asc; 4 Table join to select all transactions buying Bananas in California in the first week of May 2 Table join to select all transactions in California in the first week of May Join the results sets in an outer join to find the exclusions, then rank,group and sort the results
  • 44.
    Concurrent Query Testing Outof the Box Settings(secs) © 2009 Oracle Corporation – Proprietary and Confidential 0 20 40 60 80 100 120 User #1 User #2 User #3 User #4 User #5 User #6 User #7 User #8 User #9 User #10 User #11 User #12
  • 45.
    Concurrent Query Testing DBARestricting DOP(secs) © 2009 Oracle Corporation – Proprietary and Confidential 0 20 40 60 80 100 120 User #1 User #2 User #3 User #4 User #5 User #6 User #7 User #8 User #9 User #10 User #11 User #12
  • 46.
    Concurrent Query Testing QueryQueuing(secs) © 2009 Oracle Corporation – Proprietary and Confidential 0 20 40 60 80 100 120 User #1 User #2 User #3 User #4 User #5 User #6 User #7 User #8 User #9 User #10 User #11 User #12
  • 47.
    Concurrent Query Testing Outof the Box Fixed DoP With Queuing User 1 30 24 10 User 2 33 27 11 User 3 34 27 11 User 4 39 29 11 User 5 40 29 15 User 6 41 29 19 User 7 43 29 21 User 8 47 28 16 User 9 49 30 25 User 10 106 28 27 User 11 108 26 25 User 12 112 27 26 Average 57 28 18 © 2009 Oracle Corporation – Proprietary and Confidential
  • 48.
    © 2009 OracleCorporation – Proprietary and Confidential 1 Terabyte Loaded and Ready To Go In 20 Minutes Operation Time Create Tablespaces and Initial Load 0:39 Initial 1TB Load 9:55 Gather Statistics 3:36 Daily Incremental Load 1:44 Referential Integrity Check 0:51 Transform Data 1:09 Exchange and Incremental Statistics 0:22 Query from Hell 0:32 Total 18:48
  • 49.
    1 Terabyte Loadedand Ready To Go In 20 Minutes © 2009 Oracle Corporation – Proprietary and Confidential 0:39 9:55 3:36 1:44 0:51 1:09 0:22 0:32 Create Tablespaces and run DDL Initial 1TB Load Gather Statistics Daily Incremental Load Referential Integrity Check Transform Data Exchange and Incremental Statistics Query from Hell
  • 50.
    The preceding isintended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.