Oracle statistics by example

Oracle Statistics by Example
Mauro Pagano

Background
• Optimizer generates execution plans
• Many execution plans for each SQL
• Optimal execution plan has lower cost (*)
• Cost is computed based on
– Statistical formulas (Oracle IP)
– Many statistics around the SQL (seeded by us)
1/29/17 2

Some terminology
• Cost
– Unit of measure to compare plan estimated perf
– Equivalent to expected #single block reads
• Cardinality
– Number of rows handled, produced / consumed
• Selectivity
– % of filtering caused by predicates, range is [0,1]
– Output card = input card * selectivity
1/29/17 3

Why so much emphasis?
• Statistics are “picture” about entities
• Quality of the picture affects quality plan
– Poor stats generally lead to poor plans (*)
– Better stats generally lead to better plans (*)
• Our best bet is to provide good quality stats
– Not always as trivial as it sounds
1/29/17 4

Many type of statistics
• Oracle Optimizer uses statistics about
– Objects: tables, indexes, columns, etc
– System: CPU Speed and many IO metrics
– Dictionary: Oracle internal physical objects
– Fixed Objects: memory structure (X$)
• First two affect application SQLs
– Focus of this presentation is object statistics
1/29/17 5

What should I do about statistics?
• Collect them J
– Object stats when there are “enough” changes
– System stats once, if any (*)
• Oracle-seeded package DBMS_STATS
• Used to collect all type of statistics
– Plus drop, exp/imp, set prefs, etc etc
• Many params to affect how/what to collect
– Can have large impact on quality
1/29/17 6

When should I gather stats?
• No specific threshold in terms of time
• Balance between frequency and quality
– Gather high quality is expensive thus slow exec
– Gather frequently require fast exec
• Optimal plans tend not to change over time
– Favor quality over frequency
1/29/17 7

How?
DBMS_STATS.GATHER_TABLE_STATS (
ownname VARCHAR2,
tabname VARCHAR2,
partname VARCHAR2 DEFAULT NULL,
estimate_percent NUMBER DEFAULT
to_estimate_percent_type (get_param('ESTIMATE_PERCENT')),
block_sample BOOLEAN DEFAULT FALSE,
method_opt VARCHAR2 DEFAULT get_param('METHOD_OPT'),
degree NUMBER DEFAULT to_degree_type(get_param('DEGREE')),
granularity VARCHAR2 DEFAULT GET_PARAM('GRANULARITY'),
cascade BOOLEAN DEFAULT to_cascade_type(get_param('CASCADE')),
stattab VARCHAR2 DEFAULT NULL,
statid VARCHAR2 DEFAULT NULL,
statown VARCHAR2 DEFAULT NULL,
no_invalidate BOOLEAN DEFAULT
to_no_invalidate_type ( get_param('NO_INVALIDATE')),
stattype VARCHAR2 DEFAULT 'DATA',
force BOOLEAN DEFAULT FALSE,
context DBMS_STATS.CCONTEXT DEFAULT NULL, -- non operative
options VARCHAR2 DEFAULT 'GATHER');
1/29/17 8

That looks really complex!
• Easiest thing is let Oracle use defaults
– Just pass owner and object name
– This is also the recommended way starting 11g
– Many features depend on default values
• 12c histograms, Incremental, Concurrent
• As simple as
– exec dbms_stats.gather_table_stats(user,'T1')
1/29/17 9

What did we just do?
• Gathered:
– table statistics on table T1
– column statistics for every column
– index statistics on every index defined on T1
– (sub)partition statistics
– histograms on subset of columns (*)
• We’ll cover next stats that matters to CBO
1/29/17 10

Table statistics
• Optimizer only uses two statistics
– Number of blocks below HWM
• [ALL|DBA|USER]_TABLES.NUM_BLOCKS
• Used to cost Full Table Scan operations
– Number of rows in the table
• [ALL|DBA|USER]_TABLES.NUM_ROWS
• Used to estimate how many rows we dealing with
1/29/17 11

Table statistics – FTS cost
select table_name,num_rows,blocks from user_tables where table_name='T1';
TABLE_NAME NUM_ROWS BLOCKS
------------------------------ ---------- ----------
T1 920560 16378
explain plan for select * from t1;
select * from table(dbms_xplan.display);
Plan hash value: 3617692013
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 920K| 100M| 4463 (1)| 00:00:01 |
| 1 | TABLE ACCESS STORAGE FULL| T1 | 920K| 100M| 4463 (1)| 00:00:01 |
----------------------------------------------------------------------------------
1/29/17 12

Table statistics – FTS cost
------------------------------ ---------- ----------
T1 920560 30000
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
1/29/17 13

Table statistics – Cardinality
------------------------------ ---------- ----------
T1 920560 16378
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
1/29/17 14

Table statistics – Cardinality
------------------------------ ---------- ----------
T1 1 16378
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1| 115| 4442 (1)| 00:00:01 |
| 1 | TABLE ACCESS STORAGE FULL| T1 | 1| 115| 4442 (1)| 00:00:01 |
----------------------------------------------------------------------------------
1/29/17 15

Column statistics – NoHgrm
1/29/17 17
select column_name, num_distinct, num_nulls, histogram from user_tab_cols
where table_name = 'T1' and column_name like '%OBJECT_ID';
COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM
------------------------------ ------------ ---------- ---------------
OBJECT_ID 93192 0 NONE
DATA_OBJECT_ID 8426 835930 NONE
explain plan for select * from t1 where object_id = 1234;
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 1150 | 4453 (1)| 00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| T1 | 10 | 1150 | 4453 (1)| 00:00:01 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage("OBJECT_ID"=1234)
filter("OBJECT_ID"=1234)
Let’s do the math!
Total rows: 920560
NDV: 93192
920560 * 1/93192 ~= 10

Column statistics – NoHgrm
1/29/17 18
select column_name, num_distinct, num_nulls, histogram from user_tab_cols
where table_name = 'T1' and column_name like '%OBJECT_ID';
COLUMN_NAME NUM_DISTINCT NUM_NULLS HISTOGRAM
------------------------------ ------------ ---------- ---------------
OBJECT_ID 93192 0 NONE
DATA_OBJECT_ID 8426 835930 NONE
explain plan for select * from t1 where data_object_id = 1234;
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - storage(”DATA_OBJECT_ID"=1234)
filter(”DATA_OBJECT_ID"=1234)
Let’s do the math!
Total rows: 920560
Total NULLs: 835930
NDV: 8426
(920560 – 835930)/8426 ~= 10

Column statistics – Min/Max
1/29/17 19
cook_raw(low_value,'NUMBER') low_v,cook_raw(high_value, 'NUMBER') high_v
COLUMN_NAME NUM_DISTINCT LOW_VALU HIGH_VAL
------------------------------ ------------ -------- --------
OBJECT_ID 93192 2 99953
DATA_OBJECT_ID 8426 0 99953
----------------------------------------------------------------------------------
The more we move far
away from the range, the
lower the estimation

Column Statistics
• Optimizer also uses
– Density
• Not stored in dictionary (old one was, new one no)
• Used for unpopular value selectivity
– Histogram
• [ALL|DBA|USER]_TAB_COLS.LOW_VALUE
• [ALL|DBA|USER]_TAB_COLS.HIGH_VALUE
• [ALL|DBA|USER]_TAB_HISTOGRAMS
• Used for popular value selectivity
1/29/17 20

What is a histogram?
• Describe data distribution skewness
– Help the CBO get more accurate estimations
• Many types available
– Frequency – 1 bucket per NDV
– Top-frequency – 1 bucket per top NDV
– Hybrid – 1 bucket per popular value, others split
• Creation influenced by method_opt param
1/29/17 21

What does it look like?
1/29/17 22

Column statistics – Histogram
1/29/17 23
explain plan for select count(*) from t1 where object_type = 'INDEX';
-------------------------------------------------------------------------
| Id |Operation |Name|Rows |Bytes | ost (%CPU)|Time |
-------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 1| 9 | 4455 (1)|00:00:01|
| 1| SORT AGGREGATE | | 1| 9 | | |
|* 2| TABLE ACCESS STORAGE FULL|T1 |44990| 395K| 4455 (1)|00:00:01|
-------------------------------------------------------------------------
2 - storage("OBJECT_TYPE"='INDEX') filter("OBJECT_TYPE"='INDEX')
explain plan for select count(*) from t1 where object_type = 'TABLE';
-------------------------------------------------------------------------
| Id |Operation |Name|Rows |Bytes | ost (%CPU)|Time |
-------------------------------------------------------------------------
| 0|SELECT STATEMENT | | 1| 9 | 4455 (1)|00:00:01|
| 1| SORT AGGREGATE | | 1| 9 | | |
|* 2| TABLE ACCESS STORAGE FULL|T1 |24980| 219K| 4455 (1)|00:00:01|
-------------------------------------------------------------------------
2 - storage("OBJECT_TYPE"='TABLE') filter("OBJECT_TYPE"='TABLE')
Different values have
different estimation
thanks to the histogram

What is an index?
• Structure that stores pair key(s)-location
– Key(s) are stored in sorted order
• Used to identify rows of interest without FTS
– Navigating index and extraction location(s)
• Depending on filters, faster than FTS (or not)
– No fixed threshold, cheaper option wins
1/29/17 24

Index Statistics
• Optimizer uses
– Blevel
• [ALL|DBA|USER]_INDEXES.BLEVEL
• Used to estimate how expensive is to locate first leaf
– Number of leaf blocks (LB)
• [ALL|DBA|USER]_INDEXES.LEAF_BLOCKS
• Used to estimate how many index leaf blocks to read
– Clustering Factor (CLUF)
• [ALL|DBA|USER]_INDEXES.CLUSTERING_FACTOR
• Used to estimate how many table blocks to read
– Distinct Keys (DK)
• [ALL|DBA|USER]_INDEXES.DISTINCT_KEYS
• Used to help with data correlation
1/29/17 25

What does it look like?
1/29/17 26
B B B B B B
Root
Branches
Leaves
Leaves are
chained back
and forth for
asc/desc scan
Number of
jumps is
CLUF

Index Statistics
1/29/17 27
select index_name, blevel, leaf_blocks, distinct_keys, clustering_factor
from user_indexes where index_name = 'T1_IDX';
INDEX_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
----------- ---------- ----------- ------------- -----------------
T1_IDX 2 2039 92056 920530
-----------------------------------------------------------------------------
| Id | Operation |Name |Rows | Bytes|Cost (%CPU)|
-----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10| 1150| 13 (0)|
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED|T1 | 10| 1150| 13 (0)|
|* 2 | INDEX RANGE SCAN |T1_IDX| 10| | 3 (0)|
-----------------------------------------------------------------------------
2 - access("OBJECT_ID"=1234)
Distinct keys is
100% accurate
NUM_DISTINCT
is approximated
If CLUF ~= number
of rows in the table,
inefficient index
Cost jumps 10 for 10
rows (from 3 to 13) as
consequence of bad CLUF

Extended Statistics
• Provide additional info to CBO about
– Data correlation (functional dependencies)
– Expressions applied to column(s)
• Need to be manually implemented
– Automatically in 12c, not bulletproof yet
• Lack of usually translates in estim mistakes
1/29/17 28

Extended statistics – Expression
1/29/17 29
explain plan for select count(*) from t1 where lower(object_type) = 'index';
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 9 | | |
-----------------------------------------------------------------------------------
2 - storage(LOWER("OBJECT_TYPE")='index') filter(LOWER("OBJECT_TYPE")='index')
dbms_stats.gather_table_stats(user,'T1',method_opt=>'FOR COLUMNS (lower(object_type)) SIZE 254');
-----------------------------------------------------------------------------------
-----------------------------------------------------------------------------------
| 1 | SORT AGGREGATE | | 1 | 9 | | |
|* 2 | TABLE ACCESS STORAGE FULL| T1 | 44990 | 395K| 4251 (1)| 00:00:01 |
-----------------------------------------------------------------------------------
2 - storage(LOWER("OBJECT_TYPE")='index') filter(LOWER("OBJECT_TYPE")='index')
Incorrect estimation, we
know the right one is
~45k
Correct estimation J

estimate_percent
• Amount of data to sample for gathering stats
• Has an impact on time to gather and quality
• Recommended (default) AUTO_SAMPLE_SIZE
– Not recommended in 10g, yes in 11g onwards
– Required for many features
– Use HyperLogLog algorithm internally (*)
1/29/17 30

method_opt
• On which columns gather stats
• On which columns gather histograms (#buckets)
• Recom (default) FOR ALL COLUMNS SIZE AUTO
– Not recommended in 10g, yes in 11g onwards
– Oracle determines hist/no-hist based on col usage
– If app knows better, follow app recommendations
1/29/17 31

Can’t Oracle do it for me?
• Oracle provides nightly job to gather stats
– Does a decent job starting 11g (so so in 10g)
– Prioritize tables order depending on #changes
– Only allowed to run for fixed number of hours
• Might not touch all needed objects
– Collects object and dictionary stats only
• Apps might have specific req, follow them
1/29/17 32

References
• Oracle Database PL/SQL Packages and Types
Reference 12.1
• Oracle Database SQL Tuning Guide 12.1
• http://blogs.oracle.com/optimizer
• Master Note: Optimizer Statistics (Doc ID
1369591.1)
34

Contact Information
• http://mauro-pagano.com
– Tools
• SQLd360, TUNAs360, Pathfinder
• Email
– mauro.pagano@gmail.com
35

Oracle statistics by example

More Related Content

What's hot

Viewers also liked

Similar to Oracle statistics by example

Recently uploaded

Oracle statistics by example