SlideShare a Scribd company logo
1 of 35
Download to read offline
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 1
Join Cardinality Estimation Methods
Chinar Aliyev
chinaraliyev@gmail.com
As it is known the Query Optimizer tries to select best plan for the query and it does that based on
generating all possible plans, estimates cost each of them, and selects cheapest costed plan as optimal
one. Estimating cost of the plan is a complex process. But cost is directly proportionate to the
number of I/O s. Here is functional dependence between number of the rows retrieved from
database and number of I/O s. So the cost of a plan depends on estimated number of the rows
retrieved in each step of the plan โ€“ cardinality of the operation. Therefore optimizer should accurately
estimate cardinality of each step in the execution plan. In this paper we going to analyze how oracle
optimizer calculates join selectivity and cardinality in different situations, like how does CBO
calculate join selectivity when histograms are available (including new types of histograms, in 12c)?,
what factors does error (estimation) depend on? And etc. In general two main join cardinality
estimation methods exists: Histogram Based and Sampling Based.
Thanks to Jonathan Lewis for writing โ€œCost Based Oracle Fundamentalsโ€ book. This book actually
helped me to understand optimizer`s internals and to open the โ€œBlack Boxโ€. In 2007 Alberto
Dell`Era did an excellent work, he investigated join size estimation with histograms. However there
are some questions like introduction of a โ€œspecial cardinalityโ€ concept. In this paper we are going to
review this matter also.
For simplicity we are going to use single column join and columns containing no null values. Assume
we have two tables t1, t2 corresponding join columns j1, j2 and the rest of columns are filter1 and
filter2. Our queries are
(Q0)
SELECT COUNT (*)
FROM t1, t2
WHERE t1.j1 = t2.j2
AND t1.filter1 ='value1'
AND t2.filter2 ='value2'
(Q1)
SELECT COUNT (*)
FROM t1, t2
WHERE t1.j1 = t2.j2;
(Q2)
SELECT COUNT (*)
FROM t1, t2;
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 2
Histogram Based Estimation
Understanding Join Selectivity and Cardinality
As you know the query Q2 is a Cartesian product. It means we will get Join Cardinality -
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘’๐‘ ๐‘–๐‘Ž๐‘› for the join product as:
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘’๐‘ ๐‘–๐‘Ž๐‘› =num_rows(๐‘ก1)*num_rows(๐‘ก2)
Here num_rows(๐‘ก๐‘–) is number of rows of corresponding tables. When we add join condition into the
query (so Q1) then it means we actually get some fraction of Cartesian product. To identify this
fraction here Join Selectivity has been introduced.
Therefore we can write this as follows
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 โ‰ค ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘’๐‘ ๐‘–๐‘Ž๐‘›
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 = ๐ฝ๐‘ ๐‘’๐‘™*๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘’๐‘ ๐‘–๐‘Ž๐‘› = ๐ฝ๐‘ ๐‘’๐‘™ โˆ—num_rows(๐‘ก1)*num_rows(๐‘ก2) (1)
๐ฝ๐‘ ๐‘’๐‘™ = ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1/(๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก1) โˆ— ๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก2))
Definition: Join selectivity is the ratio of the โ€œpureโ€-natural cardinality over the Cartesian product.
I called ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 as โ€œpureโ€ cardinality because it does not contain any filter conditions.
Here ๐ฝ๐‘ ๐‘’๐‘™ is Join Selectivity. This is our main formula. You should know that when optimizer tries to
estimate JC- Join Cardinality it first calculates ๐ฝ๐‘ ๐‘’๐‘™ . Therefore we can use same ๐ฝ๐‘ ๐‘’๐‘™ and can write
appropriate formula for query Q0 as
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„0 = ๐ฝ๐‘ ๐‘’๐‘™ โˆ—Card(๐‘ก1)*Card(๐‘ก2) (2)
Here Card (๐‘ก๐‘–) is final cardinality after applying filter predicate to the corresponding table. In other
words ๐ฝ๐‘ ๐‘’๐‘™ for both formulas (1) and (2) is same. Because ๐ฝ๐‘ ๐‘’๐‘™ does not depend on filter columns,
unless filter conditions include join columns. According to formula (1)
๐ฝ๐‘ ๐‘’๐‘™ = ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1/(num_rows(๐‘ก1) โˆ— num_rows(๐‘ก2)) (3)
or ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„0 =
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1โˆ—Card(๐‘ก1)โˆ—Card(๐‘ก2)
num_rows(๐‘ก1)โˆ—num_rows(๐‘ก2)
(4)
Based on this we have to find out estimation mechanism of expected cardinality - ๐‘ช๐’‚๐’“๐’…๐‘ธ๐Ÿ. Now
consider that for ๐‘—๐‘– join columns of ๐‘ก๐‘– tables here is not any type of histogram. So it means in this
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 3
case optimizer assumes uniform distribution and for such situations as you already know ๐ฝ๐‘๐‘Ž๐‘Ÿ๐‘‘ and
๐ฝ๐‘ ๐‘’๐‘™ are calculated as
๐ฝ๐‘ ๐‘’๐‘™ = 1/max(๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1
), ๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2
)) (5)
๐ฝ๐‘๐‘Ž๐‘Ÿ๐‘‘ = ๐ฝ๐‘ ๐‘’๐‘™ โˆ— num_rows(๐‘ก1) โˆ— num_rows(๐‘ก2)
The question now is: where does formula (5) come from? How do we understand it?
According to (3) in order to calculate ๐ฝ๐‘ ๐‘’๐‘™ we first have to estimate โ€œpureโ€ expected cardinality -
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1. And it only depends on Join Columns. For ๐‘ก1 table, based on uniform distribution the
number of rows per distinct value of the ๐‘—1 column will be num_rows(๐‘ก1)/๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1) and for
๐‘ก2 table it will be num_rows(๐‘ก2)/๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2). Also here will be
min(๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1), ๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2)) common distinct values. Therefore expected โ€œpureโ€ cardinality
is
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 = min (๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1), ๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2)) โˆ—
num_rows(๐‘ก1)
num_dist(๐‘—1)
โˆ—
num_rows(๐‘ก2)
num_dist(๐‘—2)
(6)
Then according to formula (3) Join Selectivity will be:
๐ฝ๐‘ ๐‘’๐‘™ =
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1
num_rows(๐‘ก1)โˆ—num_rows(๐‘ก2)
=
min (๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1),๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2))
๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1)โˆ— ๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2)
=
1
max (๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1),๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2))
As it can be seen we have got formula (5). Without histogram optimizer is not aware of the data
distribution, so in dictionary of the database here are not โ€œ(distinct value, frequency)โ€ โ€“ this pairs
indicate column distribution. Because of this, in case of uniform distribution, optimizer actually
thinks and calculates โ€œaverage frequencyโ€ as ๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก1)/๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1). Based on โ€œaverage
frequencyโ€ optimizer calculates โ€œpureโ€ expected cardinality and then join selectivity. So if a table
column has histogram (depending type of this) optimizer will calculates join selectivity based on
histogram. In this case โ€œ(distinct value, frequency)โ€ pairs are not formed based on โ€œaverage
frequencyโ€, but are formed based on information which are given by the histogram.
Case 1. Both Join columns have frequency histograms
In this case both join columns have frequency histogram and our query(freq_freq.sql) is
SELECT COUNT (*)
FROM t1, t2
WHERE t1.j1 = t2.j2 AND t1.f1 = 13;
Corresponding execution plan is
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 4
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 2272 | 2260 |
|* 3 | TABLE ACCESS FULL| T1 | 1 | 40 | 40 |
| 4 | TABLE ACCESS FULL| T2 | 1 | 1000 | 1000 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."J1"="T2"."J2")
3 - filter("T1"."F1"=13)
Estimation is good enough for this situation, but it has not been exactly estimated. And why? How
did optimizer calculate cardinality of the join as 2272?
If we enable SQL trace for the query then we will see oracle queries only histgrm$ dictionary table.
Therefore information about columns and tables is as follows.
Select table_name,num_rows from user_tables where table_name in (โ€˜T1โ€™,โ€™T2โ€™);
tab_name num_rows
T1 1000
T2 1000
(Freq_values1)
SELECT endpoint_value COLUMN_VALUE,
endpoint_number - NVL (prev_endpoint, 0) frequency,
endpoint_number ep
FROM (SELECT endpoint_number,
NVL (LAG (endpoint_number, 1) OVER (ORDER BY endpoint_number),
0
)
prev_endpoint,
endpoint_value
FROM user_tab_histograms
WHERE table_name = 'T1' AND column_name = 'J1')
ORDER BY endpoint_number
tab t1,
col j1
tab t2, col
j2
value frequency ep value frequency ep
0 40 40 0 100 100
1 40 80 2 40 140
2 80 160 3 120 260
3 100 260 4 20 280
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 5
4 160 420 5 40 320
5 60 480 6 100 420
6 260 740 8 40 460
7 80 820 9 20 480
8 120 940 10 20 500
9 60 1000 11 60 560
12 20 580
13 20 600
14 80 680
15 80 760
16 20 780
17 80 860
18 80 940
19 60 1000
Frequency histograms exactly express column distribution. So โ€œ(column value, frequency)โ€ pair
gives us all opportunity to estimate cardinality of any kind of operations. Now we have to try to
estimate pure cardinality ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 then we can find out ๐ฝ๐‘ ๐‘’๐‘™ according to formula (3). Firstly we
have to find common data for the join columns. These data is spread between
max(min_value(j1),min_value(j2)) and min(max_value(j1),max_value(j2)). It means we are
not interested in the data which column value greater than 10 for j2 column. Also we have to take
equval values, so we get following table
tab t1, col
j1
tab t2, col
j2
value frequency value frequency
0 40 0 100
2 80 2 40
3 100 3 120
4 160 4 20
5 60 5 40
6 260 6 100
8 120 8 40
9 60 9 20
Because of this expected pure cardinality ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 Will be
100*40+80*40+100*120+160*20+60*40+260*100+120*40+60*20=56800 and Join selectivity
๐ฝ๐‘ ๐‘’๐‘™ =
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1
๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก1)โˆ—๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก2)
=
56800
1000โˆ—1000
= 0.0568
And eventually our cardinality will be according to the formula (2)
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 6
๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„0 = ๐ฝ๐‘ ๐‘’๐‘™ โˆ—Card (๐‘ก1)*Card (๐‘ก2) = 0.0568 โˆ— 40 โˆ— 1000 = 2272
Also if we enable 10053 event then in trace file we see following lines regarding on join selectivity.
Join Card: 2272.000000 = outer (40.000000) * inner (1000.000000) * sel
(0.056800)
Join Card - Rounded: 2272 Computed: 2272.000000
As we see same number as in above execution plan. Another question was why we did not get exact
cardinality โ€“ 2260? Although join selectivity by definition does not depend on filter columns and
conditions, but filtering actually influences this process. Optimizer does not consider join column
value range, max/min value, spreads, distinct values after applying filter โ€“ in line 3 of execution plan.
It is not easy to resolve. At least it will require additional estimation algorithms, then efficiency of
whole estimation process could be harder. So if we remove filter condition from above query we will
get exact estimation.
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 56800 | 56800 |
| 3 | TABLE ACCESS FULL| T1 | 1 | 1000 | 1000 |
| 4 | TABLE ACCESS FULL| T2 | 1 | 1000 | 1000 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."J1"="T2"."J2")
It means optimizer calculates โ€œaverageโ€ join selectivity. I think it is not an issue in general. As result
we got the following formula for join selectivity.
๐ฝ๐‘ ๐‘’๐‘™ =
โˆ‘ ๐‘“๐‘Ÿ๐‘’๐‘ž(๐‘ก1.๐‘—1)โˆ—๐‘“๐‘Ÿ๐‘’๐‘ž(๐‘ก2.๐‘—2))
min _๐‘š๐‘Ž๐‘ฅ
๐‘–=max _๐‘š๐‘–๐‘›
๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก1)โˆ—๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก2)
(7)
Here freq is corresponding frequency of the column value.
Case 2. Join columns with height-balanced (equ-height) and frequency histograms
Now assume one of the join column has height-balanced(HB) histogram and another has
frequency(FQ) histogram (Height_Balanced_Frequency.sql) We are going to investiagte cardinality
estimation of the two queries here
select count(*)
from t1, t2 --- (Case2 q1)
where t1.j1 = t2.j2;
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 7
select count(*)
from t1, t2 --- (Case2 q2)
where t1.j1 = t2.j2 and t1.f1=11;
For the column J1 here is Height balanced histogram - HB and for the column j2 here is frequency
- FQ histogram avilable. The appropriate information from user_tab_histogrgrams dictionary view
shown in Table 3.
tatb t1, col
j1
tab t2 , col
j2
column
value
frequency ep column
value
frequency ep
1 0 0 1 2 2
9 1 1 7 2 4
16 1 2 48 3 7
24 1 3 64 4 11
32 1 4
40 1 5
48 2 7
56 1 8
64 2 10
72 2 12
80 3 15
Ferquency column for t1.j1 of Table 3 does not express real frequency for the column. It is actually
โ€œfrequency of the bucketโ€. First we have to identify common values. So we have to ignore HB
histogram buckets with endpoint number greater than 10. We have exact โ€œvalue, frequencyโ€ pairs of
the t2.j2 column therefore our base source must be values of the t2.j2 column. But for the t1.j1 we
do not have exact frequencies. HB histogram cointains buckets which hold approximately same
number of rows. Also we can find number of the distinct values per bucket. Then for every value of
the frequency histogram we can identify appropriate bucket of the HB histogram. Within HB bucket
we aslo can assume uniform distrbution then we can estimate size of this disjoint subset โ€“ {value of
FQ and Bucket of HB} .
Although this approach gave me some approximation and estimation of the join cardinality but it did
not give me exact number(s) which oracle optimizer calculates and reports in 10053 trace file. We
have to find what information we need to improve this approach? ,
Firstly Alberto Dell'Era investigated joins based on the histograms in 2007- (Join Over histograms).
His approach was based on grouping values into three major categories:
- โ€œpopulars matching popularsโ€
- โ€œpopulars not matching popularsโ€
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 8
- โ€œnot popular subtablesโ€
Estimating each of them. The sum of cardinality of each group will give us join cardinality. But my
point of view to the matter is quite different:
- We have to identify โ€œ(distinct value, frequency)โ€ pairs to approximate โ€œpureโ€ cardinality ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1
- Our main data here is t2.j2 column`s data, because it gives us exact frequencies
- We have to walk t2.j2 columns (histograms) values and identify second part of โ€œ(distinct
value, frequency)โ€ based on height balanced histogram.
- ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1=โˆ‘ ๐น๐‘Ÿ๐‘’๐‘ž๐‘๐‘Ž๐‘ ๐‘’๐‘‘ ๐‘ก1(value=t2.j2)*๐น๐‘Ÿ๐‘’๐‘ž๐‘๐‘Ž๐‘ ๐‘’๐‘‘ ๐‘ก2(value=t2.j2)
- Then we can calculate join selectivity and cardinality
We have to identify (value, frequency) pairs based on HB histogram, then it is easy to calculate โ€œpureโ€
cardinality so it means we can easily and more accurately estimate join cardinality. But when forming
(value, frequency) pairs based on HB histogram, we should not approach as uniform for the single
value which is locate within the bucket, because HB gives us actually โ€œaverageโ€ density โ€“
NewDensity (actually the density term has been introduced to avoid estimation errors in
non-uniform distribution case and has been improved with new density mechanism) for un-
popular values and special approach for popular values. So letโ€™s identify โ€œ(value, frequency)โ€ pairs
based on the HB histogram.
tab_name num_rows (user_tables) col_name num_distinct
T1 11 T1.J1 30
T2 130 T2.J2 4
Number of buckets - num_buckets=15( as max(ep) from Table 3 )
Number of popular buckets โ€“ num_pop_bucktes=9(as sum(frequency) from table 3 where
frequency>1)
Popular value counts โ€“ pop_value_cnt=4(as count(frequency) from table 3 where frequency>1)
NewDensity=
๐‘›๐‘ข๐‘š_๐‘ข๐‘›๐‘๐‘œ๐‘_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘ 
๐‘ข๐‘›๐‘๐‘œ๐‘_๐‘›๐‘‘๐‘ฃโˆ—๐‘›๐‘ข๐‘š_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘ 
=
๐‘›๐‘ข๐‘š_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘ โˆ’๐‘›๐‘ข๐‘š_๐‘๐‘œ๐‘_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘ 
(๐‘๐ท๐‘‰โˆ’๐‘๐‘œ๐‘_๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’_๐‘๐‘›๐‘ก)๐‘›๐‘ข๐‘š_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘ 
=
15โˆ’9
(30โˆ’4)โˆ—15
=0.015384615โ‰ˆ0.015385 (8)
And for popular values selectivity is:
๐‘’๐‘_๐‘“๐‘Ÿ๐‘’๐‘ž๐‘ข๐‘’๐‘›๐‘๐‘ฆ
๐‘›๐‘ข๐‘š_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘ 
=
๐‘’๐‘_๐‘“๐‘Ÿ๐‘’๐‘ž๐‘ข๐‘’๐‘›๐‘๐‘ฆ
15
So โ€œ(value, frequency)โ€ pairs based on the HB histogram will be:
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 9
column
value
popular frequency calculated
1 N 2.00005 130*0.015385 - (num_rows*density)
7 N 2.00005 130*0.015385 - (num_rows*density)
48 Y 17.33333333 130*2/15 - (num_rows*frequency/num_buckets)
64 Y 17.33333333 130*2/15 - (num_rows*frequency/num_buckets)
We have got all โ€œ(value, frequency)โ€ pairs so according formula (7) we can calculate Join Selectivity.
tab t1 , col
j1
tab t2 , col
j2
column
value
frequency column
value
frequency freq*freq
1 2.00005 1 2 4.0001
7 2.00005 7 2 4.0001
48 17.33333333 48 3 52
64 17.33333333 64 4 69.33333
Sum 129.3335
And finally ๐ฝ๐‘ ๐‘’๐‘™ =
129.3335
numrows(t1)โˆ—numrows(t2)
=
129.3335
11โˆ—130
=0.090443
So our โ€œpureโ€ cardinality is ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 = 129. Execution plan of the query is as follows
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 129 | 104 |
| 3 | TABLE ACCESS FULL| T2 | 1 | 11 | 11 |
| 4 | TABLE ACCESS FULL| T1 | 1 | 130 | 130 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."J1"="T2"."J2")
And the corresponding information from 10053 trace file :
Join Card: 129.333333 = outer (130.000000) * inner (11.000000) * sel (0.090443)
Join Card - Rounded: 129 Computed: 129.333333
It means we were able to figure out exact estimation mechanism in this case. Execution plan of the
second query (Case2 q2) as follows
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 10
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 5 | 7 |
|* 3 | TABLE ACCESS FULL| T1 | 1 | 5 | 5 |
| 4 | TABLE ACCESS FULL| T2 | 1 | 11 | 11 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."J1"="T2"."J2")
3 - filter("T1"."F1"=11)
According our approach join cardinality should computed as
๐ฝ๐‘ ๐‘’๐‘™*card(t1)*card(t2)= 0.090443*card(t1)*card(t2)
Also from optimizer trace file we will see the following:
Join Card: 5.173333 = outer (11.000000) * inner (5.200000) * sel (0.090443)
Join Card - Rounded: 5 Computed: 5.173333
It actually confirms our approach. However execution plan shows cardinality of the single table t1 as
5, it is correct because it must be rounded up but during join estimation process optimizer consider
original values rather than rounding.
Reviewing Alberto Dell'Era`s โ€“ complete formula (join_histogram_complete.sql)
We can list column information from dictionary as below:
tatb t1, col value tatb t2, col value
column value frequency column value frequency
20 1 10 1
40 1 30 2
50 1 50 1
60 1 60 4
70 2 70 2
80 2
90 1
99 1
So we have to find common values, as you see min(t1.value)=20 due to we must ignore t2.value=10
also max(t1.val)=70 it means we have to ignore column values t2.value>70. In addition we do not
have the value 40 in t2.value therefore we have to delete it also. Because of this we are getting
following table
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 11
tatb t1, col j1 tab t2 , col
j2
column value frequency column
value
frequency
20 1 30 2
50 1 50 1
60 1 60 4
70 2 70 2
Num_rows(t1)=12;num_buckets(t1.value)=6;num_distinct(t1.value)=8,=>
newdensity=
๐‘›๐‘ข๐‘š_๐‘ข๐‘›๐‘๐‘œ๐‘_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘ 
๐‘ข๐‘›๐‘๐‘œ๐‘_๐‘›๐‘‘๐‘ฃโˆ—๐‘›๐‘ข๐‘š_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘ 
=
6โˆ’2
(8โˆ’1)โˆ—6
=0.095238095, so appropriate column values
frequency based on HB histogram will be :
t1.value freq calculated
30 1.142857143 num_rows*newdensity
50 1.142857143 num_rows*newdensity
60 1.142857143 num_rows*newdensity
70 4 num_rows*freq/num_buckets
And finally cardinality will be.
t1.value t2.value
column
value
frequency column
value
frequency freq*freq
30 1.142857143 30 2 2.285714286
50 1.142857143 50 1 1.142857143
60 1.142857143 60 4 4.571428571
70 4 70 2 8
sum 16
Optimizer also estimated exactly 16 as we see it in execution plan of the query.
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 16 | 13 |
| 3 | TABLE ACCESS FULL| T1 | 1 | 12 | 12 |
| 4 | TABLE ACCESS FULL| T2 | 1 | 14 | 14 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."VALUE"="T2"."VALUE")
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 12
There Alberto also has introduced โ€œContribution 4: special cardinalityโ€, but it seems it is not
necessary.
Reviewing Alberto Dell'Era`s essential case (join_histogram_essentials.sql)
This is quite interesting case, firstly because in oracle 12c optimizer calculates join cardinality as 31
but not as 30, and second in this case old and new densities are same. Letโ€™s interpret the case.
The by corresponding information from user_tab_histograms.
tatb t1, col
value
tab t2 , col
value
column
value
frequency ep column
value
frequency ep
10 2 2 10 2 2
20 1 3 20 1 3
30 2 5 50 3 6
40 1 6 60 1 7
50 1 7 70 4 11
60 1 8
70 2 10
And num_rows(t1)=20,num_rows(t2)=11,num_dist(t1.value)=11,num_dist(t2.val)=5,
Density (t1.value)=(10-6)/((11-3)*10)= 0.05. Above mechanism does not give us exact number as
expected as optimizer estimation. Because in this case to estimate frequency for un-popular values
oracle does not use density it uses number of distinct values per bucket and number of rows per
distinct values instead of the density. To prove this one we can use join_histogram_essentials1.sql.
In this case t1 table is same as in join_histogram_essentials.sql . The column T2.value has only one
value 20 with frequency one.
t1.value freq EP t2.value freq EP
10 2 2 20 1 1
20 1 3
30 2 5
40 1 6
50 1 7
60 1 8
70 2 10
In this case oracle computes join cardinality 2 as rounded up from 1.818182. We can it from trace
file
Join Card: 1.818182 = outer (20.000000) * inner (1.000000) * sel (0.090909)
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 13
Join Card - Rounded: 2 Computed: 1.818182
It looks like the estimation totally depend on t1.value column distribution. So
num_rows_bucket(number of rows per bucket) is 2 and num_rows_distinct(number of distinct
value per bucket) is 20/11=1.818182. Every bucket has 1.1 distinct value and within bucket every
distinct value has 2/1.1=1.818182 rows. And this is our cardinality. But if we increase frequency of
the t2.value - join_histogram_essentials2.sql. The (t2.value, frequency)=(20,5) and t1 table is same
as in previous case.
t1.value freq EP t2.value freq EP
10 2 2 20 5 5
20 1 3
30 2 5
40 1 6
50 1 7
60 1 8
70 2 10
Corresponding lines from 10053 trace file:
Join Card: 5.000000 = outer (20.000000) * inner (5.000000) * sel (0.050000)
Join Card - Rounded: 5 Computed: 5.000000
Tests show that in such cases cardinality of the join computed as frequency of the t2.value. So it
means frequency of the popular value will be:
Frequency (non-popular t1.value) = {
๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ _๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก
๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘ 
๐‘“๐‘Ÿ๐‘’๐‘ž๐‘ข๐‘’๐‘›๐‘๐‘ฆ ๐‘œ๐‘“ ๐‘ก2. ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ = 1
1 ๐‘“๐‘Ÿ๐‘’๐‘ž๐‘ข๐‘’๐‘›๐‘๐‘ฆ ๐‘œ๐‘“ ๐‘ก2. ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ > 1
or
Cardinality = max (frequency of t2.val, number of rows per distinct value within bucket)
Question is why? In such cases I think optimizer tries to minimize estimation errors. So
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 14
tab t1,col
val
column
value
frequency calculated
10 4 - (num_rows*frequency/num_buckets)
20 1.818181818 - (num_rows_bucket/num_dist_buckets)
50 1 -frequency of t2.value
60 1.818181818 - (num_rows_bucket/num_dist_buckets)
70 4 - (num_rows*frequency/num_buckets)
Therefore
tab t1,col
value
tab t2 , col
value
column value frequency column value frequency freq*freq
10 4 10 2 8
20 1.818181818 20 1 1.818181818
50 1 50 3 3
60 1.818181818 60 1 1.818181818
70 4 70 4 16
sum 30.63636364
We get 30.64โ‰ˆ31 as expected cardinality. Let`s see trace file and execution plan
Join Card: 31.000000 = outer (11.000000) * inner (20.000000) * sel (0.140909)
Join Card - Rounded: 31 Computed: 31.000000
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 31 | 29 |
| 3 | TABLE ACCESS FULL| T2 | 1 | 11 | 11 |
| 4 | TABLE ACCESS FULL| T1 | 1 | 20 | 20 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."VALUE"="T2"."VALUE")
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 15
Case 3. Join columns with hybrid and frequency histograms
In this case we are going to analyze how optimizer calculates join selectivity when there are hybrid
and frequency histograms available on the join columns (hybrid_freq.sql). Note that the query is
same -(Case2 q1).The corresponding information from dictionary view.
SELECT endpoint_value COLUMN_VALUE,
endpoint_number - NVL (prev_endpoint, 0) frequency,
ENDPOINT_REPEAT_COUNT,
endpoint_number
FROM (SELECT endpoint_number,
ENDPOINT_REPEAT_COUNT,
NVL (LAG (endpoint_number, 1) OVER (ORDER BY
endpoint_number),0)
prev_endpoint,
endpoint_value
FROM user_tab_histograms
WHERE table_name = 'T3' AND column_name = 'J3')
ORDER BY endpoint_number
Tab t1, col j1 tab t2 , col j2
column value frequency endpoint_rep_cnt column value frequency
0 6 6 0 3
2 9 7 1 6
4 8 5 2 6
6 8 5 3 8
7 7 7 4 11
9 10 5 5 3
10 6 6 6 3
11 3 3 7 9
12 7 7 8 6
13 4 4 9 5
14 5 5
15 5 5
16 5 5
17 7 7
19 10 5
As it can be seen common column values are between 0 and 9. So we are not interested in buckets
which contain column values greater than or equival 10. Hybrid histogram gives us more information
to estimate single table and also join selectivity than height balanced histogram. Specially endpoint
repeat count column are used by optimizer to exactly estimate endpoint values. But how does
optimizer use this information to estimate join? Principle of the estimation โ€œ(value,frequency)โ€ pairs
based on hybrid histogram are same as height based histogram. So it depends on popularity of the
value, if value is popular then frequency will be equval to the corresponding endpoint repeat count,
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 16
otherwise it will be calculated based on the density. If we enable dbms_stats trace when gathering
hybrid histogram. We get the following
DBMS_STATS:
SELECT SUBSTRB (DUMP (val, 16, 0, 64), 1, 240) ep,
freq,
cdn,
ndv,
(SUM (pop) OVER ()) popcnt,
(SUM (pop * freq) OVER ()) popfreq,
SUBSTRB (DUMP (MAX (val) OVER (), 16, 0, 64), 1, 240) maxval,
SUBSTRB (DUMP (MIN (val) OVER (), 16, 0, 64), 1, 240) minval
FROM (SELECT val,
freq,
(SUM (freq) OVER ()) cdn,
(COUNT ( * ) OVER ()) ndv,
(CASE
WHEN freq > ( (SUM (freq) OVER ()) / 15) THEN 1
ELSE 0
END)
pop
FROM (SELECT /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
xmlindex_sel_idx_tbl no_substrb_pad */
"ID"
val,
COUNT ("ID") freq
FROM "SYS"."T1" t
WHERE "ID" IS NOT NULL
GROUP BY "ID"))
ORDER BY valDBMS_STATS: > cdn 100, popFreq 28, popCnt 4, bktSize 6.6, bktSzFrc .6
DBMS_STATS: Evaluating hybrid histogram: cht.count 15, mnb 15, ssize 100, min_ssize
2500, appr_ndv TRUE,
ndv 20, selNdv 0, selFreq 0, pct 100, avg_bktsize 7, csr.hreq TRUE, normalize TRUE
Average bucket size is 7. Oracle considers value as popular when correspoindg endpoint repeat count
is greater than or equval average bucket size. Also in our case density is (crdn- popfreq)/((NDV-
popCnt)*crdn)=(100-28)/((20-4)*100)= 0.045. If we enable 10053 trace event you can clearly see
columns and tables statistics. Therefore โ€œ(value,frequency)โ€ will be as
t1.j1 popular frequency calculated
0 N 4.5 density*num_rows
1 N 4.5 density*num_rows
2 Y 7 endpoint_repeat_count
3 N 4.5 density*num_rows
4 N 4.5 density*num_rows
5 N 4.5 density*num_rows
6 N 4.5 density*num_rows
7 Y 7 endpoint_repeat_count
8 N 4.5 density*num_rows
9 N 4.5 density*num_rows
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 17
And then final cardinality.
t1.j1 t2.j2
value frequency value frequency freq*freq
0 4.5 0 3 13.5
1 4.5 1 6 27
2 7 2 6 42
3 4.5 3 8 36
4 4.5 4 11 49.5
5 4.5 5 3 13.5
6 4.5 6 3 13.5
7 7 7 9 63
8 4.5 8 6 27
9 4.5 9 5 22.5
sum 307.5
Join sel 0.05125
Lets now check execution plan
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 308 | 293 |
| 3 | TABLE ACCESS FULL| T2 | 1 | 60 | 60 |
| 4 | TABLE ACCESS FULL| T1 | 1 | 100 | 100 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."J1"="T2"."J2")
And trace from 10053 trace file :
Join Card: 307.500000 = outer (60.000000) * inner (100.000000) * sel (0.051250)
Join Card - Rounded: 308 Computed: 307.500000
Case 4. Both Join columns with Top frequency histograms
In this case join columns have top-frequency histogram (TopFrequency_hist.sql). We are going to
use same query as above โ€“ (Case2 q1). Corresponding column information is.
Table Stats::
Table: T2 Alias: T2
#Rows: 201 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00
SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 18
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
Column (#1): J2(NUMBER)
AvgLen: 4 NDV: 21 Nulls: 0 Density: 0.004975 Min: 1.000000 Max: 200.000000
Histogram: Top-Freq #Bkts: 192 UncompBkts: 192 EndPtVals: 12 ActualVal: yes
***********************
Table Stats::
Table: T1 Alias: T1
#Rows: 65 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00
SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
Column (#1): J1(NUMBER)
AvgLen: 3 NDV: 14 Nulls: 0 Density: 0.015385 Min: 4.000000 Max: 100.000000
Histogram: Top-Freq #Bkts: 56 UncompBkts: 56 EndPtVals: 5 ActualVal: yes
t1.j1 freq t2.j2 freq
4 10 1 14
5 16 2 18
6 17 3 18
8 12 4 17
100 1 5 15
6 19
7 19
8 22
9 17
10 18
11 13
200 2
By definition of the Top-Frequency histogram, we can say that here are two types of buckets.
Oracle placed high frequency values into some buckets (appropriate) and rest of the values of the
table oracle actually โ€œplacedโ€ into another โ€œbucketโ€. So we actually have โ€œhigh frequencyโ€ and
โ€œlow frequencyโ€ values. Therefore for โ€œhigh frequencyโ€ values we also have exact frequencies, but
for โ€œlow frequencyโ€ values we can approach by using โ€œUniform distributionโ€. Firstly we have to
build high frequency pairs based on common values. The max(min(t1.j1),min(t2.j2))=4 and also
max(max(t1.j1),max(t2.j2))=100. In principle we have to see and gather common values which are
between 4 and 100. So after identifying common values, for popular values we are going to use
exact frequency and for non-popular values new density. Therefore we could create following table:
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 19
common value j2.freq j1.freq freq*freq
4 17 10 170
5 15 16 240
6 19 17 323
7 19 1.000025 19.000475
8 22 12 264
9 17 1.000025 17.000425
10 18 1.000025 18.00045
11 13 1.000025 13.000325
100 1 1.000025 1.000025
sum 1065.0017
For t1 table we have num_rows-popular_rows=65-56=9 unpopular rows and ndv-
popular_value_count=14-5=9 also for t2 table we have 201-192=9 unpopular rows and 21-12=9
unpopular distinct values. Frequency for each unpopular rows of the t2.j2 is
num_rows(t2)*density(t2)= 201*0.004975= 0.999975 also for t1.j1 it is num_rows(t1)*density(t1)=
65*0.015385 = 1.000025. Due to cardinality for each individual unpopular rows will be:
CardIndvPair=unpopular_freq(t1.j1)* unpopular_freq(t2.j2)= 0.999975*1.000025=1.
Test cases show that oracle considers all low frequency (unpopular rows) values during join when
top frequency histograms are available it means cardinality for โ€œlow frequencyโ€ values will be
Card(Low frequency values)=max(unpopular_rows(t1.j1),unpopular_rows(t2.j2))* CardIndvPair=9
Therefore final cardinality of the our join will be CARD(high freq values)+ CARD(low freq
values)=1065+9=1074. Lets see execution plan.
-----------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
-----------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 6 | 4 (0)|
| 1 | SORT AGGREGATE | | 1 | 6 | |
|* 2 | HASH JOIN | | 1074 | 6444 | 4 (0)|
| 3 | TABLE ACCESS FULL| T1 | 65 | 195 | 2 (0)|
| 4 | TABLE ACCESS FULL| T2 | 201 | 603 | 2 (0)|
-----------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."J1"="T2"."J2")
And the trace file
Join Card: 1074.000000 = outer (201.000000) * inner (65.000000) * sel (0.082204)
Join Card - Rounded: 1074 Computed: 1074.000000
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 20
Case 5. Join columns with Top frequency and frequency histograms
Now consider that we have tables which for join columns there are top frequency and frequency
histogram (TopFrequency_Frequency.sql). The columns distribution from dictionary is as below
(Freq_values1)
t1.j1 freq t2.j2 freq
1 3 0 4
2 3 1 7
3 5 2 2
4 5 4 3
5 5
6 4
7 6
8 4
9 5
25 1
In this case there is a frequency histogram for the column t2.j2 and we have exact common {1, 2, 4}
values. But test cases show that optimizer also considers all the values from top frequency histogram
which are between max(min(t1.j1),min(t2.j2)) and min(max(t1.j1),max(t2.j2)). It is quite interesting
case. Because of this we have frequency histogram and it should be our main source and this case
should have been similar to the case 3.
Table Stats::
Table: T2 Alias: T2
#Rows: 16 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00
SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient:
0.000000
Column (#1): J2(NUMBER)
AvgLen: 3 NDV: 4 Nulls: 0 Density: 0.062500 Min: 0.000000 Max: 4.000000
Histogram: Freq #Bkts: 4 UncompBkts: 16 EndPtVals: 4 ActualVal: yes
***********************
Table Stats::
Table: T1 Alias: T1
#Rows: 42 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00
SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient:
0.000000
Column (#1): J1(NUMBER)
AvgLen: 3 NDV: 11 Nulls: 0 Density: 0.023810 Min: 1.000000 Max: 25.000000
Histogram: Top-Freq #Bkts: 41 UncompBkts: 41 EndPtVals: 10 ActualVal: yes
Considered values and their frequencies:
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 21
Considered
values
j1.freq j2.freq freq*freq
1 3 7 21
2 3 2 6
3 5 1 5
4 5 3 15
sum 47
Here for the value 3 j2.freq calculated as num_rows(t2)*density=16*0.0625=1. And in 10053 file
Join Card: 47.000000 = outer (16.000000) * inner (42.000000) * sel (0.069940)
Join Card - Rounded: 47 Computed: 47.000000
Let see another example-TopFrequency_Frequency2.sql
t1.j1 freq t3.j3 freq
1 3 0 4
2 3 1 7
3 5 2 2
4 5 4 3
5 5 10 2
6 4
7 6
8 4
9 5
25 1
Tables and columns statistics:
Table Stats::
Table: T3 Alias: T3
#Rows: 18 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC:
0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
Column (#1): J3(NUMBER)
AvgLen: 3 NDV: 5 Nulls: 0 Density: 0.055556 Min: 0.000000 Max: 10.000000
Histogram: Freq #Bkts: 5 UncompBkts: 18 EndPtVals: 5 ActualVal: yes
***********************
Table Stats::
Table: T1 Alias: T1
#Rows: 42 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC:
0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
Column (#1): J1(NUMBER)
AvgLen: 3 NDV: 11 Nulls: 0 Density: 0.023810 Min: 1.000000 Max: 25.000000
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 22
Histogram: Top-Freq #Bkts: 41 UncompBkts: 41 EndPtVals: 10 ActualVal: yes
Considered column values and their frequencies are:
Considered
val
j1.freq calculated j3.freq calculated freq*freq
1 3 freq 7 freq 21
2 3 freq 2 freq 6
3 5 freq 1.000008 num_rows*density 5.00004
4 5 freq 3 freq 15
5 5 freq 1.000008 num_rows*density 5.00004
6 4 freq 1.000008 num_rows*density 4.000032
7 6 freq 1.000008 num_rows*density 6.000048
8 4 freq 1.000008 num_rows*density 4.000032
9 5 freq 1.000008 num_rows*density 5.00004
10 1.00002 num_rows*density 2 freq 2.00004
sum 73.000272
And from trace file
Join Card: 73.000000 = outer (18.000000) * inner (42.000000) * sel (0.096561)
Join Card - Rounded: 73 Computed: 73.000000
But if we compare estimated cardinality with actual values then we will see:
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 73 | 42 |
| 3 | TABLE ACCESS FULL| T3 | 1 | 18 | 18 |
| 4 | TABLE ACCESS FULL| T1 | 1 | 42 | 42 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."J1"="T3"."J3")
As we see here is significant difference. 73 vs 42, error estimation is enough big. That is why we said
before its quite interesting case, so optimizer should consider only values from frequency histogram,
these values should be main source of the estimation process โ€“ as similar to the case3. So if consider
and walk on the values of the frequency histogram as common values then we will get the following
table:
common val j1.freq calculated j3.freq calculated freq*freq
1 3 freq 7 freq 21
2 3 freq 2 freq 6
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 23
4 5 freq 3 freq 15
10 1.00002 num_rows*density 2 freq 2.00004
sum 44.00004
You can clearly see that, such estimation is very close to the actual rows.
Case 6. Join columns with Hybrid and Top frequency histograms
It is quite hard to interpret when one of join column has top frequency histogram
(Hybrid_topfreq.sql). For example here is hybrid histogram for t1.j1 and top frequency histogram
for t2.j2. Column information from dictionary
t1.j1 t1.freq ep_rep_cnt t2.j2 j2.freq
1 3 3 1 5
3 10 6 2 3
4 6 6 3 4
6 5 2 4 5
9 8 5 5 4
10 1 1 6 3
11 2 2 7 1
13 5 3 26 1
30 1
And from dbms_stats trace file
SELECT SUBSTRB (DUMP (val, 16, 0, 64), 1, 240) ep,
freq,
cdn,
ndv,
(SUM (pop) OVER ()) popcnt,
(SUM (pop * freq) OVER ()) popfreq,
SUBSTRB (DUMP (MAX (val) OVER (), 16, 0, 64), 1, 240) maxval,
SUBSTRB (DUMP (MIN (val) OVER (), 16, 0, 64), 1, 240) minval
FROM (SELECT val,
freq,
(SUM (freq) OVER ()) cdn,
(COUNT ( * ) OVER ()) ndv,
(CASE WHEN freq > ( (SUM (freq) OVER ()) / 8) THEN 1 ELSE 0 END)
pop
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 24
FROM (SELECT /*+ no_parallel(t) no_parallel_index(t) dbms_stats
cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring
xmlindex_sel_idx_tbl no_substrb_pad */
"J1"
val,
COUNT ("J1") freq
FROM "T"."T1" t
WHERE "J1" IS NOT NULL
GROUP BY "J1"))
ORDER BY val
DBMS_STATS: > cdn 40, popFreq 12, popCnt 2, bktSize 5, bktSzFrc 0
DBMS_STATS: Evaluating hybrid histogram: cht.count 8, mnb 8, ssize 40, min_ssize 2500,
appr_ndv TRUE, ndv 13, selNdv 0, selFreq 0, pct 100, avg_bktsize 5, csr.hreq TRUE,
normalize TRUE
High frequency common values are located between 1 and 7. Also we have two popular values for
t1.j1 column :{3,4}.
Table Stats::
Table: T2 Alias: T2
#Rows: 30 SSZ: 0 LGR: 0 #Blks: 5 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC
0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0
IMCQuotient: 0.000000
Column (#1): J2(NUMBER)
AvgLen: 3 NDV: 12 Nulls: 0 Density: 0.033333 Min:
1.000000 Max: 30.000000
Histogram: Top-Freq #Bkts: 27 UncompBkts: 27
EndPtVals: 9 ActualVal: yes
***********************
Table Stats::
Table: T1 Alias: T1
#Rows: 40 SSZ: 0 LGR: 0 #Blks: 5 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC
0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0
IMCQuotient: 0.000000
Column (#1): J1(NUMBER)
AvgLen: 3 NDV: 13 Nulls: 0 Density: 0.063636 Min:
1.000000 Max: 13.000000
Histogram: Hybrid #Bkts: 8 UncompBkts: 40
EndPtVals: 8 ActualVal: yes
Therefore common values and their frequencies are:
Common value t1.freq t2.freq freq*freq
1 2.54544 5 12.7272
2 2.54544 3 7.63632
3 6 4 24
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 25
4 6 5 30
5 2.54544 4 10.18176
6 2.54544 3 7.63632
7 2.54544 1 2.54544
sum 94.72704
Moreover we have num_rows-top_freq_rows=30-27=3 infrequency rows and NDV-
top_freq_count=12-9=3 unpopular NDV. I have done several test cases and I think cardinality of
the join in this case consists two parts: High frequency values and low frequency values
(unpopular). In different cases estimating cardinality for low frequency values was different for me.
In current case I think based on the uniform distribution. It means for t1.j1 โ€œaverage frequencyโ€ is
number for rows(t1)/NDV(j1)=40/13= 3.076923 . Also we have 3 unpopular (low frequency
values) rows and 3 unpopular NDV. For each โ€œlow frequencyโ€ value we have
num_rows(t1)*density(j1)=2.54544โ‰ˆ3 frequency and we have 3 low frequency-unpopular rows
therefore unpopular cardinality is 3*3=9 so final cardinality will be
CARD(popular rows)+CARD(un popular rows)= 94.72704+ 9= 103.72704.
Lines from 10053 trace file
Join Card: 103.727273 = outer (30.000000) * inner (40.000000) * sel (0.086439)
Join Card - Rounded: 104 Computed: 103.727273
And execution plan
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 104 | 101 |
| 3 | TABLE ACCESS FULL| T2 | 1 | 30 | 30 |
| 4 | TABLE ACCESS FULL| T1 | 1 | 40 | 40 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."J1"="T2"."J2")
The above test case was a quite simple because popular values of the hybrid histogram also are located
within range of high frequency values of the top frequency histogram. I mean popular values {1, 5,
6} of the hybrid histogram actually located 1-6 range of top frequency histogram.
Let see another example
CREATE TABLE t1(j1 NUMBER);
INSERT INTO t1 VALUES(6);
INSERT INTO t1 VALUES(2);
INSERT INTO t1 VALUES(7);
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 26
INSERT INTO t1 VALUES(8);
INSERT INTO t1 VALUES(7);
INSERT INTO t1 VALUES(1);
INSERT INTO t1 VALUES(3);
INSERT INTO t1 VALUES(6);
INSERT INTO t1 VALUES(4);
INSERT INTO t1 VALUES(7);
INSERT INTO t1 VALUES(2);
INSERT INTO t1 VALUES(3);
INSERT INTO t1 VALUES(7);
INSERT INTO t1 VALUES(9);
INSERT INTO t1 VALUES(5);
INSERT INTO t1 VALUES(6);
INSERT INTO t1 VALUES(17);
INSERT INTO t1 VALUES(18);
INSERT INTO t1 VALUES(19);
INSERT INTO t1 VALUES(20);
COMMIT;
/* execute dbms_stats.set_global_prefs('trace',to_char(512+128+2048+32768+4+8+16)); */
execute dbms_stats.gather_table_stats(null,'t1',method_opt=>'for all columns size 8');
/*exec dbms_stats.set_global_prefs('TRACE', null);*/
---Creating second table
CREATE TABLE t2(j2 number);
INSERT INTO t2 VALUES(1);
INSERT INTO t2 VALUES(1);
INSERT INTO t2 VALUES(4);
INSERT INTO t2 VALUES(3);
INSERT INTO t2 VALUES(3);
INSERT INTO t2 VALUES(4);
INSERT INTO t2 VALUES(4);
INSERT INTO t2 VALUES(4);
INSERT INTO t2 VALUES(4);
INSERT INTO t2 VALUES(4);
INSERT INTO t2 VALUES(1);
INSERT INTO t2 VALUES(3);
INSERT INTO t2 VALUES(4);
INSERT INTO t2 VALUES(2);
INSERT INTO t2 VALUES(3);
INSERT INTO t2 VALUES(2);
INSERT INTO t2 VALUES(17);
INSERT INTO t2 VALUES(18);
INSERT INTO t2 VALUES(19);
INSERT INTO t2 VALUES(20);
COMMIT;
execute dbms_stats.gather_table_stats(null,'t2',method_opt=>'for all columns size 4');
Lets enable 10053 trace file
ALTER SESSION SET EVENTS '10053 trace name context forever';
EXPLAIN PLAN
FOR
SELECT COUNT ( * )
FROM t1, t2
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 27
WHERE t1.j1 = t2.j2;
SELECT *
FROM table (DBMS_XPLAN.display);
ALTER SESSION SET EVENTS '10053 trace name context off';
Corresponding information from the dictionary:
t1.j1 freq ep_rep ep_num t2.j2 freq ep_num
1 1 1 1 1 3 3
2 2 2 3 3 4 7
4 3 1 6 4 7 14
6 4 3 10 20 1 15
7 4 4 14
17 3 1 17
18 1 1 18
20 2 1 20
And lines from dbms_stats trace
DBMS_STATS: > cdn 20, popFreq 7, popCnt 2, bktSize 2.4, bktSzFrc .4
DBMS_STATS: Evaluating hybrid histogram: cht.count 8, mnb 8, ssize 20,
min_ssize 2500, appr_ndv TRUE, ndv 13, selNdv 0, selFreq 0, pct 100, avg_bktsize
3, csr.hreq TRUE, normalize TRUE
DBMS_STATS: Histogram gathering flags: 527
DBMS_STATS: Accepting histogram
DBMS_STATS: Start fill_cstats - hybrid_enabled: TRUE
So we our average bucket size is 3 and we have 2 popular values {6, 7}. These values are not a part
of high frequency values in top frequency histogram. Table and column statistics from optimizer
trace file:
Table Stats::
Table: T2 Alias: T2
#Rows: 20 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00
SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient:
0.000000
Column (#1): J2(NUMBER)
AvgLen: 3 NDV: 8 Nulls: 0 Density: 0.062500 Min: 1.000000 Max: 20.000000
Histogram: Top-Freq #Bkts: 15 UncompBkts: 15 EndPtVals: 4 ActualVal: yes
***********************
Table Stats::
Table: T1 Alias: T1
#Rows: 20 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00
SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient:
0.000000
Column (#1): J1(NUMBER)
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 28
AvgLen: 3 NDV: 13 Nulls: 0 Density: 0.059091 Min: 1.000000 Max: 20.000000
Histogram: Hybrid #Bkts: 8 UncompBkts: 20 EndPtVals: 8 ActualVal: yes
---Join Cardinality
SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
Join Card: 31.477273 = outer (20.000000) * inner (20.000000) * sel (0.078693)
Join Card - Rounded: 31 Computed: 31.477273
Firstly lets calculate cardinality for the high frequency values.
High freq values j2.freq j1.freq freq*freq
1 3 1.18182 3.54546
3 4 1.18182 4.72728
4 7 1.18182 8.27274
20 1 1.18182 1.18182
sum 17.7273
So our cardinality for high frequency values is 17.7273. And we also have num_rows(t1)-
popular_rows(t1)=20-15=5 unpopular rows. But as you see oracle computed final cardinality as
31. In my opinion popular rows of the hybrid histogram here play role. Test cases show that
optimizer in such situations also tries to take advantage of the popular values. In our case the value
6 and 7 are popular values and popular frequency is 7 (sum of popular frequency). If we try find
out frequencies of these values based on the top frequency histogram then we have to use density.
So cardinality for popular values will be:
Popular frequency*num_rows(t1)*density(j2)=7*20*0.0625=8.75. Moreover for every โ€œlow
frequencyโ€ values we have 1.18182โ‰ˆ1 frequency and we have 5 โ€œlow frequencyโ€ values (or
unpopular rows of the j2 column) therefore cardinality for โ€œlow frequencyโ€ could be consider as 5.
Eventually we can figure out final cardinality.
CARD = CARD (High frequency values) + CARD (Low frequency values) + CARD (Unpopular
rows) = 17.7273+8.75+5=31.4773.
And execution plan
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 31 | 26 |
| 3 | TABLE ACCESS FULL| T1 | 1 | 20 | 20 |
| 4 | TABLE ACCESS FULL| T2 | 1 | 20 | 20 |
---------------------------------------------------------------
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 29
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."J1"="T2"."J2")
So it is an expected cardinality.
But in general here could be estimation or approximation errors which are related with rounding.
Sampling based estimation
As we know in Oracle Database 12c new dynamic sampling feature has been introduced. The
dynamic sampling level=11 is designed for the operations like single table, group by and join
operations for which oracle automatically defines sample size and tries to estimate cardinality of the
operations. Lets see following example and try to understand sampling mechanism in the join size
estimation.
CREATE TABLE t1
AS SELECT * FROM dba_users;
CREATE TABLE t2
AS SELECT * FROM dba_objects;
EXECUTE dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1');
EXECUTE dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 1');
SELECT COUNT (*)
FROM t1, t2
WHERE t1.username = t2.owner;
Without histogram and in default sampling mode execution plan is:
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 92019 | 54942 |
| 3 | TABLE ACCESS FULL| T1 | 1 | 42 | 42 |
| 4 | TABLE ACCESS FULL| T2 | 1 | 92019 | 92019 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."USERNAME"="T2"."OWNER")
Without histogram and automatic sampling mode execution plan is:
---------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |
|* 2 | HASH JOIN | | 1 | 58728 | 54942 |
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 30
| 3 | TABLE ACCESS FULL| T1 | 1 | 42 | 42 |
| 4 | TABLE ACCESS FULL| T2 | 1 | 92019 | 92019 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T1"."USERNAME"="T2"."OWNER")
Note
-----
- dynamic statistics used: dynamic sampling (level=AUTO)
As we see without histogram there is significant difference between actual and estimated rows but in
case when automatic (adaptive) sampling is enabled estimation is good enough. The Question is how
did optimizer actually get cardinality as 58728? How did optimizer calculate it? To give the
explanation we could use 10046 and 10053 trace events. So in SQL trace file we could see following
lines.
SQL ID: 1bgh7fk6kqxg7
Plan Hash: 3696410285
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
*/ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T2#0") */ 1 AS C1 FROM
"T2" SAMPLE BLOCK(51.8135, 8) SEED(1) "T2#0", "T1" "T1#1" WHERE
("T1#1"."USERNAME"="T2#0"."OWNER")) innerQuery
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.06 0.05 0 879 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.06 0.05 0 879 0 1
Misses in library cache during parse: 1
Optimizer mode: CHOOSE
Parsing user id: SYS (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=879 pr=0 pw=0 time=51540 us)
30429 HASH JOIN (cr=879 pr=0 pw=0 time=58582 us cost=220 size=1287306 card=47678)
42 TABLE ACCESS FULL T1 (cr=3 pr=0 pw=0 time=203 us cost=2 size=378 card=42)
51770 TABLE ACCESS SAMPLE T2 (cr=876 pr=0 pw=0 time=35978 us cost=218 size=858204
card=47678)
During parsing oracle has executed this SQL statement and result has been used to estimate size of
the join. The SQL statement used sampling (undocumented format) actually read 50 percent of the
T2 table blocks. Sampling was not applied to the T1 table because its size is quite small when
compared to the second table and 100% sampling of the T1 table does not consume โ€œlot ofโ€ time
during parsing. It means oracle first identifies appropriate sampling size based on the table size and
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 31
then execute specific SQL statement. So we get 30429 rows based on the 51.8135 percent therefore
our estimated cardinality is 30429/51.8135*100=58727.94โ‰ˆ58728. Now letโ€™s check optimizer trace
file:
SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
Join Card: 92019.000000 = outer (42.000000) * inner (92019.000000) * sel (0.023810)
>> Join Card adjusted from 92019.000000 to 58727.970000 due to adaptive dynamic sampling,
prelen=2
Adjusted Join Cards: adjRatio=0.638216 cardHjSmj=58727.970000 cardHjSmjNPF=58727.970000
cardNlj=58727.970000 cardNSQ=58727.970000 cardNSQ_na=92019.000000
Join Card - Rounded: 58728 Computed: 58727.970000
Let see what will happen if we increase sizes of both tables โ€“ using multiple
insert into t select * from t
table name blocks row nums size mb
T1 3186 172032 25
T2 6158 368076 49
In this case oracle completely ignores adaptive sampling and uses uniform distribution to estimate
join size.
Table Stats::
Table: T2 Alias: T2
#Rows: 368076 SSZ: 0 LGR: 0 #Blks: 6158 AvgRowLen: 115.00 NEB: 0 ChainCnt:
0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
Column (#1): OWNER(VARCHAR2)
AvgLen: 6 NDV: 31 Nulls: 0 Density: 0.032258
***********************
Table Stats::
Table: T1 Alias: T1
#Rows: 172032 SSZ: 0 LGR: 0 #Blks: 3186 AvgRowLen: 127.00 NEB: 0 ChainCnt:
0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
Column (#1): USERNAME(VARCHAR2)
AvgLen: 9 NDV: 42 Nulls: 0 Density: 0.023810
Join Card: 1507639296.000000 = outer (172032.000000) * inner (368076.000000) * sel
(0.023810)
In addition if you see SQL trace file then
SQL ID: 0ck072zj5gf73
Plan Hash: 3774486692
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
*/ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T2#0") */ 1 AS C1 FROM
"T2" SAMPLE BLOCK(12.9912, 8) SEED(1) "T2#0", "T1" "T1#1" WHERE
("T1#1"."USERNAME"="T2#0"."OWNER")) innerQuery
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 32
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 1.70 1.91 0 885 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 1.70 1.91 0 887 0 0
Misses in library cache during parse: 1
Optimizer mode: CHOOSE
Parsing user id: SYS (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
0 SORT AGGREGATE (cr=0 pr=0 pw=0 time=36 us)
4049738 HASH JOIN (cr=885 pr=0 pw=0 time=2696440 us cost=1835 size=5288231772
card=195860436)
44649 TABLE ACCESS SAMPLE T2 (cr=761 pr=0 pw=0 time=28434 us cost=218 size=860706
card=47817)
6468 TABLE ACCESS FULL T1 (cr=124 pr=0 pw=0 time=28902 us cost=866 size=1548288
card=172032)
It is obvious that oracle stopped execution of this SQL during parsing, we can see it from rows
column of the execution statistics and also from row source statistics. Oracle did not complete HASH
JOIN operation in this SQL, we can be confirm that with result of the above SQL and row source
statistics. Sizes of the tables are not big actually but why did optimizer ignore and decided to continue
with previous approach? In my opinion here could be two factors, although sample size is not small
but in our case sample SQL actually took quite long time during parsing (1.8 sec elapsed time)
therefore oracle stopped it. I have added one filter predicate to the query:
SELECT COUNT (*)
FROM t1, t2
WHERE t1.username = t2.owner AND t2.object_type = 'TABLE';
In this case we will get the following lines
SQL ID: 8pu5v8h0ghy1z
Plan Hash: 3252009800
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
*/ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T2") */ 1 AS C1 FROM "T2"
SAMPLE BLOCK(12.9912, 8) SEED(1) "T2" WHERE ("T2"."OBJECT_TYPE"='TABLE'))
innerQuery
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.01 0.01 0 761 0 1
------- ------ -------- ---------- ---------- ---------- ---------- ----------
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 33
total 3 0.01 0.01 0 763 0 1
Misses in library cache during parse: 1
Optimizer mode: CHOOSE
Parsing user id: SYS (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
1 SORT AGGREGATE (cr=761 pr=0 pw=0 time=14864 us)
756 TABLE ACCESS SAMPLE T2 (cr=761 pr=0 pw=0 time=5969 us cost=219 size=21378
card=1018)
********************************************************************************
SQL ID: 9jv79m9u42jps
Plan Hash: 3525519047
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring
optimizer_features_enable(default) no_parallel result_cache(snapshot=3600)
OPT_ESTIMATE(@"innerQuery", TABLE, "T2#0", ROWS=5819.31) */ SUM(C1)
FROM
(SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T1#1") */ 1 AS C1 FROM
"T1" SAMPLE BLOCK(25.1099, 8) SEED(1) "T1#1", "T2" "T2#0" WHERE
("T2#0"."OBJECT_TYPE"='TABLE') AND ("T1#1"."USERNAME"="T2#0"."OWNER"))
innerQuery
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.01 0.00 0 2 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.74 1.02 0 6283 0 0
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 3 0.76 1.02 0 6285 0 0
Misses in library cache during parse: 1
Optimizer mode: CHOOSE
Parsing user id: SYS (recursive depth: 1)
Rows Row Source Operation
------- ---------------------------------------------------
0 SORT AGGREGATE (cr=0 pr=0 pw=0 time=32 us)
1412128 HASH JOIN (cr=6283 pr=0 pw=0 time=1243755 us cost=1908 size=215466084
card=5985169)
9880 TABLE ACCESS FULL T2 (cr=6167 pr=0 pw=0 time=20665 us cost=1674 size=87285
card=5819)
6035 TABLE ACCESS SAMPLE T1 (cr=116 pr=0 pw=0 time=6069 us cost=218 size=907137
card=43197)
It means oracle firstly tried to estimate size of T2 table, because it has filter predicate and optimizer
thinks using ADS could be very efficient. If we should have added predicate like t2.owner=โ€™HRโ€™ then
optimizer would tried to estimate also T1 table cardinality. But the mechanism of estimating subset
of the join and then estimate whole join principle in this case actually ignored. However in this case
only T2 table has been estimated. We can easily see this fact from the trace file:
BASE STATISTICAL INFORMATION
***********************
Table Stats::
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 34
Table: T2 Alias: T2
#Rows: 368144 SSZ: 0 LGR: 0 #Blks: 6158 AvgRowLen: 115.00 NEB: 0 ChainCnt:
0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
Column (#1): OWNER(VARCHAR2)
AvgLen: 6 NDV: 31 Nulls: 0 Density: 0.032258
***********************
Table Stats::
Table: T1 Alias: T1
#Rows: 172032 SSZ: 0 LGR: 0 #Blks: 3186 AvgRowLen: 127.00 NEB: 0 ChainCnt:
0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
#IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000
Column (#1): USERNAME(VARCHAR2)
AvgLen: 9 NDV: 42 Nulls: 0 Density: 0.023810
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T1[T1]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
*** 2016-02-13 19:59:49.416
** Performing dynamic sampling initial checks. **
** Not using old style dynamic sampling since ADS is enabled.
Table: T1 Alias: T1
Card: Original: 172032.000000 Rounded: 172032 Computed: 172032.000000 Non
Adjusted: 172032.000000
Scan IO Cost (Disk) = 865.000000
Scan CPU Cost (Disk) = 48493707.840000
Total Scan IO Cost = 865.000000 (scan (Disk))
= 865.000000
Total Scan CPU Cost = 48493707.840000 (scan (Disk))
= 48493707.840000
Access Path: TableScan
Cost: 866.262422 Resp: 866.262422 Degree: 0
Cost_io: 865.000000 Cost_cpu: 48493708
Resp_io: 865.000000 Resp_cpu: 48493708
Best:: AccessPath: TableScan
Cost: 866.262422 Degree: 1 Resp: 866.262422 Card: 172032.000000 Bytes:
0.000000
Access path analysis for T2
***************************************
SINGLE TABLE ACCESS PATH
Single Table Cardinality Estimation for T2[T2]
SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
*** 2016-02-13 19:59:49.417
** Performing dynamic sampling initial checks. **
** Not using old style dynamic sampling since ADS is enabled.
Column (#6): OBJECT_TYPE(VARCHAR2)
AvgLen: 9 NDV: 47 Nulls: 0 Density: 0.021277
Table: T2 Alias: T2
Card: Original: 368144.000000 >> Single Tab Card adjusted from 7832.851064 to
5819.310000 due to adaptive dynamic sampling
Rounded: 5819 Computed: 5819.310000 Non Adjusted: 7832.851064
Best NL cost: 5031394.059186
resc: 5031394.059186 resc_io: 5022741.000000 resc_cpu: 332391893348
resp: 5031394.059186 resp_io: 5022741.000000 resc_cpu: 332391893348
SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 35
Join Card: 23835893.760000 = outer (5819.310000) * inner (172032.000000) * sel
(0.023810)
*** 2016-02-13 19:59:51.055
Join Card - Rounded: 23835894 Computed: 23835893.760000
In last case I have increased of both table sizes as
table name blocks row nums size mb
T1 101950 5505024 800
T2 196807 11780608 1600
In this case oracle completely ignored ADS and used statistics from dictionary to estimate size of
tables and join cardinality.
Summary
In this paper has explained the mechanism of the oracle optimizer to calculate join selectivity and
cardinality. We learned that firstly optimizer calculates join selectivity based on โ€œpureโ€ cardinality. To
estimate the โ€œpureโ€ cardinality optimizer identifies โ€œdistinct value, frequencyโ€ pairs for each column,
based on the column distribution. The column distribution information is identified by the histogram.
And as we know that, frequency histogram gives us completely whole data distribution of the column.
Also top frequency histogram gives us enough information for high frequency values. However for
less significantly values we can approach โ€œuniform distributionโ€. Moreover if here are hybrid
histograms for the join columns in the dictionary then optimizer can use endpoint repeat count to
formulate frequency. In addition optimizer has chance to estimate join cardinality via sampling.
Although this process influenced by time restriction and size of the tables. As a result optimizer can
completely ignore adaptive dynamic sampling.
References
โ€ข Lewis, Jonathan. Cost-Based Oracle: Fundamentals Based Oracle: Fundamentals. Apress.
2006
โ€ข Alberto Dell'Era. Join Over Histograms. 2007
โ€ข http://www.adellera.it/investigations/join_over_histograms/JoinOverHistograms.pdf
โ€ข Chinar Aliyev. Automatic Sampling in Oracle 12c. 2014
โ€ข https://www.toadworld.com/platforms/oracle/w/wiki/11036.automaticadaptive-dynamic-
sampling-in-oracle-12c-part-2
โ€ข https://www.toadworld.com/platforms/oracle/w/wiki/11052.automaticadaptive-dynamic-
sampling-in-oracle-12c-part-3

More Related Content

What's hot

ใ€ใ“ใจใฏใ˜ใ‚ใ€‘ ไปŠใ•ใ‚‰่žใ‘ใชใ„!? ใ‚ผใƒญใƒˆใƒฉใ‚นใƒˆใฎใใปใ‚“ (Oracle Cloudใ‚ฆใ‚งใƒ“ใƒŠใƒผใ‚ทใƒชใƒผใ‚บ: 2021ๅนด2ๆœˆ9ๆ—ฅ)
ใ€ใ“ใจใฏใ˜ใ‚ใ€‘ ไปŠใ•ใ‚‰่žใ‘ใชใ„!? ใ‚ผใƒญใƒˆใƒฉใ‚นใƒˆใฎใใปใ‚“ (Oracle Cloudใ‚ฆใ‚งใƒ“ใƒŠใƒผใ‚ทใƒชใƒผใ‚บ: 2021ๅนด2ๆœˆ9ๆ—ฅ)ใ€ใ“ใจใฏใ˜ใ‚ใ€‘ ไปŠใ•ใ‚‰่žใ‘ใชใ„!? ใ‚ผใƒญใƒˆใƒฉใ‚นใƒˆใฎใใปใ‚“ (Oracle Cloudใ‚ฆใ‚งใƒ“ใƒŠใƒผใ‚ทใƒชใƒผใ‚บ: 2021ๅนด2ๆœˆ9ๆ—ฅ)
ใ€ใ“ใจใฏใ˜ใ‚ใ€‘ ไปŠใ•ใ‚‰่žใ‘ใชใ„!? ใ‚ผใƒญใƒˆใƒฉใ‚นใƒˆใฎใใปใ‚“ (Oracle Cloudใ‚ฆใ‚งใƒ“ใƒŠใƒผใ‚ทใƒชใƒผใ‚บ: 2021ๅนด2ๆœˆ9ๆ—ฅ)ใ‚ชใƒฉใ‚ฏใƒซใ‚จใƒณใ‚ธใƒ‹ใ‚ข้€šไฟก
ย 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
ย 
ใƒใƒƒใƒˆใƒฏใƒผใ‚ฏๅŸบ็คŽ - WEBใƒšใƒผใ‚ธใŒ่กจ็คบใ•ใ‚Œใ‚‹ใพใง
ใƒใƒƒใƒˆใƒฏใƒผใ‚ฏๅŸบ็คŽ - WEBใƒšใƒผใ‚ธใŒ่กจ็คบใ•ใ‚Œใ‚‹ใพใงใƒใƒƒใƒˆใƒฏใƒผใ‚ฏๅŸบ็คŽ - WEBใƒšใƒผใ‚ธใŒ่กจ็คบใ•ใ‚Œใ‚‹ใพใง
ใƒใƒƒใƒˆใƒฏใƒผใ‚ฏๅŸบ็คŽ - WEBใƒšใƒผใ‚ธใŒ่กจ็คบใ•ใ‚Œใ‚‹ใพใงShinnosuke Tokuda
ย 
Cisco Icon Library
Cisco Icon LibraryCisco Icon Library
Cisco Icon Libraryfmarches
ย 
Java EE ใ‹ใ‚‰ Quarkus ใซใ‚ˆใ‚‹้–‹็™บใธใฎ็งป่กŒใซใคใ„ใฆ
Java EE ใ‹ใ‚‰ Quarkus ใซใ‚ˆใ‚‹้–‹็™บใธใฎ็งป่กŒใซใคใ„ใฆJava EE ใ‹ใ‚‰ Quarkus ใซใ‚ˆใ‚‹้–‹็™บใธใฎ็งป่กŒใซใคใ„ใฆ
Java EE ใ‹ใ‚‰ Quarkus ใซใ‚ˆใ‚‹้–‹็™บใธใฎ็งป่กŒใซใคใ„ใฆShigeru Tatsuta
ย 
Spanner็งป่กŒใซใคใ„ใฆๆœฌๆฐ—ๅ‡บใ—ใฆ่€ƒใˆใฆใฟใŸ
Spanner็งป่กŒใซใคใ„ใฆๆœฌๆฐ—ๅ‡บใ—ใฆ่€ƒใˆใฆใฟใŸSpanner็งป่กŒใซใคใ„ใฆๆœฌๆฐ—ๅ‡บใ—ใฆ่€ƒใˆใฆใฟใŸ
Spanner็งป่กŒใซใคใ„ใฆๆœฌๆฐ—ๅ‡บใ—ใฆ่€ƒใˆใฆใฟใŸtechgamecollege
ย 
Juju/MAASใงไฝœใ‚‹ Kubernetes + GPU
Juju/MAASใงไฝœใ‚‹ Kubernetes + GPUJuju/MAASใงไฝœใ‚‹ Kubernetes + GPU
Juju/MAASใงไฝœใ‚‹ Kubernetes + GPUVirtualTech Japan Inc.
ย 
ใ“ใ‚Œใ‹ใ‚‰ใฎJDK/JVM ไฝ•ใ‚’้ธใถ๏ผŸใฉใ†้ธใถ๏ผŸ
ใ“ใ‚Œใ‹ใ‚‰ใฎJDK/JVM ไฝ•ใ‚’้ธใถ๏ผŸใฉใ†้ธใถ๏ผŸใ“ใ‚Œใ‹ใ‚‰ใฎJDK/JVM ไฝ•ใ‚’้ธใถ๏ผŸใฉใ†้ธใถ๏ผŸ
ใ“ใ‚Œใ‹ใ‚‰ใฎJDK/JVM ไฝ•ใ‚’้ธใถ๏ผŸใฉใ†้ธใถ๏ผŸTakahiro YAMADA
ย 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021StreamNative
ย 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouseAltinity Ltd
ย 
ๅฎŸ่ทต๏ผDjango + GraphQL ๅฎŸ่ฃ…
ๅฎŸ่ทต๏ผDjango + GraphQL ๅฎŸ่ฃ…ๅฎŸ่ทต๏ผDjango + GraphQL ๅฎŸ่ฃ…
ๅฎŸ่ทต๏ผDjango + GraphQL ๅฎŸ่ฃ…ssuseraf19bf
ย 
ใ‚ฏใƒฉใƒฉใ‚ชใƒณใƒฉใ‚คใƒณใŒNetskopeใ‚’้ธใ‚“ใ ็†็”ฑ
ใ‚ฏใƒฉใƒฉใ‚ชใƒณใƒฉใ‚คใƒณใŒNetskopeใ‚’้ธใ‚“ใ ็†็”ฑใ‚ฏใƒฉใƒฉใ‚ชใƒณใƒฉใ‚คใƒณใŒNetskopeใ‚’้ธใ‚“ใ ็†็”ฑ
ใ‚ฏใƒฉใƒฉใ‚ชใƒณใƒฉใ‚คใƒณใŒNetskopeใ‚’้ธใ‚“ใ ็†็”ฑKyohei Komatsu
ย 
JOHN DEERE 4630 TRACTOR Service Repair Manual
JOHN DEERE 4630 TRACTOR Service Repair ManualJOHN DEERE 4630 TRACTOR Service Repair Manual
JOHN DEERE 4630 TRACTOR Service Repair Manualjjjn9eoksl
ย 
[Oracle Code Tokyo 2017] Live Challenge!! SQLใƒ‘ใƒ•ใ‚ฉใƒผใƒžใƒณใ‚นใฎ้ซ˜้€ŸๅŒ–ใฎ้™็•Œใ‚’็›ฎๆŒ‡ใ›๏ผ
[Oracle Code Tokyo 2017] Live Challenge!! SQLใƒ‘ใƒ•ใ‚ฉใƒผใƒžใƒณใ‚นใฎ้ซ˜้€ŸๅŒ–ใฎ้™็•Œใ‚’็›ฎๆŒ‡ใ›๏ผ[Oracle Code Tokyo 2017] Live Challenge!! SQLใƒ‘ใƒ•ใ‚ฉใƒผใƒžใƒณใ‚นใฎ้ซ˜้€ŸๅŒ–ใฎ้™็•Œใ‚’็›ฎๆŒ‡ใ›๏ผ
[Oracle Code Tokyo 2017] Live Challenge!! SQLใƒ‘ใƒ•ใ‚ฉใƒผใƒžใƒณใ‚นใฎ้ซ˜้€ŸๅŒ–ใฎ้™็•Œใ‚’็›ฎๆŒ‡ใ›๏ผใ‚ชใƒฉใ‚ฏใƒซใ‚จใƒณใ‚ธใƒ‹ใ‚ข้€šไฟก
ย 
10ๅนดๅŠนใๅˆ†ๆ•ฃใƒ•ใ‚กใ‚คใƒซใ‚ทใ‚นใƒ†ใƒ ๆŠ€่ก“ GlusterFS & Red Hat Storage
10ๅนดๅŠนใๅˆ†ๆ•ฃใƒ•ใ‚กใ‚คใƒซใ‚ทใ‚นใƒ†ใƒ ๆŠ€่ก“ GlusterFS & Red Hat Storage10ๅนดๅŠนใๅˆ†ๆ•ฃใƒ•ใ‚กใ‚คใƒซใ‚ทใ‚นใƒ†ใƒ ๆŠ€่ก“ GlusterFS & Red Hat Storage
10ๅนดๅŠนใๅˆ†ๆ•ฃใƒ•ใ‚กใ‚คใƒซใ‚ทใ‚นใƒ†ใƒ ๆŠ€่ก“ GlusterFS & Red Hat StorageEtsuji Nakai
ย 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...HostedbyConfluent
ย 
MQTT - A practical protocol for the Internet of Things
MQTT - A practical protocol for the Internet of ThingsMQTT - A practical protocol for the Internet of Things
MQTT - A practical protocol for the Internet of ThingsBryan Boyd
ย 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...HostedbyConfluent
ย 
Kafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedInKafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedInAditya Auradkar
ย 

What's hot (20)

ใ€ใ“ใจใฏใ˜ใ‚ใ€‘ ไปŠใ•ใ‚‰่žใ‘ใชใ„!? ใ‚ผใƒญใƒˆใƒฉใ‚นใƒˆใฎใใปใ‚“ (Oracle Cloudใ‚ฆใ‚งใƒ“ใƒŠใƒผใ‚ทใƒชใƒผใ‚บ: 2021ๅนด2ๆœˆ9ๆ—ฅ)
ใ€ใ“ใจใฏใ˜ใ‚ใ€‘ ไปŠใ•ใ‚‰่žใ‘ใชใ„!? ใ‚ผใƒญใƒˆใƒฉใ‚นใƒˆใฎใใปใ‚“ (Oracle Cloudใ‚ฆใ‚งใƒ“ใƒŠใƒผใ‚ทใƒชใƒผใ‚บ: 2021ๅนด2ๆœˆ9ๆ—ฅ)ใ€ใ“ใจใฏใ˜ใ‚ใ€‘ ไปŠใ•ใ‚‰่žใ‘ใชใ„!? ใ‚ผใƒญใƒˆใƒฉใ‚นใƒˆใฎใใปใ‚“ (Oracle Cloudใ‚ฆใ‚งใƒ“ใƒŠใƒผใ‚ทใƒชใƒผใ‚บ: 2021ๅนด2ๆœˆ9ๆ—ฅ)
ใ€ใ“ใจใฏใ˜ใ‚ใ€‘ ไปŠใ•ใ‚‰่žใ‘ใชใ„!? ใ‚ผใƒญใƒˆใƒฉใ‚นใƒˆใฎใใปใ‚“ (Oracle Cloudใ‚ฆใ‚งใƒ“ใƒŠใƒผใ‚ทใƒชใƒผใ‚บ: 2021ๅนด2ๆœˆ9ๆ—ฅ)
ย 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
ย 
CCNA
CCNACCNA
CCNA
ย 
ใƒใƒƒใƒˆใƒฏใƒผใ‚ฏๅŸบ็คŽ - WEBใƒšใƒผใ‚ธใŒ่กจ็คบใ•ใ‚Œใ‚‹ใพใง
ใƒใƒƒใƒˆใƒฏใƒผใ‚ฏๅŸบ็คŽ - WEBใƒšใƒผใ‚ธใŒ่กจ็คบใ•ใ‚Œใ‚‹ใพใงใƒใƒƒใƒˆใƒฏใƒผใ‚ฏๅŸบ็คŽ - WEBใƒšใƒผใ‚ธใŒ่กจ็คบใ•ใ‚Œใ‚‹ใพใง
ใƒใƒƒใƒˆใƒฏใƒผใ‚ฏๅŸบ็คŽ - WEBใƒšใƒผใ‚ธใŒ่กจ็คบใ•ใ‚Œใ‚‹ใพใง
ย 
Cisco Icon Library
Cisco Icon LibraryCisco Icon Library
Cisco Icon Library
ย 
Java EE ใ‹ใ‚‰ Quarkus ใซใ‚ˆใ‚‹้–‹็™บใธใฎ็งป่กŒใซใคใ„ใฆ
Java EE ใ‹ใ‚‰ Quarkus ใซใ‚ˆใ‚‹้–‹็™บใธใฎ็งป่กŒใซใคใ„ใฆJava EE ใ‹ใ‚‰ Quarkus ใซใ‚ˆใ‚‹้–‹็™บใธใฎ็งป่กŒใซใคใ„ใฆ
Java EE ใ‹ใ‚‰ Quarkus ใซใ‚ˆใ‚‹้–‹็™บใธใฎ็งป่กŒใซใคใ„ใฆ
ย 
Spanner็งป่กŒใซใคใ„ใฆๆœฌๆฐ—ๅ‡บใ—ใฆ่€ƒใˆใฆใฟใŸ
Spanner็งป่กŒใซใคใ„ใฆๆœฌๆฐ—ๅ‡บใ—ใฆ่€ƒใˆใฆใฟใŸSpanner็งป่กŒใซใคใ„ใฆๆœฌๆฐ—ๅ‡บใ—ใฆ่€ƒใˆใฆใฟใŸ
Spanner็งป่กŒใซใคใ„ใฆๆœฌๆฐ—ๅ‡บใ—ใฆ่€ƒใˆใฆใฟใŸ
ย 
Juju/MAASใงไฝœใ‚‹ Kubernetes + GPU
Juju/MAASใงไฝœใ‚‹ Kubernetes + GPUJuju/MAASใงไฝœใ‚‹ Kubernetes + GPU
Juju/MAASใงไฝœใ‚‹ Kubernetes + GPU
ย 
ใ“ใ‚Œใ‹ใ‚‰ใฎJDK/JVM ไฝ•ใ‚’้ธใถ๏ผŸใฉใ†้ธใถ๏ผŸ
ใ“ใ‚Œใ‹ใ‚‰ใฎJDK/JVM ไฝ•ใ‚’้ธใถ๏ผŸใฉใ†้ธใถ๏ผŸใ“ใ‚Œใ‹ใ‚‰ใฎJDK/JVM ไฝ•ใ‚’้ธใถ๏ผŸใฉใ†้ธใถ๏ผŸ
ใ“ใ‚Œใ‹ใ‚‰ใฎJDK/JVM ไฝ•ใ‚’้ธใถ๏ผŸใฉใ†้ธใถ๏ผŸ
ย 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
ย 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
ย 
ๅฎŸ่ทต๏ผDjango + GraphQL ๅฎŸ่ฃ…
ๅฎŸ่ทต๏ผDjango + GraphQL ๅฎŸ่ฃ…ๅฎŸ่ทต๏ผDjango + GraphQL ๅฎŸ่ฃ…
ๅฎŸ่ทต๏ผDjango + GraphQL ๅฎŸ่ฃ…
ย 
ใ‚ฏใƒฉใƒฉใ‚ชใƒณใƒฉใ‚คใƒณใŒNetskopeใ‚’้ธใ‚“ใ ็†็”ฑ
ใ‚ฏใƒฉใƒฉใ‚ชใƒณใƒฉใ‚คใƒณใŒNetskopeใ‚’้ธใ‚“ใ ็†็”ฑใ‚ฏใƒฉใƒฉใ‚ชใƒณใƒฉใ‚คใƒณใŒNetskopeใ‚’้ธใ‚“ใ ็†็”ฑ
ใ‚ฏใƒฉใƒฉใ‚ชใƒณใƒฉใ‚คใƒณใŒNetskopeใ‚’้ธใ‚“ใ ็†็”ฑ
ย 
JOHN DEERE 4630 TRACTOR Service Repair Manual
JOHN DEERE 4630 TRACTOR Service Repair ManualJOHN DEERE 4630 TRACTOR Service Repair Manual
JOHN DEERE 4630 TRACTOR Service Repair Manual
ย 
[Oracle Code Tokyo 2017] Live Challenge!! SQLใƒ‘ใƒ•ใ‚ฉใƒผใƒžใƒณใ‚นใฎ้ซ˜้€ŸๅŒ–ใฎ้™็•Œใ‚’็›ฎๆŒ‡ใ›๏ผ
[Oracle Code Tokyo 2017] Live Challenge!! SQLใƒ‘ใƒ•ใ‚ฉใƒผใƒžใƒณใ‚นใฎ้ซ˜้€ŸๅŒ–ใฎ้™็•Œใ‚’็›ฎๆŒ‡ใ›๏ผ[Oracle Code Tokyo 2017] Live Challenge!! SQLใƒ‘ใƒ•ใ‚ฉใƒผใƒžใƒณใ‚นใฎ้ซ˜้€ŸๅŒ–ใฎ้™็•Œใ‚’็›ฎๆŒ‡ใ›๏ผ
[Oracle Code Tokyo 2017] Live Challenge!! SQLใƒ‘ใƒ•ใ‚ฉใƒผใƒžใƒณใ‚นใฎ้ซ˜้€ŸๅŒ–ใฎ้™็•Œใ‚’็›ฎๆŒ‡ใ›๏ผ
ย 
10ๅนดๅŠนใๅˆ†ๆ•ฃใƒ•ใ‚กใ‚คใƒซใ‚ทใ‚นใƒ†ใƒ ๆŠ€่ก“ GlusterFS & Red Hat Storage
10ๅนดๅŠนใๅˆ†ๆ•ฃใƒ•ใ‚กใ‚คใƒซใ‚ทใ‚นใƒ†ใƒ ๆŠ€่ก“ GlusterFS & Red Hat Storage10ๅนดๅŠนใๅˆ†ๆ•ฃใƒ•ใ‚กใ‚คใƒซใ‚ทใ‚นใƒ†ใƒ ๆŠ€่ก“ GlusterFS & Red Hat Storage
10ๅนดๅŠนใๅˆ†ๆ•ฃใƒ•ใ‚กใ‚คใƒซใ‚ทใ‚นใƒ†ใƒ ๆŠ€่ก“ GlusterFS & Red Hat Storage
ย 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
ย 
MQTT - A practical protocol for the Internet of Things
MQTT - A practical protocol for the Internet of ThingsMQTT - A practical protocol for the Internet of Things
MQTT - A practical protocol for the Internet of Things
ย 
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
Mind the App: How to Monitor Your Kafka Streams Applications | Bruno Cadonna,...
ย 
Kafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedInKafka Quotas Talk at LinkedIn
Kafka Quotas Talk at LinkedIn
ย 

Similar to Join Cardinality Estimation Methods_in_Oracle12c.pdf

Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...Waqas Tariq
ย 
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...ijtsrd
ย 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_fariaPaulo Faria
ย 
Exponential State Observer Design for a Class of Uncertain Chaotic and Non-Ch...
Exponential State Observer Design for a Class of Uncertain Chaotic and Non-Ch...Exponential State Observer Design for a Class of Uncertain Chaotic and Non-Ch...
Exponential State Observer Design for a Class of Uncertain Chaotic and Non-Ch...ijtsrd
ย 
Maneuvering target track prediction model
Maneuvering target track prediction modelManeuvering target track prediction model
Maneuvering target track prediction modelIJCI JOURNAL
ย 
SYSTEM IDENTIFICATION USING CEREBELLAR MODEL ARITHMETIC COMPUTER
SYSTEM IDENTIFICATION USING CEREBELLAR MODEL ARITHMETIC COMPUTERSYSTEM IDENTIFICATION USING CEREBELLAR MODEL ARITHMETIC COMPUTER
SYSTEM IDENTIFICATION USING CEREBELLAR MODEL ARITHMETIC COMPUTERTarun Kumar
ย 
Project2
Project2Project2
Project2Linjun Li
ย 
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...cscpconf
ย 
Adaptive pi based on direct synthesis nishant
Adaptive pi based on direct synthesis nishantAdaptive pi based on direct synthesis nishant
Adaptive pi based on direct synthesis nishantNishant Parikh
ย 
Investigation of auto-oscilational regimes of the system by dynamic nonlinear...
Investigation of auto-oscilational regimes of the system by dynamic nonlinear...Investigation of auto-oscilational regimes of the system by dynamic nonlinear...
Investigation of auto-oscilational regimes of the system by dynamic nonlinear...IJECEIAES
ย 
ETSATPWAATFU
ETSATPWAATFUETSATPWAATFU
ETSATPWAATFURobert Zima
ย 
Chp-1 Quick Review of basic concepts.pdf
Chp-1 Quick Review of basic concepts.pdfChp-1 Quick Review of basic concepts.pdf
Chp-1 Quick Review of basic concepts.pdfSolomonMolla4
ย 
Margin Parameter Variation for an Adaptive Observer to a Class of Systems
Margin Parameter Variation for an Adaptive Observer to a Class of SystemsMargin Parameter Variation for an Adaptive Observer to a Class of Systems
Margin Parameter Variation for an Adaptive Observer to a Class of SystemsCSCJournals
ย 
Multistage condition-monitoring-system-of-aircraft-gas-turbine-engine
Multistage condition-monitoring-system-of-aircraft-gas-turbine-engineMultistage condition-monitoring-system-of-aircraft-gas-turbine-engine
Multistage condition-monitoring-system-of-aircraft-gas-turbine-engineCemal Ardil
ย 
17 16512 32451-1-sm (edit ari)
17 16512 32451-1-sm (edit ari)17 16512 32451-1-sm (edit ari)
17 16512 32451-1-sm (edit ari)IAESIJEECS
ย 
17 16512 32451-1-sm (edit ari)
17 16512 32451-1-sm (edit ari)17 16512 32451-1-sm (edit ari)
17 16512 32451-1-sm (edit ari)IAESIJEECS
ย 

Similar to Join Cardinality Estimation Methods_in_Oracle12c.pdf (20)

Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
Optimum Algorithm for Computing the Standardized Moments Using MATLAB 7.10(R2...
ย 
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...
An Exponential Observer Design for a Class of Chaotic Systems with Exponentia...
ย 
2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria2014-mo444-practical-assignment-04-paulo_faria
2014-mo444-practical-assignment-04-paulo_faria
ย 
Exponential State Observer Design for a Class of Uncertain Chaotic and Non-Ch...
Exponential State Observer Design for a Class of Uncertain Chaotic and Non-Ch...Exponential State Observer Design for a Class of Uncertain Chaotic and Non-Ch...
Exponential State Observer Design for a Class of Uncertain Chaotic and Non-Ch...
ย 
Maneuvering target track prediction model
Maneuvering target track prediction modelManeuvering target track prediction model
Maneuvering target track prediction model
ย 
SYSTEM IDENTIFICATION USING CEREBELLAR MODEL ARITHMETIC COMPUTER
SYSTEM IDENTIFICATION USING CEREBELLAR MODEL ARITHMETIC COMPUTERSYSTEM IDENTIFICATION USING CEREBELLAR MODEL ARITHMETIC COMPUTER
SYSTEM IDENTIFICATION USING CEREBELLAR MODEL ARITHMETIC COMPUTER
ย 
Project2
Project2Project2
Project2
ย 
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
APPROACHES IN USING EXPECTATIONMAXIMIZATION ALGORITHM FOR MAXIMUM LIKELIHOOD ...
ย 
Adaptive pi based on direct synthesis nishant
Adaptive pi based on direct synthesis nishantAdaptive pi based on direct synthesis nishant
Adaptive pi based on direct synthesis nishant
ย 
Investigation of auto-oscilational regimes of the system by dynamic nonlinear...
Investigation of auto-oscilational regimes of the system by dynamic nonlinear...Investigation of auto-oscilational regimes of the system by dynamic nonlinear...
Investigation of auto-oscilational regimes of the system by dynamic nonlinear...
ย 
ETSATPWAATFU
ETSATPWAATFUETSATPWAATFU
ETSATPWAATFU
ย 
DCT
DCTDCT
DCT
ย 
DCT
DCTDCT
DCT
ย 
DCT
DCTDCT
DCT
ย 
Chp-1 Quick Review of basic concepts.pdf
Chp-1 Quick Review of basic concepts.pdfChp-1 Quick Review of basic concepts.pdf
Chp-1 Quick Review of basic concepts.pdf
ย 
Design of Quadratic Optimal Regulator for DC Motor
Design of Quadratic Optimal Regulator for DC Motor Design of Quadratic Optimal Regulator for DC Motor
Design of Quadratic Optimal Regulator for DC Motor
ย 
Margin Parameter Variation for an Adaptive Observer to a Class of Systems
Margin Parameter Variation for an Adaptive Observer to a Class of SystemsMargin Parameter Variation for an Adaptive Observer to a Class of Systems
Margin Parameter Variation for an Adaptive Observer to a Class of Systems
ย 
Multistage condition-monitoring-system-of-aircraft-gas-turbine-engine
Multistage condition-monitoring-system-of-aircraft-gas-turbine-engineMultistage condition-monitoring-system-of-aircraft-gas-turbine-engine
Multistage condition-monitoring-system-of-aircraft-gas-turbine-engine
ย 
17 16512 32451-1-sm (edit ari)
17 16512 32451-1-sm (edit ari)17 16512 32451-1-sm (edit ari)
17 16512 32451-1-sm (edit ari)
ย 
17 16512 32451-1-sm (edit ari)
17 16512 32451-1-sm (edit ari)17 16512 32451-1-sm (edit ari)
17 16512 32451-1-sm (edit ari)
ย 

Recently uploaded

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
ย 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
ย 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
ย 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
ย 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
ย 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
ย 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa
ย 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
ย 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
ย 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
ย 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
ย 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
ย 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7Call Girls in Nagpur High Profile Call Girls
ย 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
ย 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
ย 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
ย 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
ย 

Recently uploaded (20)

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
ย 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
ย 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
ย 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
ย 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
ย 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
ย 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
ย 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
ย 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
ย 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ย 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
ย 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
ย 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
ย 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
ย 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
ย 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
ย 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
ย 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
ย 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
ย 

Join Cardinality Estimation Methods_in_Oracle12c.pdf

  • 1. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 1 Join Cardinality Estimation Methods Chinar Aliyev chinaraliyev@gmail.com As it is known the Query Optimizer tries to select best plan for the query and it does that based on generating all possible plans, estimates cost each of them, and selects cheapest costed plan as optimal one. Estimating cost of the plan is a complex process. But cost is directly proportionate to the number of I/O s. Here is functional dependence between number of the rows retrieved from database and number of I/O s. So the cost of a plan depends on estimated number of the rows retrieved in each step of the plan โ€“ cardinality of the operation. Therefore optimizer should accurately estimate cardinality of each step in the execution plan. In this paper we going to analyze how oracle optimizer calculates join selectivity and cardinality in different situations, like how does CBO calculate join selectivity when histograms are available (including new types of histograms, in 12c)?, what factors does error (estimation) depend on? And etc. In general two main join cardinality estimation methods exists: Histogram Based and Sampling Based. Thanks to Jonathan Lewis for writing โ€œCost Based Oracle Fundamentalsโ€ book. This book actually helped me to understand optimizer`s internals and to open the โ€œBlack Boxโ€. In 2007 Alberto Dell`Era did an excellent work, he investigated join size estimation with histograms. However there are some questions like introduction of a โ€œspecial cardinalityโ€ concept. In this paper we are going to review this matter also. For simplicity we are going to use single column join and columns containing no null values. Assume we have two tables t1, t2 corresponding join columns j1, j2 and the rest of columns are filter1 and filter2. Our queries are (Q0) SELECT COUNT (*) FROM t1, t2 WHERE t1.j1 = t2.j2 AND t1.filter1 ='value1' AND t2.filter2 ='value2' (Q1) SELECT COUNT (*) FROM t1, t2 WHERE t1.j1 = t2.j2; (Q2) SELECT COUNT (*) FROM t1, t2;
  • 2. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 2 Histogram Based Estimation Understanding Join Selectivity and Cardinality As you know the query Q2 is a Cartesian product. It means we will get Join Cardinality - ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘’๐‘ ๐‘–๐‘Ž๐‘› for the join product as: ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘’๐‘ ๐‘–๐‘Ž๐‘› =num_rows(๐‘ก1)*num_rows(๐‘ก2) Here num_rows(๐‘ก๐‘–) is number of rows of corresponding tables. When we add join condition into the query (so Q1) then it means we actually get some fraction of Cartesian product. To identify this fraction here Join Selectivity has been introduced. Therefore we can write this as follows ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 โ‰ค ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘’๐‘ ๐‘–๐‘Ž๐‘› ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 = ๐ฝ๐‘ ๐‘’๐‘™*๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘๐‘Ž๐‘Ÿ๐‘ก๐‘’๐‘ ๐‘–๐‘Ž๐‘› = ๐ฝ๐‘ ๐‘’๐‘™ โˆ—num_rows(๐‘ก1)*num_rows(๐‘ก2) (1) ๐ฝ๐‘ ๐‘’๐‘™ = ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1/(๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก1) โˆ— ๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก2)) Definition: Join selectivity is the ratio of the โ€œpureโ€-natural cardinality over the Cartesian product. I called ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 as โ€œpureโ€ cardinality because it does not contain any filter conditions. Here ๐ฝ๐‘ ๐‘’๐‘™ is Join Selectivity. This is our main formula. You should know that when optimizer tries to estimate JC- Join Cardinality it first calculates ๐ฝ๐‘ ๐‘’๐‘™ . Therefore we can use same ๐ฝ๐‘ ๐‘’๐‘™ and can write appropriate formula for query Q0 as ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„0 = ๐ฝ๐‘ ๐‘’๐‘™ โˆ—Card(๐‘ก1)*Card(๐‘ก2) (2) Here Card (๐‘ก๐‘–) is final cardinality after applying filter predicate to the corresponding table. In other words ๐ฝ๐‘ ๐‘’๐‘™ for both formulas (1) and (2) is same. Because ๐ฝ๐‘ ๐‘’๐‘™ does not depend on filter columns, unless filter conditions include join columns. According to formula (1) ๐ฝ๐‘ ๐‘’๐‘™ = ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1/(num_rows(๐‘ก1) โˆ— num_rows(๐‘ก2)) (3) or ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„0 = ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1โˆ—Card(๐‘ก1)โˆ—Card(๐‘ก2) num_rows(๐‘ก1)โˆ—num_rows(๐‘ก2) (4) Based on this we have to find out estimation mechanism of expected cardinality - ๐‘ช๐’‚๐’“๐’…๐‘ธ๐Ÿ. Now consider that for ๐‘—๐‘– join columns of ๐‘ก๐‘– tables here is not any type of histogram. So it means in this
  • 3. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 3 case optimizer assumes uniform distribution and for such situations as you already know ๐ฝ๐‘๐‘Ž๐‘Ÿ๐‘‘ and ๐ฝ๐‘ ๐‘’๐‘™ are calculated as ๐ฝ๐‘ ๐‘’๐‘™ = 1/max(๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1 ), ๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2 )) (5) ๐ฝ๐‘๐‘Ž๐‘Ÿ๐‘‘ = ๐ฝ๐‘ ๐‘’๐‘™ โˆ— num_rows(๐‘ก1) โˆ— num_rows(๐‘ก2) The question now is: where does formula (5) come from? How do we understand it? According to (3) in order to calculate ๐ฝ๐‘ ๐‘’๐‘™ we first have to estimate โ€œpureโ€ expected cardinality - ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1. And it only depends on Join Columns. For ๐‘ก1 table, based on uniform distribution the number of rows per distinct value of the ๐‘—1 column will be num_rows(๐‘ก1)/๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1) and for ๐‘ก2 table it will be num_rows(๐‘ก2)/๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2). Also here will be min(๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1), ๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2)) common distinct values. Therefore expected โ€œpureโ€ cardinality is ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 = min (๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1), ๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2)) โˆ— num_rows(๐‘ก1) num_dist(๐‘—1) โˆ— num_rows(๐‘ก2) num_dist(๐‘—2) (6) Then according to formula (3) Join Selectivity will be: ๐ฝ๐‘ ๐‘’๐‘™ = ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 num_rows(๐‘ก1)โˆ—num_rows(๐‘ก2) = min (๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1),๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2)) ๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1)โˆ— ๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2) = 1 max (๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1),๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—2)) As it can be seen we have got formula (5). Without histogram optimizer is not aware of the data distribution, so in dictionary of the database here are not โ€œ(distinct value, frequency)โ€ โ€“ this pairs indicate column distribution. Because of this, in case of uniform distribution, optimizer actually thinks and calculates โ€œaverage frequencyโ€ as ๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก1)/๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก(๐‘—1). Based on โ€œaverage frequencyโ€ optimizer calculates โ€œpureโ€ expected cardinality and then join selectivity. So if a table column has histogram (depending type of this) optimizer will calculates join selectivity based on histogram. In this case โ€œ(distinct value, frequency)โ€ pairs are not formed based on โ€œaverage frequencyโ€, but are formed based on information which are given by the histogram. Case 1. Both Join columns have frequency histograms In this case both join columns have frequency histogram and our query(freq_freq.sql) is SELECT COUNT (*) FROM t1, t2 WHERE t1.j1 = t2.j2 AND t1.f1 = 13; Corresponding execution plan is
  • 4. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 4 --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 2272 | 2260 | |* 3 | TABLE ACCESS FULL| T1 | 1 | 40 | 40 | | 4 | TABLE ACCESS FULL| T2 | 1 | 1000 | 1000 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."J1"="T2"."J2") 3 - filter("T1"."F1"=13) Estimation is good enough for this situation, but it has not been exactly estimated. And why? How did optimizer calculate cardinality of the join as 2272? If we enable SQL trace for the query then we will see oracle queries only histgrm$ dictionary table. Therefore information about columns and tables is as follows. Select table_name,num_rows from user_tables where table_name in (โ€˜T1โ€™,โ€™T2โ€™); tab_name num_rows T1 1000 T2 1000 (Freq_values1) SELECT endpoint_value COLUMN_VALUE, endpoint_number - NVL (prev_endpoint, 0) frequency, endpoint_number ep FROM (SELECT endpoint_number, NVL (LAG (endpoint_number, 1) OVER (ORDER BY endpoint_number), 0 ) prev_endpoint, endpoint_value FROM user_tab_histograms WHERE table_name = 'T1' AND column_name = 'J1') ORDER BY endpoint_number tab t1, col j1 tab t2, col j2 value frequency ep value frequency ep 0 40 40 0 100 100 1 40 80 2 40 140 2 80 160 3 120 260 3 100 260 4 20 280
  • 5. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 5 4 160 420 5 40 320 5 60 480 6 100 420 6 260 740 8 40 460 7 80 820 9 20 480 8 120 940 10 20 500 9 60 1000 11 60 560 12 20 580 13 20 600 14 80 680 15 80 760 16 20 780 17 80 860 18 80 940 19 60 1000 Frequency histograms exactly express column distribution. So โ€œ(column value, frequency)โ€ pair gives us all opportunity to estimate cardinality of any kind of operations. Now we have to try to estimate pure cardinality ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 then we can find out ๐ฝ๐‘ ๐‘’๐‘™ according to formula (3). Firstly we have to find common data for the join columns. These data is spread between max(min_value(j1),min_value(j2)) and min(max_value(j1),max_value(j2)). It means we are not interested in the data which column value greater than 10 for j2 column. Also we have to take equval values, so we get following table tab t1, col j1 tab t2, col j2 value frequency value frequency 0 40 0 100 2 80 2 40 3 100 3 120 4 160 4 20 5 60 5 40 6 260 6 100 8 120 8 40 9 60 9 20 Because of this expected pure cardinality ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 Will be 100*40+80*40+100*120+160*20+60*40+260*100+120*40+60*20=56800 and Join selectivity ๐ฝ๐‘ ๐‘’๐‘™ = ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 ๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก1)โˆ—๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก2) = 56800 1000โˆ—1000 = 0.0568 And eventually our cardinality will be according to the formula (2)
  • 6. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 6 ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„0 = ๐ฝ๐‘ ๐‘’๐‘™ โˆ—Card (๐‘ก1)*Card (๐‘ก2) = 0.0568 โˆ— 40 โˆ— 1000 = 2272 Also if we enable 10053 event then in trace file we see following lines regarding on join selectivity. Join Card: 2272.000000 = outer (40.000000) * inner (1000.000000) * sel (0.056800) Join Card - Rounded: 2272 Computed: 2272.000000 As we see same number as in above execution plan. Another question was why we did not get exact cardinality โ€“ 2260? Although join selectivity by definition does not depend on filter columns and conditions, but filtering actually influences this process. Optimizer does not consider join column value range, max/min value, spreads, distinct values after applying filter โ€“ in line 3 of execution plan. It is not easy to resolve. At least it will require additional estimation algorithms, then efficiency of whole estimation process could be harder. So if we remove filter condition from above query we will get exact estimation. --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 56800 | 56800 | | 3 | TABLE ACCESS FULL| T1 | 1 | 1000 | 1000 | | 4 | TABLE ACCESS FULL| T2 | 1 | 1000 | 1000 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."J1"="T2"."J2") It means optimizer calculates โ€œaverageโ€ join selectivity. I think it is not an issue in general. As result we got the following formula for join selectivity. ๐ฝ๐‘ ๐‘’๐‘™ = โˆ‘ ๐‘“๐‘Ÿ๐‘’๐‘ž(๐‘ก1.๐‘—1)โˆ—๐‘“๐‘Ÿ๐‘’๐‘ž(๐‘ก2.๐‘—2)) min _๐‘š๐‘Ž๐‘ฅ ๐‘–=max _๐‘š๐‘–๐‘› ๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก1)โˆ—๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ (๐‘ก2) (7) Here freq is corresponding frequency of the column value. Case 2. Join columns with height-balanced (equ-height) and frequency histograms Now assume one of the join column has height-balanced(HB) histogram and another has frequency(FQ) histogram (Height_Balanced_Frequency.sql) We are going to investiagte cardinality estimation of the two queries here select count(*) from t1, t2 --- (Case2 q1) where t1.j1 = t2.j2;
  • 7. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 7 select count(*) from t1, t2 --- (Case2 q2) where t1.j1 = t2.j2 and t1.f1=11; For the column J1 here is Height balanced histogram - HB and for the column j2 here is frequency - FQ histogram avilable. The appropriate information from user_tab_histogrgrams dictionary view shown in Table 3. tatb t1, col j1 tab t2 , col j2 column value frequency ep column value frequency ep 1 0 0 1 2 2 9 1 1 7 2 4 16 1 2 48 3 7 24 1 3 64 4 11 32 1 4 40 1 5 48 2 7 56 1 8 64 2 10 72 2 12 80 3 15 Ferquency column for t1.j1 of Table 3 does not express real frequency for the column. It is actually โ€œfrequency of the bucketโ€. First we have to identify common values. So we have to ignore HB histogram buckets with endpoint number greater than 10. We have exact โ€œvalue, frequencyโ€ pairs of the t2.j2 column therefore our base source must be values of the t2.j2 column. But for the t1.j1 we do not have exact frequencies. HB histogram cointains buckets which hold approximately same number of rows. Also we can find number of the distinct values per bucket. Then for every value of the frequency histogram we can identify appropriate bucket of the HB histogram. Within HB bucket we aslo can assume uniform distrbution then we can estimate size of this disjoint subset โ€“ {value of FQ and Bucket of HB} . Although this approach gave me some approximation and estimation of the join cardinality but it did not give me exact number(s) which oracle optimizer calculates and reports in 10053 trace file. We have to find what information we need to improve this approach? , Firstly Alberto Dell'Era investigated joins based on the histograms in 2007- (Join Over histograms). His approach was based on grouping values into three major categories: - โ€œpopulars matching popularsโ€ - โ€œpopulars not matching popularsโ€
  • 8. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 8 - โ€œnot popular subtablesโ€ Estimating each of them. The sum of cardinality of each group will give us join cardinality. But my point of view to the matter is quite different: - We have to identify โ€œ(distinct value, frequency)โ€ pairs to approximate โ€œpureโ€ cardinality ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 - Our main data here is t2.j2 column`s data, because it gives us exact frequencies - We have to walk t2.j2 columns (histograms) values and identify second part of โ€œ(distinct value, frequency)โ€ based on height balanced histogram. - ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1=โˆ‘ ๐น๐‘Ÿ๐‘’๐‘ž๐‘๐‘Ž๐‘ ๐‘’๐‘‘ ๐‘ก1(value=t2.j2)*๐น๐‘Ÿ๐‘’๐‘ž๐‘๐‘Ž๐‘ ๐‘’๐‘‘ ๐‘ก2(value=t2.j2) - Then we can calculate join selectivity and cardinality We have to identify (value, frequency) pairs based on HB histogram, then it is easy to calculate โ€œpureโ€ cardinality so it means we can easily and more accurately estimate join cardinality. But when forming (value, frequency) pairs based on HB histogram, we should not approach as uniform for the single value which is locate within the bucket, because HB gives us actually โ€œaverageโ€ density โ€“ NewDensity (actually the density term has been introduced to avoid estimation errors in non-uniform distribution case and has been improved with new density mechanism) for un- popular values and special approach for popular values. So letโ€™s identify โ€œ(value, frequency)โ€ pairs based on the HB histogram. tab_name num_rows (user_tables) col_name num_distinct T1 11 T1.J1 30 T2 130 T2.J2 4 Number of buckets - num_buckets=15( as max(ep) from Table 3 ) Number of popular buckets โ€“ num_pop_bucktes=9(as sum(frequency) from table 3 where frequency>1) Popular value counts โ€“ pop_value_cnt=4(as count(frequency) from table 3 where frequency>1) NewDensity= ๐‘›๐‘ข๐‘š_๐‘ข๐‘›๐‘๐‘œ๐‘_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘  ๐‘ข๐‘›๐‘๐‘œ๐‘_๐‘›๐‘‘๐‘ฃโˆ—๐‘›๐‘ข๐‘š_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘  = ๐‘›๐‘ข๐‘š_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘ โˆ’๐‘›๐‘ข๐‘š_๐‘๐‘œ๐‘_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘  (๐‘๐ท๐‘‰โˆ’๐‘๐‘œ๐‘_๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’_๐‘๐‘›๐‘ก)๐‘›๐‘ข๐‘š_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘  = 15โˆ’9 (30โˆ’4)โˆ—15 =0.015384615โ‰ˆ0.015385 (8) And for popular values selectivity is: ๐‘’๐‘_๐‘“๐‘Ÿ๐‘’๐‘ž๐‘ข๐‘’๐‘›๐‘๐‘ฆ ๐‘›๐‘ข๐‘š_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘  = ๐‘’๐‘_๐‘“๐‘Ÿ๐‘’๐‘ž๐‘ข๐‘’๐‘›๐‘๐‘ฆ 15 So โ€œ(value, frequency)โ€ pairs based on the HB histogram will be:
  • 9. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 9 column value popular frequency calculated 1 N 2.00005 130*0.015385 - (num_rows*density) 7 N 2.00005 130*0.015385 - (num_rows*density) 48 Y 17.33333333 130*2/15 - (num_rows*frequency/num_buckets) 64 Y 17.33333333 130*2/15 - (num_rows*frequency/num_buckets) We have got all โ€œ(value, frequency)โ€ pairs so according formula (7) we can calculate Join Selectivity. tab t1 , col j1 tab t2 , col j2 column value frequency column value frequency freq*freq 1 2.00005 1 2 4.0001 7 2.00005 7 2 4.0001 48 17.33333333 48 3 52 64 17.33333333 64 4 69.33333 Sum 129.3335 And finally ๐ฝ๐‘ ๐‘’๐‘™ = 129.3335 numrows(t1)โˆ—numrows(t2) = 129.3335 11โˆ—130 =0.090443 So our โ€œpureโ€ cardinality is ๐ถ๐‘Ž๐‘Ÿ๐‘‘๐‘„1 = 129. Execution plan of the query is as follows --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 129 | 104 | | 3 | TABLE ACCESS FULL| T2 | 1 | 11 | 11 | | 4 | TABLE ACCESS FULL| T1 | 1 | 130 | 130 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."J1"="T2"."J2") And the corresponding information from 10053 trace file : Join Card: 129.333333 = outer (130.000000) * inner (11.000000) * sel (0.090443) Join Card - Rounded: 129 Computed: 129.333333 It means we were able to figure out exact estimation mechanism in this case. Execution plan of the second query (Case2 q2) as follows
  • 10. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 10 --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 5 | 7 | |* 3 | TABLE ACCESS FULL| T1 | 1 | 5 | 5 | | 4 | TABLE ACCESS FULL| T2 | 1 | 11 | 11 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."J1"="T2"."J2") 3 - filter("T1"."F1"=11) According our approach join cardinality should computed as ๐ฝ๐‘ ๐‘’๐‘™*card(t1)*card(t2)= 0.090443*card(t1)*card(t2) Also from optimizer trace file we will see the following: Join Card: 5.173333 = outer (11.000000) * inner (5.200000) * sel (0.090443) Join Card - Rounded: 5 Computed: 5.173333 It actually confirms our approach. However execution plan shows cardinality of the single table t1 as 5, it is correct because it must be rounded up but during join estimation process optimizer consider original values rather than rounding. Reviewing Alberto Dell'Era`s โ€“ complete formula (join_histogram_complete.sql) We can list column information from dictionary as below: tatb t1, col value tatb t2, col value column value frequency column value frequency 20 1 10 1 40 1 30 2 50 1 50 1 60 1 60 4 70 2 70 2 80 2 90 1 99 1 So we have to find common values, as you see min(t1.value)=20 due to we must ignore t2.value=10 also max(t1.val)=70 it means we have to ignore column values t2.value>70. In addition we do not have the value 40 in t2.value therefore we have to delete it also. Because of this we are getting following table
  • 11. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 11 tatb t1, col j1 tab t2 , col j2 column value frequency column value frequency 20 1 30 2 50 1 50 1 60 1 60 4 70 2 70 2 Num_rows(t1)=12;num_buckets(t1.value)=6;num_distinct(t1.value)=8,=> newdensity= ๐‘›๐‘ข๐‘š_๐‘ข๐‘›๐‘๐‘œ๐‘_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘  ๐‘ข๐‘›๐‘๐‘œ๐‘_๐‘›๐‘‘๐‘ฃโˆ—๐‘›๐‘ข๐‘š_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘  = 6โˆ’2 (8โˆ’1)โˆ—6 =0.095238095, so appropriate column values frequency based on HB histogram will be : t1.value freq calculated 30 1.142857143 num_rows*newdensity 50 1.142857143 num_rows*newdensity 60 1.142857143 num_rows*newdensity 70 4 num_rows*freq/num_buckets And finally cardinality will be. t1.value t2.value column value frequency column value frequency freq*freq 30 1.142857143 30 2 2.285714286 50 1.142857143 50 1 1.142857143 60 1.142857143 60 4 4.571428571 70 4 70 2 8 sum 16 Optimizer also estimated exactly 16 as we see it in execution plan of the query. --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 16 | 13 | | 3 | TABLE ACCESS FULL| T1 | 1 | 12 | 12 | | 4 | TABLE ACCESS FULL| T2 | 1 | 14 | 14 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."VALUE"="T2"."VALUE")
  • 12. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 12 There Alberto also has introduced โ€œContribution 4: special cardinalityโ€, but it seems it is not necessary. Reviewing Alberto Dell'Era`s essential case (join_histogram_essentials.sql) This is quite interesting case, firstly because in oracle 12c optimizer calculates join cardinality as 31 but not as 30, and second in this case old and new densities are same. Letโ€™s interpret the case. The by corresponding information from user_tab_histograms. tatb t1, col value tab t2 , col value column value frequency ep column value frequency ep 10 2 2 10 2 2 20 1 3 20 1 3 30 2 5 50 3 6 40 1 6 60 1 7 50 1 7 70 4 11 60 1 8 70 2 10 And num_rows(t1)=20,num_rows(t2)=11,num_dist(t1.value)=11,num_dist(t2.val)=5, Density (t1.value)=(10-6)/((11-3)*10)= 0.05. Above mechanism does not give us exact number as expected as optimizer estimation. Because in this case to estimate frequency for un-popular values oracle does not use density it uses number of distinct values per bucket and number of rows per distinct values instead of the density. To prove this one we can use join_histogram_essentials1.sql. In this case t1 table is same as in join_histogram_essentials.sql . The column T2.value has only one value 20 with frequency one. t1.value freq EP t2.value freq EP 10 2 2 20 1 1 20 1 3 30 2 5 40 1 6 50 1 7 60 1 8 70 2 10 In this case oracle computes join cardinality 2 as rounded up from 1.818182. We can it from trace file Join Card: 1.818182 = outer (20.000000) * inner (1.000000) * sel (0.090909)
  • 13. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 13 Join Card - Rounded: 2 Computed: 1.818182 It looks like the estimation totally depend on t1.value column distribution. So num_rows_bucket(number of rows per bucket) is 2 and num_rows_distinct(number of distinct value per bucket) is 20/11=1.818182. Every bucket has 1.1 distinct value and within bucket every distinct value has 2/1.1=1.818182 rows. And this is our cardinality. But if we increase frequency of the t2.value - join_histogram_essentials2.sql. The (t2.value, frequency)=(20,5) and t1 table is same as in previous case. t1.value freq EP t2.value freq EP 10 2 2 20 5 5 20 1 3 30 2 5 40 1 6 50 1 7 60 1 8 70 2 10 Corresponding lines from 10053 trace file: Join Card: 5.000000 = outer (20.000000) * inner (5.000000) * sel (0.050000) Join Card - Rounded: 5 Computed: 5.000000 Tests show that in such cases cardinality of the join computed as frequency of the t2.value. So it means frequency of the popular value will be: Frequency (non-popular t1.value) = { ๐‘›๐‘ข๐‘š_๐‘Ÿ๐‘œ๐‘ค๐‘ _๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก ๐‘›๐‘ข๐‘š_๐‘‘๐‘–๐‘ ๐‘ก_๐‘๐‘ข๐‘๐‘˜๐‘’๐‘ก๐‘  ๐‘“๐‘Ÿ๐‘’๐‘ž๐‘ข๐‘’๐‘›๐‘๐‘ฆ ๐‘œ๐‘“ ๐‘ก2. ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ = 1 1 ๐‘“๐‘Ÿ๐‘’๐‘ž๐‘ข๐‘’๐‘›๐‘๐‘ฆ ๐‘œ๐‘“ ๐‘ก2. ๐‘ฃ๐‘Ž๐‘™๐‘ข๐‘’ > 1 or Cardinality = max (frequency of t2.val, number of rows per distinct value within bucket) Question is why? In such cases I think optimizer tries to minimize estimation errors. So
  • 14. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 14 tab t1,col val column value frequency calculated 10 4 - (num_rows*frequency/num_buckets) 20 1.818181818 - (num_rows_bucket/num_dist_buckets) 50 1 -frequency of t2.value 60 1.818181818 - (num_rows_bucket/num_dist_buckets) 70 4 - (num_rows*frequency/num_buckets) Therefore tab t1,col value tab t2 , col value column value frequency column value frequency freq*freq 10 4 10 2 8 20 1.818181818 20 1 1.818181818 50 1 50 3 3 60 1.818181818 60 1 1.818181818 70 4 70 4 16 sum 30.63636364 We get 30.64โ‰ˆ31 as expected cardinality. Let`s see trace file and execution plan Join Card: 31.000000 = outer (11.000000) * inner (20.000000) * sel (0.140909) Join Card - Rounded: 31 Computed: 31.000000 --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 31 | 29 | | 3 | TABLE ACCESS FULL| T2 | 1 | 11 | 11 | | 4 | TABLE ACCESS FULL| T1 | 1 | 20 | 20 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."VALUE"="T2"."VALUE")
  • 15. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 15 Case 3. Join columns with hybrid and frequency histograms In this case we are going to analyze how optimizer calculates join selectivity when there are hybrid and frequency histograms available on the join columns (hybrid_freq.sql). Note that the query is same -(Case2 q1).The corresponding information from dictionary view. SELECT endpoint_value COLUMN_VALUE, endpoint_number - NVL (prev_endpoint, 0) frequency, ENDPOINT_REPEAT_COUNT, endpoint_number FROM (SELECT endpoint_number, ENDPOINT_REPEAT_COUNT, NVL (LAG (endpoint_number, 1) OVER (ORDER BY endpoint_number),0) prev_endpoint, endpoint_value FROM user_tab_histograms WHERE table_name = 'T3' AND column_name = 'J3') ORDER BY endpoint_number Tab t1, col j1 tab t2 , col j2 column value frequency endpoint_rep_cnt column value frequency 0 6 6 0 3 2 9 7 1 6 4 8 5 2 6 6 8 5 3 8 7 7 7 4 11 9 10 5 5 3 10 6 6 6 3 11 3 3 7 9 12 7 7 8 6 13 4 4 9 5 14 5 5 15 5 5 16 5 5 17 7 7 19 10 5 As it can be seen common column values are between 0 and 9. So we are not interested in buckets which contain column values greater than or equival 10. Hybrid histogram gives us more information to estimate single table and also join selectivity than height balanced histogram. Specially endpoint repeat count column are used by optimizer to exactly estimate endpoint values. But how does optimizer use this information to estimate join? Principle of the estimation โ€œ(value,frequency)โ€ pairs based on hybrid histogram are same as height based histogram. So it depends on popularity of the value, if value is popular then frequency will be equval to the corresponding endpoint repeat count,
  • 16. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 16 otherwise it will be calculated based on the density. If we enable dbms_stats trace when gathering hybrid histogram. We get the following DBMS_STATS: SELECT SUBSTRB (DUMP (val, 16, 0, 64), 1, 240) ep, freq, cdn, ndv, (SUM (pop) OVER ()) popcnt, (SUM (pop * freq) OVER ()) popfreq, SUBSTRB (DUMP (MAX (val) OVER (), 16, 0, 64), 1, 240) maxval, SUBSTRB (DUMP (MIN (val) OVER (), 16, 0, 64), 1, 240) minval FROM (SELECT val, freq, (SUM (freq) OVER ()) cdn, (COUNT ( * ) OVER ()) ndv, (CASE WHEN freq > ( (SUM (freq) OVER ()) / 15) THEN 1 ELSE 0 END) pop FROM (SELECT /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl no_substrb_pad */ "ID" val, COUNT ("ID") freq FROM "SYS"."T1" t WHERE "ID" IS NOT NULL GROUP BY "ID")) ORDER BY valDBMS_STATS: > cdn 100, popFreq 28, popCnt 4, bktSize 6.6, bktSzFrc .6 DBMS_STATS: Evaluating hybrid histogram: cht.count 15, mnb 15, ssize 100, min_ssize 2500, appr_ndv TRUE, ndv 20, selNdv 0, selFreq 0, pct 100, avg_bktsize 7, csr.hreq TRUE, normalize TRUE Average bucket size is 7. Oracle considers value as popular when correspoindg endpoint repeat count is greater than or equval average bucket size. Also in our case density is (crdn- popfreq)/((NDV- popCnt)*crdn)=(100-28)/((20-4)*100)= 0.045. If we enable 10053 trace event you can clearly see columns and tables statistics. Therefore โ€œ(value,frequency)โ€ will be as t1.j1 popular frequency calculated 0 N 4.5 density*num_rows 1 N 4.5 density*num_rows 2 Y 7 endpoint_repeat_count 3 N 4.5 density*num_rows 4 N 4.5 density*num_rows 5 N 4.5 density*num_rows 6 N 4.5 density*num_rows 7 Y 7 endpoint_repeat_count 8 N 4.5 density*num_rows 9 N 4.5 density*num_rows
  • 17. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 17 And then final cardinality. t1.j1 t2.j2 value frequency value frequency freq*freq 0 4.5 0 3 13.5 1 4.5 1 6 27 2 7 2 6 42 3 4.5 3 8 36 4 4.5 4 11 49.5 5 4.5 5 3 13.5 6 4.5 6 3 13.5 7 7 7 9 63 8 4.5 8 6 27 9 4.5 9 5 22.5 sum 307.5 Join sel 0.05125 Lets now check execution plan --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 308 | 293 | | 3 | TABLE ACCESS FULL| T2 | 1 | 60 | 60 | | 4 | TABLE ACCESS FULL| T1 | 1 | 100 | 100 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."J1"="T2"."J2") And trace from 10053 trace file : Join Card: 307.500000 = outer (60.000000) * inner (100.000000) * sel (0.051250) Join Card - Rounded: 308 Computed: 307.500000 Case 4. Both Join columns with Top frequency histograms In this case join columns have top-frequency histogram (TopFrequency_hist.sql). We are going to use same query as above โ€“ (Case2 q1). Corresponding column information is. Table Stats:: Table: T2 Alias: T2 #Rows: 201 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1
  • 18. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 18 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): J2(NUMBER) AvgLen: 4 NDV: 21 Nulls: 0 Density: 0.004975 Min: 1.000000 Max: 200.000000 Histogram: Top-Freq #Bkts: 192 UncompBkts: 192 EndPtVals: 12 ActualVal: yes *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 65 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): J1(NUMBER) AvgLen: 3 NDV: 14 Nulls: 0 Density: 0.015385 Min: 4.000000 Max: 100.000000 Histogram: Top-Freq #Bkts: 56 UncompBkts: 56 EndPtVals: 5 ActualVal: yes t1.j1 freq t2.j2 freq 4 10 1 14 5 16 2 18 6 17 3 18 8 12 4 17 100 1 5 15 6 19 7 19 8 22 9 17 10 18 11 13 200 2 By definition of the Top-Frequency histogram, we can say that here are two types of buckets. Oracle placed high frequency values into some buckets (appropriate) and rest of the values of the table oracle actually โ€œplacedโ€ into another โ€œbucketโ€. So we actually have โ€œhigh frequencyโ€ and โ€œlow frequencyโ€ values. Therefore for โ€œhigh frequencyโ€ values we also have exact frequencies, but for โ€œlow frequencyโ€ values we can approach by using โ€œUniform distributionโ€. Firstly we have to build high frequency pairs based on common values. The max(min(t1.j1),min(t2.j2))=4 and also max(max(t1.j1),max(t2.j2))=100. In principle we have to see and gather common values which are between 4 and 100. So after identifying common values, for popular values we are going to use exact frequency and for non-popular values new density. Therefore we could create following table:
  • 19. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 19 common value j2.freq j1.freq freq*freq 4 17 10 170 5 15 16 240 6 19 17 323 7 19 1.000025 19.000475 8 22 12 264 9 17 1.000025 17.000425 10 18 1.000025 18.00045 11 13 1.000025 13.000325 100 1 1.000025 1.000025 sum 1065.0017 For t1 table we have num_rows-popular_rows=65-56=9 unpopular rows and ndv- popular_value_count=14-5=9 also for t2 table we have 201-192=9 unpopular rows and 21-12=9 unpopular distinct values. Frequency for each unpopular rows of the t2.j2 is num_rows(t2)*density(t2)= 201*0.004975= 0.999975 also for t1.j1 it is num_rows(t1)*density(t1)= 65*0.015385 = 1.000025. Due to cardinality for each individual unpopular rows will be: CardIndvPair=unpopular_freq(t1.j1)* unpopular_freq(t2.j2)= 0.999975*1.000025=1. Test cases show that oracle considers all low frequency (unpopular rows) values during join when top frequency histograms are available it means cardinality for โ€œlow frequencyโ€ values will be Card(Low frequency values)=max(unpopular_rows(t1.j1),unpopular_rows(t2.j2))* CardIndvPair=9 Therefore final cardinality of the our join will be CARD(high freq values)+ CARD(low freq values)=1065+9=1074. Lets see execution plan. ----------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ----------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 6 | 4 (0)| | 1 | SORT AGGREGATE | | 1 | 6 | | |* 2 | HASH JOIN | | 1074 | 6444 | 4 (0)| | 3 | TABLE ACCESS FULL| T1 | 65 | 195 | 2 (0)| | 4 | TABLE ACCESS FULL| T2 | 201 | 603 | 2 (0)| ----------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."J1"="T2"."J2") And the trace file Join Card: 1074.000000 = outer (201.000000) * inner (65.000000) * sel (0.082204) Join Card - Rounded: 1074 Computed: 1074.000000
  • 20. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 20 Case 5. Join columns with Top frequency and frequency histograms Now consider that we have tables which for join columns there are top frequency and frequency histogram (TopFrequency_Frequency.sql). The columns distribution from dictionary is as below (Freq_values1) t1.j1 freq t2.j2 freq 1 3 0 4 2 3 1 7 3 5 2 2 4 5 4 3 5 5 6 4 7 6 8 4 9 5 25 1 In this case there is a frequency histogram for the column t2.j2 and we have exact common {1, 2, 4} values. But test cases show that optimizer also considers all the values from top frequency histogram which are between max(min(t1.j1),min(t2.j2)) and min(max(t1.j1),max(t2.j2)). It is quite interesting case. Because of this we have frequency histogram and it should be our main source and this case should have been similar to the case 3. Table Stats:: Table: T2 Alias: T2 #Rows: 16 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): J2(NUMBER) AvgLen: 3 NDV: 4 Nulls: 0 Density: 0.062500 Min: 0.000000 Max: 4.000000 Histogram: Freq #Bkts: 4 UncompBkts: 16 EndPtVals: 4 ActualVal: yes *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 42 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): J1(NUMBER) AvgLen: 3 NDV: 11 Nulls: 0 Density: 0.023810 Min: 1.000000 Max: 25.000000 Histogram: Top-Freq #Bkts: 41 UncompBkts: 41 EndPtVals: 10 ActualVal: yes Considered values and their frequencies:
  • 21. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 21 Considered values j1.freq j2.freq freq*freq 1 3 7 21 2 3 2 6 3 5 1 5 4 5 3 15 sum 47 Here for the value 3 j2.freq calculated as num_rows(t2)*density=16*0.0625=1. And in 10053 file Join Card: 47.000000 = outer (16.000000) * inner (42.000000) * sel (0.069940) Join Card - Rounded: 47 Computed: 47.000000 Let see another example-TopFrequency_Frequency2.sql t1.j1 freq t3.j3 freq 1 3 0 4 2 3 1 7 3 5 2 2 4 5 4 3 5 5 10 2 6 4 7 6 8 4 9 5 25 1 Tables and columns statistics: Table Stats:: Table: T3 Alias: T3 #Rows: 18 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): J3(NUMBER) AvgLen: 3 NDV: 5 Nulls: 0 Density: 0.055556 Min: 0.000000 Max: 10.000000 Histogram: Freq #Bkts: 5 UncompBkts: 18 EndPtVals: 5 ActualVal: yes *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 42 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): J1(NUMBER) AvgLen: 3 NDV: 11 Nulls: 0 Density: 0.023810 Min: 1.000000 Max: 25.000000
  • 22. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 22 Histogram: Top-Freq #Bkts: 41 UncompBkts: 41 EndPtVals: 10 ActualVal: yes Considered column values and their frequencies are: Considered val j1.freq calculated j3.freq calculated freq*freq 1 3 freq 7 freq 21 2 3 freq 2 freq 6 3 5 freq 1.000008 num_rows*density 5.00004 4 5 freq 3 freq 15 5 5 freq 1.000008 num_rows*density 5.00004 6 4 freq 1.000008 num_rows*density 4.000032 7 6 freq 1.000008 num_rows*density 6.000048 8 4 freq 1.000008 num_rows*density 4.000032 9 5 freq 1.000008 num_rows*density 5.00004 10 1.00002 num_rows*density 2 freq 2.00004 sum 73.000272 And from trace file Join Card: 73.000000 = outer (18.000000) * inner (42.000000) * sel (0.096561) Join Card - Rounded: 73 Computed: 73.000000 But if we compare estimated cardinality with actual values then we will see: --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 73 | 42 | | 3 | TABLE ACCESS FULL| T3 | 1 | 18 | 18 | | 4 | TABLE ACCESS FULL| T1 | 1 | 42 | 42 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."J1"="T3"."J3") As we see here is significant difference. 73 vs 42, error estimation is enough big. That is why we said before its quite interesting case, so optimizer should consider only values from frequency histogram, these values should be main source of the estimation process โ€“ as similar to the case3. So if consider and walk on the values of the frequency histogram as common values then we will get the following table: common val j1.freq calculated j3.freq calculated freq*freq 1 3 freq 7 freq 21 2 3 freq 2 freq 6
  • 23. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 23 4 5 freq 3 freq 15 10 1.00002 num_rows*density 2 freq 2.00004 sum 44.00004 You can clearly see that, such estimation is very close to the actual rows. Case 6. Join columns with Hybrid and Top frequency histograms It is quite hard to interpret when one of join column has top frequency histogram (Hybrid_topfreq.sql). For example here is hybrid histogram for t1.j1 and top frequency histogram for t2.j2. Column information from dictionary t1.j1 t1.freq ep_rep_cnt t2.j2 j2.freq 1 3 3 1 5 3 10 6 2 3 4 6 6 3 4 6 5 2 4 5 9 8 5 5 4 10 1 1 6 3 11 2 2 7 1 13 5 3 26 1 30 1 And from dbms_stats trace file SELECT SUBSTRB (DUMP (val, 16, 0, 64), 1, 240) ep, freq, cdn, ndv, (SUM (pop) OVER ()) popcnt, (SUM (pop * freq) OVER ()) popfreq, SUBSTRB (DUMP (MAX (val) OVER (), 16, 0, 64), 1, 240) maxval, SUBSTRB (DUMP (MIN (val) OVER (), 16, 0, 64), 1, 240) minval FROM (SELECT val, freq, (SUM (freq) OVER ()) cdn, (COUNT ( * ) OVER ()) ndv, (CASE WHEN freq > ( (SUM (freq) OVER ()) / 8) THEN 1 ELSE 0 END) pop
  • 24. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 24 FROM (SELECT /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring xmlindex_sel_idx_tbl no_substrb_pad */ "J1" val, COUNT ("J1") freq FROM "T"."T1" t WHERE "J1" IS NOT NULL GROUP BY "J1")) ORDER BY val DBMS_STATS: > cdn 40, popFreq 12, popCnt 2, bktSize 5, bktSzFrc 0 DBMS_STATS: Evaluating hybrid histogram: cht.count 8, mnb 8, ssize 40, min_ssize 2500, appr_ndv TRUE, ndv 13, selNdv 0, selFreq 0, pct 100, avg_bktsize 5, csr.hreq TRUE, normalize TRUE High frequency common values are located between 1 and 7. Also we have two popular values for t1.j1 column :{3,4}. Table Stats:: Table: T2 Alias: T2 #Rows: 30 SSZ: 0 LGR: 0 #Blks: 5 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): J2(NUMBER) AvgLen: 3 NDV: 12 Nulls: 0 Density: 0.033333 Min: 1.000000 Max: 30.000000 Histogram: Top-Freq #Bkts: 27 UncompBkts: 27 EndPtVals: 9 ActualVal: yes *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 40 SSZ: 0 LGR: 0 #Blks: 5 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): J1(NUMBER) AvgLen: 3 NDV: 13 Nulls: 0 Density: 0.063636 Min: 1.000000 Max: 13.000000 Histogram: Hybrid #Bkts: 8 UncompBkts: 40 EndPtVals: 8 ActualVal: yes Therefore common values and their frequencies are: Common value t1.freq t2.freq freq*freq 1 2.54544 5 12.7272 2 2.54544 3 7.63632 3 6 4 24
  • 25. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 25 4 6 5 30 5 2.54544 4 10.18176 6 2.54544 3 7.63632 7 2.54544 1 2.54544 sum 94.72704 Moreover we have num_rows-top_freq_rows=30-27=3 infrequency rows and NDV- top_freq_count=12-9=3 unpopular NDV. I have done several test cases and I think cardinality of the join in this case consists two parts: High frequency values and low frequency values (unpopular). In different cases estimating cardinality for low frequency values was different for me. In current case I think based on the uniform distribution. It means for t1.j1 โ€œaverage frequencyโ€ is number for rows(t1)/NDV(j1)=40/13= 3.076923 . Also we have 3 unpopular (low frequency values) rows and 3 unpopular NDV. For each โ€œlow frequencyโ€ value we have num_rows(t1)*density(j1)=2.54544โ‰ˆ3 frequency and we have 3 low frequency-unpopular rows therefore unpopular cardinality is 3*3=9 so final cardinality will be CARD(popular rows)+CARD(un popular rows)= 94.72704+ 9= 103.72704. Lines from 10053 trace file Join Card: 103.727273 = outer (30.000000) * inner (40.000000) * sel (0.086439) Join Card - Rounded: 104 Computed: 103.727273 And execution plan --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 104 | 101 | | 3 | TABLE ACCESS FULL| T2 | 1 | 30 | 30 | | 4 | TABLE ACCESS FULL| T1 | 1 | 40 | 40 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."J1"="T2"."J2") The above test case was a quite simple because popular values of the hybrid histogram also are located within range of high frequency values of the top frequency histogram. I mean popular values {1, 5, 6} of the hybrid histogram actually located 1-6 range of top frequency histogram. Let see another example CREATE TABLE t1(j1 NUMBER); INSERT INTO t1 VALUES(6); INSERT INTO t1 VALUES(2); INSERT INTO t1 VALUES(7);
  • 26. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 26 INSERT INTO t1 VALUES(8); INSERT INTO t1 VALUES(7); INSERT INTO t1 VALUES(1); INSERT INTO t1 VALUES(3); INSERT INTO t1 VALUES(6); INSERT INTO t1 VALUES(4); INSERT INTO t1 VALUES(7); INSERT INTO t1 VALUES(2); INSERT INTO t1 VALUES(3); INSERT INTO t1 VALUES(7); INSERT INTO t1 VALUES(9); INSERT INTO t1 VALUES(5); INSERT INTO t1 VALUES(6); INSERT INTO t1 VALUES(17); INSERT INTO t1 VALUES(18); INSERT INTO t1 VALUES(19); INSERT INTO t1 VALUES(20); COMMIT; /* execute dbms_stats.set_global_prefs('trace',to_char(512+128+2048+32768+4+8+16)); */ execute dbms_stats.gather_table_stats(null,'t1',method_opt=>'for all columns size 8'); /*exec dbms_stats.set_global_prefs('TRACE', null);*/ ---Creating second table CREATE TABLE t2(j2 number); INSERT INTO t2 VALUES(1); INSERT INTO t2 VALUES(1); INSERT INTO t2 VALUES(4); INSERT INTO t2 VALUES(3); INSERT INTO t2 VALUES(3); INSERT INTO t2 VALUES(4); INSERT INTO t2 VALUES(4); INSERT INTO t2 VALUES(4); INSERT INTO t2 VALUES(4); INSERT INTO t2 VALUES(4); INSERT INTO t2 VALUES(1); INSERT INTO t2 VALUES(3); INSERT INTO t2 VALUES(4); INSERT INTO t2 VALUES(2); INSERT INTO t2 VALUES(3); INSERT INTO t2 VALUES(2); INSERT INTO t2 VALUES(17); INSERT INTO t2 VALUES(18); INSERT INTO t2 VALUES(19); INSERT INTO t2 VALUES(20); COMMIT; execute dbms_stats.gather_table_stats(null,'t2',method_opt=>'for all columns size 4'); Lets enable 10053 trace file ALTER SESSION SET EVENTS '10053 trace name context forever'; EXPLAIN PLAN FOR SELECT COUNT ( * ) FROM t1, t2
  • 27. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 27 WHERE t1.j1 = t2.j2; SELECT * FROM table (DBMS_XPLAN.display); ALTER SESSION SET EVENTS '10053 trace name context off'; Corresponding information from the dictionary: t1.j1 freq ep_rep ep_num t2.j2 freq ep_num 1 1 1 1 1 3 3 2 2 2 3 3 4 7 4 3 1 6 4 7 14 6 4 3 10 20 1 15 7 4 4 14 17 3 1 17 18 1 1 18 20 2 1 20 And lines from dbms_stats trace DBMS_STATS: > cdn 20, popFreq 7, popCnt 2, bktSize 2.4, bktSzFrc .4 DBMS_STATS: Evaluating hybrid histogram: cht.count 8, mnb 8, ssize 20, min_ssize 2500, appr_ndv TRUE, ndv 13, selNdv 0, selFreq 0, pct 100, avg_bktsize 3, csr.hreq TRUE, normalize TRUE DBMS_STATS: Histogram gathering flags: 527 DBMS_STATS: Accepting histogram DBMS_STATS: Start fill_cstats - hybrid_enabled: TRUE So we our average bucket size is 3 and we have 2 popular values {6, 7}. These values are not a part of high frequency values in top frequency histogram. Table and column statistics from optimizer trace file: Table Stats:: Table: T2 Alias: T2 #Rows: 20 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): J2(NUMBER) AvgLen: 3 NDV: 8 Nulls: 0 Density: 0.062500 Min: 1.000000 Max: 20.000000 Histogram: Top-Freq #Bkts: 15 UncompBkts: 15 EndPtVals: 4 ActualVal: yes *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 20 SSZ: 0 LGR: 0 #Blks: 1 AvgRowLen: 3.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): J1(NUMBER)
  • 28. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 28 AvgLen: 3 NDV: 13 Nulls: 0 Density: 0.059091 Min: 1.000000 Max: 20.000000 Histogram: Hybrid #Bkts: 8 UncompBkts: 20 EndPtVals: 8 ActualVal: yes ---Join Cardinality SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN Join Card: 31.477273 = outer (20.000000) * inner (20.000000) * sel (0.078693) Join Card - Rounded: 31 Computed: 31.477273 Firstly lets calculate cardinality for the high frequency values. High freq values j2.freq j1.freq freq*freq 1 3 1.18182 3.54546 3 4 1.18182 4.72728 4 7 1.18182 8.27274 20 1 1.18182 1.18182 sum 17.7273 So our cardinality for high frequency values is 17.7273. And we also have num_rows(t1)- popular_rows(t1)=20-15=5 unpopular rows. But as you see oracle computed final cardinality as 31. In my opinion popular rows of the hybrid histogram here play role. Test cases show that optimizer in such situations also tries to take advantage of the popular values. In our case the value 6 and 7 are popular values and popular frequency is 7 (sum of popular frequency). If we try find out frequencies of these values based on the top frequency histogram then we have to use density. So cardinality for popular values will be: Popular frequency*num_rows(t1)*density(j2)=7*20*0.0625=8.75. Moreover for every โ€œlow frequencyโ€ values we have 1.18182โ‰ˆ1 frequency and we have 5 โ€œlow frequencyโ€ values (or unpopular rows of the j2 column) therefore cardinality for โ€œlow frequencyโ€ could be consider as 5. Eventually we can figure out final cardinality. CARD = CARD (High frequency values) + CARD (Low frequency values) + CARD (Unpopular rows) = 17.7273+8.75+5=31.4773. And execution plan --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 31 | 26 | | 3 | TABLE ACCESS FULL| T1 | 1 | 20 | 20 | | 4 | TABLE ACCESS FULL| T2 | 1 | 20 | 20 | ---------------------------------------------------------------
  • 29. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 29 Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."J1"="T2"."J2") So it is an expected cardinality. But in general here could be estimation or approximation errors which are related with rounding. Sampling based estimation As we know in Oracle Database 12c new dynamic sampling feature has been introduced. The dynamic sampling level=11 is designed for the operations like single table, group by and join operations for which oracle automatically defines sample size and tries to estimate cardinality of the operations. Lets see following example and try to understand sampling mechanism in the join size estimation. CREATE TABLE t1 AS SELECT * FROM dba_users; CREATE TABLE t2 AS SELECT * FROM dba_objects; EXECUTE dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1'); EXECUTE dbms_stats.gather_table_stats(user,'t2',method_opt=>'for all columns size 1'); SELECT COUNT (*) FROM t1, t2 WHERE t1.username = t2.owner; Without histogram and in default sampling mode execution plan is: --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 92019 | 54942 | | 3 | TABLE ACCESS FULL| T1 | 1 | 42 | 42 | | 4 | TABLE ACCESS FULL| T2 | 1 | 92019 | 92019 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."USERNAME"="T2"."OWNER") Without histogram and automatic sampling mode execution plan is: --------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 1 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 | |* 2 | HASH JOIN | | 1 | 58728 | 54942 |
  • 30. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 30 | 3 | TABLE ACCESS FULL| T1 | 1 | 42 | 42 | | 4 | TABLE ACCESS FULL| T2 | 1 | 92019 | 92019 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T1"."USERNAME"="T2"."OWNER") Note ----- - dynamic statistics used: dynamic sampling (level=AUTO) As we see without histogram there is significant difference between actual and estimated rows but in case when automatic (adaptive) sampling is enabled estimation is good enough. The Question is how did optimizer actually get cardinality as 58728? How did optimizer calculate it? To give the explanation we could use 10046 and 10053 trace events. So in SQL trace file we could see following lines. SQL ID: 1bgh7fk6kqxg7 Plan Hash: 3696410285 SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T2#0") */ 1 AS C1 FROM "T2" SAMPLE BLOCK(51.8135, 8) SEED(1) "T2#0", "T1" "T1#1" WHERE ("T1#1"."USERNAME"="T2#0"."OWNER")) innerQuery call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 0.06 0.05 0 879 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 0.06 0.05 0 879 0 1 Misses in library cache during parse: 1 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=879 pr=0 pw=0 time=51540 us) 30429 HASH JOIN (cr=879 pr=0 pw=0 time=58582 us cost=220 size=1287306 card=47678) 42 TABLE ACCESS FULL T1 (cr=3 pr=0 pw=0 time=203 us cost=2 size=378 card=42) 51770 TABLE ACCESS SAMPLE T2 (cr=876 pr=0 pw=0 time=35978 us cost=218 size=858204 card=47678) During parsing oracle has executed this SQL statement and result has been used to estimate size of the join. The SQL statement used sampling (undocumented format) actually read 50 percent of the T2 table blocks. Sampling was not applied to the T1 table because its size is quite small when compared to the second table and 100% sampling of the T1 table does not consume โ€œlot ofโ€ time during parsing. It means oracle first identifies appropriate sampling size based on the table size and
  • 31. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 31 then execute specific SQL statement. So we get 30429 rows based on the 51.8135 percent therefore our estimated cardinality is 30429/51.8135*100=58727.94โ‰ˆ58728. Now letโ€™s check optimizer trace file: SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN Join Card: 92019.000000 = outer (42.000000) * inner (92019.000000) * sel (0.023810) >> Join Card adjusted from 92019.000000 to 58727.970000 due to adaptive dynamic sampling, prelen=2 Adjusted Join Cards: adjRatio=0.638216 cardHjSmj=58727.970000 cardHjSmjNPF=58727.970000 cardNlj=58727.970000 cardNSQ=58727.970000 cardNSQ_na=92019.000000 Join Card - Rounded: 58728 Computed: 58727.970000 Let see what will happen if we increase sizes of both tables โ€“ using multiple insert into t select * from t table name blocks row nums size mb T1 3186 172032 25 T2 6158 368076 49 In this case oracle completely ignores adaptive sampling and uses uniform distribution to estimate join size. Table Stats:: Table: T2 Alias: T2 #Rows: 368076 SSZ: 0 LGR: 0 #Blks: 6158 AvgRowLen: 115.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): OWNER(VARCHAR2) AvgLen: 6 NDV: 31 Nulls: 0 Density: 0.032258 *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 172032 SSZ: 0 LGR: 0 #Blks: 3186 AvgRowLen: 127.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): USERNAME(VARCHAR2) AvgLen: 9 NDV: 42 Nulls: 0 Density: 0.023810 Join Card: 1507639296.000000 = outer (172032.000000) * inner (368076.000000) * sel (0.023810) In addition if you see SQL trace file then SQL ID: 0ck072zj5gf73 Plan Hash: 3774486692 SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T2#0") */ 1 AS C1 FROM "T2" SAMPLE BLOCK(12.9912, 8) SEED(1) "T2#0", "T1" "T1#1" WHERE ("T1#1"."USERNAME"="T2#0"."OWNER")) innerQuery
  • 32. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 32 call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 2 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 1.70 1.91 0 885 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 1.70 1.91 0 887 0 0 Misses in library cache during parse: 1 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 0 SORT AGGREGATE (cr=0 pr=0 pw=0 time=36 us) 4049738 HASH JOIN (cr=885 pr=0 pw=0 time=2696440 us cost=1835 size=5288231772 card=195860436) 44649 TABLE ACCESS SAMPLE T2 (cr=761 pr=0 pw=0 time=28434 us cost=218 size=860706 card=47817) 6468 TABLE ACCESS FULL T1 (cr=124 pr=0 pw=0 time=28902 us cost=866 size=1548288 card=172032) It is obvious that oracle stopped execution of this SQL during parsing, we can see it from rows column of the execution statistics and also from row source statistics. Oracle did not complete HASH JOIN operation in this SQL, we can be confirm that with result of the above SQL and row source statistics. Sizes of the tables are not big actually but why did optimizer ignore and decided to continue with previous approach? In my opinion here could be two factors, although sample size is not small but in our case sample SQL actually took quite long time during parsing (1.8 sec elapsed time) therefore oracle stopped it. I have added one filter predicate to the query: SELECT COUNT (*) FROM t1, t2 WHERE t1.username = t2.owner AND t2.object_type = 'TABLE'; In this case we will get the following lines SQL ID: 8pu5v8h0ghy1z Plan Hash: 3252009800 SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T2") */ 1 AS C1 FROM "T2" SAMPLE BLOCK(12.9912, 8) SEED(1) "T2" WHERE ("T2"."OBJECT_TYPE"='TABLE')) innerQuery call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.00 0.00 0 2 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 0.01 0.01 0 761 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ----------
  • 33. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 33 total 3 0.01 0.01 0 763 0 1 Misses in library cache during parse: 1 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 1 SORT AGGREGATE (cr=761 pr=0 pw=0 time=14864 us) 756 TABLE ACCESS SAMPLE T2 (cr=761 pr=0 pw=0 time=5969 us cost=219 size=21378 card=1018) ******************************************************************************** SQL ID: 9jv79m9u42jps Plan Hash: 3525519047 SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) no_parallel result_cache(snapshot=3600) OPT_ESTIMATE(@"innerQuery", TABLE, "T2#0", ROWS=5819.31) */ SUM(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "T1#1") */ 1 AS C1 FROM "T1" SAMPLE BLOCK(25.1099, 8) SEED(1) "T1#1", "T2" "T2#0" WHERE ("T2#0"."OBJECT_TYPE"='TABLE') AND ("T1#1"."USERNAME"="T2#0"."OWNER")) innerQuery call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.01 0.00 0 2 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 0.74 1.02 0 6283 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 0.76 1.02 0 6285 0 0 Misses in library cache during parse: 1 Optimizer mode: CHOOSE Parsing user id: SYS (recursive depth: 1) Rows Row Source Operation ------- --------------------------------------------------- 0 SORT AGGREGATE (cr=0 pr=0 pw=0 time=32 us) 1412128 HASH JOIN (cr=6283 pr=0 pw=0 time=1243755 us cost=1908 size=215466084 card=5985169) 9880 TABLE ACCESS FULL T2 (cr=6167 pr=0 pw=0 time=20665 us cost=1674 size=87285 card=5819) 6035 TABLE ACCESS SAMPLE T1 (cr=116 pr=0 pw=0 time=6069 us cost=218 size=907137 card=43197) It means oracle firstly tried to estimate size of T2 table, because it has filter predicate and optimizer thinks using ADS could be very efficient. If we should have added predicate like t2.owner=โ€™HRโ€™ then optimizer would tried to estimate also T1 table cardinality. But the mechanism of estimating subset of the join and then estimate whole join principle in this case actually ignored. However in this case only T2 table has been estimated. We can easily see this fact from the trace file: BASE STATISTICAL INFORMATION *********************** Table Stats::
  • 34. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 34 Table: T2 Alias: T2 #Rows: 368144 SSZ: 0 LGR: 0 #Blks: 6158 AvgRowLen: 115.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): OWNER(VARCHAR2) AvgLen: 6 NDV: 31 Nulls: 0 Density: 0.032258 *********************** Table Stats:: Table: T1 Alias: T1 #Rows: 172032 SSZ: 0 LGR: 0 #Blks: 3186 AvgRowLen: 127.00 NEB: 0 ChainCnt: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 1 #IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 #IMCBlocks: 0 IMCQuotient: 0.000000 Column (#1): USERNAME(VARCHAR2) AvgLen: 9 NDV: 42 Nulls: 0 Density: 0.023810 SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T1[T1] SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE *** 2016-02-13 19:59:49.416 ** Performing dynamic sampling initial checks. ** ** Not using old style dynamic sampling since ADS is enabled. Table: T1 Alias: T1 Card: Original: 172032.000000 Rounded: 172032 Computed: 172032.000000 Non Adjusted: 172032.000000 Scan IO Cost (Disk) = 865.000000 Scan CPU Cost (Disk) = 48493707.840000 Total Scan IO Cost = 865.000000 (scan (Disk)) = 865.000000 Total Scan CPU Cost = 48493707.840000 (scan (Disk)) = 48493707.840000 Access Path: TableScan Cost: 866.262422 Resp: 866.262422 Degree: 0 Cost_io: 865.000000 Cost_cpu: 48493708 Resp_io: 865.000000 Resp_cpu: 48493708 Best:: AccessPath: TableScan Cost: 866.262422 Degree: 1 Resp: 866.262422 Card: 172032.000000 Bytes: 0.000000 Access path analysis for T2 *************************************** SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T2[T2] SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE *** 2016-02-13 19:59:49.417 ** Performing dynamic sampling initial checks. ** ** Not using old style dynamic sampling since ADS is enabled. Column (#6): OBJECT_TYPE(VARCHAR2) AvgLen: 9 NDV: 47 Nulls: 0 Density: 0.021277 Table: T2 Alias: T2 Card: Original: 368144.000000 >> Single Tab Card adjusted from 7832.851064 to 5819.310000 due to adaptive dynamic sampling Rounded: 5819 Computed: 5819.310000 Non Adjusted: 7832.851064 Best NL cost: 5031394.059186 resc: 5031394.059186 resc_io: 5022741.000000 resc_cpu: 332391893348 resp: 5031394.059186 resp_io: 5022741.000000 resc_cpu: 332391893348 SPD: Return code in qosdDSDirSetup: NOCTX, estType = JOIN
  • 35. ยฉ 2016 Chinar A. Aliyev Hotsos Symposium March 6-10 35 Join Card: 23835893.760000 = outer (5819.310000) * inner (172032.000000) * sel (0.023810) *** 2016-02-13 19:59:51.055 Join Card - Rounded: 23835894 Computed: 23835893.760000 In last case I have increased of both table sizes as table name blocks row nums size mb T1 101950 5505024 800 T2 196807 11780608 1600 In this case oracle completely ignored ADS and used statistics from dictionary to estimate size of tables and join cardinality. Summary In this paper has explained the mechanism of the oracle optimizer to calculate join selectivity and cardinality. We learned that firstly optimizer calculates join selectivity based on โ€œpureโ€ cardinality. To estimate the โ€œpureโ€ cardinality optimizer identifies โ€œdistinct value, frequencyโ€ pairs for each column, based on the column distribution. The column distribution information is identified by the histogram. And as we know that, frequency histogram gives us completely whole data distribution of the column. Also top frequency histogram gives us enough information for high frequency values. However for less significantly values we can approach โ€œuniform distributionโ€. Moreover if here are hybrid histograms for the join columns in the dictionary then optimizer can use endpoint repeat count to formulate frequency. In addition optimizer has chance to estimate join cardinality via sampling. Although this process influenced by time restriction and size of the tables. As a result optimizer can completely ignore adaptive dynamic sampling. References โ€ข Lewis, Jonathan. Cost-Based Oracle: Fundamentals Based Oracle: Fundamentals. Apress. 2006 โ€ข Alberto Dell'Era. Join Over Histograms. 2007 โ€ข http://www.adellera.it/investigations/join_over_histograms/JoinOverHistograms.pdf โ€ข Chinar Aliyev. Automatic Sampling in Oracle 12c. 2014 โ€ข https://www.toadworld.com/platforms/oracle/w/wiki/11036.automaticadaptive-dynamic- sampling-in-oracle-12c-part-2 โ€ข https://www.toadworld.com/platforms/oracle/w/wiki/11052.automaticadaptive-dynamic- sampling-in-oracle-12c-part-3