Oracle CBO Fundamental

1
Oracle Cost Based Optimizer
Oracle Seminar
2012.11.15
DA팀 : 유재근

2
Questions
• Histogram을 수집하면 SQL성능이 좋아질
까? 나빠질까?
• 왜 DBA들은 histogram 수집을 두려워할까?

3
OPTIMIZER BASICS
• Three main questions you should ask when
looking for an efficient execution plan:
1. How much data? How many rows? Volume?
2. How scattered / clustered is the data?
3. Caching?
=> Know your data!

4
OPTIMIZER BASICS
• Why are these questioins so important?
 Two main strategies:
1. One “Big Job”
=>How much data, volume?
2. Few/many “Small Jobs”
=>How many times / rows ?
=>Effort per iteration? Clustering/Caching

5
OPTIMIZER BASICS
• Optimizer’s cost estimate is based on:
 How much data? How many rows?
 How scattered / clustered ?(partially)
 (Caching?) Not at all : 11g

6
SUMMARY
• Cardinality and Clustering determine
whether the “Big Job” or “Small Job”
strategy should be preferred
• If the optimizer gets these estimates rigtht,
the resulting execution plan will be
efficient within the boundaries of the given
access paths
• Know your data and business questions
• Help your optimizer. (Oracle doesn’t know
the data the way you know it.)

7
AGENDA
 Clustering Factor
 Statistics / Histograms
 Datatype issues

8
HOW SCATTERED / CLUSTERED?
• INDEX SCAN  TABLE BLOCK
• Worst Case
1,000 rows => visit 1,000 table blocks:
1,000 * 5ms = 5s
• Good Case
1,000 rows => visit 10 table blocks: 10*5ms = 50ms

9
• There is only a single measure of clustering
in Oracle:
The index clustering factor
• The index clustering factor is represented
by a single value
• The logic measuring the clustering factor by
default does not cater for data clustered
across few blocks(ASSM!)

10
 Challenges
 Getting the index clustering factor right
 There are various reasons why the index
clustering factor measured by Oracle might not
be representative
- Multiple freelists / freelist groups
- ASSM (automatic space segment management)
- Partitioning
- SHRINK SPACE effores

12
• ASSM에서는 동시에 여러 세션이 입력 시
clustering이 나빠진다.
• ASSM에서는 동시에 같은 block에 데이터
를 insert하는 것을 피한다.
(freelist관리)
• 이런 경우에는 clustering factor가 나빠서
많은 read block이 발생해도 성능이 나빠
지지 않는다.(?)

13
• The CF in case of an index range scan
with table access involved represents the
largest fraction of the cost associated
with the operation. (See 10053 trace file)

14
STATISTICS
• Basic Statics:
- Table Statistics: Blocks, Rows,
AvgRowLen
- Basic Column Statistics:Low/High Value,
Num Distinct, Num Nulls

15
Statistics
• Controlling column statistics via METHOD_OPT
 FOR ALL INDEXED COLUMNS SIZE > 1:
 Nonsense, without basic column
statistics
 Default from 10g on:
FOR ALL COLUMNS SIZE AUTO:
basic column statistics for all coumns,
histograms if Oracle determines so

16
Sampling 추출방법
• Row sampling
select count(*) from t1 sample(1);
전체 row의 1%를 sampling
1%도 정확도가 높다.
• Block sampling
select count(*) from t1 sample block(1);
전체 block의 1%를 sampling
row sampling방식보다 빠르나 정확도는 낮
다.

17
HISTOGRAMS
• Basic column statistics get generated
along with table statistics in a single pass
• Each histogram requires a separate pass
• Therefore Oracle resorts to aggressive
sampling if allowed =>AUTO_SAMPLE_SIZE
• This limits the quality of histograms and
their significance
(basic column statistics는 거의 모든 row를 대상으로 하
나, histogram은 극히 일부 row를 sampling)
user_tab_col_statistics 참조

18
HISTOGRAMS
• Limited resolution of 255 value pairs
maximum
• Less than 255 distinct column values =>
Frequency Histogram
• More than 255 distinct column values
=>Height Balanced Histogram
• Height Balanced is always a sampling of
data, even when computing statistics!

19
HISTOGRAMS
• Aggressive sampling
• Oracle doesn’t trust its own histogram
information when caculating estimated
cardinality.
• Very bad cardinality estimation 
inefficient execution plan

20
Frequency Histograms
• When it consists of only a few popular
values
• Very popular and nonpopular values
• Dynamic sampling also is not
representative
• Statistics is sometimes inconsistent

21
Height Histograms
• Rounding effects
• They cannot cover all values.
• Histogram values are unstable.
(when you gather histograms, the values
can be different.)
• Oracle doesn’t know the data the way
you know it.

22
Height Histograms
5000 is considered unpopular, even though it is a very popular
value.

23
Height Balanced Histograms
INDEX range scan, height balanced histograms can be very useful.

24
SUMMARY
• Check the correctness of the CF for your
critical indexes
• Oracle does not know the questions you
ask about the data
• You may want to use FOR ALL COLUMNS
SIZE 1 as default and only generate
histograms where really necessary
• You may get better results with the old
histogram behavior, but not aloways

25
SUMMARY
• There are data patterns that don’t work well
with histograms
• => You may need to manullay generate
histograms using
DBMS_STATS.SET_COLUMN_STATS for
critical columns
• Don’t forget about Dynamic
Sampling/FBI/Virtual Columns/Extended
Statistics
• Know your data and business questions!

26
10053 Trace File
• SYSTEM Statistics Information
- CPU SPEED, SBRDTime, MBRDTime, MBRC
• 테이블/인덱스 Statistical Information
- Base Cardinality, Density, CLUF
• Cardinality Estimation
• Cost Estimation
- Access Type, Join Type, Join Order
• 예측은 틀릴 수 밖에 없다.

27
Index Scan Cost
• 전체 Row수 중 얼마를 읽어야 Index
Scan이 Table Full Scan보다 효율적일까?
• CBO는 Index Scan과 Full Table Scan의
Cost를 기계적으로 계산
• 10053 Trace file 내용
• Cost계산 방식 이해  CBO가 왜 Index
Scan/Full Table Scan하는지 이해할 수 있
다.

28
COST?
• Jonathan Lewis
• The cost represents (and has always
represented) the optimizer’s best
estimate of the time it will take to
execute the statement.
• Query의 예상 수행 시간(Time)

29
Time으로서의 Cost
• Total Time = CPU Time + I/O time + Wait Time
• Estimated Time
= Estimated CPU time + Estimated I/O time
= Estimated CPU time + Single Block I/O time
+ Multi Block I/O time

30
• Oracle은 COST에 정규화(인위적인 가공)을 수행
• 모든 Time의 합을 평균 Single Block I/O Time으로 나눈다.
• 왜 그럴까? Oracle Optimizer의 역사에 대한 이해가 필요
• 과거 System Statistics 개념이 소개되기 전에는 I/O Model이 Cost계산의 기본
Model이었다. 즉, Cost란 곧 I/O Count였다.
• I/O Count를 Time으로 변환하는 것이 불가능하기 때문에 역으로 Time을 I/O
Count기준으로 정규화하는 방법을 선택했다.

31
• Total Time = CPU Time + I/O time + Wait Time
• Estimated Time
= Estimated CPU time + Estimated I/O time
= Estimated CPU time + Single Block I/O time +
Multi Block I/O time
• COST = (Estimated Time / Single Block I/O Time)
cost가 높다 = 수행시간이 길다
= Single Block I/O count + 보정Multi Block I/O count + 보
정CPU count
 모든 예상 수행 시간을 single block I/O time에 대한 가중치
를 고려한 count값으로 변환
 I/O Cost 방식보다 진보

32
System Statistics
• 10g 이후 부터는 DBA가 통계정보를 수집
하지 않아도 CPU mode 사용

33
Cost 정보 예
• Column C1 하나만 fetch하는 것과 C1, C2
를 fetch하는 것은 비용이 다르다.
• Predicate이 추가되거나 복잡해져도 CPU
비용이 증가한다.

34
DB_FILE_MULTIBLOCK_READ_COU
NT
• Cost 계산 시에는 System Statistics의
mbrc값을 사용한다.
• 하지만, 실제 Query를 수행할 때는
DFMBRC값을 사용한다.

35
Time Model 한계
• Mreadtim 과 sreadtim 값의 부정확성
• Enterprise 환경의 Storage는 sreadtim이
mreadtim보다 더 높은 다소 비현실적인
현상 발생
• dbms_stats.set_system_stats procedure를
이용해 강제로 값을 조작

36
CBO의 기본 흐름
SQL
Statement
Parse
Transformation
(logical optimization)
Optimization
(Physical Optimization)
Exexution
Plan
Result Set

37
Cost 계산 예제
• 다음과 같은 데이터를 가정
Blocks = 100
MBRC = 4
sreadtim = 5
mreadtim = 10
• COST = Single Block I/O Count +

Oracle CBO Fundamental

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Oracle CBO Fundamental

Similar to Oracle CBO Fundamental (20)

Recently uploaded

Recently uploaded (20)

Oracle CBO Fundamental