The document discusses Oracle's cost-based optimizer. It provides the following key points:
1. The optimizer's cost estimate is based on the number of rows (cardinality), how clustered the data is, and caching, but it does not consider caching prior to 11g.
2. Cardinality and clustering determine whether a "big job" full table scan strategy or "small job" index scan strategy should be preferred.
3. Accurate statistics like histograms are important for the optimizer to estimate cardinality correctly and choose an efficient execution plan. However, histograms have limitations in representing all data patterns.
3. 3
OPTIMIZER BASICS
• Three main questions you should ask when
looking for an efficient execution plan:
1. How much data? How many rows? Volume?
2. How scattered / clustered is the data?
3. Caching?
=> Know your data!
4. 4
OPTIMIZER BASICS
• Why are these questioins so important?
Two main strategies:
1. One “Big Job”
=>How much data, volume?
2. Few/many “Small Jobs”
=>How many times / rows ?
=>Effort per iteration? Clustering/Caching
5. 5
OPTIMIZER BASICS
• Optimizer’s cost estimate is based on:
How much data? How many rows?
How scattered / clustered ?(partially)
(Caching?) Not at all : 11g
6. 6
SUMMARY
• Cardinality and Clustering determine
whether the “Big Job” or “Small Job”
strategy should be preferred
• If the optimizer gets these estimates rigtht,
the resulting execution plan will be
efficient within the boundaries of the given
access paths
• Know your data and business questions
• Help your optimizer. (Oracle doesn’t know
the data the way you know it.)
8. 8
HOW SCATTERED / CLUSTERED?
• INDEX SCAN TABLE BLOCK
• Worst Case
1,000 rows => visit 1,000 table blocks:
1,000 * 5ms = 5s
• Good Case
1,000 rows => visit 10 table blocks: 10*5ms = 50ms
9. 9
HOW SCATTERED / CLUSTERED?
• There is only a single measure of clustering
in Oracle:
The index clustering factor
• The index clustering factor is represented
by a single value
• The logic measuring the clustering factor by
default does not cater for data clustered
across few blocks(ASSM!)
10. 10
HOW SCATTERED / CLUSTERED?
Challenges
Getting the index clustering factor right
There are various reasons why the index
clustering factor measured by Oracle might not
be representative
- Multiple freelists / freelist groups
- ASSM (automatic space segment management)
- Partitioning
- SHRINK SPACE effores
12. 12
HOW SCATTERED / CLUSTERED?
• ASSM에서는 동시에 여러 세션이 입력 시
clustering이 나빠진다.
• ASSM에서는 동시에 같은 block에 데이터
를 insert하는 것을 피한다.
(freelist관리)
• 이런 경우에는 clustering factor가 나빠서
많은 read block이 발생해도 성능이 나빠
지지 않는다.(?)
13. 13
• The CF in case of an index range scan
with table access involved represents the
largest fraction of the cost associated
with the operation. (See 10053 trace file)
HOW SCATTERED / CLUSTERED?
15. 15
Statistics
• Controlling column statistics via METHOD_OPT
FOR ALL INDEXED COLUMNS SIZE > 1:
Nonsense, without basic column
statistics
Default from 10g on:
FOR ALL COLUMNS SIZE AUTO:
basic column statistics for all coumns,
histograms if Oracle determines so
16. 16
Sampling 추출방법
• Row sampling
select count(*) from t1 sample(1);
전체 row의 1%를 sampling
1%도 정확도가 높다.
• Block sampling
select count(*) from t1 sample block(1);
전체 block의 1%를 sampling
row sampling방식보다 빠르나 정확도는 낮
다.
17. 17
HISTOGRAMS
• Basic column statistics get generated
along with table statistics in a single pass
• Each histogram requires a separate pass
• Therefore Oracle resorts to aggressive
sampling if allowed =>AUTO_SAMPLE_SIZE
• This limits the quality of histograms and
their significance
(basic column statistics는 거의 모든 row를 대상으로 하
나, histogram은 극히 일부 row를 sampling)
user_tab_col_statistics 참조
18. 18
HISTOGRAMS
• Limited resolution of 255 value pairs
maximum
• Less than 255 distinct column values =>
Frequency Histogram
• More than 255 distinct column values
=>Height Balanced Histogram
• Height Balanced is always a sampling of
data, even when computing statistics!
19. 19
HISTOGRAMS
• Aggressive sampling
• Oracle doesn’t trust its own histogram
information when caculating estimated
cardinality.
• Very bad cardinality estimation
inefficient execution plan
20. 20
Frequency Histograms
• When it consists of only a few popular
values
• Very popular and nonpopular values
• Dynamic sampling also is not
representative
• Statistics is sometimes inconsistent
21. 21
Height Histograms
• Rounding effects
• They cannot cover all values.
• Histogram values are unstable.
(when you gather histograms, the values
can be different.)
• Oracle doesn’t know the data the way
you know it.
24. 24
SUMMARY
• Check the correctness of the CF for your
critical indexes
• Oracle does not know the questions you
ask about the data
• You may want to use FOR ALL COLUMNS
SIZE 1 as default and only generate
histograms where really necessary
• You may get better results with the old
histogram behavior, but not aloways
25. 25
SUMMARY
• There are data patterns that don’t work well
with histograms
• => You may need to manullay generate
histograms using
DBMS_STATS.SET_COLUMN_STATS for
critical columns
• Don’t forget about Dynamic
Sampling/FBI/Virtual Columns/Extended
Statistics
• Know your data and business questions!
26. 26
10053 Trace File
• SYSTEM Statistics Information
- CPU SPEED, SBRDTime, MBRDTime, MBRC
• 테이블/인덱스 Statistical Information
- Base Cardinality, Density, CLUF
• Cardinality Estimation
• Cost Estimation
- Access Type, Join Type, Join Order
• 예측은 틀릴 수 밖에 없다.
27. 27
Index Scan Cost
• 전체 Row수 중 얼마를 읽어야 Index
Scan이 Table Full Scan보다 효율적일까?
• CBO는 Index Scan과 Full Table Scan의
Cost를 기계적으로 계산
• 10053 Trace file 내용
• Cost계산 방식 이해 CBO가 왜 Index
Scan/Full Table Scan하는지 이해할 수 있
다.
28. 28
COST?
• Jonathan Lewis
• The cost represents (and has always
represented) the optimizer’s best
estimate of the time it will take to
execute the statement.
• Query의 예상 수행 시간(Time)
29. 29
Time으로서의 Cost
• Total Time = CPU Time + I/O time + Wait Time
• Estimated Time
= Estimated CPU time + Estimated I/O time
= Estimated CPU time + Single Block I/O time
+ Multi Block I/O time
30. 30
Time으로서의 Cost
• Oracle은 COST에 정규화(인위적인 가공)을 수행
• 모든 Time의 합을 평균 Single Block I/O Time으로 나눈다.
• 왜 그럴까? Oracle Optimizer의 역사에 대한 이해가 필요
• 과거 System Statistics 개념이 소개되기 전에는 I/O Model이 Cost계산의 기본
Model이었다. 즉, Cost란 곧 I/O Count였다.
• I/O Count를 Time으로 변환하는 것이 불가능하기 때문에 역으로 Time을 I/O
Count기준으로 정규화하는 방법을 선택했다.
31. 31
Time으로서의 Cost
• Total Time = CPU Time + I/O time + Wait Time
• Estimated Time
= Estimated CPU time + Estimated I/O time
= Estimated CPU time + Single Block I/O time +
Multi Block I/O time
• COST = (Estimated Time / Single Block I/O Time)
cost가 높다 = 수행시간이 길다
= Single Block I/O count + 보정Multi Block I/O count + 보
정CPU count
모든 예상 수행 시간을 single block I/O time에 대한 가중치
를 고려한 count값으로 변환
I/O Cost 방식보다 진보
35. 35
Time Model 한계
• Mreadtim 과 sreadtim 값의 부정확성
• Enterprise 환경의 Storage는 sreadtim이
mreadtim보다 더 높은 다소 비현실적인
현상 발생
• dbms_stats.set_system_stats procedure를
이용해 강제로 값을 조작