Understandung Firebird optimizer, by Dmitry Yemanov (in English)
Upcoming SlideShare
Loading in...5
×
 

Understandung Firebird optimizer, by Dmitry Yemanov (in English)

on

  • 2,783 views

Understanding Firebird optimizer, by Dmitry Yemanov (in English)

Understanding Firebird optimizer, by Dmitry Yemanov (in English)

Statistics

Views

Total Views
2,783
Views on SlideShare
2,778
Embed Views
5

Actions

Likes
0
Downloads
34
Comments
0

1 Embed 5

http://www.slideshare.net 5

Accessibility

Categories

Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Understandung Firebird optimizer, by Dmitry Yemanov (in English) Understandung Firebird optimizer, by Dmitry Yemanov (in English) Presentation Transcript

  • Understanding Firebird optimizer Dmitry Yemanov [email_address] Firebird Project
  • Optimizer Keypoints
      • Allow the data to be retrieved in the most efficient way possible
      • Analyze the existing statistical information
      • Inject additional predicates
      • Order operations by priority
      • Try different join permutations
      • Strategies
      • Rule-based (heuristics)
      • Cost-based (statistics)
      • Mixed
  • Optimizer Algorithm
      • Preparation
      • Expand views
      • Separate predicates: «base», «parent», «missing»
      • Distribute equalities
      • Generate index mappings
      • Main stage
      • Calculate cost for different join orders
      • Choose the best index coverage for the given join order
      • Ensure early predicates evaluation
      • Decide about navigation or sorting
  • Rule-based Approach
      • Heuristical assumptions
      • Indexed retrieval is better than a full table scan
      • Loop join (indexed) is better than a merge join
      • Index b-tree has three levels of depth
      • Compound indices are better than a few simple ones
      • Drawbacks
      • Indices could be not really good for some operations
      • Not ready for «ad hoc» queries
  • Cost-based Approach
      • Key ideas
      • Every operation has an associated cost value
      • Cost is calculated using the statistical information
      • Cost is aggregated from bottom up in the access path
      • Drawbacks
      • Complex implementation
      • Slower optimization process
      • Requires up-to-date statistics
  • Basic Terms
      • Selectivity
      • Represents a fraction of rows from a row set
      • Value range is 0.0 to 1.0
      • Cardinality
      • Represents number of rows in a row set
      • Base cardinality is a number of rows in a base table
      • Cost
      • Represents computational complexity of the retrieval
      • Is a function of the estimated cardinalities
      • Linearly depends on the number of logical reads (page fetches)
  • Cost Measurement
      • Full table scan
      • cost = <base cardinality>
      • Unique index scan + table scan
      • cost = <b-tree level> + 1
      • Range index scan + table scan
      • cost = <b-tree level> + N + <selectivity> * <base cardinality>
      • N represents a number of the leaf pages to be scanned and thus depends on the average key length
  • Cost Aggregation SELECT * FROM T1 JOIN T2 ON T1.PK = T2.FK WHERE T1.VAL < 100 ORDER BY T1.RANK PLAN SORT ( JOIN ( T1 NATURAL, T2 INDEX (FK) ) ) Table T1: base cardinality = 1000 Table T2: base cardinality = 5000 Index FK: selectivity = 0.001 Final Row Set cost = 5000 cardinality = 2500 Sort cost = 5000 cardinality = 2500 Full Scan cost = 1000 cardinality = 1000 Filter cost = 1000 cardinality = 500 Index Scan cost = 7 cardinality = 5 Loop Join cost = 4000 cardinality = 2500
  • Statistics
      • What is it?
      • Information that describes data amounts and distribution of values on different levels (table / index / column)
      • Where is located?
      • Stored in the database
      • Calculated «on the fly»
      • How is updated?
      • By user's request (SET STATITICS)
      • On index creation / activation
      • On database restore
  • Core Statistics
      • Base cardinality (number of rows in a table)
      • For small tables: number of used record slots on data pages
      • For large tables: number of data pages / average record length
      • Estimated at runtime using a page scan
      • Index selectivity
      • 1 / number of distinct keys
      • Maintained per segment: (A), (A, B), (A, B, C)
      • Uniform value distribution is assumed
      • Stored on the index root page, visible through RDB$INDICES and RDB$INDEX_SEGMENTS
  • Advanced Statistics
      • Table level
      • Average page fill factor
      • Average record length
      • Index level
      • B-tree depth
      • Average key length
      • Clustering factor
      • Column level
      • Number of NULLs
      • Value distribution histograms
  • Clustering Factor Bad Clustering Factor Good Clustering Factor Index Key 1 Index Key 2 Index Key 3 Index Key 5 Index Key 4 Data Page 12 Data Page 25 Data Page 28 Data Page 57 Data Page 44 Data Page 12 Data Page 13 Data Page 14
  • Decisions Based on Statistics
      • Full table scan vs indexed retrieval
      • Big selectivity value suggests a full table scan
      • Order of streams in loop joins
      • Calculate costs of different stream permutations and choose the cheapest one
      • Loop join vs merge join
      • Calculate costs of different stream permutations
      • Index navigation vs external sorting
      • Depends on the clustering factor
  • Decisions Based on Statistics (cont'd)
      • What indices to use
      • Compare index selectivities and index scan costs
      • Estimate how many indices would work best
      • Consider segment operations for compound indices
      • Special handling of different comparisons
      • Calculate selectivities for AND / OR operations
  • Thank you!