How to …                 .. design efficient SQL                             Jonathan Lewis                  jonathanlewis...
Highlights          Know the data          Does a good execution path exist ?          Can the optimizer find that path ?J...
Knowing the data - conflict          Your knowledge of the data          The optimizers model of the dataJonathan Lewis   ...
Common Outcomes                                         You think the task is                                           Sm...
Know the metadata select                                       AP_GRP_FK_I             GRP_ID                 table_owner,...
Know the data (b)                                                                        COLX                  CT         ...
Know the data (d)     select                                                             BLOCKS            COUNT(*)       ...
Know the data (f) select        keys_per_leaf, count(*) blocks from   (        select               sys_op_lbid(49721,l,t....
Draw the query - requirement       select      {columns}                                     Orders in the last week where...
Draw the query - indexes                                                        Order_lines                               ...
Sketch in paths - strategy    • Pick a starting point           – How many rows will I start with           – How efficien...
Case Study (a)          select                                                     "distinct" is always a bit little suspe...
Case Study (c)                                                          12,000 rows per day                               ...
Case Study (e)     Get rid of the histogram on office_id     Use hints (if necessary) to force the only "constant volume" ...
Case Study (g)    Reduce work with better indexing    Modified Indexes (option a - fairly safe)           contracts(id, id...
Case Study (i)          select        con.id          from          contracts     con          where         id_office = :...
Upcoming SlideShare
Loading in...5
×

[INSIGHT OUT 2011] A15 how to design optimal sql(jonathan lewis)

587

Published on

Published in: Technology
1 Comment
1 Like
Statistics
Notes
  • Can you please share this ppt
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
587
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "[INSIGHT OUT 2011] A15 how to design optimal sql(jonathan lewis)"

  1. 1. How to … .. design efficient SQL Jonathan Lewis jonathanlewis.wordpress.com www.jlcomp.demon.co.uk Who am I ? Independent Consultant. 27+ years in IT 23+ using Oracle Strategy, Design, Review, Briefings, Educational, Trouble-shooting jonathanlewis.wordpress.com www.jlcomp.demon.co.uk Member of the Oak Table Network Oracle ACE Director Oracle author of the year 2006 Select Editor’s choice 2007 O1 visa for USAJonathan Lewis Many slides have a foot-note. This is a brief summary of the comments that I Efficient SQL© 2006 - 2011 should have made whilst displaying the slide, and is there for later reference. 2 / 32 1
  2. 2. Highlights Know the data Does a good execution path exist ? Can the optimizer find that path ?Jonathan Lewis Efficient SQL© 2006 - 2011 3 / 32 Knowing the data How much data? Where is it ?Jonathan Lewis Efficient SQL© 2006 - 2011 4 / 32 2
  3. 3. Knowing the data - conflict Your knowledge of the data The optimizers model of the dataJonathan Lewis Efficient SQL© 2006 - 2011 5 / 32 Choice of Strategies Lots of little jobs How many How little (how precise) One big jobJonathan Lewis Efficient SQL© 2006 - 2011 6 / 32 3
  4. 4. Common Outcomes You think the task is Small Big Oracle thinks Good Bad Small the task is Plan Plan Bad Good Big Plan PlanJonathan Lewis Efficient SQL© 2006 - 2011 7 / 32 Optimizer problems Correlated columns Uneven data distribution Aggregate subqueries Non-equality joins Bind variablesJonathan Lewis Efficient SQL© 2006 - 2011 8 / 32 4
  5. 5. Know the metadata select AP_GRP_FK_I GRP_ID table_owner, AP_GRP_ROLE_I GRP_ID (compress) table_name, ROLE_ID index_name, AP_ORG_AP_I ORG_ID (compress 1) column_name AP_ID from AP_ORG_FK_I ORG_ID dba_ind_columns order by AP_PER_AP_I PER_ID (compress 1) AP_ID table_owner, table_name, AP_PER_FK_I PER_ID index_name, AP_PK AP_ID column_position (compress) AP_ROLE_FK_I ROLE_ID ; AP_UD_I TRUNC(UPD_DATE) (drop, compress, coalesce)Jonathan Lewis A simple query, and a little thought, can show us indexes which could be Efficient SQL© 2006 - 2011 dropped or made more efficient. 9 / 32 Know the data (a) COLX CT select CHK 12 colX, COM 3252534 count(*) ct LDD 314 from t1 PD 1821 group by colX VAL 108 order by colX XRF 32Jonathan Lewis Look for odd data patterns, and think how you can take advantage of them Efficient SQL© 2006 - 2011 10 / 32 5
  6. 6. Know the data (b) COLX CT 1 9 2 12 select 3 12 colX, 4 8 count(*) ct 5 7 from t1 6 9 group by colX ... order by colX ... 9997 1 sample (5) 9998 1 sample block (5) sample block (5, 2) -- 9i 9999 1 sample block (5, 2) seed(N) -- 10g 10000 1Jonathan Lewis A simple query may not help - but it is still a good starting point Efficient SQL© 2006 - 2011 11 / 32 Know the data (c) select CT COUNT(*) ct, count(*) 1 9001 ... from ( 6 37 select 7 78 colX, 8 94 count(*) ct 9 117 from t1 10 112 11 126 group by colX There are 99 values 12 99 ) that appear 12 times 13 97 group by ct 14 86 order by ct 15 49 ; 16 32 ... 22 1Jonathan Lewis With a little extra sophistication we get an interesting idea about the number Efficient SQL© 2006 - 2011 of rows that might be returned for "column = constant". 12 / 32 6
  7. 7. Know the data (d) select BLOCKS COUNT(*) blocks, count(*) 1 9001 from ( ... select 6 43 /*+ index(t1(colX)) */ 7 83 colX, 8 107 count( 9 126 distinct substr(rowid,1,15) 10 120 ) blocks 11 125 from t1 12 119 group by colX There are 90 values 13 90 that are scattered 14 69 ) across 13 blocks 15 42 group by blocks 16 28 order by blocks ... ; 19 2Jonathan Lewis We also need to know about number of blocks accessed. Efficient SQL© 2006 - 2011 13 / 32 Know the data (e) select /*+ index(t,"T1_I1") */ count(*) nrw, count(distinct sys_op_lbid(49721,L,t.rowid)) nlb, count(distinct hextoraw( sys_op_descend("DATE_ORD") || sys_op_descend("SEQ_ORD") )) ndk, sys_op_countchg(substrb(t.rowid,1,15),1) clf from "TEST_USER"."T1" t where "DATE_ORD" is not null or "SEQ_ORD" is not null ;Jonathan Lewis This is a query used by the dbms_stats package to collect index stats. Efficient SQL© 2006 - 2011 14 / 32 7
  8. 8. Know the data (f) select keys_per_leaf, count(*) blocks from ( select sys_op_lbid(49721,l,t.rowid) block_id, count(*) keys_per_leaf from t1 t where {index_columns are not all null} group by sys_op_lbid(49721,l,t.rowid) ) group by keys_per_leaf order by keys_per_leaf ;Jonathan Lewis We can take advantage of some of the functions to analyse the index quality Efficient SQL© 2006 - 2011 15 / 32 Know the data (g) KEYS_PER_LEAF BLOCKS KEYS_PER_LEAF BLOCKS 17 206 3 114 18 373 19 516 4 39 50% usage for splits 20 678 6 38 21 830 7 39 22 979 13 37 23 1,094 24 1,178 14 1 25 1,201 21 1 26 1,274 27 15 Expect 70% usage 27 1,252 28 3 28 1,120 29 1,077 39 1 30 980 54 6 31 934 55 3 32 893 244 1 33 809 34 751 281 1 35 640 326 8 36 738 37 625 38 570 Smashed (FIFO) Index Assume 100% full 39 539 40 489Jonathan Lewis Efficient SQL© 2006 - 2011 16 / 32 8
  9. 9. Draw the query - requirement select {columns} Orders in the last week where from customers cus, the customer is in London orders ord order_lines orl the supplier is from Leeds products prd1 there is a supplier elsewhere suppliers sup1 where cus.location = LONDON and ord.id_customer = cus.id and ord.date_placed between sysdate - 7 and sysdate and orl.id_order = ord.id and prd1.id = orl.id_product and sup1.id = prd1.id_supplier and sup1.location = LEEDS and exists ( select null from product_match mch, products prd2, suppliers sup2 where mch.id_product = prd1.id and prd2.id = mch.id_product_sub and sup2.id = prd2.id_supplier and sup2.location != LEEDS )Jonathan Lewis Efficient SQL© 2006 - 2011 17 / 32 Draw the query - outline Orders in the last week where the customer is in London the supplier is from Leeds Order_lines there is a supplier elsewhere Exists Recent Product_match Products Orders ProductsNot Leeds Leeds London Suppliers Suppliers CustomersJonathan Lewis Efficient SQL© 2006 - 2011 18 / 32 9
  10. 10. Draw the query - indexes Order_lines PK FK PK FK PK Date Product_match Products Orders FK Products PK FK PK FK Suppliers Suppliers Customers Location Location LocationJonathan Lewis Efficient SQL© 2006 - 2011 19 / 32 Draw the query - statistics Huge Order_lines 1:10 Good Clustering Date: Big Orders 1:2,500 Good Clustering 1:10 / 1:150 Good caching Totally Random for recent data Small CustomersJonathan Lewis Efficient SQL© 2006 - 2011 20 / 32 10
  11. 11. Sketch in paths - strategy • Pick a starting point – How many rows will I start with – How efficiently can I get them – the first step may be inefficient (it only happens once) • How do I get to next table – How many times do I make the step – How precise is the access path – How much data do I now haveJonathan Lewis Efficient SQL© 2006 - 2011 21 / 32 Sketch in paths - analysis Order_lines 4 6 3 1 Product_match Products Orders 7 Products 5 2 8 Suppliers Suppliers Customers http://jonathanlewis.wordpress.com/2010/03/04/sql-server-2 http://www.embarcadero.com/master-sql-tuners-oracle-lewis-haileyJonathan Lewis Efficient SQL© 2006 - 2011 22 / 32 11
  12. 12. Case Study (a) select "distinct" is always a bit little suspect, distinct trx.id_contract suggesting an error in a join clause or a rewrite with an existence subquery. from (The latter is not viable in this case) transactions trx, contracts con, transaction_types tty where trx.id_ttype = tty.id and trx.id_contract = con.id and con.id_office = :b1 and tty.qxt <> NONE and trx.created between :b2 and :b3 and trx.error = 0 ;Jonathan Lewis The SQL is a little odd - its creating a drop-down list for an OLTP system to Efficient SQL© 2006 - 2011 show contracts that have had some work done on them in a given date range 23 / 32 Case Study (b) . | Id| Operation | Name |Rows |Bytes | Cost | | 0| SELECT STATEMENT | | | | 14976 | | 1| SORT UNIQUE | | 7791 | 304K| 14976 | | 2| FILTER | | | | | | 3| HASH JOIN | | 7798 | 304K| 14974 | | 4| VIEW | | 9819 |98190 | 1599 | | 5| HASH JOIN | | | | | | 6| INDEX RANGE SCAN | CON_OFF_FK | 9819 |98190 | 35 | | 7| INDEX FAST FULL SCAN| CON_PK | 9819 |98190 | 1558 | | 8| HASH JOIN | | 7798 | 228K| 13374 | | 9| TABLE ACCESS FULL | TRANS_TYPES | 105 | 945 | 3 | | 10| TABLE ACCESS FULL | TRANSACTIONS | 7856 | 161K| 13370 |Jonathan Lewis The AWR history showed 11 different plans in the previous week. The most Efficient SQL© 2006 - 2011 resource-intenstive bit was always the scan on transactions. 24 / 32 12
  13. 13. Case Study (c) 12,000 rows per day Almost all have error = 0 Data for the same day is well packed. created between … (indexed) error = 0 Transactions tty (id) trx (id_ttype) con (id) trx(id_contract) id_office = … qxt != ‘NONE’ TX_types (indexed) Contracts 240 offices 100 rows no exclusions Contracts per office: 100 to 18,000 Histogram on Office IDJonathan Lewis This picture is simple, and we don’t need all the details to see the problems Efficient SQL© 2006 - 2011 and possible solutions. Bind variables and histograms don’t go well together. 25 / 32 Case Study (d) The variation in contracts per office is difficult. We’re allowed one plan (due to bind variables -- but see 11g). Make the worst execution plan “good enough” The date range is usually one day. Start at table transactions for constant response times. Data cluster: Date-based transactions are well clustered Contract-based transactions are scattered over time The database is “young” and will be growing – a lot.Jonathan Lewis The best plan depends on the office that wants the data – a small office Efficient SQL© 2006 - 2011 would want to start at contracts, a large office at transactions … at present. 26 / 32 13
  14. 14. Case Study (e) Get rid of the histogram on office_id Use hints (if necessary) to force the only "constant volume" plan Target plan: hash join table access full transaction_types table access by rowid contracts nested loop table access by rowid transactions index range scan transactions_idx index range scan contracts_idxJonathan Lewis We get rid of the histogram that is introducing instability. Future growth in Efficient SQL© 2006 - 2011 contracts means we dont want to start at contracts. Hinting may be needed. 27 / 32 Case Study (f) select /*+ leading (trx con tty) index(trx(created)) use_nl(con) index(con(id)) use_hash(tty) swap_join_inputs(tty) full(tty) */ distinct trx.id_contract from transactions trx, contracts con, transaction_types tty where ...Jonathan Lewis If you have to hint, you need to be thorough - get the complete join order, Efficient SQL© 2006 - 2011 every access method, and every join method. Note the 10g index hints. 28 / 32 14
  15. 15. Case Study (g) Reduce work with better indexing Modified Indexes (option a - fairly safe) contracts(id, id_office) Target plan: hash join table access full transaction_types nested loop table access by rowid transactions index range scan transactions_idx index range scan contracts_idxJonathan Lewis Create a non-unique index with extra column to support the PK to get rid of Efficient SQL© 2006 - 2011 the table access. Being high precision this is probably a safe index change. 29 / 32 Case Study (h) Reduce work with better indexing Modified Indexes (option b - needs careful testing) contracts(id, id_office) transactions(created, id_con, error, id_ttype) Target plan: hash join table access full transaction_types nested loop index range scan transactions_idx index range scan contracts_idxJonathan Lewis Adding columns to the TX index allow an index-only access. But the new Efficient SQL© 2006 - 2011 index is much bigger with a worse clustering_factor, so may cause problems 30 / 32 15
  16. 16. Case Study (i) select con.id from contracts con where id_office = :b1 and exists ( select null from transactions trx, transaction_types tty where trx.id_contract = con.id and trx.created between :b2 and :b3 and trx.error = 0 and tty.id = trx.id_ttype and tty.qxt <> NONE ) ;Jonathan Lewis An alternative style of query - but the workload increases with time, and the Efficient SQL© 2006 - 2011 "best" indexing for transactions would be different. 31 / 32 Summary Know the data Draw the picture Identify the problems Bad indexing Bad statistics Optimizer deficiencies Structure the query with hintsJonathan Lewis Efficient SQL© 2006 - 2011 32 / 32 16

×