Upcoming SlideShare
×

# Ugif 10 2012 ppt0000002

554 views

Published on

0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
554
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
7
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Ugif 10 2012 ppt0000002

1. 1. Update Statistics Olivier Bourdin olivier.bourdin@fr.ibm.com Mercredi 3 Octobre 2012 User Group Informix France
2. 2. Overview Brief Review and History What’s changed? – 11.10, 11.50 – 11.70 – “Smart Statistics” 11.70 FAQ’s – Do I need to do anything different? – Did the update statistics update any stats? – Update statistics and reoptimization User Group Informix France
3. 3. Why is statistics important? Choosing the right QUERY PATH determines how fast you getyour results. Choosing the Wrong Path can be like going around the world toget to your neighbor’s. • Expensive to go around the world. • Takes too long. User Group Informix France
4. 4. Query Optimization Process Examine all tables (table A, table B, table C) – Examine selectivity of every filter (where clauses) – Determine if indexes can be used for filters, order by, group by – Find the best way to scan a table -- sequentially or by an index Identify Join Pairs (AB, AC, BA, BC, CA, CB) – Find best join method (nested loop, hash, or sort merge) – Decide which indexes are best for the join – Calculate the cost of the join Repeat for each additional table (ABC, ACB, BAC, ...) User Group Informix France
5. 5. Estimating costs: need data ! Find the cheapest/lowest cost path. – Cost = I/O cost + Weight * (CPU cost) – I/O -- disk access – CPU -- Rows processed Estimate costs – Filters -- Which indexes to use? – Joins -- Nested Loop, Hash, or Sort Merge? – Eliminate redundant pairs? User Group Informix France
6. 6. Filter selectivity Selectivity is the percentage of rows selected as a result of a filter (number between 0 and 1) Expression Filter Selectivity indexed_col = literal F=1/(number of distinct keys in index) value indexed_col > literal F = (literal value - 2nd min)/(2nd max-2nd value min) NOT expression F = 1 - F(expression) expr1 AND expr2 F = F(expr1) x F(expr2) User Group Informix France
7. 7. How do we influence QueyOptimization ? OPTCOMPIND Optimizer directives, Optimization Goals Update Statistics – Collect information for the optimizer – Table nrows, npused; Index Statistics -- LOW – Data Distributions -- MEDIUM & HIGH – Compile Stored Procedures User Group Informix France
8. 8. Where are the stats stored ? systables (Low) – nrows, npused sysindices (Low) – leaves, levels, nunique, clust syscolumns (Low) – colmin, colmax sysfragments (Low) – nrows, npused, – For index partitions, levels, clust Can view with sysdistrib (Medium or High) dbschema -hd User Group Informix France
9. 9. View Query Path Set explain on – Can be set in session Explain Directive – Can be embedded in the query FOREACH SELECT {+EXPLAIN } order_num INTO p_num FROM orders WHERE customer_num = 104 ORDER BY order_num xtrace Debug – Support may ask you to turn this on User Group Informix France
10. 10. Debugging with xtrace To “see” the statistics information being used for query optimization Example: xtrace heavy -c XTF_OPTMZR -f XTF_DEBUG xtrace size 10000 xtrace on Use “xtrace fview” or Use “xtrace info” to “xtrace view” to view display current xtrace traces. settings. “xtrace fview” includes Use “xtrace --” for xtrace timestamps. usage info. User Group Informix France
11. 11. Xtrace: example f1 31310 16 get_distrib(): distrib not found for table c col zipcode Before f1 7401 16 selec1: op = 46(OP_EQ), defsel = 0.1 sel = 0.0434783 … … f2 1207 16 oprowspages(tab = c, nrows = 28, npages = 2) f2 13217 16 opmix_iscancost(numrows=1.21739,npages=2,pagesread=1.13988) f2 13225 16 opmix_iscancost(scancost=1.1764,indexcost=1.08, …, iscancost=2.2564) f1 31310 18 get_distrib(): distrib found for table c col zipcode After Update f1 7401 18 selec1: op = 46(OP_EQ), defsel = 0.1 sel = 0.0357143 … Statistics … f2 1207 18 oprowspages(tab = c, nrows = 28672, npages = 2048) … f2 2237 18 dpages = 24576 lpages = 84 nlevels = 2 f2 1871 18 dcost = 33.72 seek 0 keyonly = TRUE f2 1896 18 iscancost(c, zip_ix) cost = 35.72 f2 13217 18 opmix_iscancost(numrows=1024,npages=2048,pagesread=805.977) f2 13225 18 opmix_iscancost(scancost=836.697,indexcost=35.72, …, iscancost=872.417) User Group Informix France
12. 12. Xtrace (after ... cont’d) … f2 1207 18 oprowspages(tab = c, nrows = 28672, npages = 2048) f2 1320 18 opscantabcost(c) npages = 2048, nrows = 28672, cost = 2909.16 f2 1527 18 opcartcost(c) cost = 2909.16 initcost = 0 f2 1988 18 index_info(): index 100_1 fullness 0.75 recs_per_node 128 keylen 4 … f2 2237 18 dpages = 2048 lpages = 187 nlevels = 3 f2 10863 18 idxtree_travcost s 3.48772e-05 nlevels 3 lpages .. dpages .. mempages 512 f2 14448 18 seek_factor 6 clust 2048 clust_scale 0 seek 0 … f2 1727 18 opidxcost(c, 100_1) = 0.745763 f1 16094 18 index 100_1 considered, icost 0.745763, istart 0.0078125, fltragg 0 f1 16324 18 indexp(): best index path: idx 100_1 icost = 0.745763 idx_flags 2 f3 3462 18 idx cost = 0.745763 initcost = 0.0078125 totalcost = 17.1526 f3 3465 18 outer size = 23 join size = 1 f3 8468 18 build inner table, init cost is 13.5745, join cost is 4.24268 f3 8568 18 build outer table, init cost is 4.24268, join cost is 13.5745 User Group Informix France
13. 13. sqexplain.out (before) select c.city, c.state, o.ship_date from customer c, orders o where c.customer_num = o.customer_num and c.state = ? and c.zipcode = ? Estimated Cost: 3 Estimated # of Rows Returned: 1 1) informix.c: INDEX PATH Filters: informix.c.state = AZ (1) Index Name: informix.zip_ix Index Keys: zipcode (Serial, fragments: ALL) Lower Index Filter: informix.c.zipcode = 85016 2) informix.o: INDEX PATH (1) Index Name: informix. 102_4 Index Keys: customer_num (Serial, fragments: ALL) Lower Index Filter: informix.c.customer_num = informix.o.customer_num NESTED LOOP JOIN User Group Informix France
14. 14. sqexplain.out (after) select c.city, c.state, o.ship_date from customer c, orders o where c.customer_num = o.customer_num and c.state = ? and c.zipcode = ? Estimated Cost: 19 Customer has 28672 rows. Estimated # of Rows Returned: 1 Orders has 23 rows. 1) informix.o: SEQUENTIAL SCAN 2) informix.c: INDEX PATH Filters: (informix.c.zipcode = 85016 AND informix.c.state = AZ ) (1) Index Name: informix. 100_1 Index Keys: customer_num (Serial, fragments: ALL) Lower Index Filter: informix.c.customer_num = informix.o.customer_num NESTED LOOP JOIN User Group Informix France
15. 15. Before 11.x Before 11.x – Update statistics low, – Update statistics medium, high • Resolution, Confidence Scripts – Update statistics distributions only Cron jobs – Update statistics drop distributions – Update statistics for table, for procedure – Lots of guidelines • What to run update statistics on • Which update statistics to run • How to run update statistics User Group Informix France
16. 16. Guidelines Update statistics medium distributions only for all columns that do not have an index Update statistics high for columns that are the first key in an index Update statistics low for all columns in multicolumn indexes Run with PDQ for better performance (for table ONLY) Do not run with PDQ for update statistics for procedure User Group Informix France
17. 17. Issues (before 11.x) Difficult to know when update statistics was run last Guidelines weren’t always well-understood People weren’t sure how to run update statistics – Accidentally over-wrote statistics by running HIGH first, then MEDIUM – Accidentally compiled stored procedures with PDQ – Ran Update Stats LOW twice (performance issue) Update statistics LOW for table tab1; What might be considered Update statistics HIGH for table tab1 (col1, col2); “missing” here? User Group Informix France
18. 18. 11.10 Features 11.10 Enhancements – Create index creates initial stats and distribution information for the leading column of the index – Enhance catalog information • What time was update statistics Low run? • What time were the distributions created? • How many rows were sampled for the distributions? – New “Sampling Size” option – Update statistics drop distributions ONLY – Auto Update Statistics Scheduler tasks User Group Informix France
19. 19. Help with Guidelines Use scheduler task “Auto Update Statistics Evaluation” – Scheduler task can be run “on-demand” using exectask() Execute function exectask(‘Auto Update Statistics Evaluation’) Use script in Informix Technote (swg21137764) – UPDATE STATISTICS commands to allow the optimizer to work its best http://www-01.ibm.com/support/docview.wss?uid=swg21137764 Use Art Kagel’s dostats (from IIUG) User Group Informix France
20. 20. US History First introduced in 11.10 – Scheduler task “Auto Update Statistics Evaluation” – Scheduler task “Auto Update Statistics Refresh” – Uses the guidelines to determine the update statistics commands to run Enhancement to work with non-English Locales in 11.50.xC6 User Group Informix France
21. 21. AUS Scheduler Tasks Runs Update Statistics FOR TABLE commands UPDATE STATISTICS LOW FOR TABLE stores7:customer UPDATE STATISTICS HIGH FOR TABLE stores7:customer ( customer_num, zipcode ) RESOLUTION 0.500 DISTRIBUTIONS ONLY Runs with PDQ set to AUS_PDQ in sysadmin:ph_threshold > select * from ph_threshold where name = "AUS_PDQ"; id 30 name AUS_PDQ task_name Auto Update Statistics Refresh value 10 value_type NUMERIC description Update statistics executes with this PDQ priority. User Group Informix France
22. 22. AUS Parameters AUS_AGE aus_evaluator The statistics are rebuilt after specified days. AUS_CHANGE aus_evaluator The statistics are rebuilt after specified percentage of data has changed. AUS_AUTO_RULES aus_evaluator 1 or 0 – if “off”, only evaluates tables that already have statistics. AUS_SMALL_TABLES aus_evaluator Tables containing less than this number of rows will always have their statistics rebuilt. AUS_PDQ aus_refresh_stats Run Update Statistics with this PDQ setting. User Group Informix France
23. 23. 11.70 Features Smart Statistics – Default: AUTO_STAT_MODE 1 – Default: STATCHANGE 10 – Update Statistics command, when run, is not executed for index statistics and for table distribution if the STATCHANGE threshold has not been met Fragment-level Statistics – Not on by default – Not discussed in this presentation User Group Informix France
24. 24. 11.70 Statistics Updated ? Update Statistics info in database catalog tables –Look at ustlowts in systables • Updated when systables nrows and npused are updated – this is done whenever update statistics command is run – STATCHANGE threshold is not looked at –Look at ustlowts in sysindices • Updated when index statistics are rebuilt/updated –Look at constr_time in sysdistrib • Updated when distribution statistics are rebuilt/updated User Group Informix France
25. 25. Example \$ dbaccessdemo7 stores7 –nots select idxname, levels, leaves, nrows, nupdates, ndeletes, ninserts, ustlowts from sysindices where tabid = 100 and idxname = “zip_ix” ; idxname zip_ix Index on customer(zipcode) levels 1 leaves 1.000000000000 nrows 28.00000000000 UDI counters for this index nupdates 0.00 at the time of the update ndeletes 0.00 statistics low run. ninserts 28.00000000000 ustlowts 2012-04-03 22:54:56.00000 > select * from sysdistrib where tabid = 100; dbaccessdemo7 did not create table distributions No rows found. for customer table. User Group Informix France
26. 26. Example (cont’d) > load from customer.unl insert into customer; 199863 row(s) loaded. > select idxname, levels, leaves, nrows, nupdates, ndeletes, ninserts, > ustlowts from sysindices where tabid = 100 and idxname = “zip_ix”; idxname zip_ix levels 1 Index statistics for zip_ix leaves 1.000000000000 unchanged after 199,863 nrows 28.00000000000 rows inserted into the nupdates 0.00 customer table. ndeletes 0.00 ninserts 28.00000000000 -- No update statistics ustlowts command has been run. 2012-04-03 22:54:56.00000 User Group Informix France
27. 27. Example (cont’d) > create index state_ix on customer(state); idxname zip_ix idxname state_ix levels 1 levels 3 leaves 1.000000000000 leaves 556.0000000000 nrows 28.00000000000 nrows nupdates 0.00 nupdates 0.00 ndeletes 0.00 ndeletes 0.00 ninserts 28.00000000000 ninserts 0.00 ustlowts 2012-04-03 ustlowts 2012-04-03 22:54:56.00000 23:04:33.00000 After inserting 199,863 rows into the customer table, create index state_ix on customer(state). -- No update statistics command has been run. User Group Informix France
28. 28. Example (cont’d) > select tabid, colno, mode, smplsize, rowssmpld, constr_time, > ustnrows, ustbuildduration, nupdates, ndeletes, ninserts > from sysdistrib where tabid = 100; tabid 100 colno 8 column state mode H smplsize 199891.0000000 rowssmpld 199891.0000000 constr_time 2012-04-03 23:04:33.00000 ustnrows 199891.0000000 ustbuildduration 0:00:00.00000 Distribution nupdates 0.00 information for ndeletes 0.00 column state in ninserts 199891.0000000 customer table User Group Informix France
29. 29. Example (cont’d) > select partnum, nupdates, ndeletes, ninserts from sysmaster:sysptnhdr > where partnum in (select partn from sysfragments > where fragtype = "I" and indexname in (state_ix, zip_ix)); partnum nupdates ndeletes ninserts zip_ix 1049092 0 0 199891 state_ix 1049100 0 0 0 > select partnum, nupdates, ndeletes, ninserts from sysmaster:sysptnhdr > where partnum = (select partnum from systables where tabid = 100); partnum nupdates ndeletes ninserts customer 1049069 0 0 199891 Actual partition page info, showing the UDI counters for the partition, since the partition was created – this is not the same as the UDI info in the catalogs, which are updated when statistics are updated. User Group Informix France
30. 30. OAT view of Statistics User Group Informix France
31. 31. OAT view (cont’d) For customer table -- • Index zip_ix has exceeded STATCHANGE. • Index state_ix has not. User Group Informix France
32. 32. Example (cont’d) > update statistics low for table customer; idxname zip_ix BEFORE idxname zip_ix AFTER levels 1 levels 3 leaves 1.000000000000 leaves 505.0000000000 nrows 28.00000000000 nrows 199891.0000000 nupdates 0.00 nupdates 0.00 ndeletes 0.00 ndeletes 0.00 ninserts 28.00000000000 ninserts 199891.0000000 ustlowts 2012-04-03 ustlowts 2012-04-04 22:54:56.00000 00:36:53.00000 • Index statistics updated. zip_ix index • Catalog UDI values updated. • sysindices ustlowts updated. User Group Informix France
33. 33. Example (cont’d) > update statistics low for table customer; BEFORE AFTER idxname state_ix idxname state_ix levels 3 levels 3 leaves 556.0000000000 leaves 556.0000000000 nrows nrows 199891.0000000 nupdates 0.00 nupdates 0.00 ndeletes 0.00 ndeletes 0.00 ninserts 0.00 ninserts 0.00 ustlowts 2012-04-03 ustlowts 2012-04-03 23:04:33.00000 23:04:33.00000 • Index statistics unchanged. state_ix index • Catalog UDI values unchanged. • sysindices ustlowts unchanged. User Group Informix France
34. 34. Example (cont’d) > select tabname, tabid, nrows, created, ustlowts > from systables where tabid = 100; tabname customer tabid 100 nrows 199891.0000000 created 04/03/2012 ustlowts 2012-04-04 00:36:53.00000 The systables information is always updated when update statistics for table stats are run, regardless of STATCHANGE. User Group Informix France
35. 35. Example Update Statistics LOW for table tab1; Update Statistics HIGH for table tab1 (col1, col2); Before 11.70 – You should put “Distributions Only” in the Update Statistics HIGH command to avoid collecting index statistics again After 11.70 – Doesn’t matter since index statistics will only be updated if STATCHANGE has been met for the index User Group Informix France
36. 36. Sysmaster query for %change SELECT colname as name, Column as type, constr_time::datetime year to second as build_date, rowssmpld::bigint as sample, d.ustnrows::bigint as nrows, case when d.mode = M then Medium‘ when d.mode = H then High end as mode, resolution, confidence, ustbuildduration as build_duration, (table_counter.udi_counter - d.ninserts - d.nupdates - d.ndeletes) as udi_counter, CASE WHEN d.ustnrows=0 and (table_counter.udi_counter - d.ninserts - d.nupdates - d.ndeletes) = 0 THEN 0.00 WHEN d.ustnrows=0 and (table_counter.udi_counter - d.ninserts - d.nupdates - d.ndeletes) != 0 THEN -1 ELSE ROUND((table_counter.udi_counter - d.ninserts - d.nupdates – d.ndeletes)/d.ustnrows * 100,2) END as change FROM sysdistrib d, syscolumns c, ( select SUM(nupdates + ndeletes + ninserts) as udi_counter from sysmaster:sysptnhdr where partnum in (select partn from sysfragments where tabid = 100 and fragtype=T union select partnum as partn from systables where tabid = 100) ) as table_counter WHERE d.tabid=100 and c.tabid=100 and d.colno = c.colno and d.seqno = 1 UNION User Group Informix France
37. 37. Sysmaster query for %change -- Continuing query started on previous slide SELECT idxname as name, MIN(Index) as type, MIN(ustlowts)::datetime year to second as build_date, MIN(0) as sample, SUM(f.nrows)::bigint as nrows, MIN(Low) as mode, MIN(0) as resolution, MIN(0) as confidence, SUM(i.ustbuildduration) as build_duration, SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0) + NVL(p.ndeletes,0)) - SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0)) as udi_counter, CASE WHEN SUM(f.nrows)=0 and (SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0) + NVL(p.ndeletes,0)) - SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0))) = 0 THEN 0.00 WHEN SUM(f.nrows)=0 and (SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0) + NVL(p.ndeletes,0)) - SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0))) != 0 THEN -1 ELSE ROUND((SUM(NVL(p.ninserts,0) + NVL(p.nupdates,0) + NVL(p.ndeletes,0)) - SUM(NVL(f.ninserts,0) + NVL(f.nupdates,0) + NVL(f.ndeletes,0)))/SUM(f.nrows) * 100,2) END as change FROM sysindices i, sysmaster:sysptnhdr p, sysfragments f WHERE i.idxname = f.indexname AND i.tabid = 100 AND i.tabid = f.tabid AND f.partn = p.partnum GROUP BY i.idxname ORDER BY change DESC User Group Informix France
38. 38. Table STATCHANGE value Default STATCHANGE applies if not set for table Can be set at session level using set environment – Set environment statchange ‘5’ ; Can set STATCHANGE when creating table Can alter table to set STATCHANGE – Alter table customer statchange 5; select tabname, NVL ( statchange, (select cf_effective from sysmaster:sysconfig where cf_name = ‘STATCHANGE’) ) as statchange from systables where tabname = "customer"; User Group Informix France
39. 39. FORCE option Can add “FORCE” to any update statistics command to ignore STATCHANGE When you upgrade to 11.70 – Existing partition pages will have UDI counters added (UDI values are 0) – Catalog tables sysfragments (for indexes) and sysdistrib (for table column data distributions) will have UDI counters added (values are 0) – What does this mean for Update Statistics? • FORCE Execute even if NO change • STATCHANGE 0 Execute if any amount of change (non- zero) User Group Informix France
40. 40. FORCE option (cont’d) Add “FORCE” to end of update statistics command to get legacy behavior (ignore STATCHANGE) FORCE – Execute even if NO change – Sets sysdistrib nupdates, ndeletes, ninserts to 0 – same behavior isn’t seen with sysfragments nupdates, ndeletes, ninserts STATCHANGE 0 – Execute if non-zero amount of change – Set environment STATCHANGE ‘0’ User Group Informix France
41. 41. Stored Procedures Not affected by STATCHANGE -- Update statistics FOR PROCEDURE SQL statements in SPL are optimized – When SPL is created or on first execution – When dependent table or indexes are altered – When statistics of dependent tables change In 11.70, this means every time update statistics is run to update a table, systable’s npused, nrows, and ustlowts are updated (even if index statistics or distribution statistics are not updated due to STATCHANGE not having been met). User Group Informix France
42. 42. Update Statistics Low - Summary Update statistics low performance improvement feature takes effect when : • USTLOW_SAMPLE is set to 1 • the index has 100,000 or more leaf pages • Detached index USTLOW_SAMPLE • New ONCONFIG parameter, documented in 11.70.xC4 • Controls use of sampling (new feature) to collect index statistics during update statistics • 0 or 1 (on) / Default value is 0 (off) • Can be updated with onmode -wm/wf • Can be set at session-level using SET ENVIRONMENT – Set Environment USTLOW_SAMPLE 0 / 1 / on / off User Group Informix France
43. 43. Update Statistics Low – Why? Update Statistics LOW takes too long when gathering statistics for large indexes • Entire index is read in sequence • Each leaf page of an index must be read individually (separate I/O) • Some customers do not run the command because it does not fit in the maintenance window • On a single large table (billions of rows and many indexes), command can take over 3 days New Feature Solution: USTLOW_SAMPLE • Use sampling to reduce time required to gather index statistics • Many samples are taken, and index statistics is calculated based on statistics from the samples User Group Informix France
44. 44. Update Statistics Low - Details Update statistics low gathers the following index statistics • number of index levels • number of index leaf pages • number of unique values for index lead key • clustering factor • 2nd lowest and 2nd highest value for index lead key Index statistics saved in database catalog • Sysindices (levels, leaves, nunique, clust) • Syscolumns (colmin, colmax) • Sysfragments (levels, clust) for fragtype = “I” When Update Statistics Med or High is run, index statistics are also collected, unless “Distributions Only” is used User Group Informix France
45. 45. Update Statistics Low – Details (cont’d) Instead of reading the entire index in sequence, the new feature: • Uses sampling • Each sample will go from index root page to index leaf page, reading one or more index leaf pages • Sampling is “dynamic” -- number of samples is not pre- determined • Number of samples is determined by the quality of the samples – Fewer samples needed if data is evenly distributed – More samples needed if data distribution is skewed – Standard deviation among the samples is used as criteria as a measurement of “quality” • Time for update statistics is not predictable up-front User Group Informix France
46. 46. Update Statistics Low - Example Example based on internal traces User Group Informix France
47. 47. Update Statistics Low - Example Example based on internal traces User Group Informix France
48. 48. Update Statistics Low - Notes Review of Update statistics feature – 11.70.xC1 “Smart Statistics” Feature Review • Default: AUTO_STAT_MODE 1 • Default: STATCHANGE 10 • Update Statistics command, when run, is not executed for index statistics and for table distribution if the STATCHANGE threshold has not been met – Update Statistics info in database catalog tables • Look at ustlowts in systables – Updated when systables nrows and npused are updated – this is done whenever update statistics command is run – STATCHANGE threshold is not looked at • Look at ustlowts in sysindices – Updated when index statistics are rebuilt/updated • Look at constr_time in sysdistrib – Updated when distribution statistics are rebuilt/updated Remember, 11.10 Feature – Statistics are collected when Index is created User Group Informix France
49. 49. Catalog for smarter Statistics systables sysfragments 11.70 statchange nupdates Existing statlevel ndeletes ustlowts ninserts sysindices sysdistrib sysfragdist nupdates nupdates nupdates ndeletes ndeletes ndeletes ninserts ninserts ninserts ustbuildduration ustbuildduration ustbuildduration ustlowts constr_time constr_time User Group Informix France
50. 50. Questions ? User Group Informix France
51. 51. Merci Olivier Bourdin olivier.bourdin@fr.ibm.com Mercredi 3 Octobre 2012 User Group Informix France