Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A12 vercelletto indexing_techniques


Published on

This presentation talks about the available (as per April 2013) index related techniques with IBM Informix.
It includes indexing techniques available with IBM Informix 12.1

See all iiug presentations available on / member area

Published in: Technology
  • Be the first to comment

  • Be the first to like this

A12 vercelletto indexing_techniques

  1. 1. ::IBM Informix indexing techniques:which one to use when ?Eric Vercelletto Session A12Begooden IT Consulting 4/23/2013 3:35 PM
  2. 2. • Introduction to Response Time measuring• Identify the relevant indexing techniques• Describe implementation method• Confirm/recognize its use by accurate monitoring• Measure its efficency as response time andeffective use in the database (sqltrace,sqexplain)• Identify pros and consAgenda / methodology4/24/2013 Session F12 2
  3. 3. Introduction• Begooden IT Consulting is an IBM ISV company, mainlyfocused on Informix technology services.• Our 15+ years experience within Informix SoftwareFrance and Portugal helped us to acquire in depthproduct knowledge as well as solid field experience.• Our services include Informix implementation auditing,performance tuning, issue management, administrationmentoring …• We also happen to be the Querix reseller for France andFrench speaking countries (except Québec and Louisiana)• The company is based in Pont l’Abbé, Finistère, France4/24/20133
  4. 4. Some basics not to forget aboutThere are 2 ways to measure response times• The « cold » measure: the response time is measured just afterstarting the engine, when data and index pages are not yet loadedinto Shared Memory IFMX buffers. Disk IO must be performed toread the data and index pages, which will increase the RT.• The « hot » measure: RT is measured when data and index pagesare loaded into SHMEM. No or few disk IO => RT is much shorter.• This point can often explain surprising RT differences according tohow the data accessed.• Broad range or DS queries most often access data and/or indexes indisk pages• OLTP queries mostly access data and indexes in SHMEM pages4
  5. 5. Derivated thoughts and facts• Reading data pages and/or index pages on disk always takelonger than in SHMEM. Full table scans can take minutes ormore, according to table size• Reading data pages in SHMEM is very fast. Full scan of atable in SHMEM take fractions of seconds or seconds, rarelymore.• Reading index pages in SHMEM is also very fast. Added tothis, due to the B TREE structure, reading index pagesgenerally handles more contents than reading data pages.• This often makes difficult the comparison of the efficiencyof 2 different indexes on the same table, when reading inSHMEM.5
  6. 6. Derivated thoughts and facts (continued)• When running hot measures on indexes, the differencescan be as low as milliseconds BUT …• Repeating millions of times 3 unuseful milliseconds canmake a difference!• When the Response Times get to such a low level, sqltraceis the tool you need to understand the query behaviour.• In certain situations, saving milliseconds on a query willmake the difference. In other situations, saving seconds willnot make the difference.• A bad response time can be caused by an unappropriateindexation, but can also be caused by some « unusual »logic adding unuseful efforts to be performed by theapplications and the server.6
  7. 7. Comparing cold measure with hot measure (1)• full scan of a mid-sized table tpcc:order_line,containing 24 millions of rowsse l e ct * from order_lineon s t at -g his output« Cold » read: performed just after oninit -v« Hot read: performed after the first scanMany disk pages readzero disk pages read47.4 secs 19,4 secs secsAll buffer reads7
  8. 8. Comparing cold measure with hot measure (2)• Cold use of a poor selectivity indexselect * from order_line where ol_w_id = 10 ( duplicate index on w_id, 50 distinct values)Cold read Hot readFew disk readsMany disk readsExecution time: 5,9 secs Execution time: 1.1 secs8
  9. 9. BATCHEDREAD_INDEX: description• This feature has been taken from XPS andintroduced in 11.50xC5.• The purpose is to maximize the index keys accessby grouping the reading of many index keys intolarge buffers, then fetching the rows associatedwith those keys• This technique brings strong savings in terms ofCPU and IO, therefore reducing Response Time.• This technique is suitable and efficient formassive index reads (DS/OLAP), not for pinpoint-type (OLTP) index access.9
  10. 10. BATCHEDREAD_INDEX: the test• We will run the following query against a 30millions rows clients table. The table has anindex on ‘lastname’. Row size is 328 bytesoutput to /dev/nullselect lastname,count(*)from clientsgroup by 1• This query returns 2,188,286 rows10
  11. 11. BATCHEDREAD_INDEX: facts• All those response times are measured as « cold »AUTO_READAHEAD 0BATCHEDREAD_INDEX 0• AUTO_READAHEAD 0BATCHEDREAD_INDEX 1• AUTO_READAHEAD 1BATCHEDREAD_INDEX 1See the difference11
  12. 12. BATCHEDREAD_INDEX: how ?• BATCHEDREAD_INDEX can be set, as well asBATCHEDREAD_TABLE, either in the onconfig file• Or used as an environment variable beforelaunching the applicationexport IFX_BATCHEDREAD_INDEX=1• Or as an SQL statementSET ENVIRONMENT IFX_BATCHEDREAD_INDEX 1;• Monitor index scan activity with onstat –g scn•12
  13. 13. Attached or Detached Index?• The « Antique Informix Disk Layout » used to create the index pages in the sameextents as the data pages for the attached indexes. The expected result wasreducing disk IO.• This layout happened to become a problem because the data pages were oftenlocated far from the index pages, causing the opposite effect of increasing disk IO.The official recommandation was at this time to create detached indexes for thisreason.• Nowadays, index pages are created in a different partition than the data pages,causing the attached indexes to have the same level of performance as thedetached indexes.• But.. If you have the possibility to create the data dbspaces and the indexdbspaces on independant disks and channels , you will increase your disk IOperformance by reducing disk contention.• This gain will be observed mainly during intensive sessions doing massive datachanges.• Watch out the output of onstat –g iof and look for low IO thruput per second.13
  14. 14. Few columns or many columns in the same index?Key points to consider• Remember about « cold » reads and « hot » reads whentesting the efficiency of an index. Results can bedramatically different between cold and hot.• The choice is as often a hard to obtain trade-off, anddefinately a long subject to discuss!• Many columns in a index can make it more selective, but italso will consume more CPU/disk resource when updatingkeys (see b-tree cleaner tuning)• Few columns in an index can make it less selective, but itwill consume less CPU/disk resource when updating keys• Integrity constraints are not negotiable, but some integrityconstraints indexes can be negotiated…14
  15. 15. Few columns or many columns?Techniques to evaluate efficiency• time dbaccess dbname queryfile gives anindication on the efficiency of an index, but can bemisleading due to cold and hot measure hugedifferences.• onmode –Y sessnum 1 will identify whichindex(es) are used, also will inform on how many rowshave been scanned against how many rows have beenreturned• onstat –g his (sqltrace) will give fine detailabout response time, buffer and disk access, lock waitsetc…• A complete diagnostic will be done with the 3 tools.15
  16. 16. Few columns or Many columns?Let’s analyze a real case: one column16Rows scanned: 4913Response time: 0.0368’’1 column indexbuffer reads: 5900
  17. 17. Few columns or many columns?Same case, index with 2 columns17Rows scanned: 106Response time: 0.0047’’2 columns indexBuffer reads: 122
  18. 18. Highly duplicated lead columnsindexes: how was life before?• The Antique Informix Rule stated to avoid multi-columns indexes with low selectivity for theleading keys, due to poor efficiency.Ex: warehouse_id,district_id,order_id,order_line• Querying on order_line required to specify thelead columns in the query predicate, or createanother index with order_line as lead column• Restructuring indexes following those rules was acomplex, long and risky task, not to mention thefact that any downtime due to index rebuildingwas poorly accepted by Operations Managers…18
  19. 19. Index key first & self join : it’s magic!• The key-first scan was introduced in 7.3. It has been enhanced sothat an index can be used even the lead columns are not specifiedin the where clause• The index self join technique has been introduced in IDS 11.10,although many DBA’s didn’t even notice it!• By scanning subsets of the poorly selective composite index, theengine manages to use the non-subsequent index keys as indexfilters, transforming the index into a highly selective index.• Hierarchical-like indexes with highly duplicated lead columns nowneed no redefinition to be efficient.• You need not building new indexes with highly selective leadcolumns. This saves optimizer work and disk space.• Index self join is enabled by default. You can, if you persist in notusing it, disable it either by setting INDEX_SELFJOIN 0 in onconfig orwith an optimizer directive {+AVOID_INDEX_SJ}19
  20. 20. Index self-join: the test• We will use the order_line TPC-C table, that contains23,735,211 rows• The index follows the hierarchy, which was formerlyconsidered as a poor implementation:ol_w_id: warehouse id (50 distinct values)ol_d_id: district id (10 distinct values)ol_o_id: order number ( 9279 distinct values)ol_number: order line number (14 distinct values)• The challenging query isSELECT ol_d_id,ol_o_id,avg(ol_quantity),avg(ol_amount)FROM order_lineGROUP BY 1,2ORDER BY 2,320
  21. 21. No Self join• Use onmode -wm INDEX_SELFJOIN=0 to disable self join21Index is taken, but only key firstMany rows scannedResponse time: 11.258’’
  22. 22. Self join: find the differences!22Key-first + self join accessRows scanned: =~ 100 times lessRT: 3.313’’
  23. 23. The Antique Informix Rule says:“you will use only one index per table”
  24. 24. The AIR says:“you will use only one index per table”• The Antique Informix Rule stated that only oneindex per table could be used• The optimizer had to choose only one indexamong several indexes for the same table,although several indexes were needed.• Many not so unrealistic query cases had to bedrastically re-written in order to provideacceptable response times• The trick was generally to use an UNION or anested query, but the query code readability andmaintenability suffered from that.24
  25. 25. What A.I.R. obliged you to do• Generally, the best way to workaround the RTissue was to use either UNION or nested queries• The trick could be efficient in terms of ResponseTime, but the code got more complex to read andto maintain• This workaround needed to strongly modify theapplication code, and needed detailed andaccurate tests to obtain the same results as withthe initial query25
  26. 26. The optimizer constantly gettingsmarter across releases• An optimizer enhancement introduced the useof several indexes on the same table, but onlyif the where clauses were linked with the ‘OR’operator.• The query path is like a usual INDEX PATH, thedifference being the use of several indexes26
  27. 27. Measure with INDEX PATHUse of 3 indexes!Simple INDEX PATHScanned rows: 376,000RT: 2.489’’27Disk reads:: 34136
  28. 28. Multi index: different path33% gain in RTMulti-index /skip scan enabledResponse Time is shorter3 indexes usedDisk reads: 198428
  29. 29. Multiple indexes:what should be done?• Generally, the optimizer decides correctly which is the best path• You can compare the results with the use of UNION, then decidebetween keeping hard to maintain code or not• You can nonetheless use optimizer directives to force the accessmethod, like{+ AVOID_MULTI_INDEX (clients)}To force INDEX PATH• Or{+ MULTI_INDEX (clients)}TO force multi index SKIP SCAN path• Can get tricky to make a self choice if AND and OR conditions areset on the involved indexes• The difference is almost not visible in case of hot measure• Statistics on indexes are very important, the access method canchange according to them!29
  30. 30. Star join• Star join is an extension of the MULTI INDEX concept• It combines this technique with DYNAMIC HASH JOINS• The technique has been ported from XPS to IDS 11.70• It is used exclusively for DS/OLAP queries where a FACTtable is the center point of many dimension tables• Requires PDQPRIORITY ( Ultimate Edition or EnterpriseEdition )• If you consider using Star Join, you are an excellentcandidate to see a demo of Informix WarehouseAccelerator!30
  31. 31. The A.I.R says:« you will avoid indexes with too many tree levels »• Ok, but what could I do to solve that ?My indexes are built with the data theyhave inside, and nothing or almostnothing can be done• Databases and tables are gettingbigger and bigger, andsplitting/archiving part of the data isnot always an acceptable solution31
  32. 32. FOREST OF TREES INDEXES• The forest of trees index type has beenintroduced in 11.70 xC1• It replicates the model of a traditionnal B-TREE, having several root nodes instead ofonly one root node• The forest of trees brings benefits whencontention against the root node is observed32
  33. 33. Reducing b-tree levels numberon index « lastname,firstname »• create index "informix".id_clients_02 on "informix".clients (lastname,firstname) using btree=> The initial number of b-trees levels is 6• create index "informix".id_clients_02 on "informix".clients (lastname,firstname) using btree hash on (lastname) with 10 buckets=> The number of b-trees levels decreased to 5• create index "informix".id_clients_02 on "informix".clients (lastname,firstname) using btree hash on (lastname) with 100 buckets=> The number of b-trees levels decreased to 4• create index "informix".id_clients_02 on "informix".clients (lastname,firstname) using btree hash on (lastname) with 1000 buckets=> The number of b-trees levels decreased to 333
  34. 34. Tpcc with regular b-tree indexes• Index iu_stock_01 has 4 levelsTpcc result is 14093 tpmCHigh contention oniu_stock_01: 8,704,052 spinsin 4 mn34
  35. 35. Tpcc with FOT on iu_stock_01• create unique index iu_stock_01 on stock (s_w_id,s_i_id)using btree in data03 HASH on (s_w_id) with 50 buckets;• Index iu_stock_01 has now 3 levelsResult grew to 16413 tpmCContention on iu_stock_01decreased from 8,704,000to 149,600 spins in 4mniu_oorder_01 is now a goodcandidate for FOT!35
  36. 36. Main facts on FOT indexes• FOT is very efficient on reducing concurrency on indexesaccess => Better RT in OLTP context• FOT is very efficient to reduce levels of B-TREE => Betteroverall RT• Ideal for primary keys and foreign keys in an highconcurrency OLTP context• Implementation is easy and fast• Supports main index functionality: ER, PK, FK, b-treecleaning…• Does not support aggregate queries, range scans on HASHON columns• Also does not support index clustering, index fillfactor andfunctional(UDR based) indexes36
  37. 37. Optimizing big index creation:PSORT_NPROCS• The PSORT_NPROCS env variable is used to allocate morethreads to the sort package, which is also used for parallelindex creation.• Significant performance improvements on index creationcan be obtained on multi-core/multi-processor servers• It can be used even with non PDQPRIORITY-enablededitions if the server has more than one core/CPU.• PSORT_NPROCS can unleash the memory consumption:please check for available memory on the server.• The onconfig parameter DS_NONPDQ_QUERY_MEM has tobe checked if using PSORT_NPROCS.37
  38. 38. Optimizing big index creationDBSPACETEMP or PSORT_DBTEMP• The env variables DBSPACETEMP overrides thesame onconfig parameter.• Generally raw-device based temp dbspaces offermore performance than file system based files.• PSORT_DBTEMP write temporary sort files in thespecified file-system based directories instead ofDBSPACETEMP.• It is useful to spread the temporary sort files to awider list of directories mounted on differentspindles38
  39. 39. PSORT_NPROCS/PSORT_DBTEMP:facts• create index id_clients_02 on clients(lastname,firstname)• unset PSORT_NPROCSunset PSORT_DBTEMP=> 13m28.709s• export PSORT_NPROCS=3export PSORT_DBTEMP=/tmp:/ids_chunks/ids_space01:/ids_chunks/ids_space02:/ids_chunks/ids_space03=> 6m19• A ram disk, or even a SSD drive can improve performance a lot:export PSORT_NPROCS=3export PSORT_DBTEMP=/mnt/myramdisk=> 4m22.030s• To check the environment of the session:onstat –g env SessionNumber39
  40. 40. Index disable: What happens?• Disabling an existing index will prevent the server from using thisindex, but it will « remember » the index schema.• This technique can be applied before executing massive data insertor update, since it will alleviate the index keys update workload.• Heavy side effects can be expected: loss of key unicity, loss ofperformance…• If you run a query on a disabled index, the optimizer will probablychoose a sequential scan unless a better path is found.• The index will be seen as ‘disabled’ in dbschema, but will not beseen in oncheck –pT no oncheck –pe• Disabling an index will make its former disk space available in thedbspace• Disabling an index is immediate• Syntax is: set indexes IndexName disabled40
  41. 41. Index enable: what happens?• Enabling an index will rebuild the index physically,with the same definition as before• Enabling an index takes as much time as creatingthe same index• But the enable statement is simpler to type than thecreate index statement • + you do not have to remember the initial createindex statement • Syntax is: set indexes IndexName enabled41
  42. 42. Digging for more performance:Disable foreign key indexes• Many times, foreign key indexes are a part of the same table’s primarykey.• order_line primary key (ol_w_id,ol_d_id,ol_o_id,ol_number)order_line foreign key (ol_w_id,ol_d_id,ol_o_id)• Using ‘disable index’ in the add constraint statement will save thecreation of an ‘unuseful’ index, because its structure is already existingin the primary key.• ALTER TABLE order_line ADD CONSTRAINT(FOREIGN KEY (ol_w_id,ol_d_id,ol_o_id)REFERENCES oorder(o_w_id,o_d_id,o_id) CONSTRAINT ol2 INDEX DISABLED);• This implementation will save disk space by dropping an index• CPU resource will be saved when updating/deleting/creating index keys,• and consequently disk IO will also be saved.• Check that disabling the constraint index has no hidden side effects, anmistake can have expensive consequences!42
  43. 43. I need to create a new index,but users are always connected to the table!• Sometimes a new index needs to be created, butthe tables are accessed by users or batches.• IDS 11.10 introduced the possibility to create anindex without putting an exclusive lock on the table,called index online.• Users can SELECT, INSERT, UPDATE or DELETE rowsin the table while the index is being created• Syntax is:create index id_clients_01 on clients(lastname,firstname)ONLINE• Drop index online is also available in the sameconditions43
  44. 44. Create index online:precautions & restrictions• The create index online is a complex operation, involvingtable snapshot, base index build catch up and more.• It will request additional resources, such as disk space, CPUand memory in order to make the operation safe and asfast as possible.• Long transactions may happen: check logical logs sizebefore diving• The index pre-image pool memory size is managed with theonconfig parameter ONLIDX_MAXMEM, updatable withonmode –wm• No appliable for cluster index, UDT columns, no UDRindexes• Only one create index online per table at the same time44
  45. 45. Index compression• IDS introduced table compression in 11.50 xC4. This technology is nowused successfully in large databases implementations.• Index compression is a new feature of IDS 12.10. It is based on thesame technology as table compression.• The principle is to compress the key columns values at b-tree leaf level,but not the rowids attached to these key values• Index compression is very effective for indexes having large key values:names, item names etc…• The compression dictionary must contain at least 2000 unique keyvalues• Index compression is an excellent way to save disk space, and …• Since more key values fit in an index page, more key values can be readin one IO cycle => IO is more efficient• Reducing IO must enhance index access performance in large queries45
  46. 46. Index compression:Disk space gained• Execute function task ("index compress", "id_clients_01", "staging");• Orexecute function task(“index compress”, “j”,“testdb”);• Orcreate index id_clients_01 on clients(lastname,firstname) compressedMore than 50% compression rate46
  47. 47. Cluster index• The creation or alter of a cluster index will physically sortthe table data by the first column of this index at creationtime• Accessing a table data with a cluster index will read alreadysorted data pages.• Generally makes IO on data pages easier because they arecontiguous => Decrease RT• The cluster level will decrease as long as new rows areinsert• High cost of administration: re-clustering this index willrewrite the table data pages• Cluster index can be good for stable tables accessed in aordered sequential way47
  48. 48. Statistics on indexes• Introduced in 11.70: when one creates an index,the distributions for this index are automaticallycreated• High mode statistics are generated for the leadcolumn• Index levels statistics are also generated in lowmode• This will not stop you from regularly updatingstatistics for those indexes, but it is no morerequired to do it just after the index creation
  49. 49. Questions?Indexing techniques: which one to use whenEric Vercelletto Begooden IT Consulting