The Beauty of Informix Disk StructuresPresented by Frédéric Delest Written by Andreas Legner
What to expect• On-disk persistence of an Informix Server instance• Touch on layout of spaces and chunks• Pages and page types• How’s your data stored in partitions• Ways to look at what’s on disk• Hands-on – Finding a spec. row of data in your server instance• Your questions answered – Many things only documented vaguely nowadays, so wonder what you still know ;-)• Hope there’s something new for everyone!
We’ll be talking…• Partitions – What all your tables and indices consist of – Even things like sequences or timeseries• Pages – What a whole instance is based upon – Changed heavily over time – and still remained the same• Dbspaces & Chunks – Two very persisting species as well – through all evolution since earliest versions of Informix• Physical & Logical Logs – How old is your oldest phys. or log. log file?• All this supporting an ever growing, heavily expanding set of functionality – Allowing for extremely seamless, reliable, unexpensive and fast migration from v7 through v11.7 (and back) Ain’t this Beauty? Simplicity designed for sustanability.
Test Environment• Informix Virtual Appliance – Same as used for other sessions• The main demo instance: – INFORMIXDIR=/opt/IBM/informix – INFORMIXSERVER=demo_on – ONCONFIG=onconfig.demo_on – ROOTPATH /data/IBM/informix/demo/demo_on/online_root
Jump right into it• What makes up an Informix server instance when it’s down? – $INFORMIXDIR & $ONCONFIG – onconfig root chunk info – Chunks• A chunk – A device (“raw”) – A file (“cooked”) – Actually a contiguous portion of them • Starting at an offset • reaching <size> kiloBytes further • NOT initialized anyhow as a whole • Unless newly created as a (cooked) file: blown up with zero bytes Can’t use ‘sparse files’ – for obvious reasons • Only first and third pages (0 + 2) are initialized
The Root Chunk• The Root Chunk is the only chunk initially – Making up the Root dbspace (“rootdbs” usually) – Holding everything required – In specific order• … and will remain the key entry point to e.g. all other chunks• Begins on so called “root reserved pages” – Starting from here anything else can be found• Followed by a single chunk free list page – Every chunk logically begins in a chunk free list page recording its free space – Only blob chunks (chunks of a blobspace) don’t have these – they are a totally different kind• Followed by the dbspace’s master partition – “partition partition” or “TBLSpace TBLSpace”• (Almost) anything beyond this can change – Database partition <– this would never move – The physical log – Initial logical logs – System and user databases …
Dbspaces/Sb(lob)spaces• Up to m logical collections of 1 – n chunks each – We’ll see what m and n can be• Home of – Partitions in case of dbspaces and – partially – sbspaces – Sblobs in case of sbspaces – Blobs in case of blobspaces• Minimum entity of a backup or restore• ‘Critical’ – Rootdbs or – Dbspace containing physical or any logical log – Must be contained in any dbspace backup or L 0 restore
A Fresh Instance For newbies (or others still wishing to know – do this whenever you want to test something):• Let’s create a new baby instance: – INFORMIXSERVER=baby – ONCONFIG=onconfig.$INFORMIXSERVER – Copy $INFORMIXDIR/etc/onconfig.std to $INFORMIXDIR/etc/$ONCONFIG – Edit new config file: • ROOTPATH /tmp/root_chunk.baby • Lower ROOTSIZE, PHYSFILE and LOGSIZE by factor 10 • MSGPATH $INFORMIXDIR/online.baby.log • SERVERNUM 123 • DBSERVERNAME baby – Add an entry to $INFORMIXDIR/etc/sqlhosts (unset INFORMIXSQLHOSTS): • baby onsoctcp localhost 9876 – oninit –ivy to initialize new instance on disk – onstat –d to see chunks and dbspaces we have - one only – onstat –m –r 2 to see when system databases creation is done
oncheck -p…• oncheck’s -p option adds printing to checking – DBA’s first choice for looking at disk objects – -pr|R for printing reserved pages – -pP for locating pages physically, taking chunk# and page offset (base pages) – -pp for locating pages logically within a partition, taking partnum and log. page# – -pe for extent listing – -pt|T for printing partition pages – … pd|D|k|K|l|L for data and index pages• Some options only working when server is up – Esp. when needing more detail info than just a chunk• Others first attempting a connection – Might have to wait up to $INFORMIXCONTIME seconds (default: 60) – when server is down• When server is up it will always go through the server – Hence show you buffer cache content rather than reading from disk
First Peek at a Chunk• Do an ‘oncheck -pe [rootdbs]’ – Extent listing • we’ll clarify “extents” later – Can limit output to specific space • not any further … so can be big – Only available online (or quiescent) – And with all the space’s chunks online (!) • Won’t work if one chunk in space is down• Try and locate the objects mentioned so far
oncheck -peDBspace Usage Report: rootdbs Owner: informix Created: 01/26/2011 Chunk Pathname Pagesize(k) Size(p) Used(p) Free(p) 1 /tmp/rootchunk.baby 2 100000 52256 47744 Description Offset(p) Size(p) ------------------------------------------------------------- -------- -------- RESERVED PAGES 0 12 CHUNK FREELIST PAGE 12 1 rootdbs:informix.TBLSpace 13 250 PHYSICAL LOG 263 15000 LOGICAL LOG: Log file 1 15263 500 LOGICAL LOG: Log file 2 15763 500 ...• ‘p’ is pages – base unit of a chunk• First 3 items always the same – Root reserverd pages – Chunk’s first chunk free list page – TBLSpace TBLSpace’s first extent• All 3 can have “extension”
Pages and Page Sizes• A chunk is made up of pages• Base i/o unit is a page – Also data and index buffering occurs in pages• 2kB entities (4kB on AIX and Windows) by default – Mandatory page size on “critical dbspaces”: root dbspace or dbspace holding any phys. or log. logs• Configurable page size for other, non-critical dbspaces – Per dbspcace – At dbspace creation time – In multiples of default page size, up to 16k• Different game in blobspaces and sbspaces – Blobsspaces always had freely choosable pages sizes (multiples of base page size) – Sbspaces use default (base) page size … no matter what people (or Informix installers) keep telling you ;-)
How to look at a page?• oncheck -pP <chunk_no> <page_offset> [#pgs] [-h] – Prints page header – Prints page slot table and slots if applicable • Unless -h (headers only) specified – <#pgs> to see multiple pages • (not working yet with non-default page size) – Requires <page_offset> specified in base (default) pages !• SMI: – sysrawdsk look at pages as raw space – syspaghdr look at page headers only – Both indexed, but not very smart – e.g. can’t well use <=/</>/>= – Use base pages for offset! – Use carefully – not too safe, esp. with non-default page size!• onstat: when pages in memory• dd / od / … – Latter two provide more ‘natural’ image of a page
Page Structure• (Almost) every used page has – a 24byte page header – a trailing stamp (last 4 bytes)• When header and stamp match, the page is considered consistent in itself – At least it has been written completely – A checksum mechanism used nowadays – used to be two stamps that needed to match• Page content usually is organized in slots• Slot table – growing from page end – Entries describing slots• Unused pages – no structure or consistency assumed• What is ‘unused’ ? – Not allocated to any object, so FREE in the chunk – Or beyond it’s object’s “npused” (# pages used)
Some Pages Now• Try this now: – oncheck -pP 1 0 12 > first12.pgs• Find – Page headers – Slot tables and entries – Slots• What is it what we’re looking at?• Try to dump the same using ‘dd’ and/or ‘od’ – dd if=$ROOTCHUNK bs=2k count=12 | od -A x -t x > first12.hex
Page Types• Many different page types – oncheck -pp|P naming them in page header output portion – Encoded in lower bits of page flags• ROOTRSV: root (and extended) reserved pages recording system configuration• CHUNK: chunk free list pages, recording FREE extents first one always at fixed position 2 in a chunk chained if one doesn’t suffice• FREE: partition free bitmap, recording page’s use state within a partition at fixed intervals within a partition first one always logical page 0• PARTN/SECPARTN: partition pages and secondary partition pages a partition’s details, incl. in-place alter history• DATA/REMAIN: table data row and overflow (remainder) pages• BTREE: btree index page (root/twig/leaf node)• PBLOB partition blob page• BLOB/BMAP/BBITblobspace pages
Slots• Page content organized in slots normally – Only few page types don’t need real slots (chunk FREE list, bitmap, plog marker, any sort of blobspace pages …)• Slot – A contiguous range of bytes within a page – With a 2*2bytes slot table entry describing it • Slot begin and slot size, optional slot flags – Space consumption of a slot: slot size + 4 – Slot size can be zero – deleted slot – Slot table size, growing from page end: page’s #slots * 4• Page can have up to 2k slots – E.g. large index pages can have this many – Certain pages have much lower limits, for various reasons • DATA, REMAINDER, PBLOB: max. 255 slots reason: ROWIDs (we’ll see later) • Reserved pages only few (tens) reason: slot vs. page sizes
Reserved Pages• Try this: – oncheck -pr > first12.txt• compare to what we’ve dumped earlier – Formatting those 12 “reserved pages” • We’re seeing: – Page Zero: version information primarily – Onconfig params and values (not all) – Physical/Logical log definitions, and last Checkpoint details – Dbspace definitions – Chunk definitions – Archive details and Data Replication status – Yet not all of them are displayed • Some are paired – for recoverability reasons • Only more recent of pair is taken – In a larger instance many more are displayed … • But not mentioned individually, as extra (extended) reserved pages • Initial 12 can only hold very limited amount of details
Reserved Pages Extension Root Reserved Pages Extended Reserved Pages• Log. logs, dbspaces and chunks can be many Zero Config More logical logs…• To accommodate their definitions reserved pages Ckpt1 More logical logs… can be extended Ckpt2 More space specs…• Extensions for each sort Dbsp1 always in contiguous blocks More space specs… – Within “rootdbs” chunks Dbsp2 PChunk1• Root reserved page pointing More pchunk specs… to its extension PChunk2 More pchunk specs… – pg_next: start page MChunk1 – pg_prev: extension size MChunk2 More mchunk specs… Arch1 More mchunk specs… Arch2
Extents• Contiguous sets of pages allocated to a certain purpose – E.g. to a partition, or forming a log file• Within one chunk• Arbitrary size: 1 page up to (almost) chunk size• Oncheck –pe: listing all extents of a dbspace (or whole instance)• S.a sysextents SMI table
Sorts of Extents• Possible extents: – Reserved pages – root and extensions – Chunk free list pages – single page extents – Physical log – 1 large extent – Logical logs – 1 extent each – Partition extents – data/index partitions consist of 0 - many extents – Unused areas of a chunk: FREE extents• So what’s needed to read to compile a complete extent list? – Reserved pages (for log files) – Chunk free lists – Partition pages
Partitions• Partitions form the containers for database objects recorded, by their Partnum or Fragid, in database catalogs – Tables (and their fragments) – Indices (and their fragments) – Sequences – relying on a partion’s ability to generate serial values – Even external tables possess a (dummy) partition – for having a partnum – Sbspace metadata• Thinking of a partition as a ‘file’ (containing the partition data) – partition (header) page would be the ‘inode’ – Partition extents would be ‘blocks’ – dbspace would be the ‘file system’
Partitions (cont.)• A partition (“tablespace”) consists of – Its partition header page • Holding the details that describe the partition • Potentially extending to secondary partition pages – A collections of allocated extents• Partitions resides in a (db-/sb-)space, one abstraction level above chunks – Their extents reside in the space’s chunks• All partitions of a space are recorded, by their partition header pages, in the space’s Partition Partition – aka. “TBLSpace TBLSpace” – The space’s master partition - the very first one – Holding the spaces partition pages
What’s a Partnum? • Visualizing a dbspace first:Dbspace: DbsNo rp off flags 1.chk #chks flags (b)pg_sz name 4 0 354 60001Primary chunks: 4 3 N--BA 1 datadbs Reserved Pages chkno rp off dbsno nxchk offset fpage #bpages #freepgs ovhd f l a g s pg_sz path 4 0 39c 4 5 0 - 1000 0 30040 PO-B 2048 /data/IBM/informix/demo/demo_on/datadbs_1 5 0 4c8 4 6 0 - 2500 2 30040 PO-B 2048 /data/IBM/informix/demo/demo_on/datadbs_2 6 0 5f4 4 0 0 - 4000 270 10040 PO-B 2048 /data/IBM/informix/demo/demo_on/datadbs_3 0 1. chunk …/datadbs_1 99 2. chunk …/datadbs_2 3. chunk …/datadbs_3 … Partition Partnum … Tblspace tblspace 0x00400001 100 199 FREE + free list Table_1 0x0040005b Table_2 0x004000c2 Table_3 0x00400062 Table_4 0x00400005
So … What’s a partnum?• A partnum is a 4bytes integer number – Uniquely identifying a partition – Falling into 1.5 bytes “dbspace number” – And 2.5 bytes “logical page number” – Hex representation: 0xdddlllll• What does this mean? – Each dbspace can hold partitions (TBLSpaces) – It always holds a master partition (TBLSpace TBLSpace) – All other partitions are recorded in this master partition – The master partition only contains partition header and secondary pages – Each partition header page describes one partition – The ‘lllll’ fraction of a partition’s partnum is the number (position) of its partition header page within the dbspace’s (‘ddd’) TBLSpace TBLSpace• What special partnum then is 0xN00001 ? – TBLSpace TBLSpace’s own partnum for dbspace ‘N’
Looking at a Partition Page• oncheck –pt|T db:owner.table[,dbs] | partnum• Finds the desired partition header page(s)• Tells you the following recorded in those pages – General partition info – slot 1 • Partnum, date, flags, rowsize, … – Extents allocated to this partition – slot 5 – Evtl. a pointer to the partition’s current compression dictionary – slot 7 – Partition name printed is NOT taken from partition page – determined from catalogs instead• Specifying a partnum will target only this one partition page – Will attempt to resolve partition name querying systables• Otherwise all partitions of the specified table are targeted – Single data partition – or multiple in case of a fragmented table – Index partitions – each index normally has its own partition (detached)• -pT: will scan an entire (set of) partition(s) to gather page statistics • Index/Data/Bitmap page types and usage • Index usage reports • In-place alter versions• Only working with the server running
Partition Page ‘raw’• oncheck –pp 0x<N>00001 <L> – What’s the difference ? – Not formatted as a partition page – but “complete” instead ;-)• Try and compare the following: – oncheck -pt 0x100001 – oncheck -pp 0x100001 1 – In how far are these the same? – In how far different?
Find a specific Data Row now• Given a specific row in a fragmented table – dbname:[owner.]tabname[,fragdbs|%partition]:rowid – or a partnum:rowid combination, e.g. from a log record – What would it take to get to that row manually?• First let’s learn what’s to be done under the hood• Let’s assume the partnum is known already – Can be obtained from systables or sysfragments – Let’s say: partnum 0x400079, rowid 0x00000a01 – Or obtain e.g. from systables.partnum
So what’s a Rowid ? A Partition• A rowid describes the precise Bitmap location of a row within a 1st extent Page Page 0 partition/fragment: Page 2nd extent – 0xppppppss - 4byte integer page 4 header – High 3 bytes: logical page 3rd extent slot 1 Rowid: number within partition page 8 slots …. slot n 0xa01 – Low byte: slot number with page 4th extent• Not to be confused with the “WITH ROWID” shadow Slots column (frag’d table) …. – A real number assigned to a row 5th Extent ...
Paths to Our Row’s Page (1) So we need extent info for our partition (identified by partnum) – Want to physically locate the page containing our row – Either walk all the way by foot, via the partition pages – Or use pick from a formated extent list• Crawling: Find partition page for partnum and use its extent list for translation • Dump Tblspace Tblspace partition page: 4th page in space’s first chunk - this is fixed • Slot 5 has the extent list - we’re on Linux, sorry for wrong endianess • Take partnum’s “logical page” portion • Convert to physical address using raw extent list found • Determine location of target partition page and dump it as well • Use that page’s raw extent list for translating your rowid into a physical page
Paths to Our Row’s Page (2)• Walking: Using formatted extent list • Obtain an extent list (oncheck –pe) • Determine table name (from system catalog) • Find extent matching your matching (can be confusing if table is fragmented) • OR: use extent list in ‘oncheck –pt <partnum>’ output • Calculate precise phys. location (extent start plus log. page difference)• Driving: – oncheck -pp <partnum> <logical_page>
The Row Finally• oncheck will dump the page’s slots in raw hex format – Pick the one your rowid is pointing to• What’s easy to determine – Does the row exist? No, if slot is missing or zero length. – Does the slot length fit the partition’s row length? • Might be shorter in case of variable length data types.• If you need to know what’s in this row – E.g. page can’t be read any more (inconsitent) – No way around applying the table’s schema byte by byte – Way beyond this 1 hour talk ;-)
Indirect / Incomplete Rows• Row not fitting your schema? – Too short somehow?• Strange looking slot length – way too large?• High bit set in a DATA page slot length means – first 4 bytes in slot are no DATA – Instead they’re a forward pointer – In the form of another 4byte rowid (0xppppppss)• An indirect row or an initial piece of a row obviously – Need to look up its next/remainder piece – Located on so called REMAINDER pages – Row can consist of multiple such pieces (32k max row length)• What fun looking at such rows in their entirety!
Watch out for IPA!• Row still not fitting our schema??• DATA page header having strange value in its ‘page next’ field??• Then we’re on an old version page! – What’s that again? – And can this be combined even with row indirection (multi-piece rows)? Sure it can!• All rows on such page don’t fit the table’s current schema – Instead they’re in the shape of a previous schema this table had – Before potentially a whole series of ALTER TABLE statements – These ALTERs have been performed in in-place fashion – no real changes yet• Some real dirt work starting here, again at our partition page – There we learn about a series of secondary partition pages – Keeping a memory of all outstanding in-place ALTERs – Partition page’s pg_next field has the TBLSpace TBLSpace log. page# of the first such ALTER page
Compression• Neither row indirection nor IPA can explain what my row’s looking like? – Moreover it does look like real garbage! – And that slot length is an oddity – way too big• Is this partition compressed? – Consult ‘oncheck -pt’ output, it would tell• Is this row compressed? – The slot length field would have its second highest bit set• Again next step would be our partition page – Slot 7 has the pointer to the current compression dictionary – Also oncheck -pt should show this information• Then uncompress the row using the uncompress dictionary – Not here, not now …