Building a Hierarchical Data Model Using the Latest IBM Informix Features

Building a Hierarchical Data Model
Using the Latest IBM Informix
Features
Ajaykumar Gupte
gupte@us.ibm.com
1

Agenda
●
Problem of querying hierarchical data
●
Hierarchical data design
●
“Connect By”- keywords & pseudo columns
●
Execution model
●
Query transformation

Problem of querying hierarchical data
• Common technique of storing hierarchical data in
relational tables is self-reference
– Employee-Manager
• Employee table (key – empid)
• Every employee has a manager (indicated by mgrid)
• Manager is also an employee (with a valid empid)
– Shipment
• Inbound shipment table (key – item_id)
• Each item can belong to a package ( key –
package_id)
• Every package is itself an item (with a valid item_id)
CREATE TABLE employee (
empid INTEGER NOT NULL
PRIMARY KEY,
name VARCHAR(10),
salary DECIMAL(9, 2),
mgrid INTEGER);
CREATE TABLE inbound_shipment (
shipment_id VARCHAR(50),
item_id VARCHAR(20) ,
package_id VARCHAR(20),
.......

ship_CX2555
Pallet_BX505 Pallet xxx
box_C3524box_C1255
Pallet xxx
band_aid… A1_pharma..band_aid.. vicks_.. vicks_...A1_pharma.. vicks_..
box_C4000
Tylenol.. Tylenol…

Characteristics/Limitations
■ Multi-step approach – requiring complex application/SPL logic
■ Recursive self-join
■ Filtering/ordering/grouping requires more additions
■ Joining results with other tables becomes complex
■ Reuse amongst other applications
– understanding of the complex logic (data placement etc)
– more customization

SELECT level as package_level, item_id,
package_id
FROM inbound_shipment
START WITH item_id = 'pallet_BX505'
CONNECT BY PRIOR
item_id = package_id
Using CONNECT BY to discover data
hierarchy
C o n d it io n o f r e c u r s io n
s e e d o f r e c u r s io n

Results of CONNECT BY Query
package_level item_id package_id
1 pallet_BX505 ship_CX2555
2 box_C1255 pallet_BX505
3 band_aid_H10 box_C1255
3 A1_pharma_F23 box_C1255

Hierarchical view of data
17
15 16
10 13 11 12 14
1 2 43 5 6 7 8 9
SELECT name, empid, mgrid
FROM emp
START WITH name = 'Goyal'
CONNECT BY PRIOR empid = mgrid
G o y a l Goyal 16 17
Zander 11 16
McKeoug
h
5 11
Barnes 6 11
Henry 12 16
O'Neil 7 12
Smith 8 12
Shoeman 9 12
Scott 14 16
empid mgrid

12
Flow of Execution
17
15 16
10 13 11 12 14
1 2 43 5 6 7 8 9
SELECT name, empid, mgrid
FROM emp
START WITH name = 'Goyal'
CONNECT BY PRIOR empid = mgrid
Stack
JOIN
16
PUSH
POP11 14
65 987

Where is hierarchical data ?

Bill of materials

Reporting structure

Package tracking

Inventory management

Social media

date/time

Geography / region

PRIOR■ Unary operator PRIOR is used in join filter to distinguish column references
of the last prior recursive step, from column references to the base table.
■ Query without PRIOR can result in a forever running query or single row
3 vicks_CK215 box_C3524
SELECT level , item_id, package_id
CONNECT BY PRIOR

LEVEL
■ Pseudo column that tracks the level of a node in hierarchy starting with level 1
for the root node.
■ Can be used in CONNECT BY clause as a filter to limit the depth of hierarchy
5 row(s) retrieved.
SELECT level as package_level,
item_id, package_id
where level < 3
CONNECT BY PRIOR item_id =
package_id

NOCYCLE
■ By default hierarchical queries return error when they detect cycle in the data
■ Used to allow the query to return all rows by ignoring the cycle causing row
insert into inbound_shipment(item_id,package_id) values ("ship_CX2555",
"pallet_BX505");
26079: CONNECT BY query resulted in a loop/cycle.
Error in line 9
Near character position 37
CONNECT BY PRIOR

NOCYCLE Example
2 ship_CX2555 pallet_BX505
6 row(s) retrieved.
SELECT level as package_level, item_id, package_id
where level < 3
CONNECT BY NOCYCLE PRIOR item_id = package_id

CONNECT_BY_ISCYCLE
■ Identify the nodes that would result in a cycle
package_level item_id package_id connect_by_iscycle
1 pallet_BX505 ship_CX2555 0
2 ship_CX2555 pallet_BX505 1
2 box_C1255 pallet_BX505 0
6 row(s) retrieved.
SELECT level as package_level,
item_id, package_id ,
connect_by_iscycle
where level < 3
START WITH item_id =
'pallet_BX505'
CONNECT BY NOCYCLE PRIOR

CONNECT_BY_ISLEAF Example
package_level item_id package_id connect_by_isleaf
3 band_aid_H10 box_C1255 1
3 band_aid_H12 box_C1255 1
3 A1_pharma_F23 box_C1255 1
3 A1_pharma_F33 box_C1255 1
3 vicks_CK215 box_C3524 1
3 A1_pharma_T30 box_C3524 1
3 A1_pharma_415 box_C4520 1
3 tylenol_BA341 box_C4000 1
20 row(s) retrieved.
SELECT level as
package_level, item_id,
package_id ,
connect_by_isleaf
where connect_by_isleaf = 1
'pallet_BX505'
CONNECT BY NOCYCLE
PRIOR item_id = package_id

SYS_CONNECT_BY_PATH
■ Expression which is used to build a string representing a path from the root row
to current row.
■ >>--SYS_CONNECT_BY_PATH--(--string-expression1--,--string-expression2--)--><
path pallet_BX505
item_id pallet_BX505
package_id ship_CX2555
path pallet_BX505box_C1255
item_id box_C1255
package_id pallet_BX505
item_id box_C3524
item_id box_C4520
item_id box_C4000
5 row(s) retrieved.
SELECT
sys_connect_by_path(item_id,"") as path ,
item_id, package_id
where level < 3
CONNECT BY PRIOR item_id = package_id

CONNECT_BY_ROOT
■ unary operator which, for every row in the hierarchy, returns the expression for
the row’s root ancestor
■ >>--CONNECT_BY_ROOT--expression----------------------------------><
root item_id package_id
pallet_BX505 pallet_BX505 ship_CX2555
pallet_BX505 box_C1255 pallet_BX505
5 row(s) retrieved.
SELECT
connect_by_root item_id as root,
item_id, package_id
where level < 3
'pallet_BX505'
CONNECT BY PRIOR item_id =
package_id

SIBLINGS
■ Attribute of ORDER BY clause to order the siblings at every level of hierarchy
■ Same semantics of ORDER BY but applied at siblings rows
level item_id package_id
5 row(s) retrieved.
SELECT level, item_id,
package_id
where level < 3
'pallet_BX505'
CONNECT BY PRIOR item_id
= package_id
order SIBLINGS by item_id

Query rewrite & Execution model
• Query rewrite
CONNECT BY PRIOR
SELECT level , item_id, package_id FROM
( SELECT level, item_id, package_id
WHERE item_id = 'pallet_BX505'
UNION ALL
SELECT level, ship.item_id , ship.package_id
FROM inbound_shipment ship, dtab
WHERE ship.package_id = dtab.item_id
)
AS dtab;

Execution model of recursive queries in IDS
TEMP TABLE
CYCLE OR
TRAVERSAL
SCAN
JOIN
UNION ALL
SORT
SCAN SCAN
SORT
SCAN
Scan of shipment
table
Scan of
shipment
table
order
siblings by
Connect
by filters
Top level scan on
derived table

sqexplainQUERY:
SELECT level as package_level, item_id, package_id FROM inbound_shipment
START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id
Connect by Query Rewrite:
select x0.level ,x0.item_id ,x0.package_id from
(select x1.item_id ,x1.package_id ,x1.item_id ,1 ,1 ,0 from
"informix".inbound_shipment x1 where (x1.item_id = 'pallet_BX505' )
union all
select x2.item_id ,x2.package_id ,x2.item_id ,(level + 1 ) ::integer
,connect_by_isleaf ,dtab_30093_173_stkcol from "informix".inbound_shipment
x2 ,"informix".dtab_30093_173 x0 where (dtab_30093_173_p_item_id =
x2.package_id ) )
X0
(item_id,package_id,dtab_30093_173_p_item_id,level,connect_by_isleaf,dtab_3
0093_173_stkcol)
S T A R T W I T H

Estimated Cost: 1
Estimated # of Rows Returned: 5
1) informix.dtab_30093_173: COLLECTION SCAN
Subquery:
---------
Estimated Cost: 13
1) informix.inbound_shipment: SEQUENTIAL SCAN
Filters: informix.inbound_shipment.item_id = 'pallet_BX505'
Union Query:
------------
1) informix.dtab_30093_173: SEQUENTIAL SCAN
2) informix.inbound_shipment: SEQUENTIAL SCAN
DYNAMIC HASH JOIN (Build Outer)
Dynamic Hash Filters: informix.dtab_30093_173.dtab_30093_173_p_item_id =
informix.inbound_shipment.package_id
Query statistics:
Table map :
----------------------------
Internal name Table name
----------------------------
t1 dtab_30093_173
type table rows_prod time
-----------------------------------
clscan t1 25 00:00.00

CONNECT BY Restriction

Multiple tables are not allowed
SELECT ship.item_id , ord.name
FROM inbound_shipment ship, orders ordinbound_shipment ship, orders ord
START WITH item_id = “pallet_BX505”
WHERE ship.item_id = ord.item_id
Rewrite to
SELECT item_id , name
FROM (SELECT ship.item_id, ord.name
FROM inbound_shipment ship, orders ord
WHERE ship.item_id = ord.item_id )
START WITH item_id = “pallet_BX505”

Tree node traversal
10
20 30
40 50
level c1 c2
1 10 0
2 30 10
3 50 30
4 20 50
5 40 20
2 20 10
3 40 20
7 row(s) retrieved.
c1 c2
10 0
20 10
30 10
40 20
50 30
20 50
6 row(s) retrieved.
select * from t1;
select level , * from t1 start with c1 = 10 connect
by prior c1 = c2;
10--30--50--20--40
10--20--40

Child to Parent Traversal
1 tylenol_BA500 box_C4000
3 row(s) retrieved.
SELECT level as package_level, item_id, package_id
START WITH item_id = 'tylenol_BA500'
CONNECT BY PRIOR package_id = item_id

SEQUENCE NUMBER GENERATOR
SELECT level FROM sysmaster:sysdual CONNECT BY level <= 10
S in g le r o w t a b leConnect by Query Rewrite:
---------------------------
select x0.level from (select 1 ,1 ,0 from sysmaster:"informix".sysdual x1 union all select (level + 1 ) ::integer ,connect_by_isleaf
,dtab_27465_191_stkcol from sysmaster:"informix".sysdual x2 ,"informix".dtab_27465_191 x0 where ((level + 1 ) <= 10. ) )
x0(level,connect_by_isleaf,dtab_27465_191_stkcol)
1) informix.dtab_27465_191: COLLECTION SCAN
Subquery:
---------
Estimated Cost: 5
1) sysmaster:informix.sysdual: SEQUENTIAL SCAN
Union Query:
------------
1) informix.dtab_27465_191: SEQUENTIAL SCAN
Filters: informix.dtab_27465_191.level + 1 <= 10
2) sysmaster:informix.sysdual: SEQUENTIAL SCAN
NESTED LOOP JOIN

Performance Considerations
• Queries are recursive and involves repeated self joins
• Use “PRIOR” Keyword, else query will run forever !!
• TEMP Dbspace used for hierarchy traversal (stack) and
cycle detection
• Configure - DBSPACETEMP

Conclusion
• Simple queries for complex reporting
• Useful for single or multiple data tree structure
• Easy to map path between two nodes/rows

Questions?
Ajaykumar Gupte
gupte@us.ibm.com
30

Building a Hierarchical Data Model Using the Latest IBM Informix Features

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Building a Hierarchical Data Model Using the Latest IBM Informix Features

Similar to Building a Hierarchical Data Model Using the Latest IBM Informix Features (20)

More from Ajay Gupte

More from Ajay Gupte (7)

Recently uploaded

Recently uploaded (20)

Building a Hierarchical Data Model Using the Latest IBM Informix Features

Editor's Notes