Building a Hierarchical Data Model
Using the Latest IBM Informix
Features
Ajaykumar Gupte
gupte@us.ibm.com
1
Agenda
●
Problem of querying hierarchical data
●
Hierarchical data design
●
“Connect By”- keywords & pseudo columns
●
Execution model
●
Query transformation
Problem of querying hierarchical data
• Common technique of storing hierarchical data in
relational tables is self-reference
– Employee-Manager
• Employee table (key – empid)
• Every employee has a manager (indicated by mgrid)
• Manager is also an employee (with a valid empid)
– Shipment
• Inbound shipment table (key – item_id)
• Each item can belong to a package ( key –
package_id)
• Every package is itself an item (with a valid item_id)
CREATE TABLE employee (
empid INTEGER NOT NULL
PRIMARY KEY,
name VARCHAR(10),
salary DECIMAL(9, 2),
mgrid INTEGER);
CREATE TABLE inbound_shipment (
shipment_id VARCHAR(50),
item_id VARCHAR(20) ,
package_id VARCHAR(20),
.......
ship_CX2555
Pallet_BX505 Pallet xxx
box_C3524box_C1255
Pallet xxx
band_aid… A1_pharma..band_aid.. vicks_.. vicks_...A1_pharma.. vicks_..
box_C4000
Tylenol.. Tylenol…
Characteristics/Limitations
■ Multi-step approach – requiring complex application/SPL logic
■ Recursive self-join
■ Filtering/ordering/grouping requires more additions
■ Joining results with other tables becomes complex
■ Reuse amongst other applications
– understanding of the complex logic (data placement etc)
– more customization
SELECT level as package_level, item_id,
package_id
FROM inbound_shipment
START WITH item_id = 'pallet_BX505'
CONNECT BY PRIOR
item_id = package_id
Using CONNECT BY to discover data
hierarchy
C o n d it io n o f r e c u r s io n
s e e d o f r e c u r s io n
Results of CONNECT BY Query
package_level item_id package_id
1 pallet_BX505 ship_CX2555
2 box_C1255 pallet_BX505
3 band_aid_H10 box_C1255
3 band_aid_H12 box_C1255
3 A1_pharma_F23 box_C1255
3 A1_pharma_F33 box_C1255
Hierarchical view of data
17
15 16
10 13 11 12 14
1 2 43 5 6 7 8 9
SELECT name, empid, mgrid
FROM emp
START WITH name = 'Goyal'
CONNECT BY PRIOR empid = mgrid
G o y a l Goyal 16 17
Zander 11 16
McKeoug
h
5 11
Barnes 6 11
Henry 12 16
O'Neil 7 12
Smith 8 12
Shoeman 9 12
Scott 14 16
empid mgrid
12
Flow of Execution
17
15 16
10 13 11 12 14
1 2 43 5 6 7 8 9
SELECT name, empid, mgrid
FROM emp
START WITH name = 'Goyal'
CONNECT BY PRIOR empid = mgrid
Stack
JOIN
16
PUSH
POP11 14
65 987
Where is hierarchical data ?

Bill of materials

Reporting structure

Package tracking

Inventory management

Social media

date/time

Geography / region
PRIOR■ Unary operator PRIOR is used in join filter to distinguish column references
of the last prior recursive step, from column references to the base table.
■ Query without PRIOR can result in a forever running query or single row
package_level item_id package_id
1 pallet_BX505 ship_CX2555
2 box_C1255 pallet_BX505
3 band_aid_H10 box_C1255
3 band_aid_H12 box_C1255
3 A1_pharma_F23 box_C1255
3 A1_pharma_F33 box_C1255
2 box_C3524 pallet_BX505
3 vicks_CK215 box_C3524
3 vicks_CK315 box_C3524
3 vicks_CK324 box_C3524
SELECT level , item_id, package_id
FROM inbound_shipment
START WITH item_id = 'pallet_BX505'
CONNECT BY PRIOR
item_id = package_id
LEVEL
■ Pseudo column that tracks the level of a node in hierarchy starting with level 1
for the root node.
■ Can be used in CONNECT BY clause as a filter to limit the depth of hierarchy
package_level item_id package_id
1 pallet_BX505 ship_CX2555
2 box_C1255 pallet_BX505
2 box_C3524 pallet_BX505
2 box_C4520 pallet_BX505
2 box_C4000 pallet_BX505
5 row(s) retrieved.
SELECT level as package_level,
item_id, package_id
FROM inbound_shipment
where level < 3
START WITH item_id = 'pallet_BX505'
CONNECT BY PRIOR item_id =
package_id
NOCYCLE
■ By default hierarchical queries return error when they detect cycle in the data
■ Used to allow the query to return all rows by ignoring the cycle causing row
insert into inbound_shipment(item_id,package_id) values ("ship_CX2555",
"pallet_BX505");
package_level item_id package_id
1 pallet_BX505 ship_CX2555
26079: CONNECT BY query resulted in a loop/cycle.
Error in line 9
Near character position 37
SELECT level , item_id, package_id
FROM inbound_shipment
START WITH item_id = 'pallet_BX505'
CONNECT BY PRIOR
item_id = package_id
NOCYCLE Example
package_level item_id package_id
1 pallet_BX505 ship_CX2555
2 ship_CX2555 pallet_BX505
2 box_C1255 pallet_BX505
2 box_C3524 pallet_BX505
2 box_C4520 pallet_BX505
2 box_C4000 pallet_BX505
6 row(s) retrieved.
SELECT level as package_level, item_id, package_id
FROM inbound_shipment
where level < 3
START WITH item_id = 'pallet_BX505'
CONNECT BY NOCYCLE PRIOR item_id = package_id
CONNECT_BY_ISCYCLE
■ Identify the nodes that would result in a cycle
package_level item_id package_id connect_by_iscycle
1 pallet_BX505 ship_CX2555 0
2 ship_CX2555 pallet_BX505 1
2 box_C1255 pallet_BX505 0
2 box_C3524 pallet_BX505 0
2 box_C4520 pallet_BX505 0
2 box_C4000 pallet_BX505 0
6 row(s) retrieved.
SELECT level as package_level,
item_id, package_id ,
connect_by_iscycle
FROM inbound_shipment
where level < 3
START WITH item_id =
'pallet_BX505'
CONNECT BY NOCYCLE PRIOR
item_id = package_id
CONNECT_BY_ISLEAF Example
package_level item_id package_id connect_by_isleaf
3 band_aid_H10 box_C1255 1
3 band_aid_H12 box_C1255 1
3 A1_pharma_F23 box_C1255 1
3 A1_pharma_F33 box_C1255 1
3 vicks_CK215 box_C3524 1
3 vicks_CK315 box_C3524 1
3 vicks_CK324 box_C3524 1
3 A1_pharma_T30 box_C3524 1
3 A1_pharma_T20 box_C3524 1
3 A1_pharma_T10 box_C3524 1
3 A1_pharma_415 box_C4520 1
3 A1_pharma_413 box_C4520 1
3 A1_pharma_329 box_C4520 1
3 A1_pharma_343 box_C4520 1
3 tylenol_BA341 box_C4000 1
3 tylenol_BA455 box_C4000 1
3 tylenol_BA570 box_C4000 1
3 tylenol_BA521 box_C4000 1
3 tylenol_BA520 box_C4000 1
3 tylenol_BA500 box_C4000 1
20 row(s) retrieved.
SELECT level as
package_level, item_id,
package_id ,
connect_by_isleaf
FROM inbound_shipment
where connect_by_isleaf = 1
START WITH item_id =
'pallet_BX505'
CONNECT BY NOCYCLE
PRIOR item_id = package_id
SYS_CONNECT_BY_PATH
■ Expression which is used to build a string representing a path from the root row
to current row.
■ >>--SYS_CONNECT_BY_PATH--(--string-expression1--,--string-expression2--)--><
path pallet_BX505
item_id pallet_BX505
package_id ship_CX2555
path pallet_BX505box_C1255
item_id box_C1255
package_id pallet_BX505
path pallet_BX505box_C3524
item_id box_C3524
package_id pallet_BX505
path pallet_BX505box_C4520
item_id box_C4520
package_id pallet_BX505
path pallet_BX505box_C4000
item_id box_C4000
package_id pallet_BX505
5 row(s) retrieved.
SELECT
sys_connect_by_path(item_id,"") as path ,
item_id, package_id
FROM inbound_shipment
where level < 3
START WITH item_id = 'pallet_BX505'
CONNECT BY PRIOR item_id = package_id
CONNECT_BY_ROOT
■ unary operator which, for every row in the hierarchy, returns the expression for
the row’s root ancestor
■ >>--CONNECT_BY_ROOT--expression----------------------------------><
root item_id package_id
pallet_BX505 pallet_BX505 ship_CX2555
pallet_BX505 box_C1255 pallet_BX505
pallet_BX505 box_C3524 pallet_BX505
pallet_BX505 box_C4520 pallet_BX505
pallet_BX505 box_C4000 pallet_BX505
5 row(s) retrieved.
SELECT
connect_by_root item_id as root,
item_id, package_id
FROM inbound_shipment
where level < 3
START WITH item_id =
'pallet_BX505'
CONNECT BY PRIOR item_id =
package_id
SIBLINGS
■ Attribute of ORDER BY clause to order the siblings at every level of hierarchy
■ Same semantics of ORDER BY but applied at siblings rows
level item_id package_id
1 pallet_BX505 ship_CX2555
2 box_C1255 pallet_BX505
2 box_C3524 pallet_BX505
2 box_C4000 pallet_BX505
2 box_C4520 pallet_BX505
5 row(s) retrieved.
SELECT level, item_id,
package_id
FROM inbound_shipment
where level < 3
START WITH item_id =
'pallet_BX505'
CONNECT BY PRIOR item_id
= package_id
order SIBLINGS by item_id
Query rewrite & Execution model
• Query rewrite
SELECT level , item_id, package_id
FROM inbound_shipment
START WITH item_id = 'pallet_BX505'
CONNECT BY PRIOR
item_id = package_id
SELECT level , item_id, package_id FROM
( SELECT level, item_id, package_id
FROM inbound_shipment
WHERE item_id = 'pallet_BX505'
UNION ALL
SELECT level, ship.item_id , ship.package_id
FROM inbound_shipment ship, dtab
WHERE ship.package_id = dtab.item_id
)
AS dtab;
Execution model of recursive queries in IDS
TEMP TABLE
CYCLE OR
TRAVERSAL
SCAN
JOIN
UNION ALL
SORT
SCAN SCAN
SORT
SCAN
Scan of shipment
table
Scan of
shipment
table
order
siblings by
Connect
by filters
Top level scan on
derived table
sqexplainQUERY:
SELECT level as package_level, item_id, package_id FROM inbound_shipment
START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id
Connect by Query Rewrite:
select x0.level ,x0.item_id ,x0.package_id from
(select x1.item_id ,x1.package_id ,x1.item_id ,1 ,1 ,0 from
"informix".inbound_shipment x1 where (x1.item_id = 'pallet_BX505' )
union all
select x2.item_id ,x2.package_id ,x2.item_id ,(level + 1 ) ::integer
,connect_by_isleaf ,dtab_30093_173_stkcol from "informix".inbound_shipment
x2 ,"informix".dtab_30093_173 x0 where (dtab_30093_173_p_item_id =
x2.package_id ) )
X0
(item_id,package_id,dtab_30093_173_p_item_id,level,connect_by_isleaf,dtab_3
0093_173_stkcol)
S T A R T W I T H
Estimated Cost: 1
Estimated # of Rows Returned: 5
1) informix.dtab_30093_173: COLLECTION SCAN
Subquery:
---------
Estimated Cost: 13
Estimated # of Rows Returned: 5
1) informix.inbound_shipment: SEQUENTIAL SCAN
Filters: informix.inbound_shipment.item_id = 'pallet_BX505'
Union Query:
------------
1) informix.dtab_30093_173: SEQUENTIAL SCAN
2) informix.inbound_shipment: SEQUENTIAL SCAN
DYNAMIC HASH JOIN (Build Outer)
Dynamic Hash Filters: informix.dtab_30093_173.dtab_30093_173_p_item_id =
informix.inbound_shipment.package_id
Query statistics:
Table map :
----------------------------
Internal name Table name
----------------------------
t1 dtab_30093_173
type table rows_prod time
-----------------------------------
clscan t1 25 00:00.00
CONNECT BY Restriction

Multiple tables are not allowed
SELECT ship.item_id , ord.name
FROM inbound_shipment ship, orders ordinbound_shipment ship, orders ord
START WITH item_id = “pallet_BX505”
CONNECT BY PRIOR item_id = package_id
WHERE ship.item_id = ord.item_id
Rewrite to
SELECT item_id , name
FROM (SELECT ship.item_id, ord.name
FROM inbound_shipment ship, orders ord
WHERE ship.item_id = ord.item_id )
START WITH item_id = “pallet_BX505”
CONNECT BY PRIOR item_id = package_id
Tree node traversal
10
20 30
40 50
level c1 c2
1 10 0
2 30 10
3 50 30
4 20 50
5 40 20
2 20 10
3 40 20
7 row(s) retrieved.
c1 c2
10 0
20 10
30 10
40 20
50 30
20 50
6 row(s) retrieved.
select * from t1;
select level , * from t1 start with c1 = 10 connect
by prior c1 = c2;
10--30--50--20--40
10--20--40
Child to Parent Traversal
package_level item_id package_id
1 tylenol_BA500 box_C4000
2 box_C4000 pallet_BX505
3 pallet_BX505 ship_CX2555
3 row(s) retrieved.
SELECT level as package_level, item_id, package_id
FROM inbound_shipment
START WITH item_id = 'tylenol_BA500'
CONNECT BY PRIOR package_id = item_id
SEQUENCE NUMBER GENERATOR
SELECT level FROM sysmaster:sysdual CONNECT BY level <= 10
S in g le r o w t a b leConnect by Query Rewrite:
---------------------------
select x0.level from (select 1 ,1 ,0 from sysmaster:"informix".sysdual x1 union all select (level + 1 ) ::integer ,connect_by_isleaf
,dtab_27465_191_stkcol from sysmaster:"informix".sysdual x2 ,"informix".dtab_27465_191 x0 where ((level + 1 ) <= 10. ) )
x0(level,connect_by_isleaf,dtab_27465_191_stkcol)
1) informix.dtab_27465_191: COLLECTION SCAN
Subquery:
---------
Estimated Cost: 5
Estimated # of Rows Returned: 2
1) sysmaster:informix.sysdual: SEQUENTIAL SCAN
Union Query:
------------
1) informix.dtab_27465_191: SEQUENTIAL SCAN
Filters: informix.dtab_27465_191.level + 1 <= 10
2) sysmaster:informix.sysdual: SEQUENTIAL SCAN
NESTED LOOP JOIN
Performance Considerations
• Queries are recursive and involves repeated self joins
• Use “PRIOR” Keyword, else query will run forever !!
• TEMP Dbspace used for hierarchy traversal (stack) and
cycle detection
• Configure - DBSPACETEMP
Conclusion
• Simple queries for complex reporting
• Useful for single or multiple data tree structure
• Easy to map path between two nodes/rows
Questions?
Ajaykumar Gupte
gupte@us.ibm.com
30

Building a Hierarchical Data Model Using the Latest IBM Informix Features

  • 1.
    Building a HierarchicalData Model Using the Latest IBM Informix Features Ajaykumar Gupte gupte@us.ibm.com 1
  • 2.
    Agenda ● Problem of queryinghierarchical data ● Hierarchical data design ● “Connect By”- keywords & pseudo columns ● Execution model ● Query transformation
  • 3.
    Problem of queryinghierarchical data • Common technique of storing hierarchical data in relational tables is self-reference – Employee-Manager • Employee table (key – empid) • Every employee has a manager (indicated by mgrid) • Manager is also an employee (with a valid empid) – Shipment • Inbound shipment table (key – item_id) • Each item can belong to a package ( key – package_id) • Every package is itself an item (with a valid item_id) CREATE TABLE employee ( empid INTEGER NOT NULL PRIMARY KEY, name VARCHAR(10), salary DECIMAL(9, 2), mgrid INTEGER); CREATE TABLE inbound_shipment ( shipment_id VARCHAR(50), item_id VARCHAR(20) , package_id VARCHAR(20), .......
  • 4.
    ship_CX2555 Pallet_BX505 Pallet xxx box_C3524box_C1255 Palletxxx band_aid… A1_pharma..band_aid.. vicks_.. vicks_...A1_pharma.. vicks_.. box_C4000 Tylenol.. Tylenol…
  • 5.
    Characteristics/Limitations ■ Multi-step approach– requiring complex application/SPL logic ■ Recursive self-join ■ Filtering/ordering/grouping requires more additions ■ Joining results with other tables becomes complex ■ Reuse amongst other applications – understanding of the complex logic (data placement etc) – more customization
  • 6.
    SELECT level aspackage_level, item_id, package_id FROM inbound_shipment START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id Using CONNECT BY to discover data hierarchy C o n d it io n o f r e c u r s io n s e e d o f r e c u r s io n
  • 7.
    Results of CONNECTBY Query package_level item_id package_id 1 pallet_BX505 ship_CX2555 2 box_C1255 pallet_BX505 3 band_aid_H10 box_C1255 3 band_aid_H12 box_C1255 3 A1_pharma_F23 box_C1255 3 A1_pharma_F33 box_C1255
  • 8.
    Hierarchical view ofdata 17 15 16 10 13 11 12 14 1 2 43 5 6 7 8 9 SELECT name, empid, mgrid FROM emp START WITH name = 'Goyal' CONNECT BY PRIOR empid = mgrid G o y a l Goyal 16 17 Zander 11 16 McKeoug h 5 11 Barnes 6 11 Henry 12 16 O'Neil 7 12 Smith 8 12 Shoeman 9 12 Scott 14 16 empid mgrid
  • 9.
    12 Flow of Execution 17 1516 10 13 11 12 14 1 2 43 5 6 7 8 9 SELECT name, empid, mgrid FROM emp START WITH name = 'Goyal' CONNECT BY PRIOR empid = mgrid Stack JOIN 16 PUSH POP11 14 65 987
  • 10.
    Where is hierarchicaldata ?  Bill of materials  Reporting structure  Package tracking  Inventory management  Social media  date/time  Geography / region
  • 11.
    PRIOR■ Unary operatorPRIOR is used in join filter to distinguish column references of the last prior recursive step, from column references to the base table. ■ Query without PRIOR can result in a forever running query or single row package_level item_id package_id 1 pallet_BX505 ship_CX2555 2 box_C1255 pallet_BX505 3 band_aid_H10 box_C1255 3 band_aid_H12 box_C1255 3 A1_pharma_F23 box_C1255 3 A1_pharma_F33 box_C1255 2 box_C3524 pallet_BX505 3 vicks_CK215 box_C3524 3 vicks_CK315 box_C3524 3 vicks_CK324 box_C3524 SELECT level , item_id, package_id FROM inbound_shipment START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id
  • 12.
    LEVEL ■ Pseudo columnthat tracks the level of a node in hierarchy starting with level 1 for the root node. ■ Can be used in CONNECT BY clause as a filter to limit the depth of hierarchy package_level item_id package_id 1 pallet_BX505 ship_CX2555 2 box_C1255 pallet_BX505 2 box_C3524 pallet_BX505 2 box_C4520 pallet_BX505 2 box_C4000 pallet_BX505 5 row(s) retrieved. SELECT level as package_level, item_id, package_id FROM inbound_shipment where level < 3 START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id
  • 13.
    NOCYCLE ■ By defaulthierarchical queries return error when they detect cycle in the data ■ Used to allow the query to return all rows by ignoring the cycle causing row insert into inbound_shipment(item_id,package_id) values ("ship_CX2555", "pallet_BX505"); package_level item_id package_id 1 pallet_BX505 ship_CX2555 26079: CONNECT BY query resulted in a loop/cycle. Error in line 9 Near character position 37 SELECT level , item_id, package_id FROM inbound_shipment START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id
  • 14.
    NOCYCLE Example package_level item_idpackage_id 1 pallet_BX505 ship_CX2555 2 ship_CX2555 pallet_BX505 2 box_C1255 pallet_BX505 2 box_C3524 pallet_BX505 2 box_C4520 pallet_BX505 2 box_C4000 pallet_BX505 6 row(s) retrieved. SELECT level as package_level, item_id, package_id FROM inbound_shipment where level < 3 START WITH item_id = 'pallet_BX505' CONNECT BY NOCYCLE PRIOR item_id = package_id
  • 15.
    CONNECT_BY_ISCYCLE ■ Identify thenodes that would result in a cycle package_level item_id package_id connect_by_iscycle 1 pallet_BX505 ship_CX2555 0 2 ship_CX2555 pallet_BX505 1 2 box_C1255 pallet_BX505 0 2 box_C3524 pallet_BX505 0 2 box_C4520 pallet_BX505 0 2 box_C4000 pallet_BX505 0 6 row(s) retrieved. SELECT level as package_level, item_id, package_id , connect_by_iscycle FROM inbound_shipment where level < 3 START WITH item_id = 'pallet_BX505' CONNECT BY NOCYCLE PRIOR item_id = package_id
  • 16.
    CONNECT_BY_ISLEAF Example package_level item_idpackage_id connect_by_isleaf 3 band_aid_H10 box_C1255 1 3 band_aid_H12 box_C1255 1 3 A1_pharma_F23 box_C1255 1 3 A1_pharma_F33 box_C1255 1 3 vicks_CK215 box_C3524 1 3 vicks_CK315 box_C3524 1 3 vicks_CK324 box_C3524 1 3 A1_pharma_T30 box_C3524 1 3 A1_pharma_T20 box_C3524 1 3 A1_pharma_T10 box_C3524 1 3 A1_pharma_415 box_C4520 1 3 A1_pharma_413 box_C4520 1 3 A1_pharma_329 box_C4520 1 3 A1_pharma_343 box_C4520 1 3 tylenol_BA341 box_C4000 1 3 tylenol_BA455 box_C4000 1 3 tylenol_BA570 box_C4000 1 3 tylenol_BA521 box_C4000 1 3 tylenol_BA520 box_C4000 1 3 tylenol_BA500 box_C4000 1 20 row(s) retrieved. SELECT level as package_level, item_id, package_id , connect_by_isleaf FROM inbound_shipment where connect_by_isleaf = 1 START WITH item_id = 'pallet_BX505' CONNECT BY NOCYCLE PRIOR item_id = package_id
  • 17.
    SYS_CONNECT_BY_PATH ■ Expression whichis used to build a string representing a path from the root row to current row. ■ >>--SYS_CONNECT_BY_PATH--(--string-expression1--,--string-expression2--)-->< path pallet_BX505 item_id pallet_BX505 package_id ship_CX2555 path pallet_BX505box_C1255 item_id box_C1255 package_id pallet_BX505 path pallet_BX505box_C3524 item_id box_C3524 package_id pallet_BX505 path pallet_BX505box_C4520 item_id box_C4520 package_id pallet_BX505 path pallet_BX505box_C4000 item_id box_C4000 package_id pallet_BX505 5 row(s) retrieved. SELECT sys_connect_by_path(item_id,"") as path , item_id, package_id FROM inbound_shipment where level < 3 START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id
  • 18.
    CONNECT_BY_ROOT ■ unary operatorwhich, for every row in the hierarchy, returns the expression for the row’s root ancestor ■ >>--CONNECT_BY_ROOT--expression---------------------------------->< root item_id package_id pallet_BX505 pallet_BX505 ship_CX2555 pallet_BX505 box_C1255 pallet_BX505 pallet_BX505 box_C3524 pallet_BX505 pallet_BX505 box_C4520 pallet_BX505 pallet_BX505 box_C4000 pallet_BX505 5 row(s) retrieved. SELECT connect_by_root item_id as root, item_id, package_id FROM inbound_shipment where level < 3 START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id
  • 19.
    SIBLINGS ■ Attribute ofORDER BY clause to order the siblings at every level of hierarchy ■ Same semantics of ORDER BY but applied at siblings rows level item_id package_id 1 pallet_BX505 ship_CX2555 2 box_C1255 pallet_BX505 2 box_C3524 pallet_BX505 2 box_C4000 pallet_BX505 2 box_C4520 pallet_BX505 5 row(s) retrieved. SELECT level, item_id, package_id FROM inbound_shipment where level < 3 START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id order SIBLINGS by item_id
  • 20.
    Query rewrite &Execution model • Query rewrite SELECT level , item_id, package_id FROM inbound_shipment START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id SELECT level , item_id, package_id FROM ( SELECT level, item_id, package_id FROM inbound_shipment WHERE item_id = 'pallet_BX505' UNION ALL SELECT level, ship.item_id , ship.package_id FROM inbound_shipment ship, dtab WHERE ship.package_id = dtab.item_id ) AS dtab;
  • 21.
    Execution model ofrecursive queries in IDS TEMP TABLE CYCLE OR TRAVERSAL SCAN JOIN UNION ALL SORT SCAN SCAN SORT SCAN Scan of shipment table Scan of shipment table order siblings by Connect by filters Top level scan on derived table
  • 22.
    sqexplainQUERY: SELECT level aspackage_level, item_id, package_id FROM inbound_shipment START WITH item_id = 'pallet_BX505' CONNECT BY PRIOR item_id = package_id Connect by Query Rewrite: select x0.level ,x0.item_id ,x0.package_id from (select x1.item_id ,x1.package_id ,x1.item_id ,1 ,1 ,0 from "informix".inbound_shipment x1 where (x1.item_id = 'pallet_BX505' ) union all select x2.item_id ,x2.package_id ,x2.item_id ,(level + 1 ) ::integer ,connect_by_isleaf ,dtab_30093_173_stkcol from "informix".inbound_shipment x2 ,"informix".dtab_30093_173 x0 where (dtab_30093_173_p_item_id = x2.package_id ) ) X0 (item_id,package_id,dtab_30093_173_p_item_id,level,connect_by_isleaf,dtab_3 0093_173_stkcol) S T A R T W I T H
  • 23.
    Estimated Cost: 1 Estimated# of Rows Returned: 5 1) informix.dtab_30093_173: COLLECTION SCAN Subquery: --------- Estimated Cost: 13 Estimated # of Rows Returned: 5 1) informix.inbound_shipment: SEQUENTIAL SCAN Filters: informix.inbound_shipment.item_id = 'pallet_BX505' Union Query: ------------ 1) informix.dtab_30093_173: SEQUENTIAL SCAN 2) informix.inbound_shipment: SEQUENTIAL SCAN DYNAMIC HASH JOIN (Build Outer) Dynamic Hash Filters: informix.dtab_30093_173.dtab_30093_173_p_item_id = informix.inbound_shipment.package_id Query statistics: Table map : ---------------------------- Internal name Table name ---------------------------- t1 dtab_30093_173 type table rows_prod time ----------------------------------- clscan t1 25 00:00.00
  • 24.
    CONNECT BY Restriction  Multipletables are not allowed SELECT ship.item_id , ord.name FROM inbound_shipment ship, orders ordinbound_shipment ship, orders ord START WITH item_id = “pallet_BX505” CONNECT BY PRIOR item_id = package_id WHERE ship.item_id = ord.item_id Rewrite to SELECT item_id , name FROM (SELECT ship.item_id, ord.name FROM inbound_shipment ship, orders ord WHERE ship.item_id = ord.item_id ) START WITH item_id = “pallet_BX505” CONNECT BY PRIOR item_id = package_id
  • 25.
    Tree node traversal 10 2030 40 50 level c1 c2 1 10 0 2 30 10 3 50 30 4 20 50 5 40 20 2 20 10 3 40 20 7 row(s) retrieved. c1 c2 10 0 20 10 30 10 40 20 50 30 20 50 6 row(s) retrieved. select * from t1; select level , * from t1 start with c1 = 10 connect by prior c1 = c2; 10--30--50--20--40 10--20--40
  • 26.
    Child to ParentTraversal package_level item_id package_id 1 tylenol_BA500 box_C4000 2 box_C4000 pallet_BX505 3 pallet_BX505 ship_CX2555 3 row(s) retrieved. SELECT level as package_level, item_id, package_id FROM inbound_shipment START WITH item_id = 'tylenol_BA500' CONNECT BY PRIOR package_id = item_id
  • 27.
    SEQUENCE NUMBER GENERATOR SELECTlevel FROM sysmaster:sysdual CONNECT BY level <= 10 S in g le r o w t a b leConnect by Query Rewrite: --------------------------- select x0.level from (select 1 ,1 ,0 from sysmaster:"informix".sysdual x1 union all select (level + 1 ) ::integer ,connect_by_isleaf ,dtab_27465_191_stkcol from sysmaster:"informix".sysdual x2 ,"informix".dtab_27465_191 x0 where ((level + 1 ) <= 10. ) ) x0(level,connect_by_isleaf,dtab_27465_191_stkcol) 1) informix.dtab_27465_191: COLLECTION SCAN Subquery: --------- Estimated Cost: 5 Estimated # of Rows Returned: 2 1) sysmaster:informix.sysdual: SEQUENTIAL SCAN Union Query: ------------ 1) informix.dtab_27465_191: SEQUENTIAL SCAN Filters: informix.dtab_27465_191.level + 1 <= 10 2) sysmaster:informix.sysdual: SEQUENTIAL SCAN NESTED LOOP JOIN
  • 28.
    Performance Considerations • Queriesare recursive and involves repeated self joins • Use “PRIOR” Keyword, else query will run forever !! • TEMP Dbspace used for hierarchy traversal (stack) and cycle detection • Configure - DBSPACETEMP
  • 29.
    Conclusion • Simple queriesfor complex reporting • Useful for single or multiple data tree structure • Easy to map path between two nodes/rows
  • 30.

Editor's Notes

  • #4 Employee-Manager All employees reporting to “Goyal” Entire organization chart for “Goyal” All managers under Goyal with salary &amp;lt; $X All non-manager employee under Goyal with salary &amp;lt; $Y Shipment List all items from a pallet #10 Which product units are inside pallet #10 ? Find out a pallet number of unit (upc 456….) ? Display all products from a pallet by scanning a single unit with upc (678….) Count number of boxes from a pallet by scanning a single unit with upc (567….) Count number of product units &amp; boxes from a pallet by scanning a single unit with upc (567….)
  • #5 List all items/boxes from pallet “pallet_BX505” Fetch row from inbound_shipment where item_id = “pallet_BX505” Materialize result of step 1 into TEMP table Join the result of step 2 back into the inbound_shipment such that item_id from step 2 == package_id (similar to self join) Materialize results of step 3 into TEMP table Repeat step 3 and 4 until step 3 results in no data i.e. Join results in no data
  • #7 A hierarchical query operates on rows, which correspond to nodes within a logical structure of parent-child relationships. If parent rows have multiple children, sibling relationships exist among child rows of the same parent. These relationships might reflect, for example, the reporting structure among employees and managers within the divisions and management levels of an organization. Important: Hierarchical queries are most efficient for data sets in which parent-child dependencies in the table have the logical topology of a simple graph. If the self-referencing table includes more than one independent hierarchy for the same set of columns, or if any child row is also an ancestor of its parent, see also the section Dependency patterns that are not a simple graph.
  • #16 Pseudo column which returns a 1 or 0 to indicate if the row resulted in a cycle or not (row when joined back into the base table would result in cycle or not) to identify the nodes that would result in a cycle Can be used only when NOCYCLE attribute is used Cannot be used in START WITH and CONNECT BY clause
  • #17 This Pseudo column returns either 1 or 0 based on whether the node is a leaf node or not A node is a leaf node if it has no children in the query result hierarchy (not in the actual data hierarchy) Cannot appear in START WITH and CONNECT BY clause.
  • #22 CONNECT BY queries are Supported inside views / Derived tables Supported inside subqueries SPLs (static and dynamic statements in SPL) CONNECT BY queries do not support joins in the FROM clause Workaround is to rewrite queries to push down join into FROM clause of CONNECT BY query
  • #29 Queries are optimized exactly like normal SQL queries Access paths/join types are chosen based on available statistics Subqueries with CONNECT BY are not flattened (merged into parent query block) Views with CONNECT BY or views referenced in FROM clause of CONNECT BY queries are always materialized