THE TPR*-TREE: AN
OPTIMIZED
SPATIO-TEMPORAL ACCESS
METHOD FOR PREDICTIVE
QUERIES
Dimitris Papadias
Yufei Tao
Jimeng Sun
VLDB Conference 2003
Outline
 Introduction
 The TPR-tree and The TPR*-tree
 Experiments
 Conclusions
Introduction
 Spatio-temporal databases
 record moving objects’ geographical locations (sometimes also shapes)
at various timestamps.
 support queries that explore their historical and future (predictive)
behaviors. Applications.
 applications: flight control systems, weather forecast and mobile
computing
 The database stores the motion functions of moving objects.
 For each object o, its motion function gives its location o(t) at any future
time t.
 A predictive window query
 specifies a query region qR and a future time interval qT
 retrieves the set of all objects that will fall in qR during qT.
 our goal: index moving objects so that a predictive window query can
be answered with as few disk I/Os as possible.
 Examples
 Find all airplanes that will be over Florida in the next 10 minutes.
 Report all vessels that will enter the United States in the next hour.
Motion Function
 We consider linear motion.
-2
-2
c
2
-2
d
1
-1
a
1
1
1
b
-1
20 4 6 8 10
2
4
6
8
10
x axis
y axis
1
-2
-2
1
at time 0
c
d
a
b
20 4 6 8 10
2
4
6
8
10
x axis
y axis
at time 1
 For each object, the database stores
 Its minimum bounding rectangle (MBR) at the reference time 0
 Its current velocity bounding rectangle (VBR)
 Examples: MBR(a)={2,4,3,4}, VBR(a)={1,1,1,1};
MBR(c)={8,9,3,4}, VBR(c)={-2,0,0,2};
 An update is necessary only when an object’s VBR changes.
The Time Parameterized R-Tree (TPR-Tree)
 Extends the R-tree by introducing the velocity bounding
rectangle (VBR) in all entries.
 Queries are compared with conservative MBRs of non-
leaf entries. N1v={-2,1,-2,1} and N2v={-2,0,-1,2}
-2
-2
c
2
-2
d
1
-1
a
1
1 1
1
b
-1
1
-2
-2
-2
2
-1
N
1
N2
20 4 6 8 10
2
4
6
8
10
x axis
y axis
1
-2
-2
1
at time 0
c
d
a
b
N
1
N2
20 4 6 8 10
2
4
6
8
10
x axis
y axis
qR
at time 1
TPR*-Tree
 Goal:
 index moving objects so that a predictive window query can be
answered with as few disk I/O as possible.
 A mathematical model that estimates the cost of
answering a predictive window query using TPR-like
structures.
 Number of node accesses.
 Application of the model to derive the optimal
performance.
 The TPR-tree is much worse than the optimal structure.
 Exam the algorithms of the TPR-tree, identify their
deficiencies, and propose new ones.
 The TPR*-tree.
TPR*-Tree Insertion
 Choose Path
 Node Insert
 Pick Worst (if overflow)
 Node split
TPR deficiency 1: Choosing sub-tree to insert
 To insert an entry, the TPR-tree picks the sub-tree incurring
the minimum penalty (smallest MBR/VBR enlargement).
20 4 6 8 10
2
4
6
8
10
x axis
y axis
c
d
b
a
g
h
the (absolute) values
of all velocities are 1
e
f
i (static)
time 0
20 4 6 8 10
2
4
6
8
10
x axis
y axis
c
d
a
b g
h
p
e
f
i
inserting p at time 2
 May result in inserting an entry into a bad sub-tree; this
problem is increasingly serious as time evolves.
TPR* solution: Choose path
 Aims at finding the best insertion path globally, namely,
among all possible paths.
 Observation: We can find this path by accessing only a few
more nodes (than the TPR-tree algorithm).
20 4 6 8 10
2
4
6
8
10
x axis
y axis
c
d
a
b g
h
p
e
f
i
inserting p at time 2
Maintain a priority queue:
[(g),0], [(h),0], [(i),20]
the path expanded so far
the accumulated penalty so far
TPR* solution: Choose path
20 4 6 8 10
2
4
6
8
10
x axis
y axis
c
d
a
b g
h
p
e
f
i
inserting p at time 2
Visit node g:
[(h),0], [(a,g),3], [(i),20],
[(b,g),32]
complete paths already
although nodes a and b are
not visited
TPR* solution: Choose path
20 4 6 8 10
2
4
6
8
10
x axis
y axis
c
d
a
b g
h
p
e
f
i
inserting p at time 2
Visit node h:
[(a,g),3], [(d,h),9],
[(c,h),17], [(i),20],
[(b,g),32]
The algorithm stops now.
TPR deficiency 2: Which entries to re-insert
 When a node overflows, some of its entries are re-inserted to defer
node split (the ones that diverge most from the node centroid).
 The entries chosen by the TPR-tree are very likely to be re-inserted
back to the same node, so that a node split is still necessary.
20 4 6 8 10
2
4
6
8
10
x axis
y axis
b
c
a
e
the (absolute) values
of all velocities are 1
d
node overflow at time 0
20 4 6 8 10
2
4
6
8
10
x axis
y axis
b
c
a
e
d
time 2
TPR* solution: Pick worst
 Aims at selecting entries that can most effectively
“shrink” the MBR or VBR of the node for re-insertion.
 The first step picks an appropriate dimension (either spatial or
velocity)
 The second step performs sorting on this dimension and
decides the entries to be removed.
20 4 6 8 10
2
4
6
8
10
x axis
y axis
b
c
a
e
the (absolute) values
of all velocities are 1
d
time 0
– Example: If the axis chosen in the first step
is the x-axis, then the sorting list is {b,d,a,c}.
Either b or c is removed.
TPR* solution: Node Split
 Computes the overall perimeter for each
dimension
 Select the split axis as the smallest overall
perimeter
 Perimeter defined as the perimeter of the
sweeping region of the corresponding
transformed rectangle
 Perimeter computation is very efficient
 The number of vertices of a sweeping region is small
TPR deficiency 3: Tightening MBR in deletion
 Entry deletion requires first finding the entry, which
accesses many nodes of the tree. The TPR-tree uses
this fact to tighten the MBR of non-leaf entries.
 Assume nodes h and i are accessed before e is found; then the
TPR-tree will tighten the MBR of i only (enclosing g and f).
20 4 6 8 10
2
4
6
8
10
x axis
y axis
the (absolute) values
of all velocities are 1
f
e
g
a
b
d
c
i
j
h
time 0
20 4 6 8 10
2
4
6
8
10
x axis
y axis
f
e
g
b a
d c
i
j
h
deleting e at time 1
TPR deficiency 3: Tightening MBR in deletion
20 4 6 8 10
2
4
6
8
10
x axis
y axis
the (absolute) values
of all velocities are 1
f
e
g
a
b
d
c
i
j
h
time 0
20 4 6 8 10
2
4
6
8
10
x axis
y axis
f
g
b a
d c
i
j
h
after deleting e at time 1
TPR deficiency 3: Tightening MBR in deletion
20 4 6 8 10
2
4
6
8
10
x axis
y axis
the (absolute) values
of all velocities are 1
f
e
g
a
b
d
c
i
j
h
time 0
TPR* solution: Active tightening
 Tightening more entries for free.
 Assume nodes h and i are accessed before e is found;
then the TPR*-tree will tighten the MBR of both h and
i.
20 4 6 8 10
2
4
6
8
10
x axis
y axis
the (absolute) values
of all velocities are 1
f
e
g
a
b
d
c
i
j
h
time 0
20 4 6 8 10
2
4
6
8
10
x axis
y axis
f
e
g
b a
d c
i
j
h
deleting e at time 1
TPR* solution: Active tightening
20 4 6 8 10
2
4
6
8
10
x axis
y axis
the (absolute) values
of all velocities are 1
f
e
g
a
b
d
c
i
j
h
time 0
20 4 6 8 10
2
4
6
8
10
x axis
y axis
f
g
b a
d c
i
jh
after deleting e at time 1
TPR* solution: Active tightening
 Another example: Assume the shaded nodes are accessed to
find e.
 The active tightening can tighten the MBR of n5, n6, n3, and n4.
 But not n1 and n2.
n1 n2
n5 n6
n3 n4
root
...
...e
to be written
back to disk
N1 N2 N3 N4
N5 N6
Challenge of Migration
 3 Operating Systems:
 Microsoft Windows
 Sun Solaris
 Redhat Fedora Core 1
 2 Compilers: CL, GCC (2.9.5, 3.3.2)
 Difference of Code Conversion
 How close the compilers to the standard?
 Compatibility of Library
Experiments: Settings (query and tree)
 Dataset
 50,000 sampled objects’ MBRs are taken from a real spatial dataset NJ
[Tiger]
 each object is associated with a VBR such that on each dimension
 The velocity extent is zero (i.e., the object does not change
spatial extents during its movement)
 the velocity value distribution is randomed in range [0,8]
 the velocity can be positive or negative with equal probability.
 We compare TPR*- with TPR-trees.
 Disk page size=1k bytes (node capacity=27 for both trees).
 For each object update, perform a deletion followed by an insertion on each
tree.
 Each predictive query is a moving rectangle, and has these
parameters:
 qRlen: The length of the query’s MBR
 qVlen: The length of the query’s VBR
 qTlen: The number of timestamps covered.
Conclusions
 The TPR-tree combines the idea of conservative
MBR directly with the tree construction
algorithms of R*-trees.
 The TPR*-tree improves it by designing
algorithms that take into account the special
features for moving objects.
 Cost model for performance analysis
 The optimal performance of a “hypothetically best
structure”
 Reduce disk I/Os for predictive queries

Tpr star tree

  • 1.
    THE TPR*-TREE: AN OPTIMIZED SPATIO-TEMPORALACCESS METHOD FOR PREDICTIVE QUERIES Dimitris Papadias Yufei Tao Jimeng Sun VLDB Conference 2003
  • 2.
    Outline  Introduction  TheTPR-tree and The TPR*-tree  Experiments  Conclusions
  • 3.
    Introduction  Spatio-temporal databases record moving objects’ geographical locations (sometimes also shapes) at various timestamps.  support queries that explore their historical and future (predictive) behaviors. Applications.  applications: flight control systems, weather forecast and mobile computing  The database stores the motion functions of moving objects.  For each object o, its motion function gives its location o(t) at any future time t.  A predictive window query  specifies a query region qR and a future time interval qT  retrieves the set of all objects that will fall in qR during qT.  our goal: index moving objects so that a predictive window query can be answered with as few disk I/Os as possible.  Examples  Find all airplanes that will be over Florida in the next 10 minutes.  Report all vessels that will enter the United States in the next hour.
  • 4.
    Motion Function  Weconsider linear motion. -2 -2 c 2 -2 d 1 -1 a 1 1 1 b -1 20 4 6 8 10 2 4 6 8 10 x axis y axis 1 -2 -2 1 at time 0 c d a b 20 4 6 8 10 2 4 6 8 10 x axis y axis at time 1  For each object, the database stores  Its minimum bounding rectangle (MBR) at the reference time 0  Its current velocity bounding rectangle (VBR)  Examples: MBR(a)={2,4,3,4}, VBR(a)={1,1,1,1}; MBR(c)={8,9,3,4}, VBR(c)={-2,0,0,2};  An update is necessary only when an object’s VBR changes.
  • 5.
    The Time ParameterizedR-Tree (TPR-Tree)  Extends the R-tree by introducing the velocity bounding rectangle (VBR) in all entries.  Queries are compared with conservative MBRs of non- leaf entries. N1v={-2,1,-2,1} and N2v={-2,0,-1,2} -2 -2 c 2 -2 d 1 -1 a 1 1 1 1 b -1 1 -2 -2 -2 2 -1 N 1 N2 20 4 6 8 10 2 4 6 8 10 x axis y axis 1 -2 -2 1 at time 0 c d a b N 1 N2 20 4 6 8 10 2 4 6 8 10 x axis y axis qR at time 1
  • 6.
    TPR*-Tree  Goal:  indexmoving objects so that a predictive window query can be answered with as few disk I/O as possible.  A mathematical model that estimates the cost of answering a predictive window query using TPR-like structures.  Number of node accesses.  Application of the model to derive the optimal performance.  The TPR-tree is much worse than the optimal structure.  Exam the algorithms of the TPR-tree, identify their deficiencies, and propose new ones.  The TPR*-tree.
  • 7.
    TPR*-Tree Insertion  ChoosePath  Node Insert  Pick Worst (if overflow)  Node split
  • 8.
    TPR deficiency 1:Choosing sub-tree to insert  To insert an entry, the TPR-tree picks the sub-tree incurring the minimum penalty (smallest MBR/VBR enlargement). 20 4 6 8 10 2 4 6 8 10 x axis y axis c d b a g h the (absolute) values of all velocities are 1 e f i (static) time 0 20 4 6 8 10 2 4 6 8 10 x axis y axis c d a b g h p e f i inserting p at time 2  May result in inserting an entry into a bad sub-tree; this problem is increasingly serious as time evolves.
  • 9.
    TPR* solution: Choosepath  Aims at finding the best insertion path globally, namely, among all possible paths.  Observation: We can find this path by accessing only a few more nodes (than the TPR-tree algorithm). 20 4 6 8 10 2 4 6 8 10 x axis y axis c d a b g h p e f i inserting p at time 2 Maintain a priority queue: [(g),0], [(h),0], [(i),20] the path expanded so far the accumulated penalty so far
  • 10.
    TPR* solution: Choosepath 20 4 6 8 10 2 4 6 8 10 x axis y axis c d a b g h p e f i inserting p at time 2 Visit node g: [(h),0], [(a,g),3], [(i),20], [(b,g),32] complete paths already although nodes a and b are not visited
  • 11.
    TPR* solution: Choosepath 20 4 6 8 10 2 4 6 8 10 x axis y axis c d a b g h p e f i inserting p at time 2 Visit node h: [(a,g),3], [(d,h),9], [(c,h),17], [(i),20], [(b,g),32] The algorithm stops now.
  • 12.
    TPR deficiency 2:Which entries to re-insert  When a node overflows, some of its entries are re-inserted to defer node split (the ones that diverge most from the node centroid).  The entries chosen by the TPR-tree are very likely to be re-inserted back to the same node, so that a node split is still necessary. 20 4 6 8 10 2 4 6 8 10 x axis y axis b c a e the (absolute) values of all velocities are 1 d node overflow at time 0 20 4 6 8 10 2 4 6 8 10 x axis y axis b c a e d time 2
  • 13.
    TPR* solution: Pickworst  Aims at selecting entries that can most effectively “shrink” the MBR or VBR of the node for re-insertion.  The first step picks an appropriate dimension (either spatial or velocity)  The second step performs sorting on this dimension and decides the entries to be removed. 20 4 6 8 10 2 4 6 8 10 x axis y axis b c a e the (absolute) values of all velocities are 1 d time 0 – Example: If the axis chosen in the first step is the x-axis, then the sorting list is {b,d,a,c}. Either b or c is removed.
  • 14.
    TPR* solution: NodeSplit  Computes the overall perimeter for each dimension  Select the split axis as the smallest overall perimeter  Perimeter defined as the perimeter of the sweeping region of the corresponding transformed rectangle  Perimeter computation is very efficient  The number of vertices of a sweeping region is small
  • 15.
    TPR deficiency 3:Tightening MBR in deletion  Entry deletion requires first finding the entry, which accesses many nodes of the tree. The TPR-tree uses this fact to tighten the MBR of non-leaf entries.  Assume nodes h and i are accessed before e is found; then the TPR-tree will tighten the MBR of i only (enclosing g and f). 20 4 6 8 10 2 4 6 8 10 x axis y axis the (absolute) values of all velocities are 1 f e g a b d c i j h time 0 20 4 6 8 10 2 4 6 8 10 x axis y axis f e g b a d c i j h deleting e at time 1
  • 16.
    TPR deficiency 3:Tightening MBR in deletion 20 4 6 8 10 2 4 6 8 10 x axis y axis the (absolute) values of all velocities are 1 f e g a b d c i j h time 0 20 4 6 8 10 2 4 6 8 10 x axis y axis f g b a d c i j h after deleting e at time 1
  • 17.
    TPR deficiency 3:Tightening MBR in deletion 20 4 6 8 10 2 4 6 8 10 x axis y axis the (absolute) values of all velocities are 1 f e g a b d c i j h time 0
  • 18.
    TPR* solution: Activetightening  Tightening more entries for free.  Assume nodes h and i are accessed before e is found; then the TPR*-tree will tighten the MBR of both h and i. 20 4 6 8 10 2 4 6 8 10 x axis y axis the (absolute) values of all velocities are 1 f e g a b d c i j h time 0 20 4 6 8 10 2 4 6 8 10 x axis y axis f e g b a d c i j h deleting e at time 1
  • 19.
    TPR* solution: Activetightening 20 4 6 8 10 2 4 6 8 10 x axis y axis the (absolute) values of all velocities are 1 f e g a b d c i j h time 0 20 4 6 8 10 2 4 6 8 10 x axis y axis f g b a d c i jh after deleting e at time 1
  • 20.
    TPR* solution: Activetightening  Another example: Assume the shaded nodes are accessed to find e.  The active tightening can tighten the MBR of n5, n6, n3, and n4.  But not n1 and n2. n1 n2 n5 n6 n3 n4 root ... ...e to be written back to disk N1 N2 N3 N4 N5 N6
  • 21.
    Challenge of Migration 3 Operating Systems:  Microsoft Windows  Sun Solaris  Redhat Fedora Core 1  2 Compilers: CL, GCC (2.9.5, 3.3.2)  Difference of Code Conversion  How close the compilers to the standard?  Compatibility of Library
  • 22.
    Experiments: Settings (queryand tree)  Dataset  50,000 sampled objects’ MBRs are taken from a real spatial dataset NJ [Tiger]  each object is associated with a VBR such that on each dimension  The velocity extent is zero (i.e., the object does not change spatial extents during its movement)  the velocity value distribution is randomed in range [0,8]  the velocity can be positive or negative with equal probability.  We compare TPR*- with TPR-trees.  Disk page size=1k bytes (node capacity=27 for both trees).  For each object update, perform a deletion followed by an insertion on each tree.  Each predictive query is a moving rectangle, and has these parameters:  qRlen: The length of the query’s MBR  qVlen: The length of the query’s VBR  qTlen: The number of timestamps covered.
  • 23.
    Conclusions  The TPR-treecombines the idea of conservative MBR directly with the tree construction algorithms of R*-trees.  The TPR*-tree improves it by designing algorithms that take into account the special features for moving objects.  Cost model for performance analysis  The optimal performance of a “hypothetically best structure”  Reduce disk I/Os for predictive queries

Editor's Notes

  • #4 在新興的移動對像數據庫應用中需要管理大 量的持續移動對象。為了有效地實現對移動對象的 查詢操作,需要引人行之有效的索引技術。在移動 對像數據庫中,移動對象的位置頻繁地改變,而移 動對象位置的改變會引起索引結構的動態更新。因 此,索引移動對像不僅要考慮查詢操作的效率,還 必須重點考慮索引的更新代價問題。 頻繁更新是移動對象索引中的關鍵問題。目 前,已有很多方法用於解決孔樹山及其變體中的頻 繁更新問題。
  • #5 TPRtree及TPR*-tree 由于能够沿用传统R-tree 空间索引查询及 插入、删除等动态更新算法而成为目前最实用的移动对象当前及未来位置索引方法。 索引結構和R-tree 結構非 常類似,區別在於TPR-tree 索引結構中每個葉節點記錄中 不僅存儲了移動對象的位置信息,而且包括了移動對象的速度矢量。相應地,其中間節點記錄同樣包含子節點MBR 及 速度矢量VBR。 TPR-tree 節點MBR 是關於時間的函數(利 用VBR 來描述MBR 在不同維方向上的速度矢量)。具體來 講,在每一維上,MBR 上/下界是其所包含的所有對像或子 節點MBR 速度的最大/小速度,即不論移動對象位置如何變 化,TPR-tree 節點MBR 始終包含其子節點或移動對象。
  • #8 Choose path 找尋要插入的節點 Insert node : 先插入新的nodey在檢查是否有overflow: 如果有,進行pick worst去選擇最不好的entries 並且remove他,並把該entries加到re-insert list x裡面進行重新插入 如果沒有,進行node split
  • #9 insert static point p , g , h have no deterioration bcz g , h不用再expand mbr 其中又以h比較適合,因為她有比較小的MBR, 那由於原始的tpr tree 是根據 greedy mthod almost random,(只要penalty is 0 的candidate 就有可能被選到,但MBR don‘t care) 因此,隨著時間的增加,MBR會因此而愈長愈大是不願意樂見的。
  • #10 Initiated QP[(g),0], [(h),0], [(i),20] 每一個裡面是cost degradation P 要被 insert ,選擇g 跟h 他們的Cost degradation is 0 ,因為MBRs / VBRs不用去Expanded 而在進行insert時候,不用每個non-lead node都去拜訪,the cost degradation is computed from their extents stored in the root,在每一個步驟,choose path 瀏覽最小的cost degradation的。 ______________ 為了minimal cost degradation 提出choose path 方法
  • #11 在visit node g 的同時就會將path (ag) (bg) 但這也意味著a及b並沒有被visit到
  • #12 [(a,g),3], is the best choice I 不用被Visit因為在初始化的時候他的Cost就最大,故沒有必要去拜訪
  • #13 In R tree , C will be removed , resulting in smaller MBR for e. 又圖解是在看一遍
  • #14 Pick worst 再看一遍
  • #15 這邊跟tpr tree 很類似 算出所有的perimeter 選用sweeping region 原因有二: 1 polygon 通常有比較小的周常,2.region 通常是 square
  • #19 提出Active tightening 改善,允許香蕉的多重MBR區塊做單一的deletion,落在overlap的物件上的這些MBR
  • #20 範例中,non-lead node h 雖然不導致結果有任何改變,在此方法下仍然縮緊他的MBR區塊再沒有花費額外的cost之下 The MBR of j, however, cannot be adjusted because it was not visited during the search for c (hence its tightest MBR at time 1 cannot be computed). Root node 在經過deletion 之後 無論如何都必逼寫回disk來反映改變過後的Extent 區塊
  • #21 MBR n5被調整成緊密的n1 and n2 ,其中它的child nodes可能不是最緊密的