• Like
  • Save
Bitmap Indexes for Relational XML Twig Query Processing
Upcoming SlideShare
Loading in...5
×
 

Bitmap Indexes for Relational XML Twig Query Processing

on

  • 1,254 views

The slides I presented at CIKM'09

The slides I presented at CIKM'09

Statistics

Views

Total Views
1,254
Views on SlideShare
1,251
Embed Views
3

Actions

Likes
0
Downloads
19
Comments
0

2 Embeds 3

http://www.slideshare.net 2
http://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Bitmap Indexes for Relational XML Twig Query Processing Bitmap Indexes for Relational XML Twig Query Processing Presentation Transcript

    • Kyong-Ha Lee and Bongki Moon
      The University of Arizona
      Bitmap Indexes For Relational XML Twig Query Processing
    • CIKM'09, Hong Kong
      2
      XML Data and Queries
      a1
      0
      (1, 32,1)
      <a>
      <a>
      <b>t1</b>
      <c>
      <d>t2</d>
      <e>t3</e>
      </c>
      </a>
      <a>
      <b>
      <e>t4</e>
      </b>
      <d>
      <c>t5</c>
      </d>
      </a>
      . . . . .
      </a>
      a2
      a3
      a4
      1
      6
      11
      (2,11,2)
      (12,21,2)
      (22,31,2)
      9
      c1
      b1
      7
      12
      15
      2
      3
      d2
      b2
      e3
      b3
      (13,16,3)
      (17,20,3)
      (23,28,3)
      (29,30,3)
      (5,10,3)
      (3,4,3)
      10
      c2
      e2
      d3
      d1
      e1
      8
      c3
      13
      14
      4
      5
      (26,27,4)
      (24,25,4)
      (18,19,4)
      (6,7,4)
      (8,9,4)
      (14,15,4)
      //A/B/C
      //A[//B]//C
      //A[./B/C]//E
      A
      A
      A
      B
      C
      E
      B
      B
      C
      C
    • CIKM'09, Hong Kong
      3
      XML Stored in RDB
      NODE table
      PATH table
      . . .
      . . .
      . . .
    • To answer a twig query
      A twig pattern is decomposed into several path patterns.
      Path solutions are joined together to compose a final result.
      Holistic Twig Join(HTJ) algorithm
      Specialized multi-way& sort-merge join
      guarantees I/O optimality for a certain subset of XML query.
      The optimality depends on how the elements are partitioned.
      uses stacks and streams in which elements are sorted in an order.
      CIKM'09, Hong Kong
      4
      Twig Join
      A
      A
      E
      B
      C
      SA
      A
      A
      SE
      SB
      B
      E
      SC
      C
      Stacks
      Streams
    • Discrepancy between XML in RDB and conventional HTJ algorithms
      Logical: Streams vs. Table
      Physical: partitioned vs. record-oriented
      Supporting actual data including a large volume of texts requires references to records.
      How to feed tuples to HTJ algorithm?
      What’s the best partitioning scheme for XML stored in RDB?
      Bitmap index, a conventional index in RDBMS
      An efficient way to indicate tuples.
      Efficient support for logical operations
      Can we use the bitmap index for supporting HTJ?
      CIKM'09, Hong Kong
      5
      Motivation
    • Tag-based partitioning
      Simple, and skipping technique can be used to read useful elements only.
      For a query node, only one stream is accessed
      Tag+Level partitioning
      More I/O optimality, suitable for deep XML
      Some streams may be accessed for a single query node
      Path-based partitioning
      More I/O optimality, suitable for shallow XML
      A path with //-axes may require accessing many streams for a single query node
      CIKM'09, Hong Kong
      6
      HTJ on Different Partitioning Schemes
    • CIKM'09, Hong Kong
      7
      Bitmap Index
      How to partition tuples in NODE table
      By building a bitmap index on certain column(s) in the table.
      bitTag for tagName,
      bitTag+ for (tagName, Level),
      bitPath for pathId column
      Determines I/O optimality of holistic twig join algorithms.
      During twig join process, useful tuples are accessed via the bitmap index.
      A
      B
      E
      . . .
      110000
      1
      0
      0
      0
      0010000100
      0000010000
      Bit-vectors
      . . .
      disk blocks
    • bitAnc : A bit-vector represents terminal elements corr. to a certain path and all their ancestors.
      bitDesc: A bit–vector represents terminal elements corr. to a certain path and all their descendants.
      CIKM'09, Hong Kong
      8
      Additional Indexes
      a1
      0
      a2
      a3
      a4
      1
      6
      11
      b1
      2
      7
      12
      b2
      b3
      14
      e2
      d3
      8
      c3
      13
      A subtree covered by the left 3 bit-vectors
      bitPath,bitAnc, andbitDescfor PathId=2, i.e. /A/A/B
    • Basic index
      Bit-vectors are built on a single column or a group of columns
      Requires labeled values, and reading records
      Hybrid index
      A Combination of two different indexes
      descTag : bitDesc & bitTag
      bitTwig : bitPath & bitAnc
      does not require labeled values to compute twig solution
      CIKM'09, Hong Kong
      9
      Two Types of Indexes
    • CIKM'09, Hong Kong
      10
      Identifying Element Relationship with Bit-vectors
      a1
      1
      1
      1
      0
      0
      0
      1
      1
      0
      0
      0
      1
      1
      0
      0
      0
      1
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
      1100001000010000
      0
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      a2
      b1
      • For a query //A//B, can the pairs (a1, b1) and (a2, b2) be solution?
      b2
      a1
      0
      a2
      a3
      a4
      1
      6
      11
      b1
      2
      7
      12
      b2
      b3
      P2: /A/A/B
      P0: /A
      P1: /A/A
    • Choose the minimum position value among the current 1’s as a current element for a query node
      Check if 1 exists in an interval, pos(a) and pos(d)?
      looking-ahead at the next 1
      CIKM'09, Hong Kong
      11
      Advancing Cursors
      0
      eov
      P0 : /A
      P1 : /A/A
      q : //A
      (0,0,1)
      6
      1
      Currq
      Current1
      Next1
    • Early detection with a bit-vector absence
      Condensing query nodes
      For path-based partition
      Reduces |INDEX| and |RECORD|
      Skipping reading obsolete records with advance(k)
      For tag, (tag, level)-based partition
      Reduces |RECORD|
      Moving cursors over compressed bit-vectors with no decompression
      A composite cursor moving over a bit-vector compressed by run-length encoding scheme
      Reduces |INDEX|
      CIKM'09, Hong Kong
      12
      Optimizations
      A
      A
      E
      B
      E
      C
      C
      P: //A/B/C
      CA = 11
      10000000000100000
      CB = 4
      advance(11)
      00001000010000100
    • CIKM'09, Hong Kong
      13
      Compressed Bit-vector
      000100000000100000000000000011 00000000000 . . . 00000000000000 0000000000000000000000000000001 00
      (a) An original bit-vector with 8,000 bits
      31 bits
      2 bits
      256* 31 bits
      31 bits
      (b) Grouping as a unit of 31 bits and Merging identical groups
      000010…010…011
      100… 0100000000
      000…001
      000…000
      Run-length is 256
      31 literal bits
      Remaining
      word
      Uncompressed word
      Compressed word
      (c) Encoding each group as 1 word (4byte on a 32-bit machine)
      Cursor C
      ={ C.position, //Integer position value (Logical address)
      C. word, // The current word C is located at.
      C.bit, // The position of the bit C is visiting, in C.word
      C. rest } //The bit position in the remaining word
    • CIKM'09, Hong Kong
      14
      Moving A Cursor over A Compressed Bit-vector
      a) Get the position of the next 1
      C = {31, 0, 31,0}
      Skip to examine
      31* 256 bits
      C={7998, 2, 31, 0}
      000010…010…011
      100… 0100000000
      000…001
      000…000
      Remaining
      word
      Run-length is 256
      b) Check a bit value at the position 3,000
      C = {31, 0, 31,0}
      with distance to move,
      2,869=(3000-31)
      Since 31* 256 > 2,869,
      The bit we find is within the word 1.
      000010…010…011
      100… 0100000000
      000…001
      000…000
    • CIKM'09, Hong Kong
      15
      Experiments
      Datasets
      Synthetic : XMark
      Real : DBLP, Treebank, Swiss-prot
      Query sets
    • CIKM'09, Hong Kong
      16
      Statistics of Dataset and Indexes
      • # of distinct paths really varies
      • # of distinct tag names are not much different
      • Index build time is largely
      affected by attribute cardinality
      • Index size is smaller than
      labeled value size in most cases
    • CIKM'09, Hong Kong
      17
      Query Execution Time
    • CIKM'09, Hong Kong
      18
      Input Data Size
    • Merging used bit-vectors for a path pattern with //-axes and putting it into a bitmap index for the next time
      for a given path //A//B, P:/A/A/B P:/A/B
      acts like a pre-computed join index
      A path pattern with //-axes can be represented by a single bit-vector.
      Logical operations: OR, NOT
      are simply supported by bitwise-logical operations: &, |, ^
      CIKM'09, Hong Kong
      19
      Other Features on bitPath
    • CIKM'09, Hong Kong
      20
      Twig Queries with Logical Operations
      P//A,
      P//A//B//X ≡P//A//B//C V P//A//B//D ,
      P//A//E
      A
      A
      A
      A
      B
      E
      B
      E
      X
      (C|D)
      //A[./B/C or ./B/D]//E
      P//A ,
      P//A//E ,
      P//A/B ⓧ(P//A/B ⊙A//A/B/C)
      A
      A
      A
      A
      A
      B
      B
      E
      E
      B
      C
      ¬ C
      //A[./B/not(C)]//E
    • We investigated the possibilities of bitmap indexes for XML query processing
      Partitioning XML stored in RDB in various ways
      Cursor movements do not require decompression of bit-vectors
      We devised a way to identify element relationship with only bitmap index, bitTwig
      Our experiments showed that bitTwig was best for queries against shallow XML documents
      For deep XML documents, bitTag/w advance(k) showed the best performance.
      Future work: evaluating our system with more HTJ algorithms and other indexes
      CIKM'09, Hong Kong
      21
      Conclusions
    • Thanks! Questions?