Bitmap Indexes for Relational XML Twig Query Processing


Published on

The slides I presented at CIKM'09

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Bitmap Indexes for Relational XML Twig Query Processing

  1. 1. Kyong-Ha Lee and Bongki Moon<br />The University of Arizona<br />Bitmap Indexes For Relational XML Twig Query Processing <br />
  2. 2. CIKM&apos;09, Hong Kong<br />2<br />XML Data and Queries<br />a1<br />0<br />(1, 32,1)<br />&lt;a&gt; <br /> &lt;a&gt; <br /> &lt;b&gt;t1&lt;/b&gt;<br /> &lt;c&gt;<br /> &lt;d&gt;t2&lt;/d&gt;<br /> &lt;e&gt;t3&lt;/e&gt;<br /> &lt;/c&gt;<br /> &lt;/a&gt;<br /> &lt;a&gt;<br /> &lt;b&gt;<br /> &lt;e&gt;t4&lt;/e&gt;<br /> &lt;/b&gt;<br /> &lt;d&gt;<br /> &lt;c&gt;t5&lt;/c&gt;<br /> &lt;/d&gt;<br /> &lt;/a&gt;<br />. . . . .<br />&lt;/a&gt; <br />a2<br />a3<br />a4<br />1<br />6<br />11<br />(2,11,2)<br />(12,21,2)<br />(22,31,2)<br />9<br />c1<br />b1<br />7<br />12<br />15<br />2<br />3<br />d2<br />b2<br />e3<br />b3<br />(13,16,3)<br />(17,20,3)<br />(23,28,3)<br />(29,30,3)<br />(5,10,3)<br />(3,4,3)<br />10<br />c2<br />e2<br />d3<br />d1<br />e1<br />8<br />c3<br />13<br />14<br />4<br />5<br />(26,27,4)<br />(24,25,4)<br />(18,19,4)<br />(6,7,4)<br />(8,9,4)<br />(14,15,4)<br />//A/B/C<br />//A[//B]//C<br />//A[./B/C]//E<br />A<br />A<br />A<br />B<br />C<br />E<br />B<br />B<br />C<br />C<br />
  3. 3. CIKM&apos;09, Hong Kong<br />3<br />XML Stored in RDB<br />NODE table<br />PATH table<br />. . .<br />. . .<br />. . .<br />
  4. 4. To answer a twig query<br />A twig pattern is decomposed into several path patterns.<br />Path solutions are joined together to compose a final result. <br />Holistic Twig Join(HTJ) algorithm<br />Specialized multi-way& sort-merge join<br />guarantees I/O optimality for a certain subset of XML query.<br />The optimality depends on how the elements are partitioned.<br />uses stacks and streams in which elements are sorted in an order.<br />CIKM&apos;09, Hong Kong<br />4<br />Twig Join<br />A<br />A<br />E<br />B<br />C<br /> SA<br />A<br />A<br />SE<br />SB<br />B<br />E<br /> SC<br />C<br />Stacks<br />Streams<br />
  5. 5. Discrepancy between XML in RDB and conventional HTJ algorithms<br />Logical: Streams vs. Table<br />Physical: partitioned vs. record-oriented<br />Supporting actual data including a large volume of texts requires references to records.<br />How to feed tuples to HTJ algorithm?<br />What’s the best partitioning scheme for XML stored in RDB?<br />Bitmap index, a conventional index in RDBMS<br />An efficient way to indicate tuples.<br />Efficient support for logical operations<br />Can we use the bitmap index for supporting HTJ?<br />CIKM&apos;09, Hong Kong<br />5<br />Motivation<br />
  6. 6. Tag-based partitioning<br />Simple, and skipping technique can be used to read useful elements only. <br />For a query node, only one stream is accessed<br />Tag+Level partitioning<br />More I/O optimality, suitable for deep XML<br />Some streams may be accessed for a single query node<br />Path-based partitioning<br />More I/O optimality, suitable for shallow XML<br />A path with //-axes may require accessing many streams for a single query node<br />CIKM&apos;09, Hong Kong<br />6<br />HTJ on Different Partitioning Schemes<br />
  7. 7. CIKM&apos;09, Hong Kong<br />7<br />Bitmap Index<br />How to partition tuples in NODE table <br />By building a bitmap index on certain column(s) in the table.<br />bitTag for tagName, <br />bitTag+ for (tagName, Level), <br />bitPath for pathId column<br />Determines I/O optimality of holistic twig join algorithms.<br />During twig join process, useful tuples are accessed via the bitmap index. <br />A<br />B<br />E<br />. . . <br />110000<br />1<br />0<br />0<br />0<br />0010000100<br />0000010000<br />Bit-vectors<br />. . .<br />disk blocks<br />
  8. 8. bitAnc : A bit-vector represents terminal elements corr. to a certain path and all their ancestors. <br />bitDesc: A bit–vector represents terminal elements corr. to a certain path and all their descendants.<br />CIKM&apos;09, Hong Kong<br />8<br />Additional Indexes<br />a1<br />0<br />a2<br />a3<br />a4<br />1<br />6<br />11<br />b1<br />2<br />7<br />12<br />b2<br />b3<br />14<br />e2<br />d3<br />8<br />c3<br />13<br />A subtree covered by the left 3 bit-vectors<br />bitPath,bitAnc, andbitDescfor PathId=2, i.e. /A/A/B<br />
  9. 9. Basic index<br />Bit-vectors are built on a single column or a group of columns<br />Requires labeled values, and reading records <br />Hybrid index<br />A Combination of two different indexes<br />descTag : bitDesc & bitTag<br />bitTwig : bitPath & bitAnc<br />does not require labeled values to compute twig solution<br />CIKM&apos;09, Hong Kong<br />9<br />Two Types of Indexes<br />
  10. 10. CIKM&apos;09, Hong Kong<br />10<br />Identifying Element Relationship with Bit-vectors<br />a1<br />1<br />1<br />1<br />0<br />0<br />0<br />1<br />1<br />0<br />0<br />0<br />1<br />1<br />0<br />0<br />0<br />1<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />1100001000010000<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />a2<br />b1<br /><ul><li>For a query //A//B, can the pairs (a1, b1) and (a2, b2) be solution?</li></ul>b2<br />a1<br />0<br />a2<br />a3<br />a4<br />1<br />6<br />11<br />b1<br />2<br />7<br />12<br />b2<br />b3<br />P2: /A/A/B<br />P0: /A<br />P1: /A/A<br />
  11. 11. Choose the minimum position value among the current 1’s as a current element for a query node<br />Check if 1 exists in an interval, pos(a) and pos(d)?<br />looking-ahead at the next 1<br />CIKM&apos;09, Hong Kong<br />11<br />Advancing Cursors<br />0 <br />eov<br />P0 : /A <br />P1 : /A/A<br />q : //A <br />(0,0,1) <br />6<br />1<br />Currq<br />Current1<br />Next1<br />
  12. 12. Early detection with a bit-vector absence<br />Condensing query nodes<br />For path-based partition<br />Reduces |INDEX| and |RECORD|<br />Skipping reading obsolete records with advance(k)<br />For tag, (tag, level)-based partition<br />Reduces |RECORD|<br />Moving cursors over compressed bit-vectors with no decompression<br />A composite cursor moving over a bit-vector compressed by run-length encoding scheme<br />Reduces |INDEX|<br />CIKM&apos;09, Hong Kong<br />12<br />Optimizations<br />A<br />A<br />E<br />B<br />E<br />C<br />C<br />P: //A/B/C<br />CA = 11<br />10000000000100000<br />CB = 4<br />advance(11)<br />00001000010000100<br />
  13. 13. CIKM&apos;09, Hong Kong<br />13<br />Compressed Bit-vector<br />000100000000100000000000000011 00000000000 . . . 00000000000000 0000000000000000000000000000001 00<br />(a) An original bit-vector with 8,000 bits<br />31 bits<br />2 bits<br />256* 31 bits<br />31 bits<br />(b) Grouping as a unit of 31 bits and Merging identical groups<br />000010…010…011<br />100… 0100000000<br />000…001<br />000…000<br />Run-length is 256<br />31 literal bits<br />Remaining<br />word<br />Uncompressed word <br />Compressed word<br />(c) Encoding each group as 1 word (4byte on a 32-bit machine)<br />Cursor C <br /> ={ C.position, //Integer position value (Logical address)<br /> C. word, // The current word C is located at.<br /> C.bit, // The position of the bit C is visiting, in C.word<br /> C. rest } //The bit position in the remaining word<br />
  14. 14. CIKM&apos;09, Hong Kong<br />14<br />Moving A Cursor over A Compressed Bit-vector<br />a) Get the position of the next 1<br />C = {31, 0, 31,0}<br />Skip to examine<br /> 31* 256 bits<br />C={7998, 2, 31, 0}<br />000010…010…011<br />100… 0100000000<br />000…001<br />000…000<br />Remaining<br />word<br />Run-length is 256<br />b) Check a bit value at the position 3,000<br />C = {31, 0, 31,0}<br />with distance to move, <br />2,869=(3000-31)<br />Since 31* 256 &gt; 2,869,<br />The bit we find is within the word 1. <br />000010…010…011<br />100… 0100000000<br />000…001<br />000…000<br />
  15. 15. CIKM&apos;09, Hong Kong<br />15<br />Experiments<br />Datasets <br />Synthetic : XMark<br />Real : DBLP, Treebank, Swiss-prot<br />Query sets<br />
  16. 16. CIKM&apos;09, Hong Kong<br />16<br />Statistics of Dataset and Indexes<br /><ul><li># of distinct paths really varies
  17. 17. # of distinct tag names are not much different
  18. 18. Index build time is largely</li></ul>affected by attribute cardinality<br /><ul><li>Index size is smaller than </li></ul> labeled value size in most cases <br />
  19. 19. CIKM&apos;09, Hong Kong<br />17<br />Query Execution Time<br />
  20. 20. CIKM&apos;09, Hong Kong<br />18<br />Input Data Size<br />
  21. 21. Merging used bit-vectors for a path pattern with //-axes and putting it into a bitmap index for the next time<br />for a given path //A//B, P:/A/A/B P:/A/B<br />acts like a pre-computed join index<br />A path pattern with //-axes can be represented by a single bit-vector.<br />Logical operations: OR, NOT<br /> are simply supported by bitwise-logical operations: &, |, ^<br />CIKM&apos;09, Hong Kong<br />19<br />Other Features on bitPath<br />
  22. 22. CIKM&apos;09, Hong Kong<br />20<br />Twig Queries with Logical Operations<br />P//A,<br />P//A//B//X ≡P//A//B//C V P//A//B//D ,<br />P//A//E<br />A<br />A<br />A<br />A<br />B<br />E<br />B<br />E<br />X<br />(C|D)<br />//A[./B/C or ./B/D]//E<br />P//A ,<br />P//A//E ,<br />P//A/B ⓧ(P//A/B ⊙A//A/B/C)<br />A<br />A<br />A<br />A<br />A<br />B<br />B<br />E<br />E<br />B<br />C<br />¬ C<br />//A[./B/not(C)]//E<br />
  23. 23. We investigated the possibilities of bitmap indexes for XML query processing<br />Partitioning XML stored in RDB in various ways<br />Cursor movements do not require decompression of bit-vectors<br />We devised a way to identify element relationship with only bitmap index, bitTwig<br />Our experiments showed that bitTwig was best for queries against shallow XML documents <br />For deep XML documents, bitTag/w advance(k) showed the best performance.<br />Future work: evaluating our system with more HTJ algorithms and other indexes<br />CIKM&apos;09, Hong Kong<br />21<br />Conclusions<br />
  24. 24. Thanks! Questions?<br />