Your SlideShare is downloading. ×
0
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Bitmap Indexes for Relational XML Twig Query Processing

884

Published on

The slides I presented at CIKM'09

The slides I presented at CIKM'09

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
884
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
22
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Kyong-Ha Lee and Bongki Moon<br />The University of Arizona<br />Bitmap Indexes For Relational XML Twig Query Processing <br />
  • 2. CIKM&apos;09, Hong Kong<br />2<br />XML Data and Queries<br />a1<br />0<br />(1, 32,1)<br />&lt;a&gt; <br /> &lt;a&gt; <br /> &lt;b&gt;t1&lt;/b&gt;<br /> &lt;c&gt;<br /> &lt;d&gt;t2&lt;/d&gt;<br /> &lt;e&gt;t3&lt;/e&gt;<br /> &lt;/c&gt;<br /> &lt;/a&gt;<br /> &lt;a&gt;<br /> &lt;b&gt;<br /> &lt;e&gt;t4&lt;/e&gt;<br /> &lt;/b&gt;<br /> &lt;d&gt;<br /> &lt;c&gt;t5&lt;/c&gt;<br /> &lt;/d&gt;<br /> &lt;/a&gt;<br />. . . . .<br />&lt;/a&gt; <br />a2<br />a3<br />a4<br />1<br />6<br />11<br />(2,11,2)<br />(12,21,2)<br />(22,31,2)<br />9<br />c1<br />b1<br />7<br />12<br />15<br />2<br />3<br />d2<br />b2<br />e3<br />b3<br />(13,16,3)<br />(17,20,3)<br />(23,28,3)<br />(29,30,3)<br />(5,10,3)<br />(3,4,3)<br />10<br />c2<br />e2<br />d3<br />d1<br />e1<br />8<br />c3<br />13<br />14<br />4<br />5<br />(26,27,4)<br />(24,25,4)<br />(18,19,4)<br />(6,7,4)<br />(8,9,4)<br />(14,15,4)<br />//A/B/C<br />//A[//B]//C<br />//A[./B/C]//E<br />A<br />A<br />A<br />B<br />C<br />E<br />B<br />B<br />C<br />C<br />
  • 3. CIKM&apos;09, Hong Kong<br />3<br />XML Stored in RDB<br />NODE table<br />PATH table<br />. . .<br />. . .<br />. . .<br />
  • 4. To answer a twig query<br />A twig pattern is decomposed into several path patterns.<br />Path solutions are joined together to compose a final result. <br />Holistic Twig Join(HTJ) algorithm<br />Specialized multi-way& sort-merge join<br />guarantees I/O optimality for a certain subset of XML query.<br />The optimality depends on how the elements are partitioned.<br />uses stacks and streams in which elements are sorted in an order.<br />CIKM&apos;09, Hong Kong<br />4<br />Twig Join<br />A<br />A<br />E<br />B<br />C<br /> SA<br />A<br />A<br />SE<br />SB<br />B<br />E<br /> SC<br />C<br />Stacks<br />Streams<br />
  • 5. Discrepancy between XML in RDB and conventional HTJ algorithms<br />Logical: Streams vs. Table<br />Physical: partitioned vs. record-oriented<br />Supporting actual data including a large volume of texts requires references to records.<br />How to feed tuples to HTJ algorithm?<br />What’s the best partitioning scheme for XML stored in RDB?<br />Bitmap index, a conventional index in RDBMS<br />An efficient way to indicate tuples.<br />Efficient support for logical operations<br />Can we use the bitmap index for supporting HTJ?<br />CIKM&apos;09, Hong Kong<br />5<br />Motivation<br />
  • 6. Tag-based partitioning<br />Simple, and skipping technique can be used to read useful elements only. <br />For a query node, only one stream is accessed<br />Tag+Level partitioning<br />More I/O optimality, suitable for deep XML<br />Some streams may be accessed for a single query node<br />Path-based partitioning<br />More I/O optimality, suitable for shallow XML<br />A path with //-axes may require accessing many streams for a single query node<br />CIKM&apos;09, Hong Kong<br />6<br />HTJ on Different Partitioning Schemes<br />
  • 7. CIKM&apos;09, Hong Kong<br />7<br />Bitmap Index<br />How to partition tuples in NODE table <br />By building a bitmap index on certain column(s) in the table.<br />bitTag for tagName, <br />bitTag+ for (tagName, Level), <br />bitPath for pathId column<br />Determines I/O optimality of holistic twig join algorithms.<br />During twig join process, useful tuples are accessed via the bitmap index. <br />A<br />B<br />E<br />. . . <br />110000<br />1<br />0<br />0<br />0<br />0010000100<br />0000010000<br />Bit-vectors<br />. . .<br />disk blocks<br />
  • 8. bitAnc : A bit-vector represents terminal elements corr. to a certain path and all their ancestors. <br />bitDesc: A bit–vector represents terminal elements corr. to a certain path and all their descendants.<br />CIKM&apos;09, Hong Kong<br />8<br />Additional Indexes<br />a1<br />0<br />a2<br />a3<br />a4<br />1<br />6<br />11<br />b1<br />2<br />7<br />12<br />b2<br />b3<br />14<br />e2<br />d3<br />8<br />c3<br />13<br />A subtree covered by the left 3 bit-vectors<br />bitPath,bitAnc, andbitDescfor PathId=2, i.e. /A/A/B<br />
  • 9. Basic index<br />Bit-vectors are built on a single column or a group of columns<br />Requires labeled values, and reading records <br />Hybrid index<br />A Combination of two different indexes<br />descTag : bitDesc & bitTag<br />bitTwig : bitPath & bitAnc<br />does not require labeled values to compute twig solution<br />CIKM&apos;09, Hong Kong<br />9<br />Two Types of Indexes<br />
  • 10. CIKM&apos;09, Hong Kong<br />10<br />Identifying Element Relationship with Bit-vectors<br />a1<br />1<br />1<br />1<br />0<br />0<br />0<br />1<br />1<br />0<br />0<br />0<br />1<br />1<br />0<br />0<br />0<br />1<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />0<br />1100001000010000<br />0<br />1<br />2<br />3<br />4<br />5<br />6<br />7<br />8<br />9<br />10<br />11<br />12<br />13<br />14<br />15<br />a2<br />b1<br /><ul><li>For a query //A//B, can the pairs (a1, b1) and (a2, b2) be solution?</li></ul>b2<br />a1<br />0<br />a2<br />a3<br />a4<br />1<br />6<br />11<br />b1<br />2<br />7<br />12<br />b2<br />b3<br />P2: /A/A/B<br />P0: /A<br />P1: /A/A<br />
  • 11. Choose the minimum position value among the current 1’s as a current element for a query node<br />Check if 1 exists in an interval, pos(a) and pos(d)?<br />looking-ahead at the next 1<br />CIKM&apos;09, Hong Kong<br />11<br />Advancing Cursors<br />0 <br />eov<br />P0 : /A <br />P1 : /A/A<br />q : //A <br />(0,0,1) <br />6<br />1<br />Currq<br />Current1<br />Next1<br />
  • 12. Early detection with a bit-vector absence<br />Condensing query nodes<br />For path-based partition<br />Reduces |INDEX| and |RECORD|<br />Skipping reading obsolete records with advance(k)<br />For tag, (tag, level)-based partition<br />Reduces |RECORD|<br />Moving cursors over compressed bit-vectors with no decompression<br />A composite cursor moving over a bit-vector compressed by run-length encoding scheme<br />Reduces |INDEX|<br />CIKM&apos;09, Hong Kong<br />12<br />Optimizations<br />A<br />A<br />E<br />B<br />E<br />C<br />C<br />P: //A/B/C<br />CA = 11<br />10000000000100000<br />CB = 4<br />advance(11)<br />00001000010000100<br />
  • 13. CIKM&apos;09, Hong Kong<br />13<br />Compressed Bit-vector<br />000100000000100000000000000011 00000000000 . . . 00000000000000 0000000000000000000000000000001 00<br />(a) An original bit-vector with 8,000 bits<br />31 bits<br />2 bits<br />256* 31 bits<br />31 bits<br />(b) Grouping as a unit of 31 bits and Merging identical groups<br />000010…010…011<br />100… 0100000000<br />000…001<br />000…000<br />Run-length is 256<br />31 literal bits<br />Remaining<br />word<br />Uncompressed word <br />Compressed word<br />(c) Encoding each group as 1 word (4byte on a 32-bit machine)<br />Cursor C <br /> ={ C.position, //Integer position value (Logical address)<br /> C. word, // The current word C is located at.<br /> C.bit, // The position of the bit C is visiting, in C.word<br /> C. rest } //The bit position in the remaining word<br />
  • 14. CIKM&apos;09, Hong Kong<br />14<br />Moving A Cursor over A Compressed Bit-vector<br />a) Get the position of the next 1<br />C = {31, 0, 31,0}<br />Skip to examine<br /> 31* 256 bits<br />C={7998, 2, 31, 0}<br />000010…010…011<br />100… 0100000000<br />000…001<br />000…000<br />Remaining<br />word<br />Run-length is 256<br />b) Check a bit value at the position 3,000<br />C = {31, 0, 31,0}<br />with distance to move, <br />2,869=(3000-31)<br />Since 31* 256 &gt; 2,869,<br />The bit we find is within the word 1. <br />000010…010…011<br />100… 0100000000<br />000…001<br />000…000<br />
  • 15. CIKM&apos;09, Hong Kong<br />15<br />Experiments<br />Datasets <br />Synthetic : XMark<br />Real : DBLP, Treebank, Swiss-prot<br />Query sets<br />
  • 16. CIKM&apos;09, Hong Kong<br />16<br />Statistics of Dataset and Indexes<br /><ul><li># of distinct paths really varies
  • 17. # of distinct tag names are not much different
  • 18. Index build time is largely</li></ul>affected by attribute cardinality<br /><ul><li>Index size is smaller than </li></ul> labeled value size in most cases <br />
  • 19. CIKM&apos;09, Hong Kong<br />17<br />Query Execution Time<br />
  • 20. CIKM&apos;09, Hong Kong<br />18<br />Input Data Size<br />
  • 21. Merging used bit-vectors for a path pattern with //-axes and putting it into a bitmap index for the next time<br />for a given path //A//B, P:/A/A/B P:/A/B<br />acts like a pre-computed join index<br />A path pattern with //-axes can be represented by a single bit-vector.<br />Logical operations: OR, NOT<br /> are simply supported by bitwise-logical operations: &, |, ^<br />CIKM&apos;09, Hong Kong<br />19<br />Other Features on bitPath<br />
  • 22. CIKM&apos;09, Hong Kong<br />20<br />Twig Queries with Logical Operations<br />P//A,<br />P//A//B//X ≡P//A//B//C V P//A//B//D ,<br />P//A//E<br />A<br />A<br />A<br />A<br />B<br />E<br />B<br />E<br />X<br />(C|D)<br />//A[./B/C or ./B/D]//E<br />P//A ,<br />P//A//E ,<br />P//A/B ⓧ(P//A/B ⊙A//A/B/C)<br />A<br />A<br />A<br />A<br />A<br />B<br />B<br />E<br />E<br />B<br />C<br />¬ C<br />//A[./B/not(C)]//E<br />
  • 23. We investigated the possibilities of bitmap indexes for XML query processing<br />Partitioning XML stored in RDB in various ways<br />Cursor movements do not require decompression of bit-vectors<br />We devised a way to identify element relationship with only bitmap index, bitTwig<br />Our experiments showed that bitTwig was best for queries against shallow XML documents <br />For deep XML documents, bitTag/w advance(k) showed the best performance.<br />Future work: evaluating our system with more HTJ algorithms and other indexes<br />CIKM&apos;09, Hong Kong<br />21<br />Conclusions<br />
  • 24. Thanks! Questions?<br />

×