Bitmap Indexes for Relational XML Twig Query Processing
Upcoming SlideShare
Loading in...5
×
 

Bitmap Indexes for Relational XML Twig Query Processing

on

  • 1,278 views

The slides I presented at CIKM'09

The slides I presented at CIKM'09

Statistics

Views

Total Views
1,278
Views on SlideShare
1,275
Embed Views
3

Actions

Likes
1
Downloads
19
Comments
0

2 Embeds 3

http://www.slideshare.net 2
http://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Bitmap Indexes for Relational XML Twig Query Processing Bitmap Indexes for Relational XML Twig Query Processing Presentation Transcript

  • Kyong-Ha Lee and Bongki Moon
    The University of Arizona
    Bitmap Indexes For Relational XML Twig Query Processing
  • CIKM'09, Hong Kong
    2
    XML Data and Queries
    a1
    0
    (1, 32,1)
    <a>
    <a>
    <b>t1</b>
    <c>
    <d>t2</d>
    <e>t3</e>
    </c>
    </a>
    <a>
    <b>
    <e>t4</e>
    </b>
    <d>
    <c>t5</c>
    </d>
    </a>
    . . . . .
    </a>
    a2
    a3
    a4
    1
    6
    11
    (2,11,2)
    (12,21,2)
    (22,31,2)
    9
    c1
    b1
    7
    12
    15
    2
    3
    d2
    b2
    e3
    b3
    (13,16,3)
    (17,20,3)
    (23,28,3)
    (29,30,3)
    (5,10,3)
    (3,4,3)
    10
    c2
    e2
    d3
    d1
    e1
    8
    c3
    13
    14
    4
    5
    (26,27,4)
    (24,25,4)
    (18,19,4)
    (6,7,4)
    (8,9,4)
    (14,15,4)
    //A/B/C
    //A[//B]//C
    //A[./B/C]//E
    A
    A
    A
    B
    C
    E
    B
    B
    C
    C
  • CIKM'09, Hong Kong
    3
    XML Stored in RDB
    NODE table
    PATH table
    . . .
    . . .
    . . .
  • To answer a twig query
    A twig pattern is decomposed into several path patterns.
    Path solutions are joined together to compose a final result.
    Holistic Twig Join(HTJ) algorithm
    Specialized multi-way& sort-merge join
    guarantees I/O optimality for a certain subset of XML query.
    The optimality depends on how the elements are partitioned.
    uses stacks and streams in which elements are sorted in an order.
    CIKM'09, Hong Kong
    4
    Twig Join
    A
    A
    E
    B
    C
    SA
    A
    A
    SE
    SB
    B
    E
    SC
    C
    Stacks
    Streams
  • Discrepancy between XML in RDB and conventional HTJ algorithms
    Logical: Streams vs. Table
    Physical: partitioned vs. record-oriented
    Supporting actual data including a large volume of texts requires references to records.
    How to feed tuples to HTJ algorithm?
    What’s the best partitioning scheme for XML stored in RDB?
    Bitmap index, a conventional index in RDBMS
    An efficient way to indicate tuples.
    Efficient support for logical operations
    Can we use the bitmap index for supporting HTJ?
    CIKM'09, Hong Kong
    5
    Motivation
  • Tag-based partitioning
    Simple, and skipping technique can be used to read useful elements only.
    For a query node, only one stream is accessed
    Tag+Level partitioning
    More I/O optimality, suitable for deep XML
    Some streams may be accessed for a single query node
    Path-based partitioning
    More I/O optimality, suitable for shallow XML
    A path with //-axes may require accessing many streams for a single query node
    CIKM'09, Hong Kong
    6
    HTJ on Different Partitioning Schemes
  • CIKM'09, Hong Kong
    7
    Bitmap Index
    How to partition tuples in NODE table
    By building a bitmap index on certain column(s) in the table.
    bitTag for tagName,
    bitTag+ for (tagName, Level),
    bitPath for pathId column
    Determines I/O optimality of holistic twig join algorithms.
    During twig join process, useful tuples are accessed via the bitmap index.
    A
    B
    E
    . . .
    110000
    1
    0
    0
    0
    0010000100
    0000010000
    Bit-vectors
    . . .
    disk blocks
  • bitAnc : A bit-vector represents terminal elements corr. to a certain path and all their ancestors.
    bitDesc: A bit–vector represents terminal elements corr. to a certain path and all their descendants.
    CIKM'09, Hong Kong
    8
    Additional Indexes
    a1
    0
    a2
    a3
    a4
    1
    6
    11
    b1
    2
    7
    12
    b2
    b3
    14
    e2
    d3
    8
    c3
    13
    A subtree covered by the left 3 bit-vectors
    bitPath,bitAnc, andbitDescfor PathId=2, i.e. /A/A/B
  • Basic index
    Bit-vectors are built on a single column or a group of columns
    Requires labeled values, and reading records
    Hybrid index
    A Combination of two different indexes
    descTag : bitDesc & bitTag
    bitTwig : bitPath & bitAnc
    does not require labeled values to compute twig solution
    CIKM'09, Hong Kong
    9
    Two Types of Indexes
  • CIKM'09, Hong Kong
    10
    Identifying Element Relationship with Bit-vectors
    a1
    1
    1
    1
    0
    0
    0
    1
    1
    0
    0
    0
    1
    1
    0
    0
    0
    1
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    0
    1100001000010000
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    a2
    b1
    • For a query //A//B, can the pairs (a1, b1) and (a2, b2) be solution?
    b2
    a1
    0
    a2
    a3
    a4
    1
    6
    11
    b1
    2
    7
    12
    b2
    b3
    P2: /A/A/B
    P0: /A
    P1: /A/A
  • Choose the minimum position value among the current 1’s as a current element for a query node
    Check if 1 exists in an interval, pos(a) and pos(d)?
    looking-ahead at the next 1
    CIKM'09, Hong Kong
    11
    Advancing Cursors
    0
    eov
    P0 : /A
    P1 : /A/A
    q : //A
    (0,0,1)
    6
    1
    Currq
    Current1
    Next1
  • Early detection with a bit-vector absence
    Condensing query nodes
    For path-based partition
    Reduces |INDEX| and |RECORD|
    Skipping reading obsolete records with advance(k)
    For tag, (tag, level)-based partition
    Reduces |RECORD|
    Moving cursors over compressed bit-vectors with no decompression
    A composite cursor moving over a bit-vector compressed by run-length encoding scheme
    Reduces |INDEX|
    CIKM'09, Hong Kong
    12
    Optimizations
    A
    A
    E
    B
    E
    C
    C
    P: //A/B/C
    CA = 11
    10000000000100000
    CB = 4
    advance(11)
    00001000010000100
  • CIKM'09, Hong Kong
    13
    Compressed Bit-vector
    000100000000100000000000000011 00000000000 . . . 00000000000000 0000000000000000000000000000001 00
    (a) An original bit-vector with 8,000 bits
    31 bits
    2 bits
    256* 31 bits
    31 bits
    (b) Grouping as a unit of 31 bits and Merging identical groups
    000010…010…011
    100… 0100000000
    000…001
    000…000
    Run-length is 256
    31 literal bits
    Remaining
    word
    Uncompressed word
    Compressed word
    (c) Encoding each group as 1 word (4byte on a 32-bit machine)
    Cursor C
    ={ C.position, //Integer position value (Logical address)
    C. word, // The current word C is located at.
    C.bit, // The position of the bit C is visiting, in C.word
    C. rest } //The bit position in the remaining word
  • CIKM'09, Hong Kong
    14
    Moving A Cursor over A Compressed Bit-vector
    a) Get the position of the next 1
    C = {31, 0, 31,0}
    Skip to examine
    31* 256 bits
    C={7998, 2, 31, 0}
    000010…010…011
    100… 0100000000
    000…001
    000…000
    Remaining
    word
    Run-length is 256
    b) Check a bit value at the position 3,000
    C = {31, 0, 31,0}
    with distance to move,
    2,869=(3000-31)
    Since 31* 256 > 2,869,
    The bit we find is within the word 1.
    000010…010…011
    100… 0100000000
    000…001
    000…000
  • CIKM'09, Hong Kong
    15
    Experiments
    Datasets
    Synthetic : XMark
    Real : DBLP, Treebank, Swiss-prot
    Query sets
  • CIKM'09, Hong Kong
    16
    Statistics of Dataset and Indexes
    • # of distinct paths really varies
    • # of distinct tag names are not much different
    • Index build time is largely
    affected by attribute cardinality
    • Index size is smaller than
    labeled value size in most cases
  • CIKM'09, Hong Kong
    17
    Query Execution Time
  • CIKM'09, Hong Kong
    18
    Input Data Size
  • Merging used bit-vectors for a path pattern with //-axes and putting it into a bitmap index for the next time
    for a given path //A//B, P:/A/A/B P:/A/B
    acts like a pre-computed join index
    A path pattern with //-axes can be represented by a single bit-vector.
    Logical operations: OR, NOT
    are simply supported by bitwise-logical operations: &, |, ^
    CIKM'09, Hong Kong
    19
    Other Features on bitPath
  • CIKM'09, Hong Kong
    20
    Twig Queries with Logical Operations
    P//A,
    P//A//B//X ≡P//A//B//C V P//A//B//D ,
    P//A//E
    A
    A
    A
    A
    B
    E
    B
    E
    X
    (C|D)
    //A[./B/C or ./B/D]//E
    P//A ,
    P//A//E ,
    P//A/B ⓧ(P//A/B ⊙A//A/B/C)
    A
    A
    A
    A
    A
    B
    B
    E
    E
    B
    C
    ¬ C
    //A[./B/not(C)]//E
  • We investigated the possibilities of bitmap indexes for XML query processing
    Partitioning XML stored in RDB in various ways
    Cursor movements do not require decompression of bit-vectors
    We devised a way to identify element relationship with only bitmap index, bitTwig
    Our experiments showed that bitTwig was best for queries against shallow XML documents
    For deep XML documents, bitTag/w advance(k) showed the best performance.
    Future work: evaluating our system with more HTJ algorithms and other indexes
    CIKM'09, Hong Kong
    21
    Conclusions
  • Thanks! Questions?