Improved Query Performance With Variant Indexes - review presentation

Improved Query Performance
With Variant Indexes
paper by Patrick O’Neil & Dallan Quass
Presented by V B Wickramasinghe - 148245F

Overview
•What are database indexes?
•Value-List Index (B+
tree)
•Bitmap Indexes
•Projection Indexes
•Bit-Sliced Indexes
•Comparison of Index Types for different queries.
•Conclusion

What are database indexes?
“A database index is a data structure that improves the speed of data retrieval
operations on a database table at the cost of additional writes and the use of
more storage space to maintain the extra copy of data.”
- Wikipedia

Value-List Index (B+ tree)
Used to retrieve rows of a table with specified values involving one or more
indexed columns.
Leaf values of the B+ tree consists of a sequence of entries for index key
values. Each key value entry references the set of rows with that value.
----usually B+ trees references each row as RID(Row ID)
Problem-
In indexes with a relatively small number of key values compared to
the number of rows, most key values will have large number of
associated RID’s

Bitmap Indexes
•Bitmap is an alternate method of representing RID in a Value-List index A
“Bitmap B” is defined on table T as a sequence of M bits, where for each row
with ordinal number j, we set the jth
bit in B if that row satisfies the property of
the index.

Bitmap Indexes (contd..)
Bitmap Index Performance :-
• boolean operations such as AND, OR, and Not are extremely fast for
Bitmaps
--- AND
for ( I=0 ; I < len(B1) ; I++)
B3[I] = B1[I] & B2[I];
--- OR
for ( I=0 ; I < len(B1) ; I++)
B3[I] = B1[I] || B2[I];
---NOT
for ( I=0 ; I < len(B1) ; I++)
B3[I] = ~B1[I] & EBM[I];
These loops to calculate AND,OR, or NOT are extremely fast compared
to same done on RID lists as long as the Bitmaps involved have reasonably
high density.

Advantaged of Bitmap Indexes
•Fast operations
—The most common operations are the bitwise logical operations
—They are well supported by hardware
•Easy to compress, potentially small index size.
•Each individual bitmap is small and frequently used ones can be cached in memory.
•Efficient for read-mostly data: data produced from scientific experiments can be appended in large groups.
•Bitmaps are more space efficient than RID lists in a Value-List index.
•Disk savings are enormous in case when Bitmaps are dense i.e number of one bits as a proportion of all bits in a
Bitmap are large.

Projection Index
A projection index for column duplicates all column values for lookup by ordinal
number .

Projection Index (contd..)
----projection index turns out to be quite useful in cases where column values must be retrieved for all
rows of the foundset, where the foundset is dense enough such that several column values would
probably be found on the same disk page of the projection index, but rows themselves being larger,
would appear on several different pages
----faster to retrieve a foundset of column value from projection index
than from rows themselves

Bit-sliced Index
A Bit-Sliced index for a column creates Bitmaps for significant bits in
the column value,
Bnn
– bitmap representing set of non null values in the indexed column

Comparison of Indexes evaluating Single-Column Sum
Aggregates
Plan 1 : Direct access to the rows to calculate the Sum
-- costs includes I/O cost and CPU cost for getting proper row and
column value from buffer resident page.
Plan 2 : Calculating Sum through a Projection Index
Plan 3 : Calculating the Sum through a Bitmap Index
if (COUNT (Bf
AND Bnn
) = = 0)
Return null;
SUM = 0.0;
for each non-null value v in the index for C {
Designate the set of rows with value v as Bv
SUM += v * COUNT(Bf
AND BV
);
}
Return SUM;

Aggregates
Plan 4 : Calculating the SUM through a Bit-Sliced Index
if (COUNT (Bf
AND Bnn
) = = 0)
Return null;
SUM = 0.0;
for i = 0 to N
SUM += 2i
* COUNT(Bi
AND Bf
);
Return SUM;

Aggregates

Evaluating Range Predicates
SELECT target-list FROM T WHERE C-range AND <condition>
Plan 1 : Evaluating the Range using the Projection Index :-
--- we create BF
in a straightforward way by accessing each C value
in the index corresponding to an ordinal row number in Bf
and testing
whether it lies with the specified range
Plan 2 : Evaluating range predicates using Value-List index :-
Br
= empty set;
for each entry v in the index for C that satisfies the range specified
Designate the set of rows with the value v as Bv
Br
= Br
OR Bv
BF
= Bf
AND Br

Evaluating OLAP-style Queries
Data warehouses are often built to support OLAP. OLAP query performance depends on creating a
set of summary tables to efficiently evaluate an expected set of queries. This approach is possible
only when the expected set of queries is known in advance. Specifically, the OLAP Approach
addresses queries that group by different combination columns, known as dimensions. But when ad-
hoc queries must be issued that selects the rows on the criteria that are not part of the dimensional
scheme, summary tables that don’t foresee such selection cannot be used and in
such cases other indexes on the base data must be used.
➢ Join Indexes and Bitmap-Join-Indexes are tried to address this problem.

Join Indexes
A join index is an index on one table that involves a column value from different table through a
commonly encountered join.
but, if there are numerous columns used for restrictions in each dimension table, then the number of
star join indexes needed to be able to combine any single column choice from each dimension table is
a product of the number of columns in each dimension. There will be a combinatorial
explosion of join indexes in terms of the number of useful columns.
✓ bitmap join index addresses this problem

Bitmap Join Index
It is an index on a table T based on a single column of a table S, where S commonly joins with T in a
specified way.
Obviously the number of indexes of this kind increases linearly with the number of useful columns in
the dimension tables, but the speed of combining Bitmapped indexes to create ad-hoc combinations
takes care of the problem of explosion of star join indexes.

Segmentation: It is one of the methods used for speed up the query with one or more group-by
attributes.
Clustering: This is clustering for the finely divided fact table F to improve the performance. Not only
queries on fact tables, it also improves group-by queries on the dimensions.
Improved Grouping Efficiency Using Segmentation &
Clustering

● Read-mostly environment of data warehousing has made it feasible to use
more complex index structures to speed up the evaluation of queries.
● The paper shows how ad-hoc OLAP style queries involving aggregation
and grouping can be efficiently evaluated using indexing and clustering.
● The paper also introduces a new index type, Groupset indexes, that are
especially well-suited for evaluating OLAP type of query.
Conclusion

Improved Query Performance With Variant Indexes - review presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Improved Query Performance With Variant Indexes - review presentation

Similar to Improved Query Performance With Variant Indexes - review presentation (20)

More from Vimukthi Wickramasinghe

More from Vimukthi Wickramasinghe (8)

Recently uploaded

Recently uploaded (20)

Improved Query Performance With Variant Indexes - review presentation