Your SlideShare is downloading. ×
0
OLAP Indexes
What Is a Data Warehouse?
• “A data warehouse is a subject-oriented,
  integrated, time-variant, and nonvolatile
  collect...
Index Requirements in OLAP
• Data is read only
   – No insertion or deletion
• Query types
   – Point query: look up one s...
OLAP Query Example
• In table (cust, gender, …), find the total
  number of male customers
• Method 1: scan the table once...
Bitmap Index
• For n tuples, a bitmap index has n bits and
  can be packed into ⎡n /8⎤ bytes and ⎡n /32⎤
  words
• From a ...
Using Bitmap to Count
• Shcount[] contains the number of bits in the
  entry subscript
   – shcount[01100101]=4
   count =...
Advantages of Bitmap Index
• Efficient in space
• Ready for logic composition
   – C = C1 AND C2
   – Bitmap operations ca...
Bit-Sliced Index
• A sale amount can be written as an integer
  number of pennies, and then represented as
  a binary numb...
Using Indexes
   SELECT SUM(sales) FROM Sales WHERE C;
   – Tuples satisfying C is identified by a bitmap B
• Direct acces...
Cost Comparison
• Traditional value-list index (B+ tree) is costly
  in both I/O and CPU time
   – Not good for OLAP
• Bit...
Horizontal or Vertical Storage
• A fact table for data warehousing is often fat
    – Tens of even hundreds of dimensions/...
Horizontal Versus Vertical
• Find the information of tuple t
    – Typical in OLTP
    – Horizontal storage: get the whole...
Outline
• OLAP and data warehousing: concepts
• Indexing for OLAP operations
• Data cube computation
   – Computing comple...
MOLAP
                                               Date
                                        2Qtr
           t
      ...
Pros and Cons
•     Easy to implement
•     Fast retrieval
•     Many entries may be empty if data is sparse
•     Space c...
ROLAP – Data Cube in Table
• A multi-dimensional database
            Base table
        Dimensions               Measure
...
Pros and Cons
• Compact in space
• Using mature relational technology
• Overhead in query answering




Jian Pei: Data Min...
HOLAP
• Hybrid OLAP
• Store detail data in ROLAP
   – Save space
• Store aggregates in MOLAP
   – Fast query answering



...
Cube Computation
• Given a base table B(D1, …, Dn, M),
  compute the set of all aggregates in the data
  cube using aggreg...
Aggregate Cells
• A cell c=(c1, …, cn) is called an aggregate
  cell, provided each ci is either in Di or ALL
   – General...
Example
• (S1, ALL, Spring), (S2, P1, Fall) and (S1, ALL,
  Winter) are aggregate cells
• (S2, P1, Fall) is a base cell
• ...
Cuboid Lattice
• All aggregate cells belong to the same
  group by form a cuboid
                                    (Stor...
Cube (Cell) Lattice
                                Tuples in the base table

                  (S1,P1,s):6               ...
A Naïve Approach
• Set one counter for every aggregate cell
• Scan the base table once, compute all
  aggregate cells
• Ti...
Real Data Sets Are Often Sparse
• Many aggregate cells are empty
• Example: 10 dimensions, each dimension
  has cardinalit...
Multiway Aggregate Computation
• Compute ABC
• Compute AB, AC, and                        ABC
  BC from ABC
• Compute A, B...
Main Memory Allocation
• Do we really need to allocate main memory for
  AB, AC, and BC, respectively?
   – The cuboids st...
Multi-way Array Aggregation (1)

                                          C      c3 61
                                  ...
Multi-way Array Aggregation (2)

                                          C      c3 61
                                  ...
Ideas
• Consider a base table (A, B, C), where |A|=10,
  |B|=100, |C|=1000           Order  A-B-C   C-B-A
                ...
The Array-based Approach
• Sort dimensions smartly
   – Keep the smaller aggregate planes in main
     memory
   – Output ...
Summary
• Indexes for OLAP
   – Bitmap index
   – Vertical storage
• Data cube computation – the multiway array
  approach...
Upcoming SlideShare
Loading in...5
×

Basics of Database Tuning

1,027

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,027
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Basics of Database Tuning"

  1. 1. OLAP Indexes
  2. 2. What Is a Data Warehouse? • “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.” – W. H. Inmon • Data warehousing: the process of constructing and using data warehouses Jian Pei: Data Mining -- OLAP Indexes 2
  3. 3. Index Requirements in OLAP • Data is read only – No insertion or deletion • Query types – Point query: look up one specific tuple (rare) – Range query: return the aggregate of a (large) set of tuples, with group by – Complex queries: need specific algorithms and index structures, will be discussed later Jian Pei: Data Mining -- OLAP Indexes 3
  4. 4. OLAP Query Example • In table (cust, gender, …), find the total number of male customers • Method 1: scan the table once • Method 2: build a B+ tree index on attribute gender, still need to access all tuples of male customers • Can we get the count without scanning many tuples? Jian Pei: Data Mining -- OLAP Indexes 4
  5. 5. Bitmap Index • For n tuples, a bitmap index has n bits and can be packed into ⎡n /8⎤ bytes and ⎡n /32⎤ words • From a bit to the row-id: the j-th bit of the p- th byte row-id = p*8 +j cust gender … Jack M … Cathy F … … … … Nancy F … 1 0 … 0 Jian Pei: Data Mining -- OLAP Indexes 5
  6. 6. Using Bitmap to Count • Shcount[] contains the number of bits in the entry subscript – shcount[01100101]=4 count = 0; for (i = 0; i < SHNUM; i++) count += shcount[B[i]]; Jian Pei: Data Mining -- OLAP Indexes 6
  7. 7. Advantages of Bitmap Index • Efficient in space • Ready for logic composition – C = C1 AND C2 – Bitmap operations can be used • Bitmap index only works for categorical data with low cardinality – Naively, we need 50 bits per entry to represent the state of a customer in US – How to represent a sale in dollars? Jian Pei: Data Mining -- OLAP Indexes 7
  8. 8. Bit-Sliced Index • A sale amount can be written as an integer number of pennies, and then represented as a binary number of N bits – 24 bits is good for up to $167,772.15, appropriate for many stores • A bit-sliced index is N bitmaps – Tuple j sets in bitmap k if the k-th bit in its binary representation is on – The space costs of bit-sliced index is the same as storing the data directly Jian Pei: Data Mining -- OLAP Indexes 8
  9. 9. Using Indexes SELECT SUM(sales) FROM Sales WHERE C; – Tuples satisfying C is identified by a bitmap B • Direct access to rows to calculate SUM: scan the whole table once • B+ tree: find the tuples from the tree • Projection index: only scan attribute sales • Bit-sliced index: get the sum from ∑(B AND Bk)*2k Jian Pei: Data Mining -- OLAP Indexes 9
  10. 10. Cost Comparison • Traditional value-list index (B+ tree) is costly in both I/O and CPU time – Not good for OLAP • Bit-sliced index is efficient in I/O • Other case studies in [O’Neil and Quass, SIGMOD’97] Jian Pei: Data Mining -- OLAP Indexes 10
  11. 11. Horizontal or Vertical Storage • A fact table for data warehousing is often fat – Tens of even hundreds of dimensions/attributes • A query is often about only a few attributes • Horizontal storage: tuples are stored one by one • Vertical storage: tuples are stored by attributes A1 A2 … A100 A1 A2 … A100 x1 x2 … x100 x1 x2 … x100 … … … … … … … … z1 z2 … z100 z1 z2 … z100 Jian Pei: Data Mining -- OLAP Indexes 11
  12. 12. Horizontal Versus Vertical • Find the information of tuple t – Typical in OLTP – Horizontal storage: get the whole tuple in one search – Vertical storage: search 100 lists • Find SUM(a100) GROUP BY {a22, a83} – Typical in OLAP – Horizontal storage (no index): search all tuples O(100n), where n is the number of tuples – Vertical storage: search 3 lists O(3n), 3% of the horizontal storage method • Projection index: vertical storage Jian Pei: Data Mining -- OLAP Indexes 12
  13. 13. Outline • OLAP and data warehousing: concepts • Indexing for OLAP operations • Data cube computation – Computing complete data cubes – Compression Jian Pei: Data Mining -- OLAP Indexes 13
  14. 14. MOLAP Date 2Qtr t 1Qtr 3Qtr 4Qtr sum uc TV od PC U.S.A Pr VCR Country sum Canada Mexico sum Jian Pei: Data Mining -- OLAP Indexes 14
  15. 15. Pros and Cons • Easy to implement • Fast retrieval • Many entries may be empty if data is sparse • Space costly Jian Pei: Data Mining -- OLAP Indexes 15
  16. 16. ROLAP – Data Cube in Table • A multi-dimensional database Base table Dimensions Measure Store Product Season Sales Dimensions Measure S1 P1 Spring 6 Store Product Season AVG(Sales) S1 P2 Spring 12 S1 P1 Spring 6 S2 P1 Fall 9 S1 P2 Spring 12 S2 P1 Fall 9 S1 * Spring 9 … … … … * * * 9 Cubing Jian Pei: Data Mining -- OLAP Indexes 16
  17. 17. Pros and Cons • Compact in space • Using mature relational technology • Overhead in query answering Jian Pei: Data Mining -- OLAP Indexes 17
  18. 18. HOLAP • Hybrid OLAP • Store detail data in ROLAP – Save space • Store aggregates in MOLAP – Fast query answering Jian Pei: Data Mining -- OLAP Indexes 18
  19. 19. Cube Computation • Given a base table B(D1, …, Dn, M), compute the set of all aggregates in the data cube using aggregate function f() • D1, …, Dn are n dimensions and M is a measure Jian Pei: Data Mining -- OLAP Indexes 19
  20. 20. Aggregate Cells • A cell c=(c1, …, cn) is called an aggregate cell, provided each ci is either in Di or ALL – Generally, we assume that each tuple in B has distinct dimension values, i.e., (D1, …, Dn) is a key in B – Value ALL is also conventionally written as * • Cell c is a base cell if (c, m) is a tuple in the base table • A cell c is not empty if there are some base cells match all non-ALL attribute values in c Jian Pei: Data Mining -- OLAP Indexes 20
  21. 21. Example • (S1, ALL, Spring), (S2, P1, Fall) and (S1, ALL, Winter) are aggregate cells • (S2, P1, Fall) is a base cell • (S1, ALL, Winter) is empty • (S1, ALL, Spring) = (S1, *, Spring) Base table Dimensions Measure Store Product Season Sales S1 P1 Spring 6 S1 P2 Spring 12 S2 P1 Fall 9 Jian Pei: Data Mining -- OLAP Indexes 21
  22. 22. Cuboid Lattice • All aggregate cells belong to the same group by form a cuboid (Store,Product,Season) (Store,Product,*) (Store,*,Season) (*,Product,Season) Roll up Drill down (Store,*,*) (*,Product,*) (*,*,Season) (*,*,*) Jian Pei: Data Mining -- OLAP Indexes 22
  23. 23. Cube (Cell) Lattice Tuples in the base table (S1,P1,s):6 (S1,P2,s):12 (S2,P1,f):9 (S1,*,s):9 (S1,P1,*):6 (*,P1,s):6 (S1,P2,*):12 (*,P2,s):12 (S2,*,f):9 (S2,P1,*) (*,P1,f):9 (S1,*,*):9 (*,*,s):9 (*,P1,*):7.5 (*,P2,*):12 (*,*,f):9 (S2,*,*):9 (*,*,*):9 Jian Pei: Data Mining -- OLAP Indexes 23
  24. 24. A Naïve Approach • Set one counter for every aggregate cell • Scan the base table once, compute all aggregate cells • Time complexity O(n), where n is the number of tuples in the base table • Space overhead O(π(|Di|+1)) – 10 dimensions, each dimension has cardinality 10, (1110 – 1010) = 15,937,424,601 counters are needed! Jian Pei: Data Mining -- OLAP Indexes 24
  25. 25. Real Data Sets Are Often Sparse • Many aggregate cells are empty • Example: 10 dimensions, each dimension has cardinality 10 – 1010 possible base cells – Even a base table has 1 billion tuples, still at least 90% of the possible base cells are empty! Jian Pei: Data Mining -- OLAP Indexes 25
  26. 26. Multiway Aggregate Computation • Compute ABC • Compute AB, AC, and ABC BC from ABC • Compute A, B, and C AB AC from AB, AC, and BC BC • Compute ALL from A, C A B, C B ALL Jian Pei: Data Mining -- OLAP Indexes 26
  27. 27. Main Memory Allocation • Do we really need to allocate main memory for AB, AC, and BC, respectively? – The cuboids still can be very large • Observation: A, B, and C can be computed from any two of AB, AC, BC ABC AB AC BC A B C ALL Jian Pei: Data Mining -- OLAP Indexes 27
  28. 28. Multi-way Array Aggregation (1) C c3 61 c2 45 62 63 64 46 47 48 c1 29 30 31 32 c0 B13 14 15 16 60 b3 44 B b2 28 56 9 40 24 52 b1 5 36 20 b0 1 2 3 4 a0 a1 a2 a3 A Jian Pei: Data Mining -- OLAP Indexes 28
  29. 29. Multi-way Array Aggregation (2) C c3 61 c2 45 62 63 64 46 47 48 c1 29 30 31 32 c0 B13 14 15 16 60 b3 44 B b2 28 56 9 40 24 52 b1 5 36 20 b0 1 2 3 4 a0 a1 a2 a3 A Jian Pei: Data Mining -- OLAP Indexes 29
  30. 30. Ideas • Consider a base table (A, B, C), where |A|=10, |B|=100, |C|=1000 Order A-B-C C-B-A (A,B,C) 1 1 (A,B,*) 1 10*100 (A,*,C) 1000 10 ABC (*,B,C) 100*1000 1 (A,*,*) 1 10 AB AC BC (*,B,*) 100 100 (*,*,C) 1000 1 A B C (*,*,*) 1 1 Total 102,104 1,124 ALL Jian Pei: Data Mining -- OLAP Indexes 30
  31. 31. The Array-based Approach • Sort dimensions smartly – Keep the smaller aggregate planes in main memory – Output aggregates in larger planes once they are computed, and then reuse the counters • How to make it work when the main memory usage is still too large? – Details in [Zhao, Deshpande and Naughton, SIGMOD 97] Jian Pei: Data Mining -- OLAP Indexes 31
  32. 32. Summary • Indexes for OLAP – Bitmap index – Vertical storage • Data cube computation – the multiway array approach • Can you find some situations where bitmap index and vertical storage are useful? Jian Pei: Data Mining -- OLAP Indexes 32
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×