SlideShare a Scribd company logo
1 of 18
Download to read offline
DATA WAREHOUSING
Physical Design
2
   Provide efficient access to relevant records
     Based on values of particular attribute(s)
   Same idea as index in back of a book
   An index is a “thin” copy of a relation
     Not all columns from the relation are included
     The index is sorted in a particular way
   Index supports efficient lookup
     Useful when filters are selective
     Avoid scanning rows that will be filtered out
   Indexes organized based on some search key
     Column (or set of columns) whose values are used to access the index
     Organization can be sorting or hashing
   Index is built for some relation
     One index entry per record in the relation
   Index consists of <Value, RID> pairs
     Value = value of the search key for this record
     RID = record identifier
      ▪ Tells the DBMS where the record is stored
      ▪ Usually (page number, offset in page)
   Traditional Access Methods
     B-trees, hash tables, R-trees, grids, …
   Popular in Warehouses
     Covering indexes
     Multi column indexes
     join indexes
     bit map indexes



                                                5
   Idea behind fact index:
     Thinner version of fact table
     Index takes up less space than fact table
     Fewer I/Os required to scan it
   Index has 1 index entry per fact table row
     Regardless of how many columns are in the
      index
   Sometimes an index has all the data you need
     Allows index-only query plan
     Not necessary to access the actual tuples
     Such an index is called a covering index

   SELECT COUNT(*) FROM R WHERE A=5
     Use index on A
     Count number of <5,RID> entries
     No need to look up records referenced by RIDs
   Multi-column indexes are very useful in data warehousing
     We say such an index has a composite key
   Example: B-Tree index on (A,B)
       Search key is (A,B) combination
       Index entries sorted by A value
       Entries with same A value are sorted by B value
       Called a lexicographic sort
   SELECT SUM(B) FROM R WHERE A=5
     Our (A,B) index covers this query!
   Coverage vs. size trade-off
     More attributes in search key → index covers more queries
     More attributes in search key → index takes up more disk space
10
   Advantages
     efficient computation of joins involving first index
      columns (or all columns)
   Disadvantages
     useful only for specific join combinations
      ▪ for general usage, it is necessary to store a high number
        of indices
     required space may be significant
      ▪ joins always involve the fact table


                                                                    11
Base table              Index on Region                  Index on Type
Cust   Region    Type RecIDAsia Europe America RecID Retail Dealer
C1     Asia      Retail 1   1     0      0       1     1      0
C2     Europe    Dealer 2   0     1      0       2     0      1
C3     Asia      Dealer 3   1     0      0       3     0      1
C4     America   Retail 4   0     0      1       4     1      0
C5     Europe    Dealer 5   0     1      0       5     0      1

       Query:
          Get customer with region = „Asia‟ AND type = “Dealer”




                                                                           12
   Good if domain cardinality small
     Most useful for attributes with low or
      medium cardinality
      ▪ Not good for something like LastName




                                               13
   Index intersection plans with bitmap indexes
    are fast
     Just perform bitwise AND!
     Index intersection with B-Trees requires a
      join
   Save space for low-cardinality attributes
     As compared to a B-Tree or Hash index
   Bit vectors can be compressed
   Compression Pros and Cons
     Reduce storage space → reduce number of I/Os required
     Need to compress/uncompress → increase CPU work
      required
     Each compression scheme negotiates this trade-off
      differently
     Operate directly on compressed bitmap → improved
      performance




                                                              16
   Bit matrix which precomputes the join between a
    dimension and the fact table
     one column for each dimension RID
     one row for each fact table RID
     cell (i,j) is 1 if fact table tuple i joins dimension tuple j, 0
      otherwise
   Indexing dimensions
     attributes frequently involved in selection predicates
     if domain cardinality is high, then B-tree index
     if domain cardinality is low, then bitmap index
   Indices for join
     indexing only foreign keys in the fact table is rarely
      appropriate
     star join index should be used with caution (column order
      issue)
     bitmapped join index is suggested (if available)
   Indices for group by
     use materialized views

More Related Content

Viewers also liked

Viewers also liked (11)

1 introba
1 introba1 introba
1 introba
 
3 olap storage
3 olap storage3 olap storage
3 olap storage
 
Intro bi
Intro biIntro bi
Intro bi
 
Agreggates i
Agreggates iAgreggates i
Agreggates i
 
Diseño fisico particiones_3
Diseño fisico particiones_3Diseño fisico particiones_3
Diseño fisico particiones_3
 
Dw design 4_bus_architecture
Dw design 4_bus_architectureDw design 4_bus_architecture
Dw design 4_bus_architecture
 
Conceptes Bàsics de TBL
Conceptes Bàsics de TBLConceptes Bàsics de TBL
Conceptes Bàsics de TBL
 
2 olap operaciones
2 olap operaciones2 olap operaciones
2 olap operaciones
 
Dw design 1_dim_facts
Dw design 1_dim_factsDw design 1_dim_facts
Dw design 1_dim_facts
 
3 olap storage
3 olap storage3 olap storage
3 olap storage
 
Dw design 2_conceptual_model
Dw design 2_conceptual_modelDw design 2_conceptual_model
Dw design 2_conceptual_model
 

Similar to Diseño fisico indices_2

AWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDBAWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDBAmazon Web Services
 
Database Performance
Database PerformanceDatabase Performance
Database PerformanceBoris Hristov
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL IndexingBADR
 
Improved Query Performance With Variant Indexes - review presentation
Improved Query Performance With Variant Indexes - review presentationImproved Query Performance With Variant Indexes - review presentation
Improved Query Performance With Variant Indexes - review presentationVimukthi Wickramasinghe
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performanceguest9912e5
 
Performance By Design
Performance By DesignPerformance By Design
Performance By DesignGuy Harrison
 
Geek Sync | The Universe of Oracle Indexing
Geek Sync | The Universe of Oracle IndexingGeek Sync | The Universe of Oracle Indexing
Geek Sync | The Universe of Oracle IndexingIDERA Software
 
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayGetting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayAmazon Web Services Korea
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11AnwarrChaudary
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14AnwarrChaudary
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19AnwarrChaudary
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Spark Summit
 
Mongo Performance Optimization Using Indexing
Mongo Performance Optimization Using IndexingMongo Performance Optimization Using Indexing
Mongo Performance Optimization Using IndexingChinmay Naik
 
Indexing structure for files
Indexing structure for filesIndexing structure for files
Indexing structure for filesZainab Almugbel
 

Similar to Diseño fisico indices_2 (20)

AWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDBAWS July Webinar Series - Getting Started with Amazon DynamoDB
AWS July Webinar Series - Getting Started with Amazon DynamoDB
 
Database Performance
Database PerformanceDatabase Performance
Database Performance
 
MySQL Indexing
MySQL IndexingMySQL Indexing
MySQL Indexing
 
9223301.ppt
9223301.ppt9223301.ppt
9223301.ppt
 
Improved Query Performance With Variant Indexes - review presentation
Improved Query Performance With Variant Indexes - review presentationImproved Query Performance With Variant Indexes - review presentation
Improved Query Performance With Variant Indexes - review presentation
 
Unit08 dbms
Unit08 dbmsUnit08 dbms
Unit08 dbms
 
Tunning overview
Tunning overviewTunning overview
Tunning overview
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
Performance By Design
Performance By DesignPerformance By Design
Performance By Design
 
Dbms schemas for decision support
Dbms schemas for decision supportDbms schemas for decision support
Dbms schemas for decision support
 
Geek Sync | The Universe of Oracle Indexing
Geek Sync | The Universe of Oracle IndexingGeek Sync | The Universe of Oracle Indexing
Geek Sync | The Universe of Oracle Indexing
 
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB DayGetting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
Getting Strated with Amazon Dynamo DB (Jim Scharf) - AWS DB Day
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19
 
Optimizing spatial database
Optimizing spatial databaseOptimizing spatial database
Optimizing spatial database
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
Mongo Performance Optimization Using Indexing
Mongo Performance Optimization Using IndexingMongo Performance Optimization Using Indexing
Mongo Performance Optimization Using Indexing
 
Indexing structure for files
Indexing structure for filesIndexing structure for files
Indexing structure for files
 

Diseño fisico indices_2

  • 2. 2
  • 3. Provide efficient access to relevant records  Based on values of particular attribute(s)  Same idea as index in back of a book  An index is a “thin” copy of a relation  Not all columns from the relation are included  The index is sorted in a particular way  Index supports efficient lookup  Useful when filters are selective  Avoid scanning rows that will be filtered out
  • 4. Indexes organized based on some search key  Column (or set of columns) whose values are used to access the index  Organization can be sorting or hashing  Index is built for some relation  One index entry per record in the relation  Index consists of <Value, RID> pairs  Value = value of the search key for this record  RID = record identifier ▪ Tells the DBMS where the record is stored ▪ Usually (page number, offset in page)
  • 5. Traditional Access Methods  B-trees, hash tables, R-trees, grids, …  Popular in Warehouses  Covering indexes  Multi column indexes  join indexes  bit map indexes 5
  • 6. Idea behind fact index:  Thinner version of fact table  Index takes up less space than fact table  Fewer I/Os required to scan it
  • 7. Index has 1 index entry per fact table row  Regardless of how many columns are in the index
  • 8. Sometimes an index has all the data you need  Allows index-only query plan  Not necessary to access the actual tuples  Such an index is called a covering index  SELECT COUNT(*) FROM R WHERE A=5  Use index on A  Count number of <5,RID> entries  No need to look up records referenced by RIDs
  • 9. Multi-column indexes are very useful in data warehousing  We say such an index has a composite key  Example: B-Tree index on (A,B)  Search key is (A,B) combination  Index entries sorted by A value  Entries with same A value are sorted by B value  Called a lexicographic sort  SELECT SUM(B) FROM R WHERE A=5  Our (A,B) index covers this query!  Coverage vs. size trade-off  More attributes in search key → index covers more queries  More attributes in search key → index takes up more disk space
  • 10. 10
  • 11. Advantages  efficient computation of joins involving first index columns (or all columns)  Disadvantages  useful only for specific join combinations ▪ for general usage, it is necessary to store a high number of indices  required space may be significant ▪ joins always involve the fact table 11
  • 12. Base table Index on Region Index on Type Cust Region Type RecIDAsia Europe America RecID Retail Dealer C1 Asia Retail 1 1 0 0 1 1 0 C2 Europe Dealer 2 0 1 0 2 0 1 C3 Asia Dealer 3 1 0 0 3 0 1 C4 America Retail 4 0 0 1 4 1 0 C5 Europe Dealer 5 0 1 0 5 0 1 Query: Get customer with region = „Asia‟ AND type = “Dealer” 12
  • 13. Good if domain cardinality small  Most useful for attributes with low or medium cardinality ▪ Not good for something like LastName 13
  • 14. Index intersection plans with bitmap indexes are fast  Just perform bitwise AND!  Index intersection with B-Trees requires a join
  • 15. Save space for low-cardinality attributes  As compared to a B-Tree or Hash index
  • 16. Bit vectors can be compressed  Compression Pros and Cons  Reduce storage space → reduce number of I/Os required  Need to compress/uncompress → increase CPU work required  Each compression scheme negotiates this trade-off differently  Operate directly on compressed bitmap → improved performance 16
  • 17. Bit matrix which precomputes the join between a dimension and the fact table  one column for each dimension RID  one row for each fact table RID  cell (i,j) is 1 if fact table tuple i joins dimension tuple j, 0 otherwise
  • 18. Indexing dimensions  attributes frequently involved in selection predicates  if domain cardinality is high, then B-tree index  if domain cardinality is low, then bitmap index  Indices for join  indexing only foreign keys in the fact table is rarely appropriate  star join index should be used with caution (column order issue)  bitmapped join index is suggested (if available)  Indices for group by  use materialized views