Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Agreggates i


Published on

Agregados 1

  • Be the first to comment

  • Be the first to like this

Agreggates i

  1. 1. Data Warehouse Aggregates Ing. Julio Ernesto Carreño Vargas
  2. 2. Using the Star Schema The queries against a star schema follow a consistent pattern. One or more facts are typically requested, along with the dimensional attributes that provide the desired context. The facts are summarized as appropriate, based on the dimensions. 2
  3. 3. Aggregate tables Aggregate tables improve data warehouse performance by reducing the number of rows the RDBMS must access when responding to a query Base schema Aggregate schema 3
  4. 4. aggregate dimension table 4
  5. 5. aggregate characteristic The more highly summarized an aggregate table is, the fewer queries it will be able to accelerate.  This means that choosing aggregates involves making careful tradeoffs between the performance gain offered and the number of queries that will benefit. 5
  6. 6. The Aggregate Navigator To receive the performance benefit offered by an aggregate schema, a query must be written to use the aggregate. aggregate navigator: A component of the data warehouse infrastructure, the aggregate navigator assumes the task of rewriting user queries to utilize aggregate tables. 6
  7. 7. Principles of Aggregation An aggregate schema must always provide exactly the same results as the base schema. The attributes of each aggregate table must be a subset of those from a base schema table.  The only exception to this rule is the surrogate key for an aggregate dimension table. 7
  8. 8. summarization techniques Aggregate Tables Pre-Joined Aggregates Derived Tables 8
  9. 9. Pre-Joined Aggregates a pre-joined aggregate summarizes a fact across a set of dimension values. But unlike the aggregate star schemas the pre-joined aggregate places the results in a single table.  By doing so, the pre-joined aggregate eliminates the need for the RDBMS to perform a join operation at query time. 9
  10. 10. Derived Tables alter the structure of the tables summarized or change the scope of their content. Types:  the merged fact table: combines facts from more than one fact table at a common grain  the pivoted fact table: transforms a set of metrics in a single row into multiple rows with a single metric, or vice versa.  the sliced fact table: contains a subset of the records of the base fact table, usually in coordination with a specific dimension attribute. In all three cases, the derived fact tables are not expected to serve as invisible stand-ins for the base schema. 10
  11. 11. Tables with New Facts Semi-additive facts may not be added together across a particular dimension; non-additive facts are never added together. In these situations, you may choose to aggregate by means other than summation. 11
  12. 12. Choosing Aggregates One of the most vexing tasks in deploying dimensional aggregates is choosing which aggregates to design and deploy.  Your aim is to strike the correct balance between the performance gain provided by aggregate schemas and their cost in terms of resource requirements. 12
  13. 13. Choosing Aggregates What Is a Potential Aggregate? Identifying Potentially Useful Aggregates Assessing the Value of Potential Aggregates 13
  14. 14. What Is a Potential Aggregate? Aggregate Fact Tables: A Question of Grain Aggregate Dimensions Must Conform Pre-Joined Aggregates Have Grain Too Enumerating Potential Aggregates 14
  15. 15. What Is a Potential Aggregate? Express potential aggregates as fact table grain statements  Orders by day, salesperson and product  Orders by day, customer, and product  Orders by month, product, and salesperson 15
  16. 16. Enumerating Potential Aggregates 6*4*4*4*2*2 = 1563  1534 posibles agregados 16
  17. 17. Identifying Potentially Useful Aggregates Drawing on Initial Design  Design Decisions  Listening to Users Where Subject Areas Meet  The Conformance Bus  Aggregates for Drilling Across Query Patterns of an Existing System  Analyzing Reports for Potential Aggregates  Choosing Which Reports to Analyze 17
  18. 18. Identifying Potentially Useful Aggregates Identify and document potential aggregates during schema design, even if initial implementation will not include aggregates. This information will be useful in the future. Any decision to set the grain of a fact table at a finer level reveals a potential aggregate. Decisions about where to place groups of dimensional attributes reveal potential levels of aggregation. Discussion of hierarchies or drill paths point to potential aggregates User work products reveal potential aggregates. These may include reports from operational systems, manually compiled briefings, or spreadsheets. They will also be revealed by manual processes and requirements not currently met. 18
  19. 19. Aggregates for Drilling Across The process of combining information from multiple fact tables is called drilling across Consult the conformance bus to identify aggregates that will be used in drill-across reports. The lowest common dimensionality between two fact tables often suggests one or more aggregates. 19
  20. 20. Analyzing Reports for Potential Aggregates The detail rows require order facts by product and month. The summary rows require order facts by category and month. The grand total requires order facts by month. 20
  21. 21. Drilling Drill paths suggest aggregates 21
  22. 22. Assessing the Value of Potential Aggregates After identifying a pool of potential aggregates, the next step is to sort through them and determine which ones to build. 22
  23. 23. Assessing the Value of Potential Aggregates Number of Aggregates  Presence of an Aggregate Navigator  Space Consumed by Aggregate Tables How Many Rows Are Summarized  Examining the Number of Rows Summarized  The Cardinality Trap and Sparsity Who Will Benefit from the Aggregate 23
  24. 24. Examining the Number of RowsSummarized A good starting rule of thumb is to identify aggregate fact tables where each row summarizes an average of 20 rows. The savings afforded by aggregates can be lopsided, favoring a particular attribute value. Remember that, like a base fact table, a dimensional aggregate can be aggregated during a query. Aggregates may be competing with other aggregates to offer performance gains. 24
  25. 25. The Cardinality Trap and Sparsity Cardinality:The number of distinct values taken on by a given attribute sparse:not all combinations of keys are present. Don’t assume aggregate fact tables will exhibit the same sparsity as the tables they summarize.  The higher the degree of summarization, the more dense the aggregate fact table will be. The best way to get an idea of the relative size of the aggregate is to count the number of rows.  As before, count the distinct combination of keys and/or summarized dimension attributes. 25
  26. 26. Who Will Benefit from the Aggregate The first aggregates you add to your implementation are those that offer benefits across the widest number of user requirements. Aggregates that fall in the 20:1 range of savings are compared with one another to identify those that support the most common user requirements. Start by selecting aggregates that provide solid performance boosts for a wide number of common queries. To this, add more powerful (but more narrowly used) aggregates as space permits. Use the relative importance of one aggregate over another in a tiebreaker situation. 26
  27. 27. Bibliografía Mastering Data Warehouse Aggregates.Solutions for Star Schema Performance. Christopher Adamson. 27