March 2010 ACS-4904 Ron McFadyen 1Aggregate managementReferences:Lawrence Corr• Aggregate improvementwww.intelligententerprise.com/011004/415warehouse1_1.jhtml• Lost, shrunken, and collapsedwww.intelligententerprise.com/020101/501warehouse1_1.jhtmlRalph Kimball• Aggregate navigation with (almost) no metadatawww.dbmsmag.com/9608d54.html
March 2010 ACS-4904 Ron McFadyen 2Aggregate NavigationFrom “20 criteria for dimensional friendly systems”#4. Open Aggregate Navigation.The system uses physically stored aggregates as a way toenhance performance of common queries. Theseaggregates, like indexes, are chosen silently by thedatabase if they are physically present. End users andapplication developers do not need to know whataggregates are available at any point in time, andapplications are not required to explicitly code the nameof an aggregate. All query processes accessing the data,even those from different application vendors, realize thefull benefit of aggregate navigation.
March 2010 ACS-4904 Ron McFadyen 3Aggregation• Created for performance reasons• Using one base star schema, many aggregates can be defined• Some dimensions could be lost, some shrunken or collapsed –see Corr’s articlesCustomer DateProductStoreTransactionPriceCostTaxQtyConsider theTransaction-level schema:
March 2010 ACS-4904 Ron McFadyen 4AggregationCustomer DateProductStoreTransactionPriceCostTaxQtyMonthProductStorePriceCostTaxQtyTransaction-level schema Aggregate schema with:•lost dimensions (transaction, customer)•shrunken dimension (month)•metrics are accumulated over thetransactions, customers, month
March 2010 ACS-4904 Ron McFadyen 5Aggregation• Suppose you must create the schema:• What are the steps to do so?– Product and Store already exist, but the others must becreatedMonthProductStorePriceCostTaxQty
March 2010 ACS-4904 Ron McFadyen 6AggregationDesign Goal 1• Aggregates must be stored in their own fact tables; eachdistinct aggregation level must occupy its own unique facttable
March 2010 ACS-4904 Ron McFadyen 7AggregationDesign Goal 2• The dimension tables attached to the aggregate fact tablesmust be shrunken versions of the dimension tablesassociated with the base fact tableProductProductSKSKUDescriptionBrandCategoryDepartmentCategorySK?CategoryDepartmentCategory is ashrunken versionof ProductAttributes namesfrom ProductWhat values doesthe PK of Categoryassume?What name shouldthe PK of Categoryhave?
March 2010 ACS-4904 Ron McFadyen 8AggregationDesign Goal 3• The base atomic fact table and all of its related aggregatefact tables must be associated together as a “family ofschemas” so that the aggregate navigator knows whichtables are related to one another.– Kimball’s paper presents an algorithm/technique formapping an SQL statement from a base schema to anaggregate schema
March 2010 ACS-4904 Ron McFadyen 9AggregationDesign Goal 4• Force all SQL created by any end-user data access tool orapplication to refer exclusively to the base fact table andits associated full-size dimension tables.
March 2010 ACS-4904 Ron McFadyen 10Kimball’s Aggregate Navigation Algorithm1. For any given SQL statement presented to the DBMS, find the smallestfact table that has not yet been examined in the family of schemasreferenced by the query. "Smallest" in this case means the leastnumber of rows. Finding the smallest unexamined schema is a simplelookup in the aggregate navigator metadata table. Choose the smallestschema and proceed to step 2.2. Compare the table fields in the SQL statement to the table fields in theparticular Fact and Dimension tables being examined.This is a series of lookups in the DBMS system catalog.If all of the fields in the SQL statement can be found in the Factand Dimension tables being examined, alter the originalSQL by simply substituting destination table names fororiginal table names. No field names need to change.If any field in the SQL statement cannot be found in the currentFact and Dimension tables, then go back to step 1 and findthe next larger Fact table.3. Run the altered SQL. It is guaranteed to return the correct answerbecause all of the fields in the SQL statement are present in thechosen schema.
March 2010 ACS-4904 Ron McFadyen 11Aggregate Navigationwww.intelligententerprise.com/000929/feat1.jhtml“Other data mart capabilities worth mentioning areaggregation and aggregate navigation. The nature ofmultidimensional analysis dictates the extensive use ofaggregation.…Both UDB and O8i provide a database-resident means fornavigating user queries to the optimum aggregated tables”
March 2010 ACS-4904 Ron McFadyen 12Materialized ViewsAn MV is a view where the result set is pre-computed andstored in the databasePeriodically an MV must be re-computed to account for updatesto its source tablesMVs represent a performance enhancement to a DBMS as theresult set is immediately available when the view is referencedin a SQL statementMVs are used in some cases to provide summary or aggregatesfor data warehousing transparently to the end-userQuery optimization is a DBMS feature that may re-write anSQL statement supplied by a user.
March 2010 ACS-4904 Ron McFadyen 13Materialized ViewsOracle example:CREATE MATERIALIZED VIEW hr.mview_employees ASSELECT employees.employee_id, employees.emailFROM employeesUNION ALLSELECT new_employees.employee_id, new_employees.emailFROM new_employees;SQL Server has “indexed views” … seehttp://www.databasejournal.com/features/mssql/article.php/2119721