Designing Aggregates

Ing. Julio Ernesto Carreño Vargas
Designing Aggregates
   Once you have chosen dimensional aggregates, they must
    be designed and documented. This is the point of greatest
    risk for aggregate implementation.




       2
Definig The Base Schema




3
The Base Schema
   Declaration of grain is an essential part of schema design.
    Proper definition of grain not only enables the future
    identification of aggregates, it is crucial to the success of
    the base schema itself.




        4
Rollup dimensions
   Conforming rollup dimensions and their natural keys




       5
Rollup dimensions
   Rollup dimensions should be sourced from the base
    dimensions, and their attributes must follow the same
    rules for slow change processing.




       6
Hierarchies
   Documenting dimensional hierarchies
    may be important for business
    intelligence software and database
    features such as materialized views and
    materialized query tables.
   The hierarchies identify potential
    aggregation points and can aid in
    estimating degree of summarization.




        7
Housekeeping Columns
   they are present for a purely technical reason




        8
Design Principles for the Aggregate
                               Schema




9
A Separate Star for Each Aggregation
   Dimensional aggregates should be stored in separate
    tables for each aggregation.




       10
A Separate Star for Each Aggregation
   Do not store different levels of aggregation in the same
    schema. The schema will be capable of providing wrong
    results.




       11
Aggregate facts
   Aggregate facts should be stored in separate tables for
    each level of aggregation. These may be separate
    aggregate fact tables or separate prejoined aggregate
    tables




       12
Naming Conventions
   Facts and dimensional attributes should receive the same
    name in anaggregate schema as they do in the base
    schema.
   The name of an aggregate dimension table should
    describe the contents of its rows.
   The names of aggregate fact tables are always
    problematic. The best you can do is establish a convention
    and stick to it.




       13
Aggregate Dimension Design
   Attributes of the aggregate dimension must be identical
    to those in the base dimension in name and data type.
    Slow change processing rules must be identical. The
    natural key of an aggregate dimension will be different
    from the base dimension.
   Source aggregate dimensions from the base dimension,
    rather than the original source system. This eliminates
    redundant processing, and ensures uniform presentation
    of data values.




       14
Aggregate Dimension Design
   Aggregate dimension tables are often shared by multiple
    aggregates, and sometimes used by base fact tables. These
    shared dimension tables do not need to be built
    redundantly; the various fact tables can use the same
    dimension table. If the shared table is to be instantiated
    more than once, build it a single time and then replicate
    it.
       The documentation for a shared dimension must enumerate all
        dependent fact tables, whether part of the base schema or
        aggregates. In some cases, frequent updates to a dimension may
        require updates to fact tables outside their normal load
        windows.

         15
Aggregate Fact Table Design
   Aggregate Facts: Names and Data Types
       The aggregate fact should have the same business definition and
        column name as the base fact
       Unlike dimensional attributes, the aggregate fact may have a different
        data type than its counterpart in the base schema
   No New Facts, Including Counts
       Counts cannot be accurately performed against aggregate schemas,
        even if all attributes are the same. All counts must be performed
        against the base schema.
       As a general rule of thumb, the only count to be added to an
        aggregate should show the number of base rows summarized. If this
        fact is added to the aggregate, it should also appear in the base fact
        table with a constant value of 1. Counts of any other attribute should
        be directed to the base schema only.


         16
Aggregate Fact Table Design
   Audit Dimension:
       The audit record associated with a row in the aggregate fact
        table does not summarize the audit data associated with the
        base fact table. It describes the process by which the aggregate
        row was inserted or updated.
   Sourcing Aggregate Fact Tables
       Facts will be sourced from the base fact table and aggregated
        by the load process as appropriate.




         17
Documenting the Aggregate Schema




18
Documenting the Aggregate Schema
   Identify Schema Families
   Identify Dimensional Conformance




       19
Documenting the Aggregate Schema
   Documenting Aggregate Dimension Tables
   Documenting Aggregate Fact Tables




       20
Bibliografía
   Mastering Data Warehouse Aggregates.Solutions for Star
    Schema Performance. Christopher Adamson.




       21

Agreggates ii

  • 1.
    Designing Aggregates Ing. JulioErnesto Carreño Vargas
  • 2.
    Designing Aggregates  Once you have chosen dimensional aggregates, they must be designed and documented. This is the point of greatest risk for aggregate implementation. 2
  • 3.
  • 4.
    The Base Schema  Declaration of grain is an essential part of schema design. Proper definition of grain not only enables the future identification of aggregates, it is crucial to the success of the base schema itself. 4
  • 5.
    Rollup dimensions  Conforming rollup dimensions and their natural keys 5
  • 6.
    Rollup dimensions  Rollup dimensions should be sourced from the base dimensions, and their attributes must follow the same rules for slow change processing. 6
  • 7.
    Hierarchies  Documenting dimensional hierarchies may be important for business intelligence software and database features such as materialized views and materialized query tables.  The hierarchies identify potential aggregation points and can aid in estimating degree of summarization. 7
  • 8.
    Housekeeping Columns  they are present for a purely technical reason 8
  • 9.
    Design Principles forthe Aggregate Schema 9
  • 10.
    A Separate Starfor Each Aggregation  Dimensional aggregates should be stored in separate tables for each aggregation. 10
  • 11.
    A Separate Starfor Each Aggregation  Do not store different levels of aggregation in the same schema. The schema will be capable of providing wrong results. 11
  • 12.
    Aggregate facts  Aggregate facts should be stored in separate tables for each level of aggregation. These may be separate aggregate fact tables or separate prejoined aggregate tables 12
  • 13.
    Naming Conventions  Facts and dimensional attributes should receive the same name in anaggregate schema as they do in the base schema.  The name of an aggregate dimension table should describe the contents of its rows.  The names of aggregate fact tables are always problematic. The best you can do is establish a convention and stick to it. 13
  • 14.
    Aggregate Dimension Design  Attributes of the aggregate dimension must be identical to those in the base dimension in name and data type. Slow change processing rules must be identical. The natural key of an aggregate dimension will be different from the base dimension.  Source aggregate dimensions from the base dimension, rather than the original source system. This eliminates redundant processing, and ensures uniform presentation of data values. 14
  • 15.
    Aggregate Dimension Design  Aggregate dimension tables are often shared by multiple aggregates, and sometimes used by base fact tables. These shared dimension tables do not need to be built redundantly; the various fact tables can use the same dimension table. If the shared table is to be instantiated more than once, build it a single time and then replicate it.  The documentation for a shared dimension must enumerate all dependent fact tables, whether part of the base schema or aggregates. In some cases, frequent updates to a dimension may require updates to fact tables outside their normal load windows. 15
  • 16.
    Aggregate Fact TableDesign  Aggregate Facts: Names and Data Types  The aggregate fact should have the same business definition and column name as the base fact  Unlike dimensional attributes, the aggregate fact may have a different data type than its counterpart in the base schema  No New Facts, Including Counts  Counts cannot be accurately performed against aggregate schemas, even if all attributes are the same. All counts must be performed against the base schema.  As a general rule of thumb, the only count to be added to an aggregate should show the number of base rows summarized. If this fact is added to the aggregate, it should also appear in the base fact table with a constant value of 1. Counts of any other attribute should be directed to the base schema only. 16
  • 17.
    Aggregate Fact TableDesign  Audit Dimension:  The audit record associated with a row in the aggregate fact table does not summarize the audit data associated with the base fact table. It describes the process by which the aggregate row was inserted or updated.  Sourcing Aggregate Fact Tables  Facts will be sourced from the base fact table and aggregated by the load process as appropriate. 17
  • 18.
  • 19.
    Documenting the AggregateSchema  Identify Schema Families  Identify Dimensional Conformance 19
  • 20.
    Documenting the AggregateSchema  Documenting Aggregate Dimension Tables  Documenting Aggregate Fact Tables 20
  • 21.
    Bibliografía  Mastering Data Warehouse Aggregates.Solutions for Star Schema Performance. Christopher Adamson. 21