3. The Data Warehouse Must… Make information easily accessible Present information consistently Be adaptive & resilient to change Be Secure Serve as the foundation for decision making
4. The Business Community Must… Accept and trust the data warehouse if it is to be successful
17. Consists of a series of conformed dimensional data marts
18. Each Data Mart represents a difference business process.
19.
20. In General only a small subset of users will need true ad-hoc query capability
21.
22. Fact Table This is the primary table in a dimensional model The measurements of the dimensional model are stored here Each measurement is tracked at the intersection of several dimensions This is the “grain” of the model Most useful facts are additive
23. Dimension Table Descriptors of each fact Tend to have many attributes but fewer rows Tend to be used as query constraints. The better the attribute descriptions the better the warehouse Typically highly denormalized
24. Star Schema This is a fact table joined to a set of dimensions Relates data in a manner that is familiar to business users. Symmetrical nature allows for many answering many different business questions One dimensional model will exist for each business process. A single data warehouse can have dozens of these models.
26. Store the most atomic data By storing the most detail data possible you can ensure that users can drill to the level they need. Its OK to provide aggregate facts as well to improve performance.
27. Conformed Dimensions By conforming your dimensions you can correlate performance across business processes. Can be very painful (but worth it) if combining data from disparate systems.
28. Always use an artificial key as the primary key Surrogate Key allow you to: Protect you model from changes in the source system Integrate data from multiple sources Add rows that do not exist in the source system. Track changes to dimensions over time. Use Surrogate Keys
29. A key design consideration is what to do when dimension values change. A change may or may not have business meaning. Three ways to handle changes Slowly Changing Dimensions
30. Slowly Changing Dimension Types Type I Simply overwrite the old values. Simplest case, used when you don’t care about changes to data. Type II Create a new dimension row for new values Existing facts still relate to old dimension value Used when you do care about the historical changes. Type III Add a new column to table to store the new value Rarely used.
31. Dates are a fundamental Business concept and nearly every DW has a date dimension The date dimension is the classic role playing dimension. Allows rollups/filters on any date related attribute such as month/quarter/year Date dimension records still use a surrogate to handle unknown dates. Date Dimensions
32. Snowflaking is the process of hooking up lookup tables to a dimension. This is in a way re-normalizing the data. Snowflaking is in general discouraged since it adds complexity to the model. Snowflaking
33. Most relationships are one-to-many. This is the simplest case. Real world scenarios are often more complex. Many to Many between facts & dimensions are represented by creating a bridge table between the facts and the dimension Many to Many Relationships
34. Hierarchies summarize or group the data within the dimension. Typically are de-normalized into the dimension table Hierarchies
35. There are three types of fact tables Transaction Tracks each transaction as it occurs. Periodic Snapshot Captures cumulative performance over a specific period of time Often used for periodic rollups Accumulating Snapshot Updated over time Types of Fact Tables
37. System Sizing Considerations Storage The fact table volumes will drive storage requirements. Don’t forget to account for staging storage needs. Performance Understand the usage complexity of your community. Predefined reports & queries can be cachedre-aggregated. The more ad-hoc analysis that is used will impact the hardware requirements. Must understand how many simultaneous user the DW will be asked to support. Memory All the BI components Love RAM. Use 64-bit hardware to address more memory space.
38. System Configuration Considerations All-In-One Configuration All components hosted on a single server Appropriate for small deployment or POC’s
40. System Configuration Considerations Scale Out Deployment Both Report Services & Analysis Services have their own servers Appropriate for larger deployments Can be scaled massively from here
42. Becoming overly focused on technology rather then business requirements & goals Failure to embrace an influential management visionary as the business sponsor Tackle a huge multiyear project rather then smaller iterative development efforts. Paying more attention to back-end issues and ease of development then front-end performance and simplicity. Common Pitfalls
43. Making the query able data overly complex Populating model without properly conforming your dimensions Loading only summary data into your models Presuming that the business requirements are static Neglect to understand that the data warehouse success is tied to user acceptance. Common Pitfalls Continued….
Editor's Notes
These may seem simple but these principles are the foundation for the deign methodology.For business users to be able to navigate the system the tools and most importantly the data must simple and easy to use.Consistency requires a thorough ETL process to cleanse & conform the data.Change is inevitable. We need a design that is resilient to change.Security …Must have the right data in order to support decisions this means up front analysis focuses on the business need
Ultimately if any system doesn’t satisfy some business need, it is of no value and is a failure.
Go through Each Component.
Discuss Each bullet point
Discuss Each bullet pointExamples of Cleansing activities:MisspellingsFormattingCapitalization ConformanceEmphasize that users are forbidden from executing queries on these data.3NF Data is to complex for most users.3NF is not optimized for query performance.
Discuss Each bullet pointDiscuss what it means to be a conformed data martPoint out that dimensional modeling will be discussion in detail later on
Discuss Each bullet pointSpecify the examples in this diagram and what role they play
Discuss why additive facts are most usefulDescribe Semi additive factsNote that the primary key is the combo of all the foreign keys. A ROWID add little value and the index probably would be of any us either.
Attribute description should avoid cryptic abbreviationsMinimize the use of codesShow the denormalized nature of one of these dimensions.Denormalized dimensions provide the following benefits.Simplified structure for non technical usersBetter query performanceSince dimensions typically have a relatively few number of rows the impact of reduced storage efficiency is minimal
Walk through SCD2 example using dimensional model above
Walk through the date dimension in the POC example
Point out that the POCGLTransaction fact table is a transaction fact tableAnd the budget table is a periodic snapshot.