2. Objectives:
• 9.1 Introduction
• 9.2 Concepts and Activities
• 9.2.1 Data Warehousing- A Brief Retrospective and Historical Tour
• 9.2.1.1 Classic Characteristics of a Data Warehouse- Inmon Version
• 9.2.1.2 Classic characteristics of a Data Warehouse – Kimball Version
• 9.2.2 DW/ BI Architecture and Components
• 9.2.2.1 Inmon’s Corporate Information Factory
• 9.2.2.2 Imball’s Business Development Lifecycle and DW Chess Pieces
• 9.2.3 Tactical, Strategic and Operational BI
• 9.2.4 Types of Data Warehousing
• 9.2.4.1 Active Data Warehousing
• 9.2.4.2 Multi-dimensional Analysis – OLAP
• 9.2.4.3 ROLAP, MOLAP, HOLAP and DOLAP
• 9.2.5 Dimensional Data Modeling Concepts and Terminology
• 9.2.5.1 Fact Tables
• 9.2.5.2 Dimension Tables (9.2.5.2.1 Surrogate Keys, 9.2.5.2.2 Natural Keys)
• 9.2.5.3 Dimension Attribute Types (9.2.5.3.1 Type 1 Overwrite, 9.2.5.3.2 Type 2 New Row, 9.2.5.3.3 Type 3 New Column, 9.2.5.3.4 Type
4 New Table , 9.2.5.3.5 Type 6 1+2+3)
• 9.2.5.4 Star Schema
• 9.2.5.5 Snowflaking
• 9.2.5.6 Grain
• 9.2.5.7 Conformed Dimensions
• 9.2.5.8 Conformed Facts
• 9.2.5.9 DW-Bus Architecture and Bus Matrix
3. Objectives:
• 9.3 DW-BIM Activities
• 9.3.1 Understand Business Intelligence Information Needs
• 9.3.2 Define and Maintain the DW-BI Architecture
• 9.3.3 Implement Data Warehouses and Data Marts
• 9.3.4 Implement Business Intelligence Tools and User Interfaces
• 9.3.4.1 Query and Reporting Tools
• 9.3.4.2 On Line Analytical Processing (OLAP) Tools
• 9.3.4.3 Analytic Applications
• 9.3.4.4 Implementing Management Dashboards and Scorecards
• 9.3.4.5 Performance Management Tools
• 9.3.4.6 Predictive Analytics and Data Mining Tools
• 9.3.4.7 Advanced Visualization and Discovery Tools
• 9.3.5 Process Data for Business Intelligence
• 9.3.5.1 Staging Areas
• 9.3.5.2 Mapping Sources and Targets
• 9.3.5.3 Data Cleansing and Transformations ( Data Acquisitions )
• 9.3.6 Monitor and Tune Data Warehousing Processes
• 9.3.7 Monitor and Tune BI Activity and Performance
4. 9 Data Warehousing and Business Intelligence
Management
• Data warehouse and business intelligence Management
is the seventh Data Management Function in the Data
Management framework in Chapter 1.
• sixth data management function that interacts with and
influenced by Data Governance function.
• In this Chapter, we will define the Data warehousing and
business intelligence Management Function and Explains
the Concepts and Activities involved.
5. 9.1 Introduction:
• Data Warehouse (DW) is a combination of two primary components:
• Integrated decision support database
• Related software programs used to collect, cleanse, transform and store data from variety sources
• Enterprise Data warehouse (EDW) is centralized data warehouse designed to
service the BI needs of the entire organization.
• Data warehousing:
• term used to describe the operational extract, cleansing, transformation, and load processes –
and associated control process- that maintain the data contained within DW.
• Its process focuses on enabling an integrated and historical business context on operational data
by enforcing business rules and maintaining appropriate business data relationships.
• It is technology solution supporting BI
6. 9.1 Introduction:
• Business Intelligence (BI) is a set of business capabilities. Means many things,
including:
1. Query, analysis, and reporting activity by knowledge workers to monitor and understand the
financial operation health of, and make business decisions about, the enterprise.
2. Query, analysis, and reporting processes and procedures.
3. A synonym for the BI environment.
4. The market segment for BI software tools.
5. Strategic and operational analytics and reporting on corporate operational data to support
business decisions, risk management, and compliance.
6. A synonym for Decision support System (DSS).
• Data Warehousing and Business Intelligence Management (DW-BIM) is :
• The collection, integration and presentation of data to knowledge workers for the purpose of
business analysis and decision-making.
• Composed of activities supporting all phases of the decision support life cycle providing context,
moves and transforms data from sources to a common target data store, and then provide
knowledge workers various means of access, manipulation, and reporting of the integrated data.
7. 9.1 Introduction:
• Objectives for DW-BIM include:
• Providing integrated storage of required current and historical datal, organized by subject areas.
• Ensuring credible, quality data for all appropriate access capabilities.
• Ensuring a stable, high-performance, reliable environment for data acquisition, management, and
access.
• Providing an easy-to-use, flexible, and comprehensive data access environment.
• Delivering both content and access to the content in increments appropriate to organization’s
objectives.
• Leveraging, rather than duplicating, relevant data management component functions such as
Reference and Master Data Management, Data Governance (DG), Data Quality (DQ), and Meta-
data (MD).
• Providing an enterprise focal point for data delivery in support of the decisions, policies,
procedures, definitions, and standards that arise from DG.
• Defining, building, and supporting all data stores, processes, infrastructure, and tools that contain
integrated, post-transactional, and refined data used for information viewing, analysis, or data
request fulfillment.
• Integrating newly discovered data as a result of BI processes into the DW for further analytics and
BI use.
8.
9. 9.2 Concepts and Activities
• This section provide the purposes:
• The history of DW-BIM and overview of typical DW-BIM
components.
• Explanation of some general of DW-BIM terminology follows.
• Brief introduction and overviews of dimensional modeling and
its terminology leads into the activities identified in Figure 9.1
10. 9.2.1 Data Warehousing – a brief Retrospective
and Historical Tour
• Tow name have a significant contributions of advance and
shape of the practice of data warehousing:
• Bill Inmon
• Ralph Kimball
• In this section, a brief introduction of their major
contributions along with some comparisons and contrasts
of their approaches.
11. Historical Tour
9.2.1.1 Inmon Version – classic characteristics of a Data Warehouse
• In the early 1990s, bill Inmon defined DW as “Subject-oriented, integrated, time
variant, and non-volatile collection of summary and detailed historical data
used to support the strategic decision-making processes for the corporation”.
• These key characteristics give a clear distinction of nature of DW compared to
typical operational system.
• Subject Oriented: design the DW to meet the data needs of the corporation.
• Integrated: concentrate about the data stored in DW. “key structure, encoding,
decoding of structure, definitions of the data, and so on”
• Time Variant: refers to how every record in DW is accurate in moment in time.
• Non-Volatile: data warehouse to the fact that updates to records during normal
processing do not occur, and if the update occur at all, they occur on an
exception basis.
• Summarized and Detail Data: Data in WH must main detailed data.
• Historical: Data WH containing a vast amount of historical data ( 5 to 10 years
worth of data)
12. Historical Tour
9.2.1.2 Kimball Version – classic characteristics of a Data Warehouse
• Ralph Kimball took a different approach, defining Data WH simply:
• A copy of transaction data specifically structured for query and analysis.
• Has different structure than operational system (the dimensional data model)
• Data WHs always contain more than just transactional data.
• Reference data is necessary to give context to the transactions.
• Dimensional data models are relational data models.
• They just do not consistently comply with normalization rules.
• Reflect business processes more simply than normalized models.
13. 9.2.2 DW / BI Architecture and Components
• Introduces the major components fount in most DW/BI
environments, through overview from both
• Inmon and Kimball perspectives
• Inmon’s approach:
• The Corporate Information Factory
• Kimball’s approach:
• The “DW Chess Pieces”
• Both views and their components are described and
contrasted.
14. DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
• CIF (Corporate Information Factory) is Corporate Data Architecture for DW-BIM:
• Identified and Wrote by “Claudia Imhoff and Ryan Souse”
• Figure 9.2: Show the components of CIF
• Table 9.1 : Lists and describes the basic components of the Corporate
Information Factory view of DW/BI architecture.
• Table 9.2 : Provide context for the reporting context for the reporting scope and
purpose of each of the Corporate Information Factory components and some
explanatory notes.
• Table 9.3 Provide a compare-and-contrast from a business and application
perspective between the four major components of the CIF such as between the
applications, ODS, DW and Data Marts.
• Table 9.4 Provide a compare-and-contrast from a data perspective between the
four major components of the CIF such as between the applications, ODS, DW
and Data Marts.
16. DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Label – Name Description
Raw Detailed data Operational/ Transactional Application data of the enterprise.
Provide the source data to be integrated into ODS and DW.
Can also be in DB or other storage or file format.
Integration and
Transformation
This Layer of the architecture is where the un-integrated data
from various application sources stores is combined/integrated
and transform into the corporate representation in the DW
Reference Data Was A precursor to what is currently referred to as MDM.
The purpose was to allow common storage and access for
important and frequently used common data.
Focus and shared understanding on data upstream of the DW
simplifies the integration task in the DW
Historical Reference Data When current value reference data is necessary for transactional
applications, at the same it is critical to have accurate integration
and presentation of historical data
Table 9.1 Corporate Information Factory Component Descriptions
17. DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Label – Name Description
Operational Data Store
(ODS)
The main distinguishing data characteristics of ODS
compared to DW include current-valued vs DW historical
data and volatile vs. DW non-volatile data.
Operational Data Mart
(Oper-Mart)
Data mart focuses on tactical decision support.
Distinguishing characteristics include current-valued vs DW
historical data, tactical vs. DW strategic analysis, and sourcing
of data from ODS rather than just the DW.
Data Warehouse (DW) Large, comprehensive corporate resource.
Primary purpose is to provide a single integration point for
corporate data in order to serve management decision, and
strategic analysis and planning. The flow In & Out into DW in
one direction only. Data that needs correction is rejected,
corrected at its source, and re-fed through the system.
Data Marts (DM) Its purpose is to provide for DSS/information processing and
access that is customized and tailored for the needs of a
particular department or common analytic need.
Table 9.1 Corporate Information Factory Component Descriptions
18. DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Component Reporting Scope / Purpose Notes
Applications Isolated Operational Report Limited to data within one application
instance
ODS Integrated Operational
Reports
Reports requiring data from multiple source
systems.
DW Exploratory Analysis The complete set of corporate data allows for
discovery of new relationships & information.
Many BI data mining tools work with flat-file
extracts from the DW, which can also offload
the processing burden from the DW,
Oper-Mart Tactical Analytics Analytic reporting based on current-values
with a tactical focus.
Data Marts Analytics – classical
management decision
support, and strategic
analytics
“departmental analysis”, such as political and
funding expediency. Later work expanded
concepts to common-analytic needs crossing
departmental boundaries.
Table 9.2 CIF Reporting Scope and Purpose
19. • About the Table 9.3.
• Note the following general observations about the contrast between the
information on the right-hand side for DW and Data Marts, compared to the
left-hand side for applications, in particular:
• The purpose shifts from execution to analysis
• End users are typically decision makers instead of oders (front line workers)
• System usage is more ad hoc than the fixed operations of the transactional operations.
• Response time requirements are relaxed because strategic decisions allow more time than
daily operations.
• Much more data is involved in each operation / query or process.
DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
20. • DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Application
Data
ODS DW Data Mart
Business
Purpose
Specific
Business
Function
Corp
Integrated
Operational
Needs
Central Data
Repository
Integration
and Reuse
Analysis:
Departmental (Inmon)
Business Process
(Kimball)
Business Measures
(Wells)
System
Orientation
Operations
(Execution)
Operations
(Reports)
Infrastructure Informational Analytic
(DSS)
Target Users End Users:
Clerical (Daily
Operations)
Line
Managers:
Tactical
Decision
Makers
Systems: Data
Marts, Data
Mining
Executives:
Performance/Metrics
Sr. & Mid Mgrs
Knowledge Workers
How system is
used
Fixed Ops Operational
Reporting
Stage, Store,
Feed
Ad-Hoc
Table 9.3 CIF Components – Business / Application View
21. • DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Application
Data
ODS DW Data Mart
System
Availability
Fixed Ops Medium Varies Relaxed
Typical
Response Time
Seconds Seconds to
Minutes
Longer (Batch) Seconds to Hours
# Records in an
Op.
Limited Small to Med. Large Large
Amount of Data
Per Process
Small Medium Large Large
SDLC Classic Classic Classic Modified
Table 9.3 CIF Components – Business / Application View
22. • DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
• Table 9.4:
• considers a compare-and-contrast from data perspective between the four
components “application, ODS, DW, and Data Marts”.
• The majority of DW processes are for higher latency and, often over-night
batch processing.
23. • DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Application ODS DW Data Mart
Orientation Functional Subject Subject Limited Subject
View Application Corporate (Ops) Corporate (Historical) Focused Analysis
Integration Not Integrated-
Application
Specific
Integrated
Corporate Data
Integrated Corporate
Data
Integrated Subset
Volatility CRUD Volatile Non-Volatile Non-Volatile
Time Current Only Current Value Time Variant Time Variant
Detail Level Detail Only Detail Only Detail+Summary Detail+ Summary
Amount of
History*
30 to 180 Days 30 to 180 Days 5-10 years 1-5 years
Latency* Real Time to
NRT
NRT > 24 hours 1 day to 1 month
Normalized? Yes Yes Yes No
Modeling Relational Relational Relational Dimensional
Table 9.3 CIF Components – Business / Application View
24. • In Table 9.4, comparisons between DW and Data Marts, and application,
in particular:
• Data is subject vs. functional orientation
• Integrated data vs. stove-piped or siloed
• Data is time-variant vs. current-valued only.
• Higher latency in the data.
• Significantly more history is available.
• DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
25. • Called “Business Dimensional Lifecycle” approach but referred to as “Kimball
approach”.
• His Design Tip #49 “We chose the Business Dimensional Lifecycle label instead,
because it reinforced our core tents about successful data warehousing based on
our collective experiences since the mid-1980s”
• The basis of this approach is three tenets' “principles”:
• Business Focus: both immediate business requirements and more long-term broad data
integration and consistency.
• Atomic Dimensional Data Models: both for ease of business user understanding and query
performance.
• Iterative Evolution Management: Manage changes and enhancements to the DW as individual,
finite projects.
• Using conformed dimensions and facts design. “business rules parts of DW
become re-usable components that are already integrated”.
• Figure 9.3 is representation of “Data Warehouse Chess Pieces”, more inclusive
and expansive than that of Inmon.
• Uses the term Data Warehouse to encompass everything in both the data staging and data
presentation areas.
• DW/BI Architecture and Components
9.2.2.2 Kimball’s Business Development Lifecycle and DW chess Pieces
26. • DW/BI Architecture and Components
9.2.2.2 Kimball’s Business Development Lifecycle and DW chess Pieces
27. Name Description
Operational Source
Systems
Operational/Transactional Applications of the enterprise.
Integrated into ODS and DW components
Equivalent to the Application system in the CIF diagram
Data Staging Area Refer to “Kitchen”, also “area behind the scenes”. Smaller than Inmon’s
diagram.
“eclectic set of processes needed to integrate and transform data for
presentation”
Similar to “integration and transformation” in CIF
Data Presentation Area Similar to Data Marts in CIF.
Only Different with dimensions unifying the multiple data marts “DW
Bus”
Data Access Tools Focus on the needs and requirements for the end customers /
consumers of the data.
These needs translate into selection of criteria from a broad range of
data access tools to the right tools for the right task.
In CIF model, the access tools are outside DW architecture.
Table 9.5 Kimball’s DW Chess Pieces – Component Descriptions
• DW/BI Architecture and Components
9.2.2.2 Kimball’s Business Development Lifecycle and DW chess Pieces
28. • Tactical BI:
• BI tools to analysis trends by comparing metric to the same metric from a
previous month or year, etc. “used to support short-term business decisions”
• Strategic BI:
• Provide metrics to executives, with formal method of business performance
management, to help them determine if the corporation is on target for
meeting its goals. “used to support long-term corporate goals and
objectives”
• Operational BI:
• Provide BI to the front lines of business, used to manage and optimize
business operations. Coupling of BI applications with operational functions
and processes, with a requirement for very low tolerance for latency. ( Near
real-time data capture and data delivery ).
• Service-oriented architecture (SOA) necessary to support in this part.
9.2.3 Tactical, Strategic and Operational BI
29. • Three major types of Data Warehousing are described:
• 9.2.4.1 Active Data Warehousing
• 9.2.4.2 Multi-dimensional Analysis – OLAP
• 9.2.4.2 ROLAP, MOLAP, HOLAP and DOLAP
9.2.4 Types of Data Warehousing
30. • DW serve tactical and strategic existed for many years BI. Non-
volatile data.
• New architectural approaches are emerging to deal with inclusion of
volatile data.
• For example, of these applications: (ABM) automated banking machine data
provisioning. When making a banking transaction, historical balances and
new balances resulting from immediate banking actions, need to be
presented to the banking customer real-time.
• Two of the key design concepts required:
• Isolation of change
• Alternatives to batch ETL.
• Changes from new volatile data must be isolated from the bulk of
the historical, non-volatile DW data. “building partitions and using
union queries for different partitions, when necessary”.
• Trickle-feeds, pipelining, and SOA are alternatives used to batch
ETL.
• Shorter Latency requirements for data availability in the DW.
DW types
9.2.4.1 Active Data Warehousing
31. • OLAP “Online Analytical Processing”: (analytical Cube)
• An Approach to providing fast performance for multi-dimensional analytic
queries.
• Originated, in part to make distinction from OLTP “Online Transactional
Processing”
• The Typical Output queries are in Matrix format.
• The dimensions from the rows and columns of the matrix.
• The factors or measures are the values inside the matrix.
• Multi-dimensional analysis with cubes is useful to look at summaries of data.
• A common applications is financial analysis, where analysts want
repeatedly traverse known hierarchies to analyze data:
• data ( such as Year, Quarter, Month, Week, Day)
• Organization ( Such as Region, Country, Business Unit, Department)
• Product ( such as Product Category, Product Line, Product).
DW types
9.2.4.2 Multi-dimensional Analysis - OLAP
32. • Three classic implementation approaches support Online Analytical
processing. Related to DB implementation approach:
• Relational Online Analytical Processing (ROLAP): using techniques in the
two-dimensional tables of RDBMS. Star schema joins are a common DB
design.
• Multi-dimensional Online Analytical Processing (MOLAP): support OLAP by
using proprietary and specialized multi-dimensional DB technology.
• Hybrid Online Analytical processing (HOLAP): simply a combination of ROLAP
and MOLAP allow part of the data to be stored in MOLAP form and another
part of the data to be stored in ROLAP. “designer has to vary the mix of
partitioning”.
• Database Online Analytical Processing (DOLAP): A virtual OLAP cube is
available as a special proprietary function of classic relational DB.
DW types
9.2.4.3 ROLAP, MOLAP, HOLAP and DOLAP
33. 9.2.5 Dimensional Data Modeling Concepts and Terminology
• Dimensional Data modeling is the preferred technique for
Designing data marts.
• Focus on making it simple for the end-user to understand and access
the data.
• This helps contribute to the fact that the majority of data mart
design work ends up being in ETL processing.
• It is subset of entity relationship data modeling. Has entities,
attributes, and relationships.
• The entity come in two types
• Facts: provide the measurement, Dimensions: provide the context
• Relationships are constrained to all go through the fact table, and all
dimension-to-fact relationships are one-to-many (1:M).
• Table 9.6 comparison the difference between relational modeling
for transactional applications VS those build with dimensional data
modeling for data marts.
35. Dimensional Data Modeling
9.2.5.1 Fact Tables
• Represent and contain important business measures.
• Fact tables (entities) contain one or more facts (attributes representing
measures)
• The row of a fact table correspond to a particular measurement and are
numeric, such as amounts, quantities, or counts.
• Express or resolve “many-to-many” relationships between the
dimensions.
• Often have a number of control columns that express when the row
was loaded.
• These fields help the programmers, the operators and the super-
users navigate and validate the data.
36. Dimensional Data Modeling
9.2.5.2 Dimension Tables
• Represent the important objects of the business and contain textual
descriptions of the business.
• They act as the entry points or links into the fact tables, and their
contents provide report groupings and report labels.
• Typically have small number of rows and large number of columns.
Main contents of a dimension table are:
• Surrogate or non-surrogate key.
• The primary key representing what is used to link to other tables in the DW.
• Descriptive elements, including codes, descriptions, names, statuses, and so
on.
• Any hierarchy information, including multiple hierarchies and often ‘type’
breakdown.
• The business key that the business user uses to identify a unique row.
• The source system key identification fields for traceability.
• Control fields geared to the type of dimension history capture, Type 1-3, 4
and 6
• Must have unique identifiers for each row. “surrogate and natural”
37. Dimension Tables
9.2.5.2.1 Surrogate Keys
• “surrogate key” or “anonymous key” is single primary key,
populated by a number unrelated to the actual data.
• Can be either a sequential number, or truly number.
• The advantages of using surrogate keys include:
• Performance: number fields search faster than other types of fields.
• Isolation: it is buffer from business key field changes. May not need changing
if a field type or length changes on the source system.
• Integration: Enable combinations of data from different sources. Usually do
have the structure as other system when trying to identify it.
• Enhancement: values, such as ‘Unknown’ or ‘Not applicable’, have their own
specific key value in addition to all of the keys for valid rows.
• Interoperability: data access libraries, and GUI functions work better with
surrogate key, because they don’t need additional knowledge about the
underlying system to function properly.
• Versioning: Enable multiple instances of the same dimension value, which is
necessary for tracking changes over time.
• De-bugging: supports load issue analysis, and re-run capability.
38. Dimension Tables
9.2.5.2.2 Natural Keys
• Used for system that not preferred to create additional key fields to
identify unique rows by joining multiple fields in each query.
• Business driven
• The advantages of using a natural keys are:
• Lower overhead: The key fields are already present, not requiring any
additional modeling to create or processing to populate.
• Ease of change: In RDBMS where the concept of a domain exits, it is easy to
make global changes due to changes on the source system.
• Performance advantage: Using the values in the unique keys may eliminate
some joins entirely, improving performance.
• Data lineage: Easier to track across systems, especially where the data travels
through more than two systems.
39. Dimensional Data Modeling
9.2.5.3 Dimension Attribute Types
• The three main types of dimension attributes:
• Type 1
• Type 2 (and 2a)
• Type 3
• They differentiated by the need to retain historical copies.
• There are two other types that do not appear very often:
• Type 4
• Type 6 (1+2+3)
• Type 1 through 3 can co-exist within the same table, and actions
during update depend on which fields with which types are having
updates applied.
40. Dimension Attributes Types
9.2.5.3.1 Type 1 Overwrite
• Have no need for any historical records at all.
• The only interest is in the current value, so any updates completely
overwrite the prior value in the field in that row.
• Example “hair color”. When an update occurs, there is no need to
retain the current values.
41. Dimension Attributes Types
9.2.5.3.2 Type 2 New Row
• Need all historical records.
• Any new changes with type2 fields, a new row with the current
information is appended to the table.
• The pervious current row’s expired data field is updated to expire it.
• Example: when the billing address changes, the row with the old address
expires and a new row with the current billing address is appended.
• The table’s key should handle multiple instances of the same natural
key, either through the use of surrogate keys by:
• Adding an index value to the primary key
• Adding of date value ( effective, expiration, insert, and so on) to the primary
key.
42. Dimension Attributes Types
9.2.5.3.3 Type 3 New Column
• Multiple fields in the same row contain the historical values.
• Need only a selected, known portion of history.
• When an update occurs, the current value is moved to the next
appropriate field, and the last, no longer necessary, value drops off.
• Example is credit score, where only the original score when the account
opened, the most current score, and the immediate prior score are valuable.
An update would move the current score to the prior score.
• Example: monthly bill totals “12 fields”, named Month01, Month02..etc. or
January, February, etc.
• One useful purpose of Type 3: attribute value migrations.
• Example: a company decides to reorganize its product hierarchy but wants to
see sales figures for both the old hierarchy and the new for a year, to make
sure that all sales are being recorded appropriately.
43. Dimension Attributes Types
9.2.5.3.4 Type 4 New Table
• Initiate a move of the expired row into ‘history 'table, and the row
in the ‘current’ table is updated with the current information.
• Example: would be a supplier table, where expired supplier rows roll off into
the history table after an update, so the main dimension tables only contains
current supplier rows. “The latter is sometime called a Type 2a dimension”
• Retrievals involving timelines are more complex in a Type 4 design.
• Since current and history tables need to be joined before joining
with the fact table. Therefore, it is optimal when the vast majority of
access uses current dimension data, and the historical table is
maintained more for audit purposes than for active retrievals.
44. Dimension Attributes Types
9.2.5.3.5 Type 6 1+2+3
• The same as Type 2 “new row”, where any change to any value
creates a “new row”, but the key value ( surrogate or natural) does
not change.
• Two way to implement type 6:
• Add three fields to each row “effective date, expiration date, and current row
indicator”
• Add index field : updated row get the index value of zero, and all rows add 1
to their index values to move them down the line
• “the current row indicator”:
• queries looking for data as of any particular point in time check to see if the
desired data is between the effective and end dates. “drawback of requiring
additional knowledge to create queries that correctly ask for the proper row
by period value or indicator”.
• “index field”:
• . Queries looking for the current values would set the filter for index value
equal to zero and looking for prior times would still use the effective and
expiration dates. “drawback: all fact rows will link automatically to the index
version0”joining to the fact table will not find any prior values of the
dimension unless the dimensional effective and expiration dates are included
in the query.
45. Dimensional Data Modeling
9.2.5.4 Star Schema
• Is the presentation of dimensional data model with a single fact
table in the center connecting to number of surrounding dimension
tables.
• Also referred to as a star join schema, joins from the central fact
table are via single primary keys to each of the surrounding
dimension table. “the central fact table has a compound key
composed of the dimension keys”.
• Figure 9.4: Example of Star Schema
47. Dimensional Data Modeling
9.2.5.5 Snowflaking
• Is the term given to de-normalizing the flat, single-table,
dimensional structure in star schema into component hierarchical or
network structures.
• Kimball’s design methods discourage snowflaking on two main
principles:
• It dilutes the simplicity and end-user understandability of the star schema.
• The space savings are typically minimal.
• Three types of snowflake tables are recognized:
• Snowflake tables: Formed when a hierarchy is resolved into level tables.
• Outrigger tables: Formed when attributes in one dimension table links to
rows in another dimension table.
• Bride tables: formed in two situations. The first is when a many-to-many
relationship between two dimensions that is not or cannot be resolved
through a fact table relationship.
48. Dimensional Data Modeling
9.2.5.6 Grain
• Grain stands for the meaning or description of single row of data in a
fact table.
• Refers to the atomic level of the data for a transaction.
• Defining the grain of a fact table is one of the key steps in Kimball’s
dimensional design method.
• For Example: if the fact table has data for a store for all transactions
for a month, we know the grain or limits of the data in the fact table
will not include data for last years.
49. Dimensional Data Modeling
9.2.5.7 Conformed Dimensions
• The common or shared dimensions across multiple data marts in
Kimball’s design method.
• The practical importance is that the row headers from any answers
sets from conformed dimensions must be able to match exactly.
• Example: think of multiple data marts or fact tables, all linking directly to the
same dimension table, or a direct copy of that dimension table. Updates to
that dimension table automatically show in all queries for those data marts.
• Reuse of conformed dimensions in other star schemas allow for
modular development of the DW.
• Ultimately, queries walk across subject areas to unify data access to
the DW across the entire enterprise.
50. Dimensional Data Modeling
9.2.5.8 Conformed Facts
• Use standardized definitions of terms across individuals' marts.
• Different business users may use the same term in different ways.
• Does “Customer additions” refer to “gross additions” or “adjusted additions”.
• Does “Orders processed” refer to the entire order, or the sum of individual
line items.
• Developers need to be keenly aware of things that may be called the
same but are different concepts across organizations.
51. Dimensional Data Modeling
9.2.5.9 DW-Bus Architecture and Bus Matrix
• The DW-bus architecture of conformed dimensions is “ what allows
multiple data marts to co-exit and share by plugging into a bus of
shared or conformed dimensions.
• The DW-bus matrix is tabular way of showing the intersection of
data marts, data processes, or data subject areas with the shared
conformed dimensions. “Table 9.7”
• Very effective communication and planning tool.
• As new design pieces are added, the existing dimensions and facts,
complete with their sources, update logic, and schedule, need to be
reviewed for possible re-use.
53. 9.3 DW-BIM Activities
• DW “Data content”: concerned primarily with the part of the DW-
BIM lifecycle from data source to a common data store across all
relevant departments.
• BIM “Data presentation”: concerned with the portion of lifecycle
from common data store to targeted audience.
• BIM capability is directly dependent upon the provision of data from
DW that is Timely, relevant, integrated, and has other quality factors
controlled for and documented as required.
• DW-BIM activities overlap with many of the data management
functions.
54. 9.3 DW-BIM Activities
• DW “Data content”: concerned primarily with the part of the DW-
BIM lifecycle from data source to a common data store across all
relevant departments.
• BIM “Data presentation”: concerned with the portion of lifecycle
from common data store to targeted audience.
• BIM capability is directly dependent upon the provision of data from
DW that is Timely, relevant, integrated, and has other quality factors
controlled for and documented as required.
• DW-BIM activities overlap with many of the data management
functions.
55. DW-BIM Activities
9.3.1 Understand Business intelligence information Needs
• Beginning with keep a consistence focus on business value of
organization. “Value chain”
• In contrast, of Operational systems. DW-BIM project gathering
requirements specific details of operations and reports.
• DW-BIM analysis project, is ad-hoc and involves asking questions
“Slice and Dice the data”.
• Identify and scope the business area, though interviews and ask
people.
• Capturing the actual business vocabulary and technology is a key to
success.
• Document the business context, then explore the details of the
actual source data. “67% of DW-BIM projects are ETL portion”.
56. DW-BIM Activities
9.3.1 Understand Business intelligence information Needs
• Poor DW functionality is the first and apparent poor-quality data.
Collaboration with the data governance function is critical.
• Creating an executive summary of the identified business
intelligence needs is best practice.
• When starting a DW-BIM program, use a simple assessment of
business impact and technical feasibility. Three critical factors to
assessment:
• Business Sponsorship: through identified and engaged steering committee
• Business Goals and Scope: is there a clearly identified business need,
purpose, and scope for the effort?
• Business Resources: Commitment by business management to the
availability and engagement of the appropriate Business SME.
57. DW-BIM Activities
9.3.2 Define and Maintain the DW-BI Architecture
• Roles required to identified Successful DW-BIM architecture are:
• Technical Architect: Hardware, OS, DB, and DW-BIM architecture.
• Data Architect: Data analysis, system of record, data modeling, and data
mapping.
• ETL Architect / Design Lead : Staging and transform, data marts, and schedules
• Meta-data Specialist: Meta-data interfaces, meta-data architecture, and
contents.
• BI Application Architect / Design Lead: BI tool interfaces and report design,
meta-data delivery, data and report navigation, and
• DW-BIM needs to leverage many of the disciplines and components
of a company’s IT department from perspectives of:
• Business process
• Architecture
• Technology standards, including servers, DBs, Security.. ect.
• Availability and timing needs are key drivers in developing the DW-
BIM architecture. “technical requirements”
58. DW-BIM Activities
9.3.2 Define and Maintain the DW-BI Architecture
• The design decisions and principles for what data detail the DW
contains is a key design priority for DW-BIM architecture.
• BW-BIM architecture integrate with the overall corporate reporting
architecture. Focus on defining appropriate (SLAs).
• Another success factor is to identify a plan for data re-use, sharing,
and extension.
• Finally, no DW-BIM effort can be successful without business
acceptance of data. Consider, up-front, a few critical important
architectural sub-component, along with their supporting activities:
• Data quality feedback loop: how easy is the integration of needed changes
into operational system?
• End to-end meta-data: integrated end-to –end flow of meta data and easy to
access
• End-to-end verifiable data lineage: To use modern, popular, TV parlance, is
the evidence chain-of-custody for all DW-BIM data readily verifiable? Is a
system of record for all data identified?
59. DW-BIM Activities
9.3.3 Implement Data Warehouses and Data Marts
• DW and data marts are the two major classes of formal data stores
in DM-BIM landscape.
• DW is relational DB design with normalization techniques. “integrate
data from multiple source systems, and serve data to multiple data
marts”
• The primary purpose of data marts is to provide for analysis to
knowledge workers.
• DW and Data marts design (Covey’s Seven habits):
• Identify the business problem to solve.
• Identify the details and what would be used ( end solution piece of software
and associated data marts).
• Continue to work back into the integrated data required ( the DW).
• Ultimately, all the way back to the data sources.
60. DW-BIM Activities
9.3.4 Implement BI Tools and User Interfaces
• The purpose of this section is to introduce the types of tools
available in BI marketplace and review their characteristics.
• Implementing the right BI tools or User (UI) is about identifying the
right tools for the right user set.
• Almost all BI tools also come with their own meta-data repositories to mange
their internal data maps and statistics.
• Some vendors make these repositories open to the end user, other allow
business meta-data to be entered.
• Enterprise meta-data repositories must link and coped these repositories to
get complete view of reporting and analysis activity.
61. BI Tools & UI
9.3.4.1 Query and Reporting Tools
• Query and Reporting is the process of querying a data source, then
formatting it to create a “report”, either a production style report
such as an invoice, or a management report.
• The needs within business operations reporting are often different
from the needs within business query and reporting.
• Table 9.8 help distinguish business operations-style reports from
business query and reporting.
• Figure 9.5 relates the classes of BI tools to the respective classes of
BI users for those tools.
• Different Users may use different and crossed Queries and reporting
in BI tools.
62. BI Tools & UI
9.3.4.1 Query and Reporting Tools
63. BI Tools & UI
9.3.4.1 Query and Reporting Tools
64. BI Tools & UI
9.3.4.2 OLAP Tools
• Covers OLAP tools, that provide arrangement of data into OPLAP
cubes for fast analysis.
• Cubes in BI tools are generated from star ( or snowflake) DB schema.
• The OLAP cubes consists of measures “numeric facts” from fact
tables.
• The value of OLAP tools and cubes is reduction of the chance of
confusion and erroneous interpretation, by aligning the data content
with the analyst’s mental model.
• Common OLAP operations include:
• Slice : subset of multi-dimensional array corresponding to a single value for
one or more members of the dimensions not in the subset.
• Dice: it is “slice” on more than two dimensions of data cube, or more than
two consecutive slices.
• Drill Down / Up: specific analytical technique whereby the user navigates
among levels of data.
• Roll-up : involves computing all of the data relationships for one or more
dimensions. To do this, define formula.
• Pivot: To change the dimensional orientation of report or page display.
65. BI Tools & UI
9.3.4.3 Analytic Applications
• include the logic and process to extract data from well-known source
systems, such as vendor ERP systems, data model for the data mart,
and pre-built reports and dashboards.
• Different types include customer, financial, supply chain,
manufacturing, and HR applications.
• Different approaches for Analytic applications from “buy” and
“Build” perspectives.
• Some Key questions for evaluation of analytic applications are:
1. Do we have the standard source systems for which ETL is supplied? If yes,
how much have we modified it? Less modification equals more value and
better fit.
2. How many other source systems do we need to integrate? The fewer the
sources, the better the value and fit.
3. How much do the canned industry queries, reports, and dashboards match
our business ? Involve your business and customers and let them answer
that
4. How much of analytic application’s infrastructure matches your existing
infrastructure ? The better the match, the better value and fit.
66. BI Tools & UI
9.3.4.4 Implementing Management Dashboards and Scorecards
• Both are ways of efficiently presenting performance information.
• Dashboards are oriented more toward dynamic presentation of operational
information.
• Scorecards are more static representations of longer-term organizational,
tactical, or strategic goals.
• Scorecards are divided into 4 views: Finance, Customer,
Environment, and Employees.
• Each have number of metrics that are reported and
trended to various targets set by senior executives.
• Example of the way various BI Techniques combine to
create BI environment presented in Wayne Eckerson book
“on Performance Dashboards”. “Figure 9.6”
67. BI Tools & UI
9.3.4.4 Implementing Management Dashboards and Scorecards
68. BI Tools & UI
9.3.4.5 Performance Management Tools
• Include budgeting, planning, and financial consolidation.
• There have been several major acquisitions in this segment:
• On customer buying side, the degree to which customer buy BI and
Performance management from the same vendor:
• Depend on product capabilities.
• The degree to which the CFO and CIO co-operate.
• It is important to note that budgeting and planning does not apply
only to financial metrics, but to workforce, capital, and so on, as well.
69. BI Tools & UI
9.3.4.6 Predictive Analytics and Data Mining Tools
• Data Mining: type of analysis reveals patterns in data using various
algorithms. Help users discover relationships or show patterns in
more exploratory fashion.
• Predictive analytics “what-if analysis” allow users to create a model,
test the model based on actual data, and then project future results.
Underlying engines my be neural networks or inference.
• Using data mining in predictive analysis, fraud detection, root cause
analysis, customer segmentation and scoring..etc.
• Good strategy for interfacing with many data mining tools is to workt
with the business analysts to define data set needed for analysis, and
then arrange for periodic file extract.
• This strategy is intense multi-pass
• Data mining from the DW
• Data mining tools work with file-based input
70. BI Tools & UI
9.3.4.7 Advanced Visualization and Discovery Tools
• Use an in-memory architecture to allow users to interact with the
data in a highly visual, interactive way.
• Patterns in a large dataset can be difficult to recognize in a number
display.
• A pattern can be picked up visually fairly quickly, when thousands of
data points are loaded into a sophisticated display on a single page
of display.
• The difference in these tools versus most dashboard products is:
1. The degree of sophisticated analysis and visualization types such as small
multiplies, spark lines, heat maps, histograms, waterfall charts, bullet graph
2. Adherence to best practices according to the visualization community.
3. The degree of interactivity and visual discovery versus creating a chart on a
tabular data display.
71. 9.3.5 Process Data for BI
• The most and big part of any DW-BIM effort is the preparation and
processing of the data.
• This section introduces some of the architectural components and
sub-activities involved in processing data for BI:
• Staging Areas
• Mapping sources and Targets
• Data Cleansing and Transformation ( Data Acquisition)
72. Process Data for BI
9.3.5.1 Staging Areas
• Is the intermediate data store between an original data source and
the centralized data repository.
• All required cleansing, transformation, reconciliation and
relationships happen in this area.
• Divide the work reduce the overall complexity and make debugging
much simpler.
• A change-capture mechanism reduce the volume of transmitted
data sets. Several months to a few years of data can be stored in
this initial staging area. Benefits of this approach include:
• Improving performance on the source system by allowing limited history to
be stored there.
• Pro-active capture of a full set of data, allowing for future needs.
• Minimizing the time and performance impact on the source system by
having a single extract.
• Pro-active creation of a data store that is not subject to transactional system
limitations.
73. Process Data for BI
9.3.5.2 Mapping Sources and Targets
• Source-to-target mapping is the documentation activity that
defines data type details and transformation rules for all required
entities and data elements, and from each individual source to each
individual target.
• Determine valid links between data elements in multiple
equivalent systems consider the most difficult part.
• A solid taxonomy is necessary to match data elements in different
systems into a consistent structure in the EDW.
• Gold sources or system of record source or sources must be signed
off by the business.
74. Process Data for BI
9.3.5.3 Data Cleansing and Transformation ( Data Acquisition)
• Data cleansing: activities that correct and enhance the domain
values of individual data elements, including enforcement of
standards.
• Necessary for initial loads where significant history is involved.
• Strategy preferred is to push data cleansing activities to source system.
• Strategies must be developed:
• For the rows of data that are loaded but found to be incorrect.
• Deleting old records may cause some havoc with related tables and
surrogate keys
• by expiring a row, New row may be better option
• Data transformation: Activities that provide organizational context
between data elements, entities, and subject area.
• Organizational context include: cross-referencing, reference and master data
management, complete and correct relationships.
• Essential component of being able to integrated data from multiple sources.
• Required extensive involvement with Data Governance.
75. 9.3.6 Monitor and Tune DW Processes
• Transparency and visibility are the key principles drive DW-BIM
monitoring.
• Providing dashboards and drill-down activities is the best practice.
• The addition of data quality measures will enhance the value of performance
is more than just speed and timing.
• Processing should be monitored across the system for bottlenecks
and dependencies among processes.
• DB tuning techniques such as partitioning, tune backup and recovery
strategies.
• Users often consider DW as active archive due to the long histories.
• Management by exception is great policy to apply here.
• Sending attention messages upon failure is a prudent addition to monitoring
dashboard.
76. 9.3.6 Monitor and Tune BI Activity and Performance
• A best practice for BI monitoring and tuning is define and display a
set of customer facing satisfaction metrics. Example of metrics:
• Average query response time.
• The number of users per day/ week/ month.
• The statistical measures available from the system.
• Regular review of usage statistics and patterns is essential.
• Report providing frequency
• Resource usage of data
• Queries
• Report allow for prudent enhancement.
• Tuning BI activities is analogous to principle of profiling applications
in order to know where the bottlenecks are and where to apply
optimization efforts.
• Creating indexes and aggregations
• Simple solutions such as report that positing a daily results.