SlideShare a Scribd company logo
1 of 76
Download to read offline
Data Warehousing and Business Intelligence
Management
Ahmed Alorage
Objectives:
• 9.1 Introduction
• 9.2 Concepts and Activities
• 9.2.1 Data Warehousing- A Brief Retrospective and Historical Tour
• 9.2.1.1 Classic Characteristics of a Data Warehouse- Inmon Version
• 9.2.1.2 Classic characteristics of a Data Warehouse – Kimball Version
• 9.2.2 DW/ BI Architecture and Components
• 9.2.2.1 Inmon’s Corporate Information Factory
• 9.2.2.2 Imball’s Business Development Lifecycle and DW Chess Pieces
• 9.2.3 Tactical, Strategic and Operational BI
• 9.2.4 Types of Data Warehousing
• 9.2.4.1 Active Data Warehousing
• 9.2.4.2 Multi-dimensional Analysis – OLAP
• 9.2.4.3 ROLAP, MOLAP, HOLAP and DOLAP
• 9.2.5 Dimensional Data Modeling Concepts and Terminology
• 9.2.5.1 Fact Tables
• 9.2.5.2 Dimension Tables (9.2.5.2.1 Surrogate Keys, 9.2.5.2.2 Natural Keys)
• 9.2.5.3 Dimension Attribute Types (9.2.5.3.1 Type 1 Overwrite, 9.2.5.3.2 Type 2 New Row, 9.2.5.3.3 Type 3 New Column, 9.2.5.3.4 Type
4 New Table , 9.2.5.3.5 Type 6 1+2+3)
• 9.2.5.4 Star Schema
• 9.2.5.5 Snowflaking
• 9.2.5.6 Grain
• 9.2.5.7 Conformed Dimensions
• 9.2.5.8 Conformed Facts
• 9.2.5.9 DW-Bus Architecture and Bus Matrix
Objectives:
• 9.3 DW-BIM Activities
• 9.3.1 Understand Business Intelligence Information Needs
• 9.3.2 Define and Maintain the DW-BI Architecture
• 9.3.3 Implement Data Warehouses and Data Marts
• 9.3.4 Implement Business Intelligence Tools and User Interfaces
• 9.3.4.1 Query and Reporting Tools
• 9.3.4.2 On Line Analytical Processing (OLAP) Tools
• 9.3.4.3 Analytic Applications
• 9.3.4.4 Implementing Management Dashboards and Scorecards
• 9.3.4.5 Performance Management Tools
• 9.3.4.6 Predictive Analytics and Data Mining Tools
• 9.3.4.7 Advanced Visualization and Discovery Tools
• 9.3.5 Process Data for Business Intelligence
• 9.3.5.1 Staging Areas
• 9.3.5.2 Mapping Sources and Targets
• 9.3.5.3 Data Cleansing and Transformations ( Data Acquisitions )
• 9.3.6 Monitor and Tune Data Warehousing Processes
• 9.3.7 Monitor and Tune BI Activity and Performance
9 Data Warehousing and Business Intelligence
Management
• Data warehouse and business intelligence Management
is the seventh Data Management Function in the Data
Management framework in Chapter 1.
• sixth data management function that interacts with and
influenced by Data Governance function.
• In this Chapter, we will define the Data warehousing and
business intelligence Management Function and Explains
the Concepts and Activities involved.
9.1 Introduction:
• Data Warehouse (DW) is a combination of two primary components:
• Integrated decision support database
• Related software programs used to collect, cleanse, transform and store data from variety sources
• Enterprise Data warehouse (EDW) is centralized data warehouse designed to
service the BI needs of the entire organization.
• Data warehousing:
• term used to describe the operational extract, cleansing, transformation, and load processes –
and associated control process- that maintain the data contained within DW.
• Its process focuses on enabling an integrated and historical business context on operational data
by enforcing business rules and maintaining appropriate business data relationships.
• It is technology solution supporting BI
9.1 Introduction:
• Business Intelligence (BI) is a set of business capabilities. Means many things,
including:
1. Query, analysis, and reporting activity by knowledge workers to monitor and understand the
financial operation health of, and make business decisions about, the enterprise.
2. Query, analysis, and reporting processes and procedures.
3. A synonym for the BI environment.
4. The market segment for BI software tools.
5. Strategic and operational analytics and reporting on corporate operational data to support
business decisions, risk management, and compliance.
6. A synonym for Decision support System (DSS).
• Data Warehousing and Business Intelligence Management (DW-BIM) is :
• The collection, integration and presentation of data to knowledge workers for the purpose of
business analysis and decision-making.
• Composed of activities supporting all phases of the decision support life cycle providing context,
moves and transforms data from sources to a common target data store, and then provide
knowledge workers various means of access, manipulation, and reporting of the integrated data.
9.1 Introduction:
• Objectives for DW-BIM include:
• Providing integrated storage of required current and historical datal, organized by subject areas.
• Ensuring credible, quality data for all appropriate access capabilities.
• Ensuring a stable, high-performance, reliable environment for data acquisition, management, and
access.
• Providing an easy-to-use, flexible, and comprehensive data access environment.
• Delivering both content and access to the content in increments appropriate to organization’s
objectives.
• Leveraging, rather than duplicating, relevant data management component functions such as
Reference and Master Data Management, Data Governance (DG), Data Quality (DQ), and Meta-
data (MD).
• Providing an enterprise focal point for data delivery in support of the decisions, policies,
procedures, definitions, and standards that arise from DG.
• Defining, building, and supporting all data stores, processes, infrastructure, and tools that contain
integrated, post-transactional, and refined data used for information viewing, analysis, or data
request fulfillment.
• Integrating newly discovered data as a result of BI processes into the DW for further analytics and
BI use.
9.2 Concepts and Activities
• This section provide the purposes:
• The history of DW-BIM and overview of typical DW-BIM
components.
• Explanation of some general of DW-BIM terminology follows.
• Brief introduction and overviews of dimensional modeling and
its terminology leads into the activities identified in Figure 9.1
9.2.1 Data Warehousing – a brief Retrospective
and Historical Tour
• Tow name have a significant contributions of advance and
shape of the practice of data warehousing:
• Bill Inmon
• Ralph Kimball
• In this section, a brief introduction of their major
contributions along with some comparisons and contrasts
of their approaches.
Historical Tour
9.2.1.1 Inmon Version – classic characteristics of a Data Warehouse
• In the early 1990s, bill Inmon defined DW as “Subject-oriented, integrated, time
variant, and non-volatile collection of summary and detailed historical data
used to support the strategic decision-making processes for the corporation”.
• These key characteristics give a clear distinction of nature of DW compared to
typical operational system.
• Subject Oriented: design the DW to meet the data needs of the corporation.
• Integrated: concentrate about the data stored in DW. “key structure, encoding,
decoding of structure, definitions of the data, and so on”
• Time Variant: refers to how every record in DW is accurate in moment in time.
• Non-Volatile: data warehouse to the fact that updates to records during normal
processing do not occur, and if the update occur at all, they occur on an
exception basis.
• Summarized and Detail Data: Data in WH must main detailed data.
• Historical: Data WH containing a vast amount of historical data ( 5 to 10 years
worth of data)
Historical Tour
9.2.1.2 Kimball Version – classic characteristics of a Data Warehouse
• Ralph Kimball took a different approach, defining Data WH simply:
• A copy of transaction data specifically structured for query and analysis.
• Has different structure than operational system (the dimensional data model)
• Data WHs always contain more than just transactional data.
• Reference data is necessary to give context to the transactions.
• Dimensional data models are relational data models.
• They just do not consistently comply with normalization rules.
• Reflect business processes more simply than normalized models.
9.2.2 DW / BI Architecture and Components
• Introduces the major components fount in most DW/BI
environments, through overview from both
• Inmon and Kimball perspectives
• Inmon’s approach:
• The Corporate Information Factory
• Kimball’s approach:
• The “DW Chess Pieces”
• Both views and their components are described and
contrasted.
DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
• CIF (Corporate Information Factory) is Corporate Data Architecture for DW-BIM:
• Identified and Wrote by “Claudia Imhoff and Ryan Souse”
• Figure 9.2: Show the components of CIF
• Table 9.1 : Lists and describes the basic components of the Corporate
Information Factory view of DW/BI architecture.
• Table 9.2 : Provide context for the reporting context for the reporting scope and
purpose of each of the Corporate Information Factory components and some
explanatory notes.
• Table 9.3 Provide a compare-and-contrast from a business and application
perspective between the four major components of the CIF such as between the
applications, ODS, DW and Data Marts.
• Table 9.4 Provide a compare-and-contrast from a data perspective between the
four major components of the CIF such as between the applications, ODS, DW
and Data Marts.
DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Label – Name Description
Raw Detailed data Operational/ Transactional Application data of the enterprise.
Provide the source data to be integrated into ODS and DW.
Can also be in DB or other storage or file format.
Integration and
Transformation
This Layer of the architecture is where the un-integrated data
from various application sources stores is combined/integrated
and transform into the corporate representation in the DW
Reference Data Was A precursor to what is currently referred to as MDM.
The purpose was to allow common storage and access for
important and frequently used common data.
Focus and shared understanding on data upstream of the DW
simplifies the integration task in the DW
Historical Reference Data When current value reference data is necessary for transactional
applications, at the same it is critical to have accurate integration
and presentation of historical data
Table 9.1 Corporate Information Factory Component Descriptions
DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Label – Name Description
Operational Data Store
(ODS)
The main distinguishing data characteristics of ODS
compared to DW include current-valued vs DW historical
data and volatile vs. DW non-volatile data.
Operational Data Mart
(Oper-Mart)
Data mart focuses on tactical decision support.
Distinguishing characteristics include current-valued vs DW
historical data, tactical vs. DW strategic analysis, and sourcing
of data from ODS rather than just the DW.
Data Warehouse (DW) Large, comprehensive corporate resource.
Primary purpose is to provide a single integration point for
corporate data in order to serve management decision, and
strategic analysis and planning. The flow In & Out into DW in
one direction only. Data that needs correction is rejected,
corrected at its source, and re-fed through the system.
Data Marts (DM) Its purpose is to provide for DSS/information processing and
access that is customized and tailored for the needs of a
particular department or common analytic need.
Table 9.1 Corporate Information Factory Component Descriptions
DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Component Reporting Scope / Purpose Notes
Applications Isolated Operational Report Limited to data within one application
instance
ODS Integrated Operational
Reports
Reports requiring data from multiple source
systems.
DW Exploratory Analysis The complete set of corporate data allows for
discovery of new relationships & information.
Many BI data mining tools work with flat-file
extracts from the DW, which can also offload
the processing burden from the DW,
Oper-Mart Tactical Analytics Analytic reporting based on current-values
with a tactical focus.
Data Marts Analytics – classical
management decision
support, and strategic
analytics
“departmental analysis”, such as political and
funding expediency. Later work expanded
concepts to common-analytic needs crossing
departmental boundaries.
Table 9.2 CIF Reporting Scope and Purpose
• About the Table 9.3.
• Note the following general observations about the contrast between the
information on the right-hand side for DW and Data Marts, compared to the
left-hand side for applications, in particular:
• The purpose shifts from execution to analysis
• End users are typically decision makers instead of oders (front line workers)
• System usage is more ad hoc than the fixed operations of the transactional operations.
• Response time requirements are relaxed because strategic decisions allow more time than
daily operations.
• Much more data is involved in each operation / query or process.
DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
• DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Application
Data
ODS DW Data Mart
Business
Purpose
Specific
Business
Function
Corp
Integrated
Operational
Needs
Central Data
Repository
Integration
and Reuse
Analysis:
Departmental (Inmon)
Business Process
(Kimball)
Business Measures
(Wells)
System
Orientation
Operations
(Execution)
Operations
(Reports)
Infrastructure Informational Analytic
(DSS)
Target Users End Users:
Clerical (Daily
Operations)
Line
Managers:
Tactical
Decision
Makers
Systems: Data
Marts, Data
Mining
Executives:
Performance/Metrics
Sr. & Mid Mgrs
Knowledge Workers
How system is
used
Fixed Ops Operational
Reporting
Stage, Store,
Feed
Ad-Hoc
Table 9.3 CIF Components – Business / Application View
• DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Application
Data
ODS DW Data Mart
System
Availability
Fixed Ops Medium Varies Relaxed
Typical
Response Time
Seconds Seconds to
Minutes
Longer (Batch) Seconds to Hours
# Records in an
Op.
Limited Small to Med. Large Large
Amount of Data
Per Process
Small Medium Large Large
SDLC Classic Classic Classic Modified
Table 9.3 CIF Components – Business / Application View
• DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
• Table 9.4:
• considers a compare-and-contrast from data perspective between the four
components “application, ODS, DW, and Data Marts”.
• The majority of DW processes are for higher latency and, often over-night
batch processing.
• DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
Application ODS DW Data Mart
Orientation Functional Subject Subject Limited Subject
View Application Corporate (Ops) Corporate (Historical) Focused Analysis
Integration Not Integrated-
Application
Specific
Integrated
Corporate Data
Integrated Corporate
Data
Integrated Subset
Volatility CRUD Volatile Non-Volatile Non-Volatile
Time Current Only Current Value Time Variant Time Variant
Detail Level Detail Only Detail Only Detail+Summary Detail+ Summary
Amount of
History*
30 to 180 Days 30 to 180 Days 5-10 years 1-5 years
Latency* Real Time to
NRT
NRT > 24 hours 1 day to 1 month
Normalized? Yes Yes Yes No
Modeling Relational Relational Relational Dimensional
Table 9.3 CIF Components – Business / Application View
• In Table 9.4, comparisons between DW and Data Marts, and application,
in particular:
• Data is subject vs. functional orientation
• Integrated data vs. stove-piped or siloed
• Data is time-variant vs. current-valued only.
• Higher latency in the data.
• Significantly more history is available.
• DW/BI Architecture and Components
9.2.2.1 Inmon’s Corporate Information Factory
• Called “Business Dimensional Lifecycle” approach but referred to as “Kimball
approach”.
• His Design Tip #49 “We chose the Business Dimensional Lifecycle label instead,
because it reinforced our core tents about successful data warehousing based on
our collective experiences since the mid-1980s”
• The basis of this approach is three tenets' “principles”:
• Business Focus: both immediate business requirements and more long-term broad data
integration and consistency.
• Atomic Dimensional Data Models: both for ease of business user understanding and query
performance.
• Iterative Evolution Management: Manage changes and enhancements to the DW as individual,
finite projects.
• Using conformed dimensions and facts design. “business rules parts of DW
become re-usable components that are already integrated”.
• Figure 9.3 is representation of “Data Warehouse Chess Pieces”, more inclusive
and expansive than that of Inmon.
• Uses the term Data Warehouse to encompass everything in both the data staging and data
presentation areas.
• DW/BI Architecture and Components
9.2.2.2 Kimball’s Business Development Lifecycle and DW chess Pieces
• DW/BI Architecture and Components
9.2.2.2 Kimball’s Business Development Lifecycle and DW chess Pieces
Name Description
Operational Source
Systems
Operational/Transactional Applications of the enterprise.
Integrated into ODS and DW components
Equivalent to the Application system in the CIF diagram
Data Staging Area Refer to “Kitchen”, also “area behind the scenes”. Smaller than Inmon’s
diagram.
“eclectic set of processes needed to integrate and transform data for
presentation”
Similar to “integration and transformation” in CIF
Data Presentation Area Similar to Data Marts in CIF.
Only Different with dimensions unifying the multiple data marts “DW
Bus”
Data Access Tools Focus on the needs and requirements for the end customers /
consumers of the data.
These needs translate into selection of criteria from a broad range of
data access tools to the right tools for the right task.
In CIF model, the access tools are outside DW architecture.
Table 9.5 Kimball’s DW Chess Pieces – Component Descriptions
• DW/BI Architecture and Components
9.2.2.2 Kimball’s Business Development Lifecycle and DW chess Pieces
• Tactical BI:
• BI tools to analysis trends by comparing metric to the same metric from a
previous month or year, etc. “used to support short-term business decisions”
• Strategic BI:
• Provide metrics to executives, with formal method of business performance
management, to help them determine if the corporation is on target for
meeting its goals. “used to support long-term corporate goals and
objectives”
• Operational BI:
• Provide BI to the front lines of business, used to manage and optimize
business operations. Coupling of BI applications with operational functions
and processes, with a requirement for very low tolerance for latency. ( Near
real-time data capture and data delivery ).
• Service-oriented architecture (SOA) necessary to support in this part.
9.2.3 Tactical, Strategic and Operational BI
• Three major types of Data Warehousing are described:
• 9.2.4.1 Active Data Warehousing
• 9.2.4.2 Multi-dimensional Analysis – OLAP
• 9.2.4.2 ROLAP, MOLAP, HOLAP and DOLAP
9.2.4 Types of Data Warehousing
• DW serve tactical and strategic existed for many years BI. Non-
volatile data.
• New architectural approaches are emerging to deal with inclusion of
volatile data.
• For example, of these applications: (ABM) automated banking machine data
provisioning. When making a banking transaction, historical balances and
new balances resulting from immediate banking actions, need to be
presented to the banking customer real-time.
• Two of the key design concepts required:
• Isolation of change
• Alternatives to batch ETL.
• Changes from new volatile data must be isolated from the bulk of
the historical, non-volatile DW data. “building partitions and using
union queries for different partitions, when necessary”.
• Trickle-feeds, pipelining, and SOA are alternatives used to batch
ETL.
• Shorter Latency requirements for data availability in the DW.
DW types
9.2.4.1 Active Data Warehousing
• OLAP “Online Analytical Processing”: (analytical Cube)
• An Approach to providing fast performance for multi-dimensional analytic
queries.
• Originated, in part to make distinction from OLTP “Online Transactional
Processing”
• The Typical Output queries are in Matrix format.
• The dimensions from the rows and columns of the matrix.
• The factors or measures are the values inside the matrix.
• Multi-dimensional analysis with cubes is useful to look at summaries of data.
• A common applications is financial analysis, where analysts want
repeatedly traverse known hierarchies to analyze data:
• data ( such as Year, Quarter, Month, Week, Day)
• Organization ( Such as Region, Country, Business Unit, Department)
• Product ( such as Product Category, Product Line, Product).
DW types
9.2.4.2 Multi-dimensional Analysis - OLAP
• Three classic implementation approaches support Online Analytical
processing. Related to DB implementation approach:
• Relational Online Analytical Processing (ROLAP): using techniques in the
two-dimensional tables of RDBMS. Star schema joins are a common DB
design.
• Multi-dimensional Online Analytical Processing (MOLAP): support OLAP by
using proprietary and specialized multi-dimensional DB technology.
• Hybrid Online Analytical processing (HOLAP): simply a combination of ROLAP
and MOLAP allow part of the data to be stored in MOLAP form and another
part of the data to be stored in ROLAP. “designer has to vary the mix of
partitioning”.
• Database Online Analytical Processing (DOLAP): A virtual OLAP cube is
available as a special proprietary function of classic relational DB.
DW types
9.2.4.3 ROLAP, MOLAP, HOLAP and DOLAP
9.2.5 Dimensional Data Modeling Concepts and Terminology
• Dimensional Data modeling is the preferred technique for
Designing data marts.
• Focus on making it simple for the end-user to understand and access
the data.
• This helps contribute to the fact that the majority of data mart
design work ends up being in ETL processing.
• It is subset of entity relationship data modeling. Has entities,
attributes, and relationships.
• The entity come in two types
• Facts: provide the measurement, Dimensions: provide the context
• Relationships are constrained to all go through the fact table, and all
dimension-to-fact relationships are one-to-many (1:M).
• Table 9.6 comparison the difference between relational modeling
for transactional applications VS those build with dimensional data
modeling for data marts.
9.2.5 Dimensional Data Modeling Concepts and Terminology
Dimensional Data Modeling
9.2.5.1 Fact Tables
• Represent and contain important business measures.
• Fact tables (entities) contain one or more facts (attributes representing
measures)
• The row of a fact table correspond to a particular measurement and are
numeric, such as amounts, quantities, or counts.
• Express or resolve “many-to-many” relationships between the
dimensions.
• Often have a number of control columns that express when the row
was loaded.
• These fields help the programmers, the operators and the super-
users navigate and validate the data.
Dimensional Data Modeling
9.2.5.2 Dimension Tables
• Represent the important objects of the business and contain textual
descriptions of the business.
• They act as the entry points or links into the fact tables, and their
contents provide report groupings and report labels.
• Typically have small number of rows and large number of columns.
Main contents of a dimension table are:
• Surrogate or non-surrogate key.
• The primary key representing what is used to link to other tables in the DW.
• Descriptive elements, including codes, descriptions, names, statuses, and so
on.
• Any hierarchy information, including multiple hierarchies and often ‘type’
breakdown.
• The business key that the business user uses to identify a unique row.
• The source system key identification fields for traceability.
• Control fields geared to the type of dimension history capture, Type 1-3, 4
and 6
• Must have unique identifiers for each row. “surrogate and natural”
Dimension Tables
9.2.5.2.1 Surrogate Keys
• “surrogate key” or “anonymous key” is single primary key,
populated by a number unrelated to the actual data.
• Can be either a sequential number, or truly number.
• The advantages of using surrogate keys include:
• Performance: number fields search faster than other types of fields.
• Isolation: it is buffer from business key field changes. May not need changing
if a field type or length changes on the source system.
• Integration: Enable combinations of data from different sources. Usually do
have the structure as other system when trying to identify it.
• Enhancement: values, such as ‘Unknown’ or ‘Not applicable’, have their own
specific key value in addition to all of the keys for valid rows.
• Interoperability: data access libraries, and GUI functions work better with
surrogate key, because they don’t need additional knowledge about the
underlying system to function properly.
• Versioning: Enable multiple instances of the same dimension value, which is
necessary for tracking changes over time.
• De-bugging: supports load issue analysis, and re-run capability.
Dimension Tables
9.2.5.2.2 Natural Keys
• Used for system that not preferred to create additional key fields to
identify unique rows by joining multiple fields in each query.
• Business driven
• The advantages of using a natural keys are:
• Lower overhead: The key fields are already present, not requiring any
additional modeling to create or processing to populate.
• Ease of change: In RDBMS where the concept of a domain exits, it is easy to
make global changes due to changes on the source system.
• Performance advantage: Using the values in the unique keys may eliminate
some joins entirely, improving performance.
• Data lineage: Easier to track across systems, especially where the data travels
through more than two systems.
Dimensional Data Modeling
9.2.5.3 Dimension Attribute Types
• The three main types of dimension attributes:
• Type 1
• Type 2 (and 2a)
• Type 3
• They differentiated by the need to retain historical copies.
• There are two other types that do not appear very often:
• Type 4
• Type 6 (1+2+3)
• Type 1 through 3 can co-exist within the same table, and actions
during update depend on which fields with which types are having
updates applied.
Dimension Attributes Types
9.2.5.3.1 Type 1 Overwrite
• Have no need for any historical records at all.
• The only interest is in the current value, so any updates completely
overwrite the prior value in the field in that row.
• Example “hair color”. When an update occurs, there is no need to
retain the current values.
Dimension Attributes Types
9.2.5.3.2 Type 2 New Row
• Need all historical records.
• Any new changes with type2 fields, a new row with the current
information is appended to the table.
• The pervious current row’s expired data field is updated to expire it.
• Example: when the billing address changes, the row with the old address
expires and a new row with the current billing address is appended.
• The table’s key should handle multiple instances of the same natural
key, either through the use of surrogate keys by:
• Adding an index value to the primary key
• Adding of date value ( effective, expiration, insert, and so on) to the primary
key.
Dimension Attributes Types
9.2.5.3.3 Type 3 New Column
• Multiple fields in the same row contain the historical values.
• Need only a selected, known portion of history.
• When an update occurs, the current value is moved to the next
appropriate field, and the last, no longer necessary, value drops off.
• Example is credit score, where only the original score when the account
opened, the most current score, and the immediate prior score are valuable.
An update would move the current score to the prior score.
• Example: monthly bill totals “12 fields”, named Month01, Month02..etc. or
January, February, etc.
• One useful purpose of Type 3: attribute value migrations.
• Example: a company decides to reorganize its product hierarchy but wants to
see sales figures for both the old hierarchy and the new for a year, to make
sure that all sales are being recorded appropriately.
Dimension Attributes Types
9.2.5.3.4 Type 4 New Table
• Initiate a move of the expired row into ‘history 'table, and the row
in the ‘current’ table is updated with the current information.
• Example: would be a supplier table, where expired supplier rows roll off into
the history table after an update, so the main dimension tables only contains
current supplier rows. “The latter is sometime called a Type 2a dimension”
• Retrievals involving timelines are more complex in a Type 4 design.
• Since current and history tables need to be joined before joining
with the fact table. Therefore, it is optimal when the vast majority of
access uses current dimension data, and the historical table is
maintained more for audit purposes than for active retrievals.
Dimension Attributes Types
9.2.5.3.5 Type 6 1+2+3
• The same as Type 2 “new row”, where any change to any value
creates a “new row”, but the key value ( surrogate or natural) does
not change.
• Two way to implement type 6:
• Add three fields to each row “effective date, expiration date, and current row
indicator”
• Add index field : updated row get the index value of zero, and all rows add 1
to their index values to move them down the line
• “the current row indicator”:
• queries looking for data as of any particular point in time check to see if the
desired data is between the effective and end dates. “drawback of requiring
additional knowledge to create queries that correctly ask for the proper row
by period value or indicator”.
• “index field”:
• . Queries looking for the current values would set the filter for index value
equal to zero and looking for prior times would still use the effective and
expiration dates. “drawback: all fact rows will link automatically to the index
version0”joining to the fact table will not find any prior values of the
dimension unless the dimensional effective and expiration dates are included
in the query.
Dimensional Data Modeling
9.2.5.4 Star Schema
• Is the presentation of dimensional data model with a single fact
table in the center connecting to number of surrounding dimension
tables.
• Also referred to as a star join schema, joins from the central fact
table are via single primary keys to each of the surrounding
dimension table. “the central fact table has a compound key
composed of the dimension keys”.
• Figure 9.4: Example of Star Schema
Dimensional Data Modeling
9.2.5.4 Star Schema
Dimensional Data Modeling
9.2.5.5 Snowflaking
• Is the term given to de-normalizing the flat, single-table,
dimensional structure in star schema into component hierarchical or
network structures.
• Kimball’s design methods discourage snowflaking on two main
principles:
• It dilutes the simplicity and end-user understandability of the star schema.
• The space savings are typically minimal.
• Three types of snowflake tables are recognized:
• Snowflake tables: Formed when a hierarchy is resolved into level tables.
• Outrigger tables: Formed when attributes in one dimension table links to
rows in another dimension table.
• Bride tables: formed in two situations. The first is when a many-to-many
relationship between two dimensions that is not or cannot be resolved
through a fact table relationship.
Dimensional Data Modeling
9.2.5.6 Grain
• Grain stands for the meaning or description of single row of data in a
fact table.
• Refers to the atomic level of the data for a transaction.
• Defining the grain of a fact table is one of the key steps in Kimball’s
dimensional design method.
• For Example: if the fact table has data for a store for all transactions
for a month, we know the grain or limits of the data in the fact table
will not include data for last years.
Dimensional Data Modeling
9.2.5.7 Conformed Dimensions
• The common or shared dimensions across multiple data marts in
Kimball’s design method.
• The practical importance is that the row headers from any answers
sets from conformed dimensions must be able to match exactly.
• Example: think of multiple data marts or fact tables, all linking directly to the
same dimension table, or a direct copy of that dimension table. Updates to
that dimension table automatically show in all queries for those data marts.
• Reuse of conformed dimensions in other star schemas allow for
modular development of the DW.
• Ultimately, queries walk across subject areas to unify data access to
the DW across the entire enterprise.
Dimensional Data Modeling
9.2.5.8 Conformed Facts
• Use standardized definitions of terms across individuals' marts.
• Different business users may use the same term in different ways.
• Does “Customer additions” refer to “gross additions” or “adjusted additions”.
• Does “Orders processed” refer to the entire order, or the sum of individual
line items.
• Developers need to be keenly aware of things that may be called the
same but are different concepts across organizations.
Dimensional Data Modeling
9.2.5.9 DW-Bus Architecture and Bus Matrix
• The DW-bus architecture of conformed dimensions is “ what allows
multiple data marts to co-exit and share by plugging into a bus of
shared or conformed dimensions.
• The DW-bus matrix is tabular way of showing the intersection of
data marts, data processes, or data subject areas with the shared
conformed dimensions. “Table 9.7”
• Very effective communication and planning tool.
• As new design pieces are added, the existing dimensions and facts,
complete with their sources, update logic, and schedule, need to be
reviewed for possible re-use.
Dimensional Data Modeling
9.2.5.9 DW-Bus Architecture and Bus Matrix
9.3 DW-BIM Activities
• DW “Data content”: concerned primarily with the part of the DW-
BIM lifecycle from data source to a common data store across all
relevant departments.
• BIM “Data presentation”: concerned with the portion of lifecycle
from common data store to targeted audience.
• BIM capability is directly dependent upon the provision of data from
DW that is Timely, relevant, integrated, and has other quality factors
controlled for and documented as required.
• DW-BIM activities overlap with many of the data management
functions.
9.3 DW-BIM Activities
• DW “Data content”: concerned primarily with the part of the DW-
BIM lifecycle from data source to a common data store across all
relevant departments.
• BIM “Data presentation”: concerned with the portion of lifecycle
from common data store to targeted audience.
• BIM capability is directly dependent upon the provision of data from
DW that is Timely, relevant, integrated, and has other quality factors
controlled for and documented as required.
• DW-BIM activities overlap with many of the data management
functions.
DW-BIM Activities
9.3.1 Understand Business intelligence information Needs
• Beginning with keep a consistence focus on business value of
organization. “Value chain”
• In contrast, of Operational systems. DW-BIM project gathering
requirements specific details of operations and reports.
• DW-BIM analysis project, is ad-hoc and involves asking questions
“Slice and Dice the data”.
• Identify and scope the business area, though interviews and ask
people.
• Capturing the actual business vocabulary and technology is a key to
success.
• Document the business context, then explore the details of the
actual source data. “67% of DW-BIM projects are ETL portion”.
DW-BIM Activities
9.3.1 Understand Business intelligence information Needs
• Poor DW functionality is the first and apparent poor-quality data.
Collaboration with the data governance function is critical.
• Creating an executive summary of the identified business
intelligence needs is best practice.
• When starting a DW-BIM program, use a simple assessment of
business impact and technical feasibility. Three critical factors to
assessment:
• Business Sponsorship: through identified and engaged steering committee
• Business Goals and Scope: is there a clearly identified business need,
purpose, and scope for the effort?
• Business Resources: Commitment by business management to the
availability and engagement of the appropriate Business SME.
DW-BIM Activities
9.3.2 Define and Maintain the DW-BI Architecture
• Roles required to identified Successful DW-BIM architecture are:
• Technical Architect: Hardware, OS, DB, and DW-BIM architecture.
• Data Architect: Data analysis, system of record, data modeling, and data
mapping.
• ETL Architect / Design Lead : Staging and transform, data marts, and schedules
• Meta-data Specialist: Meta-data interfaces, meta-data architecture, and
contents.
• BI Application Architect / Design Lead: BI tool interfaces and report design,
meta-data delivery, data and report navigation, and
• DW-BIM needs to leverage many of the disciplines and components
of a company’s IT department from perspectives of:
• Business process
• Architecture
• Technology standards, including servers, DBs, Security.. ect.
• Availability and timing needs are key drivers in developing the DW-
BIM architecture. “technical requirements”
DW-BIM Activities
9.3.2 Define and Maintain the DW-BI Architecture
• The design decisions and principles for what data detail the DW
contains is a key design priority for DW-BIM architecture.
• BW-BIM architecture integrate with the overall corporate reporting
architecture. Focus on defining appropriate (SLAs).
• Another success factor is to identify a plan for data re-use, sharing,
and extension.
• Finally, no DW-BIM effort can be successful without business
acceptance of data. Consider, up-front, a few critical important
architectural sub-component, along with their supporting activities:
• Data quality feedback loop: how easy is the integration of needed changes
into operational system?
• End to-end meta-data: integrated end-to –end flow of meta data and easy to
access
• End-to-end verifiable data lineage: To use modern, popular, TV parlance, is
the evidence chain-of-custody for all DW-BIM data readily verifiable? Is a
system of record for all data identified?
DW-BIM Activities
9.3.3 Implement Data Warehouses and Data Marts
• DW and data marts are the two major classes of formal data stores
in DM-BIM landscape.
• DW is relational DB design with normalization techniques. “integrate
data from multiple source systems, and serve data to multiple data
marts”
• The primary purpose of data marts is to provide for analysis to
knowledge workers.
• DW and Data marts design (Covey’s Seven habits):
• Identify the business problem to solve.
• Identify the details and what would be used ( end solution piece of software
and associated data marts).
• Continue to work back into the integrated data required ( the DW).
• Ultimately, all the way back to the data sources.
DW-BIM Activities
9.3.4 Implement BI Tools and User Interfaces
• The purpose of this section is to introduce the types of tools
available in BI marketplace and review their characteristics.
• Implementing the right BI tools or User (UI) is about identifying the
right tools for the right user set.
• Almost all BI tools also come with their own meta-data repositories to mange
their internal data maps and statistics.
• Some vendors make these repositories open to the end user, other allow
business meta-data to be entered.
• Enterprise meta-data repositories must link and coped these repositories to
get complete view of reporting and analysis activity.
BI Tools & UI
9.3.4.1 Query and Reporting Tools
• Query and Reporting is the process of querying a data source, then
formatting it to create a “report”, either a production style report
such as an invoice, or a management report.
• The needs within business operations reporting are often different
from the needs within business query and reporting.
• Table 9.8 help distinguish business operations-style reports from
business query and reporting.
• Figure 9.5 relates the classes of BI tools to the respective classes of
BI users for those tools.
• Different Users may use different and crossed Queries and reporting
in BI tools.
BI Tools & UI
9.3.4.1 Query and Reporting Tools
BI Tools & UI
9.3.4.1 Query and Reporting Tools
BI Tools & UI
9.3.4.2 OLAP Tools
• Covers OLAP tools, that provide arrangement of data into OPLAP
cubes for fast analysis.
• Cubes in BI tools are generated from star ( or snowflake) DB schema.
• The OLAP cubes consists of measures “numeric facts” from fact
tables.
• The value of OLAP tools and cubes is reduction of the chance of
confusion and erroneous interpretation, by aligning the data content
with the analyst’s mental model.
• Common OLAP operations include:
• Slice : subset of multi-dimensional array corresponding to a single value for
one or more members of the dimensions not in the subset.
• Dice: it is “slice” on more than two dimensions of data cube, or more than
two consecutive slices.
• Drill Down / Up: specific analytical technique whereby the user navigates
among levels of data.
• Roll-up : involves computing all of the data relationships for one or more
dimensions. To do this, define formula.
• Pivot: To change the dimensional orientation of report or page display.
BI Tools & UI
9.3.4.3 Analytic Applications
• include the logic and process to extract data from well-known source
systems, such as vendor ERP systems, data model for the data mart,
and pre-built reports and dashboards.
• Different types include customer, financial, supply chain,
manufacturing, and HR applications.
• Different approaches for Analytic applications from “buy” and
“Build” perspectives.
• Some Key questions for evaluation of analytic applications are:
1. Do we have the standard source systems for which ETL is supplied? If yes,
how much have we modified it? Less modification equals more value and
better fit.
2. How many other source systems do we need to integrate? The fewer the
sources, the better the value and fit.
3. How much do the canned industry queries, reports, and dashboards match
our business ? Involve your business and customers and let them answer
that
4. How much of analytic application’s infrastructure matches your existing
infrastructure ? The better the match, the better value and fit.
BI Tools & UI
9.3.4.4 Implementing Management Dashboards and Scorecards
• Both are ways of efficiently presenting performance information.
• Dashboards are oriented more toward dynamic presentation of operational
information.
• Scorecards are more static representations of longer-term organizational,
tactical, or strategic goals.
• Scorecards are divided into 4 views: Finance, Customer,
Environment, and Employees.
• Each have number of metrics that are reported and
trended to various targets set by senior executives.
• Example of the way various BI Techniques combine to
create BI environment presented in Wayne Eckerson book
“on Performance Dashboards”. “Figure 9.6”
BI Tools & UI
9.3.4.4 Implementing Management Dashboards and Scorecards
BI Tools & UI
9.3.4.5 Performance Management Tools
• Include budgeting, planning, and financial consolidation.
• There have been several major acquisitions in this segment:
• On customer buying side, the degree to which customer buy BI and
Performance management from the same vendor:
• Depend on product capabilities.
• The degree to which the CFO and CIO co-operate.
• It is important to note that budgeting and planning does not apply
only to financial metrics, but to workforce, capital, and so on, as well.
BI Tools & UI
9.3.4.6 Predictive Analytics and Data Mining Tools
• Data Mining: type of analysis reveals patterns in data using various
algorithms. Help users discover relationships or show patterns in
more exploratory fashion.
• Predictive analytics “what-if analysis” allow users to create a model,
test the model based on actual data, and then project future results.
Underlying engines my be neural networks or inference.
• Using data mining in predictive analysis, fraud detection, root cause
analysis, customer segmentation and scoring..etc.
• Good strategy for interfacing with many data mining tools is to workt
with the business analysts to define data set needed for analysis, and
then arrange for periodic file extract.
• This strategy is intense multi-pass
• Data mining from the DW
• Data mining tools work with file-based input
BI Tools & UI
9.3.4.7 Advanced Visualization and Discovery Tools
• Use an in-memory architecture to allow users to interact with the
data in a highly visual, interactive way.
• Patterns in a large dataset can be difficult to recognize in a number
display.
• A pattern can be picked up visually fairly quickly, when thousands of
data points are loaded into a sophisticated display on a single page
of display.
• The difference in these tools versus most dashboard products is:
1. The degree of sophisticated analysis and visualization types such as small
multiplies, spark lines, heat maps, histograms, waterfall charts, bullet graph
2. Adherence to best practices according to the visualization community.
3. The degree of interactivity and visual discovery versus creating a chart on a
tabular data display.
9.3.5 Process Data for BI
• The most and big part of any DW-BIM effort is the preparation and
processing of the data.
• This section introduces some of the architectural components and
sub-activities involved in processing data for BI:
• Staging Areas
• Mapping sources and Targets
• Data Cleansing and Transformation ( Data Acquisition)
Process Data for BI
9.3.5.1 Staging Areas
• Is the intermediate data store between an original data source and
the centralized data repository.
• All required cleansing, transformation, reconciliation and
relationships happen in this area.
• Divide the work reduce the overall complexity and make debugging
much simpler.
• A change-capture mechanism reduce the volume of transmitted
data sets. Several months to a few years of data can be stored in
this initial staging area. Benefits of this approach include:
• Improving performance on the source system by allowing limited history to
be stored there.
• Pro-active capture of a full set of data, allowing for future needs.
• Minimizing the time and performance impact on the source system by
having a single extract.
• Pro-active creation of a data store that is not subject to transactional system
limitations.
Process Data for BI
9.3.5.2 Mapping Sources and Targets
• Source-to-target mapping is the documentation activity that
defines data type details and transformation rules for all required
entities and data elements, and from each individual source to each
individual target.
• Determine valid links between data elements in multiple
equivalent systems consider the most difficult part.
• A solid taxonomy is necessary to match data elements in different
systems into a consistent structure in the EDW.
• Gold sources or system of record source or sources must be signed
off by the business.
Process Data for BI
9.3.5.3 Data Cleansing and Transformation ( Data Acquisition)
• Data cleansing: activities that correct and enhance the domain
values of individual data elements, including enforcement of
standards.
• Necessary for initial loads where significant history is involved.
• Strategy preferred is to push data cleansing activities to source system.
• Strategies must be developed:
• For the rows of data that are loaded but found to be incorrect.
• Deleting old records may cause some havoc with related tables and
surrogate keys
• by expiring a row, New row may be better option
• Data transformation: Activities that provide organizational context
between data elements, entities, and subject area.
• Organizational context include: cross-referencing, reference and master data
management, complete and correct relationships.
• Essential component of being able to integrated data from multiple sources.
• Required extensive involvement with Data Governance.
9.3.6 Monitor and Tune DW Processes
• Transparency and visibility are the key principles drive DW-BIM
monitoring.
• Providing dashboards and drill-down activities is the best practice.
• The addition of data quality measures will enhance the value of performance
is more than just speed and timing.
• Processing should be monitored across the system for bottlenecks
and dependencies among processes.
• DB tuning techniques such as partitioning, tune backup and recovery
strategies.
• Users often consider DW as active archive due to the long histories.
• Management by exception is great policy to apply here.
• Sending attention messages upon failure is a prudent addition to monitoring
dashboard.
9.3.6 Monitor and Tune BI Activity and Performance
• A best practice for BI monitoring and tuning is define and display a
set of customer facing satisfaction metrics. Example of metrics:
• Average query response time.
• The number of users per day/ week/ month.
• The statistical measures available from the system.
• Regular review of usage statistics and patterns is essential.
• Report providing frequency
• Resource usage of data
• Queries
• Report allow for prudent enhancement.
• Tuning BI activities is analogous to principle of profiling applications
in order to know where the bottlenecks are and where to apply
optimization efforts.
• Creating indexes and aggregations
• Simple solutions such as report that positing a daily results.

More Related Content

Similar to chapter9-220725121547-5ed13e4d.pdf

Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseRussel Chowdhury
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
Enhance ServiceNow with Automated Discovery for Mainframe and IBM i
Enhance ServiceNow with Automated Discovery for Mainframe and IBM iEnhance ServiceNow with Automated Discovery for Mainframe and IBM i
Enhance ServiceNow with Automated Discovery for Mainframe and IBM iPrecisely
 
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)Marc Nehme
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OnePanchaleswar Nayak
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationVishal Kumar
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)tafosepsdfasg
 
Business Intelligence 102 for Real Estate Webinar
Business Intelligence 102 for Real Estate WebinarBusiness Intelligence 102 for Real Estate Webinar
Business Intelligence 102 for Real Estate Webinarjsthomp1
 
Webinar:Building an Agile Enterprise with Business Activity Monitor
Webinar:Building an Agile Enterprise with Business Activity Monitor Webinar:Building an Agile Enterprise with Business Activity Monitor
Webinar:Building an Agile Enterprise with Business Activity Monitor WSO2
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMark Schoeppel
 
Business Intelligence 102 for Real Estate
Business Intelligence 102 for Real EstateBusiness Intelligence 102 for Real Estate
Business Intelligence 102 for Real Estatedailena
 
Agile Data Architecture
Agile Data ArchitectureAgile Data Architecture
Agile Data ArchitectureCprime
 

Similar to chapter9-220725121547-5ed13e4d.pdf (20)

Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
Enhance ServiceNow with Automated Discovery for Mainframe and IBM i
Enhance ServiceNow with Automated Discovery for Mainframe and IBM iEnhance ServiceNow with Automated Discovery for Mainframe and IBM i
Enhance ServiceNow with Automated Discovery for Mainframe and IBM i
 
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
Innovate 2014 - Customizing Your Rational Insight Deployment (workshop)
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 
Agile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data PresentationAgile Data Warehouse Design for Big Data Presentation
Agile Data Warehouse Design for Big Data Presentation
 
2. data warehouse 2nd unit
2. data warehouse 2nd unit2. data warehouse 2nd unit
2. data warehouse 2nd unit
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
Business Intelligence 102 for Real Estate Webinar
Business Intelligence 102 for Real Estate WebinarBusiness Intelligence 102 for Real Estate Webinar
Business Intelligence 102 for Real Estate Webinar
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Webinar:Building an Agile Enterprise with Business Activity Monitor
Webinar:Building an Agile Enterprise with Business Activity Monitor Webinar:Building an Agile Enterprise with Business Activity Monitor
Webinar:Building an Agile Enterprise with Business Activity Monitor
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Learn Your Project Vocabulary
Learn Your Project VocabularyLearn Your Project Vocabulary
Learn Your Project Vocabulary
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
AhmedWasfi2015
AhmedWasfi2015AhmedWasfi2015
AhmedWasfi2015
 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large Enterprises
 
Business Intelligence 102 for Real Estate
Business Intelligence 102 for Real EstateBusiness Intelligence 102 for Real Estate
Business Intelligence 102 for Real Estate
 
Agile Data Architecture
Agile Data ArchitectureAgile Data Architecture
Agile Data Architecture
 

More from MahmoudSOLIMAN380726

More from MahmoudSOLIMAN380726 (12)

6 to 8 year roadmap.pdf
6 to 8 year roadmap.pdf6 to 8 year roadmap.pdf
6 to 8 year roadmap.pdf
 
chapter12-220725121546-610a1427.pdf
chapter12-220725121546-610a1427.pdfchapter12-220725121546-610a1427.pdf
chapter12-220725121546-610a1427.pdf
 
chapter11-220725121546-671fc36c.pdf
chapter11-220725121546-671fc36c.pdfchapter11-220725121546-671fc36c.pdf
chapter11-220725121546-671fc36c.pdf
 
chapter10-220725121546-5c59bc1a.pdf
chapter10-220725121546-5c59bc1a.pdfchapter10-220725121546-5c59bc1a.pdf
chapter10-220725121546-5c59bc1a.pdf
 
chapter8-220725121547-f85998bb.pdf
chapter8-220725121547-f85998bb.pdfchapter8-220725121547-f85998bb.pdf
chapter8-220725121547-f85998bb.pdf
 
chapter7-220725121544-6a1c05a5.pdf
chapter7-220725121544-6a1c05a5.pdfchapter7-220725121544-6a1c05a5.pdf
chapter7-220725121544-6a1c05a5.pdf
 
chapter5-220725172250-dc425eb2.pdf
chapter5-220725172250-dc425eb2.pdfchapter5-220725172250-dc425eb2.pdf
chapter5-220725172250-dc425eb2.pdf
 
chapter3-220725142737-bf613658.pdf
chapter3-220725142737-bf613658.pdfchapter3-220725142737-bf613658.pdf
chapter3-220725142737-bf613658.pdf
 
chapter4-220725121544-5ef6271b.pdf
chapter4-220725121544-5ef6271b.pdfchapter4-220725121544-5ef6271b.pdf
chapter4-220725121544-5ef6271b.pdf
 
chapter1-220725121543-7c158b33.pdf
chapter1-220725121543-7c158b33.pdfchapter1-220725121543-7c158b33.pdf
chapter1-220725121543-7c158b33.pdf
 
chapter2-220725121543-2788abac.pdf
chapter2-220725121543-2788abac.pdfchapter2-220725121543-2788abac.pdf
chapter2-220725121543-2788abac.pdf
 
Data Governance Process.pdf
Data Governance Process.pdfData Governance Process.pdf
Data Governance Process.pdf
 

Recently uploaded

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 

Recently uploaded (20)

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 

chapter9-220725121547-5ed13e4d.pdf

  • 1. Data Warehousing and Business Intelligence Management Ahmed Alorage
  • 2. Objectives: • 9.1 Introduction • 9.2 Concepts and Activities • 9.2.1 Data Warehousing- A Brief Retrospective and Historical Tour • 9.2.1.1 Classic Characteristics of a Data Warehouse- Inmon Version • 9.2.1.2 Classic characteristics of a Data Warehouse – Kimball Version • 9.2.2 DW/ BI Architecture and Components • 9.2.2.1 Inmon’s Corporate Information Factory • 9.2.2.2 Imball’s Business Development Lifecycle and DW Chess Pieces • 9.2.3 Tactical, Strategic and Operational BI • 9.2.4 Types of Data Warehousing • 9.2.4.1 Active Data Warehousing • 9.2.4.2 Multi-dimensional Analysis – OLAP • 9.2.4.3 ROLAP, MOLAP, HOLAP and DOLAP • 9.2.5 Dimensional Data Modeling Concepts and Terminology • 9.2.5.1 Fact Tables • 9.2.5.2 Dimension Tables (9.2.5.2.1 Surrogate Keys, 9.2.5.2.2 Natural Keys) • 9.2.5.3 Dimension Attribute Types (9.2.5.3.1 Type 1 Overwrite, 9.2.5.3.2 Type 2 New Row, 9.2.5.3.3 Type 3 New Column, 9.2.5.3.4 Type 4 New Table , 9.2.5.3.5 Type 6 1+2+3) • 9.2.5.4 Star Schema • 9.2.5.5 Snowflaking • 9.2.5.6 Grain • 9.2.5.7 Conformed Dimensions • 9.2.5.8 Conformed Facts • 9.2.5.9 DW-Bus Architecture and Bus Matrix
  • 3. Objectives: • 9.3 DW-BIM Activities • 9.3.1 Understand Business Intelligence Information Needs • 9.3.2 Define and Maintain the DW-BI Architecture • 9.3.3 Implement Data Warehouses and Data Marts • 9.3.4 Implement Business Intelligence Tools and User Interfaces • 9.3.4.1 Query and Reporting Tools • 9.3.4.2 On Line Analytical Processing (OLAP) Tools • 9.3.4.3 Analytic Applications • 9.3.4.4 Implementing Management Dashboards and Scorecards • 9.3.4.5 Performance Management Tools • 9.3.4.6 Predictive Analytics and Data Mining Tools • 9.3.4.7 Advanced Visualization and Discovery Tools • 9.3.5 Process Data for Business Intelligence • 9.3.5.1 Staging Areas • 9.3.5.2 Mapping Sources and Targets • 9.3.5.3 Data Cleansing and Transformations ( Data Acquisitions ) • 9.3.6 Monitor and Tune Data Warehousing Processes • 9.3.7 Monitor and Tune BI Activity and Performance
  • 4. 9 Data Warehousing and Business Intelligence Management • Data warehouse and business intelligence Management is the seventh Data Management Function in the Data Management framework in Chapter 1. • sixth data management function that interacts with and influenced by Data Governance function. • In this Chapter, we will define the Data warehousing and business intelligence Management Function and Explains the Concepts and Activities involved.
  • 5. 9.1 Introduction: • Data Warehouse (DW) is a combination of two primary components: • Integrated decision support database • Related software programs used to collect, cleanse, transform and store data from variety sources • Enterprise Data warehouse (EDW) is centralized data warehouse designed to service the BI needs of the entire organization. • Data warehousing: • term used to describe the operational extract, cleansing, transformation, and load processes – and associated control process- that maintain the data contained within DW. • Its process focuses on enabling an integrated and historical business context on operational data by enforcing business rules and maintaining appropriate business data relationships. • It is technology solution supporting BI
  • 6. 9.1 Introduction: • Business Intelligence (BI) is a set of business capabilities. Means many things, including: 1. Query, analysis, and reporting activity by knowledge workers to monitor and understand the financial operation health of, and make business decisions about, the enterprise. 2. Query, analysis, and reporting processes and procedures. 3. A synonym for the BI environment. 4. The market segment for BI software tools. 5. Strategic and operational analytics and reporting on corporate operational data to support business decisions, risk management, and compliance. 6. A synonym for Decision support System (DSS). • Data Warehousing and Business Intelligence Management (DW-BIM) is : • The collection, integration and presentation of data to knowledge workers for the purpose of business analysis and decision-making. • Composed of activities supporting all phases of the decision support life cycle providing context, moves and transforms data from sources to a common target data store, and then provide knowledge workers various means of access, manipulation, and reporting of the integrated data.
  • 7. 9.1 Introduction: • Objectives for DW-BIM include: • Providing integrated storage of required current and historical datal, organized by subject areas. • Ensuring credible, quality data for all appropriate access capabilities. • Ensuring a stable, high-performance, reliable environment for data acquisition, management, and access. • Providing an easy-to-use, flexible, and comprehensive data access environment. • Delivering both content and access to the content in increments appropriate to organization’s objectives. • Leveraging, rather than duplicating, relevant data management component functions such as Reference and Master Data Management, Data Governance (DG), Data Quality (DQ), and Meta- data (MD). • Providing an enterprise focal point for data delivery in support of the decisions, policies, procedures, definitions, and standards that arise from DG. • Defining, building, and supporting all data stores, processes, infrastructure, and tools that contain integrated, post-transactional, and refined data used for information viewing, analysis, or data request fulfillment. • Integrating newly discovered data as a result of BI processes into the DW for further analytics and BI use.
  • 8.
  • 9. 9.2 Concepts and Activities • This section provide the purposes: • The history of DW-BIM and overview of typical DW-BIM components. • Explanation of some general of DW-BIM terminology follows. • Brief introduction and overviews of dimensional modeling and its terminology leads into the activities identified in Figure 9.1
  • 10. 9.2.1 Data Warehousing – a brief Retrospective and Historical Tour • Tow name have a significant contributions of advance and shape of the practice of data warehousing: • Bill Inmon • Ralph Kimball • In this section, a brief introduction of their major contributions along with some comparisons and contrasts of their approaches.
  • 11. Historical Tour 9.2.1.1 Inmon Version – classic characteristics of a Data Warehouse • In the early 1990s, bill Inmon defined DW as “Subject-oriented, integrated, time variant, and non-volatile collection of summary and detailed historical data used to support the strategic decision-making processes for the corporation”. • These key characteristics give a clear distinction of nature of DW compared to typical operational system. • Subject Oriented: design the DW to meet the data needs of the corporation. • Integrated: concentrate about the data stored in DW. “key structure, encoding, decoding of structure, definitions of the data, and so on” • Time Variant: refers to how every record in DW is accurate in moment in time. • Non-Volatile: data warehouse to the fact that updates to records during normal processing do not occur, and if the update occur at all, they occur on an exception basis. • Summarized and Detail Data: Data in WH must main detailed data. • Historical: Data WH containing a vast amount of historical data ( 5 to 10 years worth of data)
  • 12. Historical Tour 9.2.1.2 Kimball Version – classic characteristics of a Data Warehouse • Ralph Kimball took a different approach, defining Data WH simply: • A copy of transaction data specifically structured for query and analysis. • Has different structure than operational system (the dimensional data model) • Data WHs always contain more than just transactional data. • Reference data is necessary to give context to the transactions. • Dimensional data models are relational data models. • They just do not consistently comply with normalization rules. • Reflect business processes more simply than normalized models.
  • 13. 9.2.2 DW / BI Architecture and Components • Introduces the major components fount in most DW/BI environments, through overview from both • Inmon and Kimball perspectives • Inmon’s approach: • The Corporate Information Factory • Kimball’s approach: • The “DW Chess Pieces” • Both views and their components are described and contrasted.
  • 14. DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory • CIF (Corporate Information Factory) is Corporate Data Architecture for DW-BIM: • Identified and Wrote by “Claudia Imhoff and Ryan Souse” • Figure 9.2: Show the components of CIF • Table 9.1 : Lists and describes the basic components of the Corporate Information Factory view of DW/BI architecture. • Table 9.2 : Provide context for the reporting context for the reporting scope and purpose of each of the Corporate Information Factory components and some explanatory notes. • Table 9.3 Provide a compare-and-contrast from a business and application perspective between the four major components of the CIF such as between the applications, ODS, DW and Data Marts. • Table 9.4 Provide a compare-and-contrast from a data perspective between the four major components of the CIF such as between the applications, ODS, DW and Data Marts.
  • 15. DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory
  • 16. DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory Label – Name Description Raw Detailed data Operational/ Transactional Application data of the enterprise. Provide the source data to be integrated into ODS and DW. Can also be in DB or other storage or file format. Integration and Transformation This Layer of the architecture is where the un-integrated data from various application sources stores is combined/integrated and transform into the corporate representation in the DW Reference Data Was A precursor to what is currently referred to as MDM. The purpose was to allow common storage and access for important and frequently used common data. Focus and shared understanding on data upstream of the DW simplifies the integration task in the DW Historical Reference Data When current value reference data is necessary for transactional applications, at the same it is critical to have accurate integration and presentation of historical data Table 9.1 Corporate Information Factory Component Descriptions
  • 17. DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory Label – Name Description Operational Data Store (ODS) The main distinguishing data characteristics of ODS compared to DW include current-valued vs DW historical data and volatile vs. DW non-volatile data. Operational Data Mart (Oper-Mart) Data mart focuses on tactical decision support. Distinguishing characteristics include current-valued vs DW historical data, tactical vs. DW strategic analysis, and sourcing of data from ODS rather than just the DW. Data Warehouse (DW) Large, comprehensive corporate resource. Primary purpose is to provide a single integration point for corporate data in order to serve management decision, and strategic analysis and planning. The flow In & Out into DW in one direction only. Data that needs correction is rejected, corrected at its source, and re-fed through the system. Data Marts (DM) Its purpose is to provide for DSS/information processing and access that is customized and tailored for the needs of a particular department or common analytic need. Table 9.1 Corporate Information Factory Component Descriptions
  • 18. DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory Component Reporting Scope / Purpose Notes Applications Isolated Operational Report Limited to data within one application instance ODS Integrated Operational Reports Reports requiring data from multiple source systems. DW Exploratory Analysis The complete set of corporate data allows for discovery of new relationships & information. Many BI data mining tools work with flat-file extracts from the DW, which can also offload the processing burden from the DW, Oper-Mart Tactical Analytics Analytic reporting based on current-values with a tactical focus. Data Marts Analytics – classical management decision support, and strategic analytics “departmental analysis”, such as political and funding expediency. Later work expanded concepts to common-analytic needs crossing departmental boundaries. Table 9.2 CIF Reporting Scope and Purpose
  • 19. • About the Table 9.3. • Note the following general observations about the contrast between the information on the right-hand side for DW and Data Marts, compared to the left-hand side for applications, in particular: • The purpose shifts from execution to analysis • End users are typically decision makers instead of oders (front line workers) • System usage is more ad hoc than the fixed operations of the transactional operations. • Response time requirements are relaxed because strategic decisions allow more time than daily operations. • Much more data is involved in each operation / query or process. DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory
  • 20. • DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory Application Data ODS DW Data Mart Business Purpose Specific Business Function Corp Integrated Operational Needs Central Data Repository Integration and Reuse Analysis: Departmental (Inmon) Business Process (Kimball) Business Measures (Wells) System Orientation Operations (Execution) Operations (Reports) Infrastructure Informational Analytic (DSS) Target Users End Users: Clerical (Daily Operations) Line Managers: Tactical Decision Makers Systems: Data Marts, Data Mining Executives: Performance/Metrics Sr. & Mid Mgrs Knowledge Workers How system is used Fixed Ops Operational Reporting Stage, Store, Feed Ad-Hoc Table 9.3 CIF Components – Business / Application View
  • 21. • DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory Application Data ODS DW Data Mart System Availability Fixed Ops Medium Varies Relaxed Typical Response Time Seconds Seconds to Minutes Longer (Batch) Seconds to Hours # Records in an Op. Limited Small to Med. Large Large Amount of Data Per Process Small Medium Large Large SDLC Classic Classic Classic Modified Table 9.3 CIF Components – Business / Application View
  • 22. • DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory • Table 9.4: • considers a compare-and-contrast from data perspective between the four components “application, ODS, DW, and Data Marts”. • The majority of DW processes are for higher latency and, often over-night batch processing.
  • 23. • DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory Application ODS DW Data Mart Orientation Functional Subject Subject Limited Subject View Application Corporate (Ops) Corporate (Historical) Focused Analysis Integration Not Integrated- Application Specific Integrated Corporate Data Integrated Corporate Data Integrated Subset Volatility CRUD Volatile Non-Volatile Non-Volatile Time Current Only Current Value Time Variant Time Variant Detail Level Detail Only Detail Only Detail+Summary Detail+ Summary Amount of History* 30 to 180 Days 30 to 180 Days 5-10 years 1-5 years Latency* Real Time to NRT NRT > 24 hours 1 day to 1 month Normalized? Yes Yes Yes No Modeling Relational Relational Relational Dimensional Table 9.3 CIF Components – Business / Application View
  • 24. • In Table 9.4, comparisons between DW and Data Marts, and application, in particular: • Data is subject vs. functional orientation • Integrated data vs. stove-piped or siloed • Data is time-variant vs. current-valued only. • Higher latency in the data. • Significantly more history is available. • DW/BI Architecture and Components 9.2.2.1 Inmon’s Corporate Information Factory
  • 25. • Called “Business Dimensional Lifecycle” approach but referred to as “Kimball approach”. • His Design Tip #49 “We chose the Business Dimensional Lifecycle label instead, because it reinforced our core tents about successful data warehousing based on our collective experiences since the mid-1980s” • The basis of this approach is three tenets' “principles”: • Business Focus: both immediate business requirements and more long-term broad data integration and consistency. • Atomic Dimensional Data Models: both for ease of business user understanding and query performance. • Iterative Evolution Management: Manage changes and enhancements to the DW as individual, finite projects. • Using conformed dimensions and facts design. “business rules parts of DW become re-usable components that are already integrated”. • Figure 9.3 is representation of “Data Warehouse Chess Pieces”, more inclusive and expansive than that of Inmon. • Uses the term Data Warehouse to encompass everything in both the data staging and data presentation areas. • DW/BI Architecture and Components 9.2.2.2 Kimball’s Business Development Lifecycle and DW chess Pieces
  • 26. • DW/BI Architecture and Components 9.2.2.2 Kimball’s Business Development Lifecycle and DW chess Pieces
  • 27. Name Description Operational Source Systems Operational/Transactional Applications of the enterprise. Integrated into ODS and DW components Equivalent to the Application system in the CIF diagram Data Staging Area Refer to “Kitchen”, also “area behind the scenes”. Smaller than Inmon’s diagram. “eclectic set of processes needed to integrate and transform data for presentation” Similar to “integration and transformation” in CIF Data Presentation Area Similar to Data Marts in CIF. Only Different with dimensions unifying the multiple data marts “DW Bus” Data Access Tools Focus on the needs and requirements for the end customers / consumers of the data. These needs translate into selection of criteria from a broad range of data access tools to the right tools for the right task. In CIF model, the access tools are outside DW architecture. Table 9.5 Kimball’s DW Chess Pieces – Component Descriptions • DW/BI Architecture and Components 9.2.2.2 Kimball’s Business Development Lifecycle and DW chess Pieces
  • 28. • Tactical BI: • BI tools to analysis trends by comparing metric to the same metric from a previous month or year, etc. “used to support short-term business decisions” • Strategic BI: • Provide metrics to executives, with formal method of business performance management, to help them determine if the corporation is on target for meeting its goals. “used to support long-term corporate goals and objectives” • Operational BI: • Provide BI to the front lines of business, used to manage and optimize business operations. Coupling of BI applications with operational functions and processes, with a requirement for very low tolerance for latency. ( Near real-time data capture and data delivery ). • Service-oriented architecture (SOA) necessary to support in this part. 9.2.3 Tactical, Strategic and Operational BI
  • 29. • Three major types of Data Warehousing are described: • 9.2.4.1 Active Data Warehousing • 9.2.4.2 Multi-dimensional Analysis – OLAP • 9.2.4.2 ROLAP, MOLAP, HOLAP and DOLAP 9.2.4 Types of Data Warehousing
  • 30. • DW serve tactical and strategic existed for many years BI. Non- volatile data. • New architectural approaches are emerging to deal with inclusion of volatile data. • For example, of these applications: (ABM) automated banking machine data provisioning. When making a banking transaction, historical balances and new balances resulting from immediate banking actions, need to be presented to the banking customer real-time. • Two of the key design concepts required: • Isolation of change • Alternatives to batch ETL. • Changes from new volatile data must be isolated from the bulk of the historical, non-volatile DW data. “building partitions and using union queries for different partitions, when necessary”. • Trickle-feeds, pipelining, and SOA are alternatives used to batch ETL. • Shorter Latency requirements for data availability in the DW. DW types 9.2.4.1 Active Data Warehousing
  • 31. • OLAP “Online Analytical Processing”: (analytical Cube) • An Approach to providing fast performance for multi-dimensional analytic queries. • Originated, in part to make distinction from OLTP “Online Transactional Processing” • The Typical Output queries are in Matrix format. • The dimensions from the rows and columns of the matrix. • The factors or measures are the values inside the matrix. • Multi-dimensional analysis with cubes is useful to look at summaries of data. • A common applications is financial analysis, where analysts want repeatedly traverse known hierarchies to analyze data: • data ( such as Year, Quarter, Month, Week, Day) • Organization ( Such as Region, Country, Business Unit, Department) • Product ( such as Product Category, Product Line, Product). DW types 9.2.4.2 Multi-dimensional Analysis - OLAP
  • 32. • Three classic implementation approaches support Online Analytical processing. Related to DB implementation approach: • Relational Online Analytical Processing (ROLAP): using techniques in the two-dimensional tables of RDBMS. Star schema joins are a common DB design. • Multi-dimensional Online Analytical Processing (MOLAP): support OLAP by using proprietary and specialized multi-dimensional DB technology. • Hybrid Online Analytical processing (HOLAP): simply a combination of ROLAP and MOLAP allow part of the data to be stored in MOLAP form and another part of the data to be stored in ROLAP. “designer has to vary the mix of partitioning”. • Database Online Analytical Processing (DOLAP): A virtual OLAP cube is available as a special proprietary function of classic relational DB. DW types 9.2.4.3 ROLAP, MOLAP, HOLAP and DOLAP
  • 33. 9.2.5 Dimensional Data Modeling Concepts and Terminology • Dimensional Data modeling is the preferred technique for Designing data marts. • Focus on making it simple for the end-user to understand and access the data. • This helps contribute to the fact that the majority of data mart design work ends up being in ETL processing. • It is subset of entity relationship data modeling. Has entities, attributes, and relationships. • The entity come in two types • Facts: provide the measurement, Dimensions: provide the context • Relationships are constrained to all go through the fact table, and all dimension-to-fact relationships are one-to-many (1:M). • Table 9.6 comparison the difference between relational modeling for transactional applications VS those build with dimensional data modeling for data marts.
  • 34. 9.2.5 Dimensional Data Modeling Concepts and Terminology
  • 35. Dimensional Data Modeling 9.2.5.1 Fact Tables • Represent and contain important business measures. • Fact tables (entities) contain one or more facts (attributes representing measures) • The row of a fact table correspond to a particular measurement and are numeric, such as amounts, quantities, or counts. • Express or resolve “many-to-many” relationships between the dimensions. • Often have a number of control columns that express when the row was loaded. • These fields help the programmers, the operators and the super- users navigate and validate the data.
  • 36. Dimensional Data Modeling 9.2.5.2 Dimension Tables • Represent the important objects of the business and contain textual descriptions of the business. • They act as the entry points or links into the fact tables, and their contents provide report groupings and report labels. • Typically have small number of rows and large number of columns. Main contents of a dimension table are: • Surrogate or non-surrogate key. • The primary key representing what is used to link to other tables in the DW. • Descriptive elements, including codes, descriptions, names, statuses, and so on. • Any hierarchy information, including multiple hierarchies and often ‘type’ breakdown. • The business key that the business user uses to identify a unique row. • The source system key identification fields for traceability. • Control fields geared to the type of dimension history capture, Type 1-3, 4 and 6 • Must have unique identifiers for each row. “surrogate and natural”
  • 37. Dimension Tables 9.2.5.2.1 Surrogate Keys • “surrogate key” or “anonymous key” is single primary key, populated by a number unrelated to the actual data. • Can be either a sequential number, or truly number. • The advantages of using surrogate keys include: • Performance: number fields search faster than other types of fields. • Isolation: it is buffer from business key field changes. May not need changing if a field type or length changes on the source system. • Integration: Enable combinations of data from different sources. Usually do have the structure as other system when trying to identify it. • Enhancement: values, such as ‘Unknown’ or ‘Not applicable’, have their own specific key value in addition to all of the keys for valid rows. • Interoperability: data access libraries, and GUI functions work better with surrogate key, because they don’t need additional knowledge about the underlying system to function properly. • Versioning: Enable multiple instances of the same dimension value, which is necessary for tracking changes over time. • De-bugging: supports load issue analysis, and re-run capability.
  • 38. Dimension Tables 9.2.5.2.2 Natural Keys • Used for system that not preferred to create additional key fields to identify unique rows by joining multiple fields in each query. • Business driven • The advantages of using a natural keys are: • Lower overhead: The key fields are already present, not requiring any additional modeling to create or processing to populate. • Ease of change: In RDBMS where the concept of a domain exits, it is easy to make global changes due to changes on the source system. • Performance advantage: Using the values in the unique keys may eliminate some joins entirely, improving performance. • Data lineage: Easier to track across systems, especially where the data travels through more than two systems.
  • 39. Dimensional Data Modeling 9.2.5.3 Dimension Attribute Types • The three main types of dimension attributes: • Type 1 • Type 2 (and 2a) • Type 3 • They differentiated by the need to retain historical copies. • There are two other types that do not appear very often: • Type 4 • Type 6 (1+2+3) • Type 1 through 3 can co-exist within the same table, and actions during update depend on which fields with which types are having updates applied.
  • 40. Dimension Attributes Types 9.2.5.3.1 Type 1 Overwrite • Have no need for any historical records at all. • The only interest is in the current value, so any updates completely overwrite the prior value in the field in that row. • Example “hair color”. When an update occurs, there is no need to retain the current values.
  • 41. Dimension Attributes Types 9.2.5.3.2 Type 2 New Row • Need all historical records. • Any new changes with type2 fields, a new row with the current information is appended to the table. • The pervious current row’s expired data field is updated to expire it. • Example: when the billing address changes, the row with the old address expires and a new row with the current billing address is appended. • The table’s key should handle multiple instances of the same natural key, either through the use of surrogate keys by: • Adding an index value to the primary key • Adding of date value ( effective, expiration, insert, and so on) to the primary key.
  • 42. Dimension Attributes Types 9.2.5.3.3 Type 3 New Column • Multiple fields in the same row contain the historical values. • Need only a selected, known portion of history. • When an update occurs, the current value is moved to the next appropriate field, and the last, no longer necessary, value drops off. • Example is credit score, where only the original score when the account opened, the most current score, and the immediate prior score are valuable. An update would move the current score to the prior score. • Example: monthly bill totals “12 fields”, named Month01, Month02..etc. or January, February, etc. • One useful purpose of Type 3: attribute value migrations. • Example: a company decides to reorganize its product hierarchy but wants to see sales figures for both the old hierarchy and the new for a year, to make sure that all sales are being recorded appropriately.
  • 43. Dimension Attributes Types 9.2.5.3.4 Type 4 New Table • Initiate a move of the expired row into ‘history 'table, and the row in the ‘current’ table is updated with the current information. • Example: would be a supplier table, where expired supplier rows roll off into the history table after an update, so the main dimension tables only contains current supplier rows. “The latter is sometime called a Type 2a dimension” • Retrievals involving timelines are more complex in a Type 4 design. • Since current and history tables need to be joined before joining with the fact table. Therefore, it is optimal when the vast majority of access uses current dimension data, and the historical table is maintained more for audit purposes than for active retrievals.
  • 44. Dimension Attributes Types 9.2.5.3.5 Type 6 1+2+3 • The same as Type 2 “new row”, where any change to any value creates a “new row”, but the key value ( surrogate or natural) does not change. • Two way to implement type 6: • Add three fields to each row “effective date, expiration date, and current row indicator” • Add index field : updated row get the index value of zero, and all rows add 1 to their index values to move them down the line • “the current row indicator”: • queries looking for data as of any particular point in time check to see if the desired data is between the effective and end dates. “drawback of requiring additional knowledge to create queries that correctly ask for the proper row by period value or indicator”. • “index field”: • . Queries looking for the current values would set the filter for index value equal to zero and looking for prior times would still use the effective and expiration dates. “drawback: all fact rows will link automatically to the index version0”joining to the fact table will not find any prior values of the dimension unless the dimensional effective and expiration dates are included in the query.
  • 45. Dimensional Data Modeling 9.2.5.4 Star Schema • Is the presentation of dimensional data model with a single fact table in the center connecting to number of surrounding dimension tables. • Also referred to as a star join schema, joins from the central fact table are via single primary keys to each of the surrounding dimension table. “the central fact table has a compound key composed of the dimension keys”. • Figure 9.4: Example of Star Schema
  • 47. Dimensional Data Modeling 9.2.5.5 Snowflaking • Is the term given to de-normalizing the flat, single-table, dimensional structure in star schema into component hierarchical or network structures. • Kimball’s design methods discourage snowflaking on two main principles: • It dilutes the simplicity and end-user understandability of the star schema. • The space savings are typically minimal. • Three types of snowflake tables are recognized: • Snowflake tables: Formed when a hierarchy is resolved into level tables. • Outrigger tables: Formed when attributes in one dimension table links to rows in another dimension table. • Bride tables: formed in two situations. The first is when a many-to-many relationship between two dimensions that is not or cannot be resolved through a fact table relationship.
  • 48. Dimensional Data Modeling 9.2.5.6 Grain • Grain stands for the meaning or description of single row of data in a fact table. • Refers to the atomic level of the data for a transaction. • Defining the grain of a fact table is one of the key steps in Kimball’s dimensional design method. • For Example: if the fact table has data for a store for all transactions for a month, we know the grain or limits of the data in the fact table will not include data for last years.
  • 49. Dimensional Data Modeling 9.2.5.7 Conformed Dimensions • The common or shared dimensions across multiple data marts in Kimball’s design method. • The practical importance is that the row headers from any answers sets from conformed dimensions must be able to match exactly. • Example: think of multiple data marts or fact tables, all linking directly to the same dimension table, or a direct copy of that dimension table. Updates to that dimension table automatically show in all queries for those data marts. • Reuse of conformed dimensions in other star schemas allow for modular development of the DW. • Ultimately, queries walk across subject areas to unify data access to the DW across the entire enterprise.
  • 50. Dimensional Data Modeling 9.2.5.8 Conformed Facts • Use standardized definitions of terms across individuals' marts. • Different business users may use the same term in different ways. • Does “Customer additions” refer to “gross additions” or “adjusted additions”. • Does “Orders processed” refer to the entire order, or the sum of individual line items. • Developers need to be keenly aware of things that may be called the same but are different concepts across organizations.
  • 51. Dimensional Data Modeling 9.2.5.9 DW-Bus Architecture and Bus Matrix • The DW-bus architecture of conformed dimensions is “ what allows multiple data marts to co-exit and share by plugging into a bus of shared or conformed dimensions. • The DW-bus matrix is tabular way of showing the intersection of data marts, data processes, or data subject areas with the shared conformed dimensions. “Table 9.7” • Very effective communication and planning tool. • As new design pieces are added, the existing dimensions and facts, complete with their sources, update logic, and schedule, need to be reviewed for possible re-use.
  • 52. Dimensional Data Modeling 9.2.5.9 DW-Bus Architecture and Bus Matrix
  • 53. 9.3 DW-BIM Activities • DW “Data content”: concerned primarily with the part of the DW- BIM lifecycle from data source to a common data store across all relevant departments. • BIM “Data presentation”: concerned with the portion of lifecycle from common data store to targeted audience. • BIM capability is directly dependent upon the provision of data from DW that is Timely, relevant, integrated, and has other quality factors controlled for and documented as required. • DW-BIM activities overlap with many of the data management functions.
  • 54. 9.3 DW-BIM Activities • DW “Data content”: concerned primarily with the part of the DW- BIM lifecycle from data source to a common data store across all relevant departments. • BIM “Data presentation”: concerned with the portion of lifecycle from common data store to targeted audience. • BIM capability is directly dependent upon the provision of data from DW that is Timely, relevant, integrated, and has other quality factors controlled for and documented as required. • DW-BIM activities overlap with many of the data management functions.
  • 55. DW-BIM Activities 9.3.1 Understand Business intelligence information Needs • Beginning with keep a consistence focus on business value of organization. “Value chain” • In contrast, of Operational systems. DW-BIM project gathering requirements specific details of operations and reports. • DW-BIM analysis project, is ad-hoc and involves asking questions “Slice and Dice the data”. • Identify and scope the business area, though interviews and ask people. • Capturing the actual business vocabulary and technology is a key to success. • Document the business context, then explore the details of the actual source data. “67% of DW-BIM projects are ETL portion”.
  • 56. DW-BIM Activities 9.3.1 Understand Business intelligence information Needs • Poor DW functionality is the first and apparent poor-quality data. Collaboration with the data governance function is critical. • Creating an executive summary of the identified business intelligence needs is best practice. • When starting a DW-BIM program, use a simple assessment of business impact and technical feasibility. Three critical factors to assessment: • Business Sponsorship: through identified and engaged steering committee • Business Goals and Scope: is there a clearly identified business need, purpose, and scope for the effort? • Business Resources: Commitment by business management to the availability and engagement of the appropriate Business SME.
  • 57. DW-BIM Activities 9.3.2 Define and Maintain the DW-BI Architecture • Roles required to identified Successful DW-BIM architecture are: • Technical Architect: Hardware, OS, DB, and DW-BIM architecture. • Data Architect: Data analysis, system of record, data modeling, and data mapping. • ETL Architect / Design Lead : Staging and transform, data marts, and schedules • Meta-data Specialist: Meta-data interfaces, meta-data architecture, and contents. • BI Application Architect / Design Lead: BI tool interfaces and report design, meta-data delivery, data and report navigation, and • DW-BIM needs to leverage many of the disciplines and components of a company’s IT department from perspectives of: • Business process • Architecture • Technology standards, including servers, DBs, Security.. ect. • Availability and timing needs are key drivers in developing the DW- BIM architecture. “technical requirements”
  • 58. DW-BIM Activities 9.3.2 Define and Maintain the DW-BI Architecture • The design decisions and principles for what data detail the DW contains is a key design priority for DW-BIM architecture. • BW-BIM architecture integrate with the overall corporate reporting architecture. Focus on defining appropriate (SLAs). • Another success factor is to identify a plan for data re-use, sharing, and extension. • Finally, no DW-BIM effort can be successful without business acceptance of data. Consider, up-front, a few critical important architectural sub-component, along with their supporting activities: • Data quality feedback loop: how easy is the integration of needed changes into operational system? • End to-end meta-data: integrated end-to –end flow of meta data and easy to access • End-to-end verifiable data lineage: To use modern, popular, TV parlance, is the evidence chain-of-custody for all DW-BIM data readily verifiable? Is a system of record for all data identified?
  • 59. DW-BIM Activities 9.3.3 Implement Data Warehouses and Data Marts • DW and data marts are the two major classes of formal data stores in DM-BIM landscape. • DW is relational DB design with normalization techniques. “integrate data from multiple source systems, and serve data to multiple data marts” • The primary purpose of data marts is to provide for analysis to knowledge workers. • DW and Data marts design (Covey’s Seven habits): • Identify the business problem to solve. • Identify the details and what would be used ( end solution piece of software and associated data marts). • Continue to work back into the integrated data required ( the DW). • Ultimately, all the way back to the data sources.
  • 60. DW-BIM Activities 9.3.4 Implement BI Tools and User Interfaces • The purpose of this section is to introduce the types of tools available in BI marketplace and review their characteristics. • Implementing the right BI tools or User (UI) is about identifying the right tools for the right user set. • Almost all BI tools also come with their own meta-data repositories to mange their internal data maps and statistics. • Some vendors make these repositories open to the end user, other allow business meta-data to be entered. • Enterprise meta-data repositories must link and coped these repositories to get complete view of reporting and analysis activity.
  • 61. BI Tools & UI 9.3.4.1 Query and Reporting Tools • Query and Reporting is the process of querying a data source, then formatting it to create a “report”, either a production style report such as an invoice, or a management report. • The needs within business operations reporting are often different from the needs within business query and reporting. • Table 9.8 help distinguish business operations-style reports from business query and reporting. • Figure 9.5 relates the classes of BI tools to the respective classes of BI users for those tools. • Different Users may use different and crossed Queries and reporting in BI tools.
  • 62. BI Tools & UI 9.3.4.1 Query and Reporting Tools
  • 63. BI Tools & UI 9.3.4.1 Query and Reporting Tools
  • 64. BI Tools & UI 9.3.4.2 OLAP Tools • Covers OLAP tools, that provide arrangement of data into OPLAP cubes for fast analysis. • Cubes in BI tools are generated from star ( or snowflake) DB schema. • The OLAP cubes consists of measures “numeric facts” from fact tables. • The value of OLAP tools and cubes is reduction of the chance of confusion and erroneous interpretation, by aligning the data content with the analyst’s mental model. • Common OLAP operations include: • Slice : subset of multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset. • Dice: it is “slice” on more than two dimensions of data cube, or more than two consecutive slices. • Drill Down / Up: specific analytical technique whereby the user navigates among levels of data. • Roll-up : involves computing all of the data relationships for one or more dimensions. To do this, define formula. • Pivot: To change the dimensional orientation of report or page display.
  • 65. BI Tools & UI 9.3.4.3 Analytic Applications • include the logic and process to extract data from well-known source systems, such as vendor ERP systems, data model for the data mart, and pre-built reports and dashboards. • Different types include customer, financial, supply chain, manufacturing, and HR applications. • Different approaches for Analytic applications from “buy” and “Build” perspectives. • Some Key questions for evaluation of analytic applications are: 1. Do we have the standard source systems for which ETL is supplied? If yes, how much have we modified it? Less modification equals more value and better fit. 2. How many other source systems do we need to integrate? The fewer the sources, the better the value and fit. 3. How much do the canned industry queries, reports, and dashboards match our business ? Involve your business and customers and let them answer that 4. How much of analytic application’s infrastructure matches your existing infrastructure ? The better the match, the better value and fit.
  • 66. BI Tools & UI 9.3.4.4 Implementing Management Dashboards and Scorecards • Both are ways of efficiently presenting performance information. • Dashboards are oriented more toward dynamic presentation of operational information. • Scorecards are more static representations of longer-term organizational, tactical, or strategic goals. • Scorecards are divided into 4 views: Finance, Customer, Environment, and Employees. • Each have number of metrics that are reported and trended to various targets set by senior executives. • Example of the way various BI Techniques combine to create BI environment presented in Wayne Eckerson book “on Performance Dashboards”. “Figure 9.6”
  • 67. BI Tools & UI 9.3.4.4 Implementing Management Dashboards and Scorecards
  • 68. BI Tools & UI 9.3.4.5 Performance Management Tools • Include budgeting, planning, and financial consolidation. • There have been several major acquisitions in this segment: • On customer buying side, the degree to which customer buy BI and Performance management from the same vendor: • Depend on product capabilities. • The degree to which the CFO and CIO co-operate. • It is important to note that budgeting and planning does not apply only to financial metrics, but to workforce, capital, and so on, as well.
  • 69. BI Tools & UI 9.3.4.6 Predictive Analytics and Data Mining Tools • Data Mining: type of analysis reveals patterns in data using various algorithms. Help users discover relationships or show patterns in more exploratory fashion. • Predictive analytics “what-if analysis” allow users to create a model, test the model based on actual data, and then project future results. Underlying engines my be neural networks or inference. • Using data mining in predictive analysis, fraud detection, root cause analysis, customer segmentation and scoring..etc. • Good strategy for interfacing with many data mining tools is to workt with the business analysts to define data set needed for analysis, and then arrange for periodic file extract. • This strategy is intense multi-pass • Data mining from the DW • Data mining tools work with file-based input
  • 70. BI Tools & UI 9.3.4.7 Advanced Visualization and Discovery Tools • Use an in-memory architecture to allow users to interact with the data in a highly visual, interactive way. • Patterns in a large dataset can be difficult to recognize in a number display. • A pattern can be picked up visually fairly quickly, when thousands of data points are loaded into a sophisticated display on a single page of display. • The difference in these tools versus most dashboard products is: 1. The degree of sophisticated analysis and visualization types such as small multiplies, spark lines, heat maps, histograms, waterfall charts, bullet graph 2. Adherence to best practices according to the visualization community. 3. The degree of interactivity and visual discovery versus creating a chart on a tabular data display.
  • 71. 9.3.5 Process Data for BI • The most and big part of any DW-BIM effort is the preparation and processing of the data. • This section introduces some of the architectural components and sub-activities involved in processing data for BI: • Staging Areas • Mapping sources and Targets • Data Cleansing and Transformation ( Data Acquisition)
  • 72. Process Data for BI 9.3.5.1 Staging Areas • Is the intermediate data store between an original data source and the centralized data repository. • All required cleansing, transformation, reconciliation and relationships happen in this area. • Divide the work reduce the overall complexity and make debugging much simpler. • A change-capture mechanism reduce the volume of transmitted data sets. Several months to a few years of data can be stored in this initial staging area. Benefits of this approach include: • Improving performance on the source system by allowing limited history to be stored there. • Pro-active capture of a full set of data, allowing for future needs. • Minimizing the time and performance impact on the source system by having a single extract. • Pro-active creation of a data store that is not subject to transactional system limitations.
  • 73. Process Data for BI 9.3.5.2 Mapping Sources and Targets • Source-to-target mapping is the documentation activity that defines data type details and transformation rules for all required entities and data elements, and from each individual source to each individual target. • Determine valid links between data elements in multiple equivalent systems consider the most difficult part. • A solid taxonomy is necessary to match data elements in different systems into a consistent structure in the EDW. • Gold sources or system of record source or sources must be signed off by the business.
  • 74. Process Data for BI 9.3.5.3 Data Cleansing and Transformation ( Data Acquisition) • Data cleansing: activities that correct and enhance the domain values of individual data elements, including enforcement of standards. • Necessary for initial loads where significant history is involved. • Strategy preferred is to push data cleansing activities to source system. • Strategies must be developed: • For the rows of data that are loaded but found to be incorrect. • Deleting old records may cause some havoc with related tables and surrogate keys • by expiring a row, New row may be better option • Data transformation: Activities that provide organizational context between data elements, entities, and subject area. • Organizational context include: cross-referencing, reference and master data management, complete and correct relationships. • Essential component of being able to integrated data from multiple sources. • Required extensive involvement with Data Governance.
  • 75. 9.3.6 Monitor and Tune DW Processes • Transparency and visibility are the key principles drive DW-BIM monitoring. • Providing dashboards and drill-down activities is the best practice. • The addition of data quality measures will enhance the value of performance is more than just speed and timing. • Processing should be monitored across the system for bottlenecks and dependencies among processes. • DB tuning techniques such as partitioning, tune backup and recovery strategies. • Users often consider DW as active archive due to the long histories. • Management by exception is great policy to apply here. • Sending attention messages upon failure is a prudent addition to monitoring dashboard.
  • 76. 9.3.6 Monitor and Tune BI Activity and Performance • A best practice for BI monitoring and tuning is define and display a set of customer facing satisfaction metrics. Example of metrics: • Average query response time. • The number of users per day/ week/ month. • The statistical measures available from the system. • Regular review of usage statistics and patterns is essential. • Report providing frequency • Resource usage of data • Queries • Report allow for prudent enhancement. • Tuning BI activities is analogous to principle of profiling applications in order to know where the bottlenecks are and where to apply optimization efforts. • Creating indexes and aggregations • Simple solutions such as report that positing a daily results.