DATA WAREHOUSE
Data Warehouse 
• Pool of data to support decision making. 
• Structured to be available in ready to use form 
• Subject Oriented 
• Integrated 
• Time-variant 
• Nonvolatile 
• Additional characteristics like 
1.Web based 
2.Relational/multidimensional 
3.Client/Server 
4.Real time 
5.Include metadata
Types of Data warehouse 
DATA Mart 
• Dependent 
– Created from warehouse 
– Replicated 
• Functional subset of warehouse 
• Independent 
– Scaled down, less expensive version of data warehouse 
– Designed for a department or SBU 
– Organization may have multiple data marts 
• Difficult to integrate
• Operational DATA Stores: Provides a fairly 
recent form of customer information file(CIF) 
• Enterprise DATA Warehouses: Used across the 
enterprise for decision support 
• METADATA: Describes the structure of and 
meaning about data, contributing to their 
effective use.
Data warehousing process overview 
Major components 
• Data sources 
• Data extraction 
• Data loading 
• Comprehensive database 
• Metadata 
• Middleware tools
Data Warehousing Architectures 
• May have one or more tiers 
– Determined by warehouse, data acquisition (back 
end), and client (front end) 
• One tier, where all run on same platform, is rare 
• Two tier usually combines DSS engine (client) with 
warehouse 
– More economical 
• Three tier separates these functional parts
Architecture considerations 
• Which DBMS to use? 
• Parallel processing 
• Partitioning 
• Which data migration tools be used? 
• What tools for data retrieval and analysis?
Alternative Architectures for data 
warehousing
Architecture Selection Factors 
• Information interdependence 
• Senior management Info needs 
• Urgency for a DW 
• Nature of end user tasks 
• Constraints on resources 
• Strategic view 
• Compatibility with existing systems 
• Ability of in-house IT staff 
• Technical and Political factors
Enterprise Data Warehouse
Data Integration, Extraction And Load 
process 
1.DATA INTEGRATION 
Comprises three major processes 
• Data Access: ability to access & extract data 
from any data source 
• Data federation: Integration of business views 
across multiple data store 
• Change capture: Based on the identification, 
capture, and delivery of the changes made to 
enterprise data source.
2.Extraction, Transformation And Load(ETL) 
• Is an integral component in any data-centric 
project. 
• ETL consists: 
Extraction-From all relevant sources 
Transformation-Converting extracted data in the 
form so it can place in data warehouse or 
another database 
Load- Inserting the data in the data warehouse.
ETL Process 
Transient 
Data 
source Data 
Warehouse 
Data 
Mart 
Packaged 
application 
Legacy 
system 
Extract 
Other 
Internal 
applications 
Transform Cleanse Load
Benefits of Data Warehouse 
• Allows extensive analysis in numerous ways. 
• A consolidated view of corporate data. 
• Better and more timely information. 
• Enhance system performance. 
• Simplification of data access. 
• Enhance business knowledge, enhance 
customer service and satisfaction, facilitate 
decision making.
Assignment 
• Data warehousing vendors? 
• Data warehousing case study found on the 
internet.
Data Warehouse development 
Approaches 
The Inmon Model: The EDW Approach 
• Emphasizes top-down development 
• Employing established database development 
methodologies and tools 
The Kimball Model: The Data Mart Approach 
• Plan big, build small 
• Subject oriented or department oriented 
• Focus on the requests of a specific department.
Data Warehouse Structure 
(The Star Schema)
Successful Implementation of Data 
warehouse 
• Establishment of service-level agreements and data-refresh 
requirements. 
• Identification of data sources and their governance 
policies. 
• Data quality planning & model designing. 
• ETL tool selection. 
• Relational database software and platform selection. 
• Data transport and data conversion. 
• Reconciliation process 
• End-user support
Issues in implementation of data 
warehouse 
• Starting with the wrong sponsorship chain. 
• Setting expectation that you cannot meet and 
frustrating executives at the moment of truth. 
• Engaging in politically native behavior. 
• Loading the warehouse with information just 
because it is available. 
• Believing that data warehousing database design 
is the same as transactional database design. 
Continue……..
• Choosing a data warehouse manager who is 
technology oriented rather than user oriented 
• Focusing on traditional internal record-oriented 
data and ignoring the value of external data of 
text, image, and perhaps, sound and video. 
• Delivering data with overlapping and confusing 
definitions. 
• Believing promise of performance, capacity and 
scalability. 
• Believing that your problem are over when the 
data warehouse is up and running.
Risks in Data Warehouse Projects 
• No mission or objective 
• Quality of source data 
unknown 
• Skills not in place 
• Inadequate budget 
• Lack of supporting software 
• Source data not understood 
• Weak sponsor 
• Users not computer literate 
• Geographically distributed 
environment 
• Unrealistic user expectations 
• Architectural and design risks 
• Scope creep and changing 
requirements 
• Vendors out of control 
• Multiple platforms 
• Key people leaving project 
• Loss of the sponsor 
• Too much new technology 
• Having to fix an operational 
system 
• Team geography and 
language culture
Massive Data Warehouse And 
Scalability 
• Data warehouse needs scalability. 
• Good scalability means: queries and other 
data access functions grow ideally with the 
size of warehouse. 
• Specialized methods have been developed to 
create scalable data warehouse. 
• Scalability is difficult in managing hundreds of 
terabytes.
Issues pertaining to scalability 
• The amount of data in warehouse. 
• How quickly the warehouse is expected to 
grow. 
• The number of concurrent users. 
• The complexity of user queries.
Real-Time Data warehousing 
• Also knows as active data warehousing. 
• Process of loading & providing data via the 
data warehouse. 
• Evolved from EDW (Enterprise Data Warehousing) 
concept. 
• Allows information-based decision making at 
finger tips. 
• Positively affect almost all aspects of customer 
service, SCM, logistics.
Comparison between Traditional And 
Active Data Warehousing Environment 
Traditional Data Warehouse 
Environment 
• Strategic decisions only 
• Result sometimes hard to 
measure 
• Moderate user concurrency 
• Highly restrictive reporting 
used to confirm or check 
existing processes and 
patterns. 
• Power users, knowledge 
workers, internal users. 
Active Data Warehouse 
Environment 
• Strategic and tactical decision 
• Result measured with 
operations 
• High number of users accessing 
simultaneously 
• Flexible ad hoc reporting, as well 
as machine-assisted modeling to 
discover new hypotheses. 
• Operational staffs, call centers, 
external users.
Data Warehouse Administration 
• Due to huge size, data warehouse requires 
strong monitoring. 
• A data warehouse administrator(DWA) should 
posses following features- 
1. Should be familiar with high performance software, 
hardware, and networking tech. 
2. Should familiar with decision making process. 
3. Significant to keep the existing requirement and 
capabilities of data warehouse. 
4. Must posses excellent communication skills.
Data Warehouse Security issues 
• Security and privacy of information is significant 
concern. 
• Companies must create effective and flexible 
security procedures. 
• Effective security in data warehouse focus on: 
1. Establishing effective corporate and security policies and 
procedures. 
2. Implementing logical security procedures and techniques to 
restrict access. 
3. Limiting physical access to the data center environment. 
4. Establishing an effective internal control review process with 
an emphasis on security and privacy.

Data warehouseold

  • 1.
  • 2.
    Data Warehouse •Pool of data to support decision making. • Structured to be available in ready to use form • Subject Oriented • Integrated • Time-variant • Nonvolatile • Additional characteristics like 1.Web based 2.Relational/multidimensional 3.Client/Server 4.Real time 5.Include metadata
  • 3.
    Types of Datawarehouse DATA Mart • Dependent – Created from warehouse – Replicated • Functional subset of warehouse • Independent – Scaled down, less expensive version of data warehouse – Designed for a department or SBU – Organization may have multiple data marts • Difficult to integrate
  • 4.
    • Operational DATAStores: Provides a fairly recent form of customer information file(CIF) • Enterprise DATA Warehouses: Used across the enterprise for decision support • METADATA: Describes the structure of and meaning about data, contributing to their effective use.
  • 5.
    Data warehousing processoverview Major components • Data sources • Data extraction • Data loading • Comprehensive database • Metadata • Middleware tools
  • 7.
    Data Warehousing Architectures • May have one or more tiers – Determined by warehouse, data acquisition (back end), and client (front end) • One tier, where all run on same platform, is rare • Two tier usually combines DSS engine (client) with warehouse – More economical • Three tier separates these functional parts
  • 9.
    Architecture considerations •Which DBMS to use? • Parallel processing • Partitioning • Which data migration tools be used? • What tools for data retrieval and analysis?
  • 10.
  • 11.
    Architecture Selection Factors • Information interdependence • Senior management Info needs • Urgency for a DW • Nature of end user tasks • Constraints on resources • Strategic view • Compatibility with existing systems • Ability of in-house IT staff • Technical and Political factors
  • 12.
  • 13.
    Data Integration, ExtractionAnd Load process 1.DATA INTEGRATION Comprises three major processes • Data Access: ability to access & extract data from any data source • Data federation: Integration of business views across multiple data store • Change capture: Based on the identification, capture, and delivery of the changes made to enterprise data source.
  • 14.
    2.Extraction, Transformation AndLoad(ETL) • Is an integral component in any data-centric project. • ETL consists: Extraction-From all relevant sources Transformation-Converting extracted data in the form so it can place in data warehouse or another database Load- Inserting the data in the data warehouse.
  • 15.
    ETL Process Transient Data source Data Warehouse Data Mart Packaged application Legacy system Extract Other Internal applications Transform Cleanse Load
  • 16.
    Benefits of DataWarehouse • Allows extensive analysis in numerous ways. • A consolidated view of corporate data. • Better and more timely information. • Enhance system performance. • Simplification of data access. • Enhance business knowledge, enhance customer service and satisfaction, facilitate decision making.
  • 17.
    Assignment • Datawarehousing vendors? • Data warehousing case study found on the internet.
  • 18.
    Data Warehouse development Approaches The Inmon Model: The EDW Approach • Emphasizes top-down development • Employing established database development methodologies and tools The Kimball Model: The Data Mart Approach • Plan big, build small • Subject oriented or department oriented • Focus on the requests of a specific department.
  • 19.
    Data Warehouse Structure (The Star Schema)
  • 20.
    Successful Implementation ofData warehouse • Establishment of service-level agreements and data-refresh requirements. • Identification of data sources and their governance policies. • Data quality planning & model designing. • ETL tool selection. • Relational database software and platform selection. • Data transport and data conversion. • Reconciliation process • End-user support
  • 21.
    Issues in implementationof data warehouse • Starting with the wrong sponsorship chain. • Setting expectation that you cannot meet and frustrating executives at the moment of truth. • Engaging in politically native behavior. • Loading the warehouse with information just because it is available. • Believing that data warehousing database design is the same as transactional database design. Continue……..
  • 22.
    • Choosing adata warehouse manager who is technology oriented rather than user oriented • Focusing on traditional internal record-oriented data and ignoring the value of external data of text, image, and perhaps, sound and video. • Delivering data with overlapping and confusing definitions. • Believing promise of performance, capacity and scalability. • Believing that your problem are over when the data warehouse is up and running.
  • 23.
    Risks in DataWarehouse Projects • No mission or objective • Quality of source data unknown • Skills not in place • Inadequate budget • Lack of supporting software • Source data not understood • Weak sponsor • Users not computer literate • Geographically distributed environment • Unrealistic user expectations • Architectural and design risks • Scope creep and changing requirements • Vendors out of control • Multiple platforms • Key people leaving project • Loss of the sponsor • Too much new technology • Having to fix an operational system • Team geography and language culture
  • 24.
    Massive Data WarehouseAnd Scalability • Data warehouse needs scalability. • Good scalability means: queries and other data access functions grow ideally with the size of warehouse. • Specialized methods have been developed to create scalable data warehouse. • Scalability is difficult in managing hundreds of terabytes.
  • 25.
    Issues pertaining toscalability • The amount of data in warehouse. • How quickly the warehouse is expected to grow. • The number of concurrent users. • The complexity of user queries.
  • 26.
    Real-Time Data warehousing • Also knows as active data warehousing. • Process of loading & providing data via the data warehouse. • Evolved from EDW (Enterprise Data Warehousing) concept. • Allows information-based decision making at finger tips. • Positively affect almost all aspects of customer service, SCM, logistics.
  • 27.
    Comparison between TraditionalAnd Active Data Warehousing Environment Traditional Data Warehouse Environment • Strategic decisions only • Result sometimes hard to measure • Moderate user concurrency • Highly restrictive reporting used to confirm or check existing processes and patterns. • Power users, knowledge workers, internal users. Active Data Warehouse Environment • Strategic and tactical decision • Result measured with operations • High number of users accessing simultaneously • Flexible ad hoc reporting, as well as machine-assisted modeling to discover new hypotheses. • Operational staffs, call centers, external users.
  • 28.
    Data Warehouse Administration • Due to huge size, data warehouse requires strong monitoring. • A data warehouse administrator(DWA) should posses following features- 1. Should be familiar with high performance software, hardware, and networking tech. 2. Should familiar with decision making process. 3. Significant to keep the existing requirement and capabilities of data warehouse. 4. Must posses excellent communication skills.
  • 29.
    Data Warehouse Securityissues • Security and privacy of information is significant concern. • Companies must create effective and flexible security procedures. • Effective security in data warehouse focus on: 1. Establishing effective corporate and security policies and procedures. 2. Implementing logical security procedures and techniques to restrict access. 3. Limiting physical access to the data center environment. 4. Establishing an effective internal control review process with an emphasis on security and privacy.