PRESENTEDPRESENTEDPRESENTEDPRESENTED
BYBYBYBY VASANTHKUMAR C
1DA12CS118
VEERABHADRAPPA KS
1DA12CS120
DWH
Dr. AMBEDKAR INSTITUTE OF TECHNOLOGY
Loosely speaking, a data warehouse refers to a database that is
maintained separately from an organization’s operational
database
practical interest in many applications such Decision Making
in Companies by higher order database Administrators, Data
Analysis…etc
selection & dealing successfully with particular queries gives
better results overall.
DWH
DWH
INTRODUCTION
DATA WAREHOUSE vs OLTP
DATA WAREHOUSE vs DATA MARTS
DISCUSSION(Data Warehouse terminology)
METHODOLGY
ETL LIFE CYCLE
FUTURE ENHANCEMENTS
DWH
Data Warehouse
Concepts
And
ETL Tool
INTRODUCTION
What is a Data Warehouse?
A single, complete and
consistent store of data
obtained from a variety of
different sources made
available to end users in a
what they can understand
and use in a business
context.
[Barry Devlin]
Definition of a Data Warehouse
“ An enterprise structured repository of
subject-oriented, time-variant, historical data
used for information retrieval and decision
support. The data warehouse stores atomic
and summary data.”
DWH
7
Warehouses are Very Large Databases
35%
30%
25%
20%
15%
10%
5%
0%
5GB
5-9GB
10-19GB 50-99GB 250-499GB
20-49GB 100-249GB 500GB-1TB
Initial
Projected 2Q96
Source: META Group, Inc.
Respondents
Data Warehouse Properties
Data
Warehouse
Integrated
Time VariantNon Volatile
Subject
Oriented
DWH
Subject-Oriented
Data is categorized and stored by business subject
rather than by application
Supplier
Customers Whole Sale
Marketing
Company
Products
Employees
Shippers
OLTP Applications Data Warehouse Subject
DWH
Integrated
OLTP Applications
Products
Order Detail
Order
Data Warehouse
Data on a given subject is defined and stored once.
Customer
DWH
Time-Variant
Data is stored as a series of snapshots, each
representing a period of time
Time Data
Jan-97 January
Feb-97 February
Mar-97 March
DWH
Nonvolatile
Typically data in the data warehouse is not updated or delelted.
Insert
Update
Delete
Read Read
Operational Warehouse
Load
DWH
Changing Data
Warehouse Database
First time load
Refresh
Refresh
Refresh
Operational
Database
DWH
Data Warehouse Versus OLTP
Property
Response
Time
Operations
Nature of Data
Data Organization
Size
Data Source
Activities
Operational
Sub seconds to
seconds
DML
30-60 days
Applications
Small to large
Operational, Internal
Processes
Data Warehouse
Seconds to hours
Snapshots over time
Subject, time
Large to very large
Operational, Internal,
External
Analysis
Primarily read only
DWH
Data Warehouses Versus
Data Marts
Property Data Warehouse Data Mart
Scope Enterprise Department
Subject Multiple Single-subject, LOB
Data Source Many Few
Size(typical) 100 GB to>1 TB <100 GB
Implementation time Months to years Months
Data
Warehouse
Data
Mart
DWH
Dependent Data Mart
Marketing
Sales
Human Resources
(Employees)
Shipper
Categories
Orders
External Data
Data
Warehouse
Operational
Systems
Flat Files
Data Marts
DWH
Data Warehouse
Terminology
Operational data store (ODS)
Stores tactical data from production systems
that are subject-oriented and integrated to
address operational needs
Metadata
Metadata
DWH
Data Warehouse
Terminology
Data
Integration
Enterprise data
warehouse
Business
area
warehouse
Source
data
Architecture
DWH
Methodology
Ensures a successful data warehouse
Encourages incremental development
Provides a staged approach to an enterprisewide
warehouse
- Safe
- Manageable
- Proven
- Recommended
DWH
Modeling
Warehouses differ from operational structures:
- Analytical requirements
- Subject orientation
Data must map to subject oriented information:
- Identify business subjects
- Define relationships between subjects
- Name the attributes of each subject
Modeling is iterative
Modeling tools are available
DWH
21
Components of the Warehouse
Data Extraction and Loading
The Warehouse
Analyze and Query -- OLAP Tools
Metadata
Data Mining tools
Loading the Warehouse
Cleaning the data before it is loaded
Extraction, Transformation & Loading
Purchase specialist tools, or develop programs
Extraction-- select data using different methods
Transformation--validate, clean, integrate, and
time stamp data
Loading--move data into the warehouse
OLTP Databases ETL Tool Warehouse Database
DWH
ETL Life Cycle
The typical real-life ETL cycle consists of the
following execution steps:
1. Cycle initiation
2. Build reference data
3. Extract (from sources)
4. Validate
5. Transform (clean, apply business rules, check for
data integrity, create aggregates or disaggregates)
DWH
DWH
6. Stage (load into staging tables, if used)
7. Audit reports (for example, on compliance with
business rules. Also, in case of failure, helps to
diagnose/repair)
8. Publish (to target tables)
9. Archive
10. Clean up
Data Access and Reporting
Tools that retrieve data for business analysis
Imperatives
- Ease of use
- Intuitive
- Metadata
- Training
More than one tool may be required
Warehouse
Database
Charts
Forecasting
Drill-down
DWH
27
Snowflake schema
Represent dimensional hierarchy directly by
normalizing tables.
Easy to maintain and saves storage
T
i
m
e
p
r
o
d
c
u
s
t
c
i
t
y
f
a
c
t
date, custno, prodno, cityname, ...
r
e
g
i
o
n
Oracle Warehouse Components
Relational /
Multidimensional
Text, image Spatial
Web Audio
video
External
data
Operational
data
Relational
tools
OLAP
tools
Applications/Web
Any DataAny Source Any Access
DWH
Oracle Data Mart Suite
Data Modeling
Oracle Data Mart Designer
OLTP
Engines
OLTP
Databases
Data
Extraction
Oracle Data Mart
Builder
Ware-
housing
Engines
Data Mart
Database
SQL*Plus
Data
Management
Oracle Enterprise
Manager
Data Access
& Analysis
Discoverer &
Oracle Reports
DWH
Oracle Business
Intelligence Tools
Current Tactical Strategic
IS develops
user’s Views Business users Analysis
Reports Discover Express
DWH
31
Data Mining works with Warehouse
Data
Data Warehousing provides
the Enterprise with a memory
Data Mining provides
the Enterprise with
intelligence
The Tool for Each Task
Tool
Reports
Discover
Express
Production
reporting
Ad hoc
query and
analysis
Advanced
analysis
Question
What were sales by
region last quarter?
What is driving the
increase in North
American sales?
Given the rapid increase
in Web sales, what will
total sales be for the rest
of the year?
Task
DWH
33
Reporting Tools
Andyne Computing -- GQL
Brio -- BrioQuery
Business Objects -- Business Objects
Cognos -- Impromptu
Information Builders Inc. -- Focus for Windows
Oracle -- Discoverer2000
Platinum Technology -- SQL*Assist, ProReports
PowerSoft -- InfoMaker
SAS Institute -- SAS/Assist
Software AG -- Esperant
Sterling Software -- VISION:Data
34
Extraction and Transformation Tools
Carleton Corporation -- Passport
Evolutionary Technologies Inc. -- Extract
Informatica -- OpenBridge
Informatica PowerCenter
Information Builders Inc. -- EDA Copy Manager
Platinum Technology -- InfoRefiner
Prism Solutions -- Prism Warehouse Manager
Red Brick Systems -- DecisionScape Formation
Warehouse Services
Education
Consulting
Support Services
Customers
DWH
DWH
OLAP constructs in RDBMS:
A relational database designed for OLTP will not serve well as a
database for data analysis. Optimization techniques such as
aggregating fact tables, partitioning fact tables, and denormalizing
relation tables all provide significant improvements in
performance.
No Future Without Data Warehousing:
Summary
following are covered topics:
Identifying a common, broadly accepted definition
of the data warehouse
Distinguishing the differences between OLTP
systems and analytical systems
Defining some of the common data warehouse
terminology
Identifying some of the elements and processes in
a data warehouse
Identifying and positioning the Oracle Warehouse
vision, products, and services
DWH
DWH

DWH_PROJECT [Compatibility Mode]

  • 1.
  • 2.
    Loosely speaking, adata warehouse refers to a database that is maintained separately from an organization’s operational database practical interest in many applications such Decision Making in Companies by higher order database Administrators, Data Analysis…etc selection & dealing successfully with particular queries gives better results overall. DWH
  • 3.
    DWH INTRODUCTION DATA WAREHOUSE vsOLTP DATA WAREHOUSE vs DATA MARTS DISCUSSION(Data Warehouse terminology) METHODOLGY ETL LIFE CYCLE FUTURE ENHANCEMENTS
  • 4.
  • 5.
    What is aData Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin]
  • 6.
    Definition of aData Warehouse “ An enterprise structured repository of subject-oriented, time-variant, historical data used for information retrieval and decision support. The data warehouse stores atomic and summary data.” DWH
  • 7.
    7 Warehouses are VeryLarge Databases 35% 30% 25% 20% 15% 10% 5% 0% 5GB 5-9GB 10-19GB 50-99GB 250-499GB 20-49GB 100-249GB 500GB-1TB Initial Projected 2Q96 Source: META Group, Inc. Respondents
  • 8.
    Data Warehouse Properties Data Warehouse Integrated TimeVariantNon Volatile Subject Oriented DWH
  • 9.
    Subject-Oriented Data is categorizedand stored by business subject rather than by application Supplier Customers Whole Sale Marketing Company Products Employees Shippers OLTP Applications Data Warehouse Subject DWH
  • 10.
    Integrated OLTP Applications Products Order Detail Order DataWarehouse Data on a given subject is defined and stored once. Customer DWH
  • 11.
    Time-Variant Data is storedas a series of snapshots, each representing a period of time Time Data Jan-97 January Feb-97 February Mar-97 March DWH
  • 12.
    Nonvolatile Typically data inthe data warehouse is not updated or delelted. Insert Update Delete Read Read Operational Warehouse Load DWH
  • 13.
    Changing Data Warehouse Database Firsttime load Refresh Refresh Refresh Operational Database DWH
  • 14.
    Data Warehouse VersusOLTP Property Response Time Operations Nature of Data Data Organization Size Data Source Activities Operational Sub seconds to seconds DML 30-60 days Applications Small to large Operational, Internal Processes Data Warehouse Seconds to hours Snapshots over time Subject, time Large to very large Operational, Internal, External Analysis Primarily read only DWH
  • 15.
    Data Warehouses Versus DataMarts Property Data Warehouse Data Mart Scope Enterprise Department Subject Multiple Single-subject, LOB Data Source Many Few Size(typical) 100 GB to>1 TB <100 GB Implementation time Months to years Months Data Warehouse Data Mart DWH
  • 16.
    Dependent Data Mart Marketing Sales HumanResources (Employees) Shipper Categories Orders External Data Data Warehouse Operational Systems Flat Files Data Marts DWH
  • 17.
    Data Warehouse Terminology Operational datastore (ODS) Stores tactical data from production systems that are subject-oriented and integrated to address operational needs Metadata Metadata DWH
  • 18.
  • 19.
    Methodology Ensures a successfuldata warehouse Encourages incremental development Provides a staged approach to an enterprisewide warehouse - Safe - Manageable - Proven - Recommended DWH
  • 20.
    Modeling Warehouses differ fromoperational structures: - Analytical requirements - Subject orientation Data must map to subject oriented information: - Identify business subjects - Define relationships between subjects - Name the attributes of each subject Modeling is iterative Modeling tools are available DWH
  • 21.
    21 Components of theWarehouse Data Extraction and Loading The Warehouse Analyze and Query -- OLAP Tools Metadata Data Mining tools
  • 22.
    Loading the Warehouse Cleaningthe data before it is loaded
  • 23.
    Extraction, Transformation &Loading Purchase specialist tools, or develop programs Extraction-- select data using different methods Transformation--validate, clean, integrate, and time stamp data Loading--move data into the warehouse OLTP Databases ETL Tool Warehouse Database DWH
  • 24.
    ETL Life Cycle Thetypical real-life ETL cycle consists of the following execution steps: 1. Cycle initiation 2. Build reference data 3. Extract (from sources) 4. Validate 5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates) DWH
  • 25.
    DWH 6. Stage (loadinto staging tables, if used) 7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair) 8. Publish (to target tables) 9. Archive 10. Clean up
  • 26.
    Data Access andReporting Tools that retrieve data for business analysis Imperatives - Ease of use - Intuitive - Metadata - Training More than one tool may be required Warehouse Database Charts Forecasting Drill-down DWH
  • 27.
    27 Snowflake schema Represent dimensionalhierarchy directly by normalizing tables. Easy to maintain and saves storage T i m e p r o d c u s t c i t y f a c t date, custno, prodno, cityname, ... r e g i o n
  • 28.
    Oracle Warehouse Components Relational/ Multidimensional Text, image Spatial Web Audio video External data Operational data Relational tools OLAP tools Applications/Web Any DataAny Source Any Access DWH
  • 29.
    Oracle Data MartSuite Data Modeling Oracle Data Mart Designer OLTP Engines OLTP Databases Data Extraction Oracle Data Mart Builder Ware- housing Engines Data Mart Database SQL*Plus Data Management Oracle Enterprise Manager Data Access & Analysis Discoverer & Oracle Reports DWH
  • 30.
    Oracle Business Intelligence Tools CurrentTactical Strategic IS develops user’s Views Business users Analysis Reports Discover Express DWH
  • 31.
    31 Data Mining workswith Warehouse Data Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence
  • 32.
    The Tool forEach Task Tool Reports Discover Express Production reporting Ad hoc query and analysis Advanced analysis Question What were sales by region last quarter? What is driving the increase in North American sales? Given the rapid increase in Web sales, what will total sales be for the rest of the year? Task DWH
  • 33.
    33 Reporting Tools Andyne Computing-- GQL Brio -- BrioQuery Business Objects -- Business Objects Cognos -- Impromptu Information Builders Inc. -- Focus for Windows Oracle -- Discoverer2000 Platinum Technology -- SQL*Assist, ProReports PowerSoft -- InfoMaker SAS Institute -- SAS/Assist Software AG -- Esperant Sterling Software -- VISION:Data
  • 34.
    34 Extraction and TransformationTools Carleton Corporation -- Passport Evolutionary Technologies Inc. -- Extract Informatica -- OpenBridge Informatica PowerCenter Information Builders Inc. -- EDA Copy Manager Platinum Technology -- InfoRefiner Prism Solutions -- Prism Warehouse Manager Red Brick Systems -- DecisionScape Formation
  • 35.
  • 36.
    DWH OLAP constructs inRDBMS: A relational database designed for OLTP will not serve well as a database for data analysis. Optimization techniques such as aggregating fact tables, partitioning fact tables, and denormalizing relation tables all provide significant improvements in performance. No Future Without Data Warehousing:
  • 37.
    Summary following are coveredtopics: Identifying a common, broadly accepted definition of the data warehouse Distinguishing the differences between OLTP systems and analytical systems Defining some of the common data warehouse terminology Identifying some of the elements and processes in a data warehouse Identifying and positioning the Oracle Warehouse vision, products, and services DWH
  • 38.