SEMINAR ON
DATA
WAREHOUSING
GUIDED BY :
DR. JIBITESH MISHRA SIR
PRESENTED BY :
SANGRAM KESHARI SWAIN
ADMN.NO. – 14MCA/02
M C A 2nd YR. ( 4th SEM.)
INTRODUCTION :
In 1992 W.H.INMON characterized
a data warehouse as “a subject oriented,
integrated , non-volatile , time variant
collection of data in support of
management’s decision”. Data warehouse
provide access to data for complex analysis,
knowledge discovery and decision making.
Data Warehousing
The primary concept behind
data warehousing is that the
nonvolatile data stored for
business analysis can be most
effectively managed by separating
it from the active data in the
operational systems. Nonvolatile
data is data that is not modified or
rarely modified after being moved
from operational systems to a data
warehouse.
Data Warehousing (Data Warehouse)
It is a single view of your enterprise data
- optimized for reporting and analysis
-copy of transaction and non-transaction data.
-data and information are extracted from hetero-
geneous production data sources as they are
generated, or in periodic stages, making it simpler and
more efficient to run queries over data that originally
came from different sources.
-Interactive content can be delivered to anyone in the
extended enterprise – customers, partners, employees,
managers, and executives – anytime, anywhere.
FUNDAMENTAL :
IS
│
┌───────────┐
OLTP DW
Overal process of DW
Cleaning reforming
Meta data
data
Back flushing
Other data inputs
DATABASE
Updates/new data
OLAP
DSSI
ESI
OLTP :
Traditional data base support
on-line transaction process (OLTP) ,
which includes insertion , updates, and
deletion while also supporting information
query requirements. That may touch a
small part of the data base and transactions
that deal with insertion or updates of a few
topples per relation to process. Thus they can
not be optimized for OLAP, DSS, or DATA
MINING .
DATA WAREHOUSE
TERMINOLOGY AND DEFINATION
A data warehouse is…
- the data ( i.e meta ,fact, dimension,
aggregation)
- and the process managers ( load , ware
houses ,query) that make information
available for taking informed decisions .
- it is designed for query and analysis rather
than transaction processing .
What is OLAP :
OLAP ( on-line analytical processing ) is
used to describe the analysis of complex
data for the data warehouse. In the hand
of skilled knowledge workers,OLAP tools
used distributed computing capabilities for
analysis that require more storage and
processing power that can be economica-
lly and efficiently located an individual
desktop .
WHAT IS DSS :
DSS ( decision support system ) is
also known as EIS ( executive
information system ). It supports an
organization's leading decision
makers with higher level data for
complex and important decisions.
DATA MINING :
By contrast , data warehouse are
designed précised to support efficient
extraction processing and presentation
for analytic and decision making
purpose . In comparison to traditional
database data warehouses generally
contain very large amounts of data from
multiple process that may include
database from different data models
and sometimes file acquired from
independent system and platform .
INMON’S CHARACTERISTICS :
 SUBJECT ORIENTED : Focus on a
topic to analyze data.
 INTEGRATED : Data from disparate
sources must be put into a consistent
format .
 NON-VOLATILE : Data should not
change once into the warehouse.
DATA WAREHOUSE
ARCHITECHTURE
COMPONENTS OF DATA
WAREHOUSE
 WHEN AND HOW TO GATHER DATA :
In a source-driven architecture for
gathering data ,the data sources
transmit new information ,either
continually or periodically . In
estination-driven architecture, the
data warehouse periodically sends
request for new data to the sources
* WHAT SCHEMA TO USE :
Data sources that have been
constructed independently are likely to
have different schema .In fact they
may even use different data models.
Part of task of a warehouse is to
perform schema integration, and to
convert data to the integrated schema
before they are stored. As a result the
data stored in the warehouse are not
just a copy of the data of the data at
the sources.
 DATA CLEANSING :
- The task of correcting and pre-processing
data is called data cleaning.
-Records for multiple individuals in a house
may grouped together so only mailing is sent
to each house (house holding).
 HOW TO PROPAGATE DATA :
Updates on relations at the sources must
be propagated to the warehouse .If the
relation at the data warehouse are exactly
same as those data sources ,the propagation
is straight forward. If they are not ,the
problem of propagating updates is basically
the view-maintenance problem.
 WHAT DATA TO BE SUMMERISE :
The raw data generated by a
transaction process system may be too
large to store on-line. However we can
answer many queries by maintaining
just summery data obtained
aggregation on a relation rather than
maintaining the entire relation .
DATE
MONTH
QUARTER
YEAR
ITEM-ID
ITEM NAME
COLOR
SIZE
CATEGORY ITEM-ID
STORE-ID
CUSTOMER-ID
DATE
NUMBER
PRICE
STORE-ID
CITY
STATE
COUNTRY
CUSTOMER-ID
NAME
STREET
CITY
STATE
COUNTRY
STAR SCHEMA FOR A DATA WAREHOUSE
Item info
store
Date info
customer
Data Warehousing with NGS-IQ
SUMMARY
Data warehouse is latest technology.
It helps gather and archive
operational data. Warehouse are used
for decision supportand analysis of
historical data,for instance to predicts
trends.
Data warehouseing

Data warehouseing

  • 1.
  • 2.
    GUIDED BY : DR.JIBITESH MISHRA SIR PRESENTED BY : SANGRAM KESHARI SWAIN ADMN.NO. – 14MCA/02 M C A 2nd YR. ( 4th SEM.)
  • 3.
    INTRODUCTION : In 1992W.H.INMON characterized a data warehouse as “a subject oriented, integrated , non-volatile , time variant collection of data in support of management’s decision”. Data warehouse provide access to data for complex analysis, knowledge discovery and decision making.
  • 4.
    Data Warehousing The primaryconcept behind data warehousing is that the nonvolatile data stored for business analysis can be most effectively managed by separating it from the active data in the operational systems. Nonvolatile data is data that is not modified or rarely modified after being moved from operational systems to a data warehouse.
  • 5.
    Data Warehousing (DataWarehouse) It is a single view of your enterprise data - optimized for reporting and analysis -copy of transaction and non-transaction data. -data and information are extracted from hetero- geneous production data sources as they are generated, or in periodic stages, making it simpler and more efficient to run queries over data that originally came from different sources. -Interactive content can be delivered to anyone in the extended enterprise – customers, partners, employees, managers, and executives – anytime, anywhere.
  • 7.
  • 8.
    Overal process ofDW Cleaning reforming Meta data data Back flushing Other data inputs DATABASE Updates/new data OLAP DSSI ESI
  • 9.
    OLTP : Traditional database support on-line transaction process (OLTP) , which includes insertion , updates, and deletion while also supporting information query requirements. That may touch a small part of the data base and transactions that deal with insertion or updates of a few topples per relation to process. Thus they can not be optimized for OLAP, DSS, or DATA MINING .
  • 10.
    DATA WAREHOUSE TERMINOLOGY ANDDEFINATION A data warehouse is… - the data ( i.e meta ,fact, dimension, aggregation) - and the process managers ( load , ware houses ,query) that make information available for taking informed decisions . - it is designed for query and analysis rather than transaction processing .
  • 11.
    What is OLAP: OLAP ( on-line analytical processing ) is used to describe the analysis of complex data for the data warehouse. In the hand of skilled knowledge workers,OLAP tools used distributed computing capabilities for analysis that require more storage and processing power that can be economica- lly and efficiently located an individual desktop .
  • 12.
    WHAT IS DSS: DSS ( decision support system ) is also known as EIS ( executive information system ). It supports an organization's leading decision makers with higher level data for complex and important decisions.
  • 13.
    DATA MINING : Bycontrast , data warehouse are designed précised to support efficient extraction processing and presentation for analytic and decision making purpose . In comparison to traditional database data warehouses generally contain very large amounts of data from multiple process that may include database from different data models and sometimes file acquired from independent system and platform .
  • 14.
    INMON’S CHARACTERISTICS : SUBJECT ORIENTED : Focus on a topic to analyze data.  INTEGRATED : Data from disparate sources must be put into a consistent format .  NON-VOLATILE : Data should not change once into the warehouse.
  • 15.
  • 16.
    COMPONENTS OF DATA WAREHOUSE WHEN AND HOW TO GATHER DATA : In a source-driven architecture for gathering data ,the data sources transmit new information ,either continually or periodically . In estination-driven architecture, the data warehouse periodically sends request for new data to the sources
  • 17.
    * WHAT SCHEMATO USE : Data sources that have been constructed independently are likely to have different schema .In fact they may even use different data models. Part of task of a warehouse is to perform schema integration, and to convert data to the integrated schema before they are stored. As a result the data stored in the warehouse are not just a copy of the data of the data at the sources.
  • 18.
     DATA CLEANSING: - The task of correcting and pre-processing data is called data cleaning. -Records for multiple individuals in a house may grouped together so only mailing is sent to each house (house holding).  HOW TO PROPAGATE DATA : Updates on relations at the sources must be propagated to the warehouse .If the relation at the data warehouse are exactly same as those data sources ,the propagation is straight forward. If they are not ,the problem of propagating updates is basically the view-maintenance problem.
  • 19.
     WHAT DATATO BE SUMMERISE : The raw data generated by a transaction process system may be too large to store on-line. However we can answer many queries by maintaining just summery data obtained aggregation on a relation rather than maintaining the entire relation .
  • 20.
  • 21.
  • 22.
    SUMMARY Data warehouse islatest technology. It helps gather and archive operational data. Warehouse are used for decision supportand analysis of historical data,for instance to predicts trends.