Data Warehousing

    Kavisha Uniyal
         And
    Gunjan Bhandari
 DEFINITION

• ”A DATA WAREHOUSE is a
subject-oriented, integrated,
time varient, non volatile
collection of data in support
of management decision”
 Data Warehouse Features
•   Subject-oriented: WH is organized around the major subjects of the
    enterprise rather than the major application areas.. This is reflected in the need
    to store decision-support data rather than application-oriented data
•   Integrated: because the source data come together from different enterprise-
    wide applications systems. The source data is often inconsistent using..The
    integrated data source must be made consistent to present a unified view of the
    data to the users
•   Time-variant : the source data in the WH is only accurate and valid at some
    point in time or over some time interval. The time-variance of the data
    warehouse is also shown in the extended time that the data is held, the implicit
    or explicit association of time with all data, and the fact that the data
    represents a series of snapshots
•   Non-volatile : data is not update in real time but is refresh from OS on a
    regular basis. New data is always added as a supplement to DB, rather than
    replacement. The DB continually absorbs this new data, incrementally
    integrating it with previous data
Why Data Warehousing??


“Necessity is the
 mother of
 invention”
 Scenario 1
Cola Pvt Ltd is a company with branches at
Mumbai, Delhi, Chennai and Banglore. The
Sales Manager wants annual sales report to
plan the future production quantity in each
branch. Each branch has a separate
operational system.
Scenario 1 : Cola Pvt Ltd.


Mumbai




 Delhi
              Sales per item type per branch    Sales
                         for a year.           Manager

Chennai




Banglore
Solution 1:Cola Pvt Ltd.
 Mumbai


                                             Report
 Delhi
                              Query &                  Sales
                  Data
                            Analysis tools            Manager
                Warehouse

Chennai




Banglore
Need Of Data Warehouse
• Business Users : require data warehouse to view summarized
  data from past. The data is presented in a very simple form such
  that it is very easy to understand the facts and figures.
• Store Historic Data : data warehouse is required to store the
  time variable data from past.
• Selective Data : when data is stored in DWH it may not be full
  data because DWH contains summarized data.
• Differentiate analytical and operational processing:
  operational DB is entitled for online transactions and various
  operations. And analytical DB is used for analysis the
  summarized data.
• Make Strategic Decisions : some strategies may be depending
  upon the data in the DWH
 3 Tier Architecture
3 Tire Architecture
1.   Extraction and Transformation Tier : Data is collected from
     various sources and then refined, non useful data is eliminated,
     transformed into standard format and then loaded into data
     warehouse.
2.   Connective Tier : The data mart server serves as connective
     or middle tier. Extracted and transformed data from DB is
     stored in DWH(central storage)
3.   Data Access and Retrieval Tier : End user enters a query
     through OLAP tool. The query is processed by WH and the
     graphical and complex results are displayed.
DATA MART
• Definition of Data Mart
   A subset of data warehouse that stores only relevant data

   Data Mart is of two types : dependent and independent
• Dependent data mart
  A subset that is created directly from a data warehouse
• Independent data mart
  A small data warehouse designed for a strategic business
  unit or a department
Scenario 2
• There are three companies X, Y and Z arranged
  along the x-axis. There are three countries India,
  China, Japan arranged along y-axis. The two years
  2010 and 2011 are shown along the z-axis. The
  intersection of each element from x, y and z-axis
  gives the sales in lacs of a particular company in a
  particular country in a particular year. ‘All’ given
  along axis displays the sum of sale in all entities
  with the intersecting dimension.
All

          year          2010

                 2011



             All               31         46       24   101



          JAPAN            4          9        3         16



          CHINA            7          12       5         24



          INDIA            20         25       16       61
country



                           X          Y            Z     All



                           company
Online Analysis
                Processing(OLAP)
•   It enables analysts, managers and executives to gain insight into data
    through fast, consistent, interactive access to a wide variety of possible
    views of information that has been transformed from raw data to
    reflect the real dimensionality of the enterprise as understood by the
    user.




            Data
          Warehouse
OLAP Operations
1. ROLL UP: The roll uses concept hierarchy which maps
   lower level details to higher generalized details.
    For eg:
     STREET

               AREA

                        CITY

                               STATE
2.DRILL DOWN: It is opposite of Roll Up. It
  goes from higher level details to lower level
  details. For eg:
  CONTINENT

               COUNTRY

                        STATE
3. SLICE AND DICE: This is used for searching and
     accessing data in the cube.

    YEAR




                           COUNTR
                           Y
COUNTR
Y                                   COMPANY

         COMPANY
4. PIVOT OR ROTATE: This operation is
  used when a user wants to change the
  orientation of the view of cube. In this
  operation position of some rows or some
  columns may be changed.
                             year
    year




                       company
country



                                    country
           company
Difference Between Data
             Warehouse and Database
                       Database              Data Warehouse


Orientation        Application oriented      Subject oriented
Amount of Detail   Detailed data             Summarized data
Time Dependence    Give data at the moment   Give data over time
                   of access
Community served   Clerical community        Managerial community
volatility         Volatile                  Non-volatile
Availability       Highly available          Relaxed availability
Redundancy         Non-redundant             Some redundancies
REFERENCE

BOOK:
• Data Mining and Warehousing by KANIKA LAKHANI AND
  GAURAV GIRDHAR.



SEARCH ENGINE:
• Google

Seminar datawarehousing

  • 1.
    Data Warehousing Kavisha Uniyal And Gunjan Bhandari
  • 2.
     DEFINITION • ”ADATA WAREHOUSE is a subject-oriented, integrated, time varient, non volatile collection of data in support of management decision”
  • 3.
     Data WarehouseFeatures • Subject-oriented: WH is organized around the major subjects of the enterprise rather than the major application areas.. This is reflected in the need to store decision-support data rather than application-oriented data • Integrated: because the source data come together from different enterprise- wide applications systems. The source data is often inconsistent using..The integrated data source must be made consistent to present a unified view of the data to the users • Time-variant : the source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots • Non-volatile : data is not update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data
  • 4.
    Why Data Warehousing?? “Necessityis the mother of invention”
  • 5.
     Scenario 1 ColaPvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants annual sales report to plan the future production quantity in each branch. Each branch has a separate operational system.
  • 6.
    Scenario 1 :Cola Pvt Ltd. Mumbai Delhi Sales per item type per branch Sales for a year. Manager Chennai Banglore
  • 7.
    Solution 1:Cola PvtLtd. Mumbai Report Delhi Query & Sales Data Analysis tools Manager Warehouse Chennai Banglore
  • 8.
    Need Of DataWarehouse • Business Users : require data warehouse to view summarized data from past. The data is presented in a very simple form such that it is very easy to understand the facts and figures. • Store Historic Data : data warehouse is required to store the time variable data from past. • Selective Data : when data is stored in DWH it may not be full data because DWH contains summarized data. • Differentiate analytical and operational processing: operational DB is entitled for online transactions and various operations. And analytical DB is used for analysis the summarized data. • Make Strategic Decisions : some strategies may be depending upon the data in the DWH
  • 9.
     3 TierArchitecture
  • 10.
    3 Tire Architecture 1. Extraction and Transformation Tier : Data is collected from various sources and then refined, non useful data is eliminated, transformed into standard format and then loaded into data warehouse. 2. Connective Tier : The data mart server serves as connective or middle tier. Extracted and transformed data from DB is stored in DWH(central storage) 3. Data Access and Retrieval Tier : End user enters a query through OLAP tool. The query is processed by WH and the graphical and complex results are displayed.
  • 11.
    DATA MART • Definitionof Data Mart A subset of data warehouse that stores only relevant data Data Mart is of two types : dependent and independent • Dependent data mart A subset that is created directly from a data warehouse • Independent data mart A small data warehouse designed for a strategic business unit or a department
  • 12.
    Scenario 2 • Thereare three companies X, Y and Z arranged along the x-axis. There are three countries India, China, Japan arranged along y-axis. The two years 2010 and 2011 are shown along the z-axis. The intersection of each element from x, y and z-axis gives the sales in lacs of a particular company in a particular country in a particular year. ‘All’ given along axis displays the sum of sale in all entities with the intersecting dimension.
  • 13.
    All year 2010 2011 All 31 46 24 101 JAPAN 4 9 3 16 CHINA 7 12 5 24 INDIA 20 25 16 61 country X Y Z All company
  • 14.
    Online Analysis Processing(OLAP) • It enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the real dimensionality of the enterprise as understood by the user. Data Warehouse
  • 15.
    OLAP Operations 1. ROLLUP: The roll uses concept hierarchy which maps lower level details to higher generalized details. For eg: STREET AREA CITY STATE
  • 16.
    2.DRILL DOWN: Itis opposite of Roll Up. It goes from higher level details to lower level details. For eg: CONTINENT COUNTRY STATE
  • 17.
    3. SLICE ANDDICE: This is used for searching and accessing data in the cube. YEAR COUNTR Y COUNTR Y COMPANY COMPANY
  • 18.
    4. PIVOT ORROTATE: This operation is used when a user wants to change the orientation of the view of cube. In this operation position of some rows or some columns may be changed. year year company country country company
  • 19.
    Difference Between Data Warehouse and Database Database Data Warehouse Orientation Application oriented Subject oriented Amount of Detail Detailed data Summarized data Time Dependence Give data at the moment Give data over time of access Community served Clerical community Managerial community volatility Volatile Non-volatile Availability Highly available Relaxed availability Redundancy Non-redundant Some redundancies
  • 20.
    REFERENCE BOOK: • Data Miningand Warehousing by KANIKA LAKHANI AND GAURAV GIRDHAR. SEARCH ENGINE: • Google

Editor's Notes