By: RAVI RANJAN




                  DATA
              WAREHOUSE
                  By: Ravi Ranjan
DEFINITION
 Data Warehouse
 A collection of corporate
 information, derived directly
 from operational systems
 and some external data
 sources. Its specific purpose
 is to support business
 decisions, not business
 operations.
THE PURPOSE OF DATA WAREHOUSING

     Realize    the value of data
          Data / information is an asset
          Methods to realize the
          value, (Reporting, Analysis, etc.)


     Make    better decisions
         Turn data into information
         Create competitive advantage
         Methods to support the decision making
          process, (EIS, DSS, etc.)
Data Warehouse Components

• Staging Area
      • A preparatory repository where transaction data
        can be transformed for use in the data warehouse
• Data Mart
      • Traditional dimensionally modeled set of dimension
        and fact tables
      • Per Kimball, a data warehouse is the union of a set
        of data marts
• Operational Data Store (ODS)
      • Modeled to support near real-time reporting needs.
DATA WAREHOUSE FUNCTIONALITY


Relational
Databases
                            Optimized Loader
               Extraction
ERP
Systems        Cleansing
                            Data Warehouse
                            Engine         Analyze
Purchased                                      Query
Data



Legacy
Data            Metadata Repository
EVOLUTION ARCHITECTURE OF DATA WAREHOUSE


                                      GO TO
 Top-Down Architecture               DIAGRAM

                                      GO TO
 Bottom-Up Architecture              DIAGRAM

                                      GO TO
 Enterprise Data Mart Architecture   DIAGRAM

                                      GO TO
 Data Stage/Data Mart Architecture   DIAGRAM
VERY LARGE DATA BASES

  WAREHOUSES ARE VERY LARGE DATABASES

 Terabytes   -- 10^12 bytes: Wal-Mart -- 24 Terabytes

 Petabytes -- 10^15 bytes: Geographic Information
                             Systems
 Exabytes -- 10^18 bytes:  National Medical Records

 Zettabytes   -- 10^21 bytes: Weather images

 Zottabytes   -- 10^24 bytes: Intelligence Agency Videos
COMPLEXITIES OF CREATING A DATA WAREHOUSE

     Incomplete errors
        Missing Fields
        Records or Fields That, by Design, are not
         Being Recorded

     Incorrecterrors
        Wrong Calculations, Aggregations
        Duplicate Records
        Wrong Information Entered into Source
         System
SUCCESS & FUTURE OF DATA WAREHOUSE

 The    Data Warehouse has successfully supported the
    increased needs of the State over the past eight years.
   The need for growth continues however, as the desire for
    more integrated data increases.
 The   Data Warehouse has software and tools in place to
    provide the functionality needed to support new
    enterprise Data Warehouse projects.
 The   future capabilities of the Data Warehouse can be
    expanded to include other programs and agencies.
DATA WAREHOUSE PITFALLS


 You are going to spend much time
 extracting, cleaning, and loading data
 Youare going to find problems with systems feeding the
 data warehouse
 Youwill find the need to store/validate data not being
 captured/validated by any existing system
 Large scale data warehousing can become an exercise
 in data homogenizing
DATA WAREHOUSE PITFALLS…

 The  time it takes to load the warehouse will expand
  to the amount of the time in the available window...
  and then some
 You are building a HIGH maintenance system

 You will fail if you concentrate on resource
  optimization to the neglect of project, data, and
  customer management issues and an understanding
  of what adds value to the customer
BEST PRACTICES


 Complete     requirements and design

 Prototyping    is key to business understanding

 Utilizing   proper aggregations and detailed data

 Training    is an on-going process

 Build   data integrity checks into your system.
Top-Down Architecture




                      BACK TO
                    ARCHITECTURE
Bottom-Up Architecture




                           BACK TO
                         ARCHITECTURE
Enterprise Data Mart Architecture




                                 BACK TO
                               ARCHITECTURE
Data Stage/Data Mart Architecture




                                BACK TO
                              ARCHITECTURE

Ppt

  • 1.
    By: RAVI RANJAN DATA WAREHOUSE By: Ravi Ranjan
  • 2.
    DEFINITION Data Warehouse A collection of corporate information, derived directly from operational systems and some external data sources. Its specific purpose is to support business decisions, not business operations.
  • 3.
    THE PURPOSE OFDATA WAREHOUSING  Realize the value of data  Data / information is an asset  Methods to realize the value, (Reporting, Analysis, etc.)  Make better decisions  Turn data into information  Create competitive advantage  Methods to support the decision making process, (EIS, DSS, etc.)
  • 4.
    Data Warehouse Components •Staging Area • A preparatory repository where transaction data can be transformed for use in the data warehouse • Data Mart • Traditional dimensionally modeled set of dimension and fact tables • Per Kimball, a data warehouse is the union of a set of data marts • Operational Data Store (ODS) • Modeled to support near real-time reporting needs.
  • 5.
    DATA WAREHOUSE FUNCTIONALITY Relational Databases Optimized Loader Extraction ERP Systems Cleansing Data Warehouse Engine Analyze Purchased Query Data Legacy Data Metadata Repository
  • 6.
    EVOLUTION ARCHITECTURE OFDATA WAREHOUSE GO TO Top-Down Architecture DIAGRAM GO TO Bottom-Up Architecture DIAGRAM GO TO Enterprise Data Mart Architecture DIAGRAM GO TO Data Stage/Data Mart Architecture DIAGRAM
  • 7.
    VERY LARGE DATABASES WAREHOUSES ARE VERY LARGE DATABASES  Terabytes -- 10^12 bytes: Wal-Mart -- 24 Terabytes  Petabytes -- 10^15 bytes: Geographic Information Systems  Exabytes -- 10^18 bytes: National Medical Records  Zettabytes -- 10^21 bytes: Weather images  Zottabytes -- 10^24 bytes: Intelligence Agency Videos
  • 8.
    COMPLEXITIES OF CREATINGA DATA WAREHOUSE  Incomplete errors  Missing Fields  Records or Fields That, by Design, are not Being Recorded  Incorrecterrors  Wrong Calculations, Aggregations  Duplicate Records  Wrong Information Entered into Source System
  • 9.
    SUCCESS & FUTUREOF DATA WAREHOUSE  The Data Warehouse has successfully supported the increased needs of the State over the past eight years.  The need for growth continues however, as the desire for more integrated data increases.  The Data Warehouse has software and tools in place to provide the functionality needed to support new enterprise Data Warehouse projects.  The future capabilities of the Data Warehouse can be expanded to include other programs and agencies.
  • 10.
    DATA WAREHOUSE PITFALLS You are going to spend much time extracting, cleaning, and loading data  Youare going to find problems with systems feeding the data warehouse  Youwill find the need to store/validate data not being captured/validated by any existing system  Large scale data warehousing can become an exercise in data homogenizing
  • 11.
    DATA WAREHOUSE PITFALLS… The time it takes to load the warehouse will expand to the amount of the time in the available window... and then some  You are building a HIGH maintenance system  You will fail if you concentrate on resource optimization to the neglect of project, data, and customer management issues and an understanding of what adds value to the customer
  • 12.
    BEST PRACTICES  Complete requirements and design  Prototyping is key to business understanding  Utilizing proper aggregations and detailed data  Training is an on-going process  Build data integrity checks into your system.
  • 14.
    Top-Down Architecture BACK TO ARCHITECTURE
  • 15.
    Bottom-Up Architecture BACK TO ARCHITECTURE
  • 16.
    Enterprise Data MartArchitecture BACK TO ARCHITECTURE
  • 17.
    Data Stage/Data MartArchitecture BACK TO ARCHITECTURE

Editor's Notes

  • #6 Legacy data is historical dataThe working information of a staff member Working hours or time-off hours within the fiscal period, up to the current dateWorking Hours = Overtime, etc.Time-Off Hours = Vacation, Sick Leave, etc.
  • #7 DataStage database, toolA tool set for designing, developing, and runnin.gapplications that populate one or more tables in a data warehouse