Data Warehouse/Data Mart
 Components
  Concepts
Characteristics
Overview
• Operational vs Informational Systems
• Data Warehouse components
• Data Marts
Basic Data Warehouse
                    Architecture
              One Version
Source OLTP
              of the Truth                Subset Data Marts
  Systems




                Enterprise
                  Data
                Warehouse




                      Copyright © 1997, Enterprise Group, Ltd.
Operational vs. Informational
          Systems
     Order              Operational
     Entry   Manf.
                        Systems




    Information Access Today
Operational vs. Informational
          Systems

                      Operational
                      Systems




                     Informational
                     Systems




  Information Access Today
Operational vs. Informational
                    Systems
• Most of the advances in end-user programming have run into
  difficulty in actually accessing data that exists in backbone,
  operational data bases.


• Operational data bases have a very, very long life. Large operational
  systems are converted from one technology to a more advanced one
  very infrequently (typically every eight to twenty years).


• Therefore, why not create specific DBs whose role was to make large
  scale end user access easy to isolate the operational DBs, i.e. a Data
  Warehouse
Operational vs. Informational
          Systems

                Operational
                Systems

                Information
                Delivery System


                Informational
                Systems
Operational vs. Informational
          Systems

                Operational
                Systems

                Data
                Information
                Warehouse
                Delivery System


                Informational
                Systems
Operational vs. Informational
          Systems

                Operational
                Systems

                Data
                Information
                Warehouse
                Delivery System


                Informational
                Systems
Operational vs. Informational
          Systems

                Operational
                Systems

                Data
                Information
                Warehouse
                Delivery System


                Informational
                Systems
Operational vs. Informational
          Systems
  Notice that one of the big impacts of
                          Operational
  Data Warehousing is to eliminate large
                          Systems
  numbers of existing DSS systems!
  Y2000 will make this essential!!!
                         Data
                         Information
                       Warehouse
                       Delivery System


                       Informational
                       Systems
Operational vs. Informational
                  Systems

                        Operational
                        Systems

                        Data
                        Information
Data                    Warehouse
                        Delivery System
Marts

                        Informational
                        Systems
Data Marts vs Data Warehouses
                                                                                    Internet/Intranet Layer 11

                                               direct queries

                                               virtual queries

                                               ad hoc queries                               Virtual DW


                                                                                            Coarse DW


                                                                                                                                              Operational Data
                                                                                            Central DW
                                                                                                                                                   Layer 2a


                                                                                          Distributed DW
     North America                                                                        Core DW Layer 3                                     External Data
                                                                                                                                                  Layer
                     United States
                                $11,000


                             Sales


                       United States
                                                                                                                                                        2b
                           by Sales

                      $10,340to $10,350 (1)
                       $8,730to $10,340 (2)
                       $4,320to $8,730 (2)
                       $1,100to $4,320 (1)
                        $730to $1,100 (3)




 Presentation/                                                     Data Feed/                                                        Data     Non-operational
Desktop Access                                Data Mart           Data Mining/                                   Data Staging and   Access        Data
    Layer 1                                     Layer 4          Indexing Layer 6                                 Quality Layer     Layer 7       Layer 2c
                                                                                                                                5
                                                                             Meta-data Repository Layer 8

                                                                            Warehouse Management Layer 9

                                                                              Application Messaging (Transport) Layer 10
Central Data Warehouse
                                                                                    Internet/Intranet Layer 11

                                               direct queries

                                               virtual queries

                                               ad hoc queries

                                                                                                                                              Tracking DB


                                                                                                                                              Lawson DB
                                                                                                                                              Operational Data
                                                                                            Central DW
                                                                                                                                                   Layer 2a




     North America                                                                        Core DW Layer 3                                     External Data
                                                                                                                                                  Layer
                     United States
                                $11,000


                             Sales


                       United States
                                                                                                                                                        2b
                           by Sales

                      $10,340to $10,350 (1)
                       $8,730to $10,340 (2)
                       $4,320to $8,730 (2)
                       $1,100to $4,320 (1)
                        $730to $1,100 (3)




 Presentation/                                                     Data Feed/                                                        Data     Non-operational
Desktop Access                                Data Mart           Data Mining/                                   Data Staging and   Access        Data
    Layer 1                                     Layer 4          Indexing Layer 6                                 Quality Layer     Layer 7       Layer 2c
                                                                                                                                5
                                                                             Meta-data Repository Layer 8

                                                                            Warehouse Management Layer 9

                                                                              Application Messaging (Transport) Layer 10
Virtual Date Warehouse
• A Virtual Data Warehouse approach is often
  chosen when there are infrequent demands for
  data and management wants to determine if/how
  users will use operational data.
• One of the weaknesses of a Virtual Data
  Warehouse approach is that user queries a made
  against operational DBs.
• One way to minimize this problem is to build a
  “Query Monitor” to check the performance
  characteristics of a query before executing it.
• A Coarse Data Warehouse is often chosen when the
  organization has a relatively clean/new operational
  system and management wants to make the operational
  data more easily available for just that system.
• A Central Data Warehouse
• is often chosen when the organization has a clear
  understanding about it Information Access needs and
  wants to provide “quality”, “integrated” , information to
  its knowledge workers
• A Distributed Data Warehouse is similar in most respects
  to a Central Data Warehouse, except that the data is
  distributed to separate mini-Data Warehouses (Data
  Marts )on local or specialized servers
Central Data Warehouse
                                                                                    Internet/Intranet Layer 11

                                               direct queries

                                               virtual queries

                                               ad hoc queries                               Virtual DW


                                                                                            Coarse DW


                                                                                                                                              Operational Data
                                                                                            Central DW
                                                                                                                                                   Layer 2a


                                                                                          Distributed DW
     North America                                                                        Core DW Layer 3                                     External Data
                                                                                                                                                  Layer
                     United States
                                $11,000


                             Sales


                       United States
                                                                                                                                                        2b
                           by Sales

                      $10,340to $10,350 (1)
                       $8,730to $10,340 (2)
                       $4,320to $8,730 (2)
                       $1,100to $4,320 (1)
                        $730to $1,100 (3)




 Presentation/                                                     Data Feed/                                                        Data     Non-operational
Desktop Access                                Data Mart           Data Mining/                                   Data Staging and   Access        Data
    Layer 1                                     Layer 4          Indexing Layer 6                                 Quality Layer     Layer 7       Layer 2c
                                                                                                                                5
                                                                             Meta-data Repository Layer 8

                                                                            Warehouse Management Layer 9

                                                                              Application Messaging (Transport) Layer 10
Data Marts Only
                                                                                    Internet/Intranet Layer 11

                                               direct queries

                                               virtual queries

                                               ad hoc queries                               Virtual DW


                                                                                            Coarse DW


                                                                                                                                              Operational Data
                                                                                            Central DW
                                                                                                                                                   Layer 2a


                                                                                          Distributed DW
     North America                                                                        Core DW Layer 3                                     External Data
                                                                                                                                                  Layer
                     United States
                                $11,000


                             Sales


                       United States
                                                                                                                                                        2b
                           by Sales

                      $10,340to $10,350 (1)
                       $8,730to $10,340 (2)
                       $4,320to $8,730 (2)
                       $1,100to $4,320 (1)
                        $730to $1,100 (3)




 Presentation/                                                     Data Feed/                                                        Data     Non-operational
Desktop Access                                Data Mart           Data Mining/                                   Data Staging and   Access        Data
    Layer 1                                     Layer 4          Indexing Layer 6                                 Quality Layer     Layer 7       Layer 2c
                                                                                                                                5
                                                                             Meta-data Repository Layer 8

                                                                            Warehouse Management Layer 9

                                                                              Application Messaging (Transport) Layer 10
Heterogeneity - The Reality
    i2 Supply Chain   Oracle Financials   Siebel CRM   3rd Party
                                                            Data




                         Packaged
                                                 Custom
                         Oracle
                                                 Marketing
                         Financial
                                                 Data
                         Data
                                                 Warehouse
                         Warehouse
Packaged
I2 Supply Chain          Subset
Non- Architected
Data Mart                Data Marts
Federated BI Architecture
i2 Supply Chain   Oracle Financials       Siebel CRM   3rd Party   e-commerce




                               Common
                               Staging
                               Area                        Real Time
                                                           ODS



                   Federated              Federated
                   Financial              Marketing
                   Data                   Data             Real Time
                   Warehouse              Warehouse        Data Mining
                                                           and Analytics
Federated
Packaged                                                           Real Time
I2 Supply         Subset
                  Data Marts                                       Segmentation,
Chain                                                              Classification,
Data Marts                                                         Qualification,
                               Analytical                          Offerings, etc.
                               Applications
Benefits of Data Warehouse
           Architecture
• Provides organizing framework
• Gives flexibility for changes and allows
  simplified maintenance
• Speeds up future development by aiding
  understanding of dw
• Communication tool for roles and
  requirements
• Coordinate data marts
Primary Technical Challenge Axis
                               Dirty Data Large Co.
Slow                              Parallel  Near
                                 ERP DW     Real
                         Custom
       Monthly                        VLDB Time
                         ERP DW
        Freq Turnkey
                                        Finance
               ERP DW
                                        Multi-Source
        Small DB    Mid-Size Co.
         Marketing
         Single Source
Fast     Clean Data

  Easy                                        Hard
Prerequisites for Success

•   Pain driven
•   Sponsorship at the highest levels
•   Sustainable political will
•   Iterative methodology
•   Manageable scope
•   User driven design
•   Service business mindset
•   Sustainability

Cs753 2a

  • 1.
    Data Warehouse/Data Mart Components Concepts Characteristics
  • 2.
    Overview • Operational vsInformational Systems • Data Warehouse components • Data Marts
  • 3.
    Basic Data Warehouse Architecture One Version Source OLTP of the Truth Subset Data Marts Systems Enterprise Data Warehouse Copyright © 1997, Enterprise Group, Ltd.
  • 4.
    Operational vs. Informational Systems Order Operational Entry Manf. Systems Information Access Today
  • 5.
    Operational vs. Informational Systems Operational Systems Informational Systems Information Access Today
  • 6.
    Operational vs. Informational Systems • Most of the advances in end-user programming have run into difficulty in actually accessing data that exists in backbone, operational data bases. • Operational data bases have a very, very long life. Large operational systems are converted from one technology to a more advanced one very infrequently (typically every eight to twenty years). • Therefore, why not create specific DBs whose role was to make large scale end user access easy to isolate the operational DBs, i.e. a Data Warehouse
  • 7.
    Operational vs. Informational Systems Operational Systems Information Delivery System Informational Systems
  • 8.
    Operational vs. Informational Systems Operational Systems Data Information Warehouse Delivery System Informational Systems
  • 9.
    Operational vs. Informational Systems Operational Systems Data Information Warehouse Delivery System Informational Systems
  • 10.
    Operational vs. Informational Systems Operational Systems Data Information Warehouse Delivery System Informational Systems
  • 11.
    Operational vs. Informational Systems Notice that one of the big impacts of Operational Data Warehousing is to eliminate large Systems numbers of existing DSS systems! Y2000 will make this essential!!! Data Information Warehouse Delivery System Informational Systems
  • 12.
    Operational vs. Informational Systems Operational Systems Data Information Data Warehouse Delivery System Marts Informational Systems
  • 13.
    Data Marts vsData Warehouses Internet/Intranet Layer 11 direct queries virtual queries ad hoc queries Virtual DW Coarse DW Operational Data Central DW Layer 2a Distributed DW North America Core DW Layer 3 External Data Layer United States $11,000 Sales United States 2b by Sales $10,340to $10,350 (1) $8,730to $10,340 (2) $4,320to $8,730 (2) $1,100to $4,320 (1) $730to $1,100 (3) Presentation/ Data Feed/ Data Non-operational Desktop Access Data Mart Data Mining/ Data Staging and Access Data Layer 1 Layer 4 Indexing Layer 6 Quality Layer Layer 7 Layer 2c 5 Meta-data Repository Layer 8 Warehouse Management Layer 9 Application Messaging (Transport) Layer 10
  • 14.
    Central Data Warehouse Internet/Intranet Layer 11 direct queries virtual queries ad hoc queries Tracking DB Lawson DB Operational Data Central DW Layer 2a North America Core DW Layer 3 External Data Layer United States $11,000 Sales United States 2b by Sales $10,340to $10,350 (1) $8,730to $10,340 (2) $4,320to $8,730 (2) $1,100to $4,320 (1) $730to $1,100 (3) Presentation/ Data Feed/ Data Non-operational Desktop Access Data Mart Data Mining/ Data Staging and Access Data Layer 1 Layer 4 Indexing Layer 6 Quality Layer Layer 7 Layer 2c 5 Meta-data Repository Layer 8 Warehouse Management Layer 9 Application Messaging (Transport) Layer 10
  • 16.
    Virtual Date Warehouse •A Virtual Data Warehouse approach is often chosen when there are infrequent demands for data and management wants to determine if/how users will use operational data. • One of the weaknesses of a Virtual Data Warehouse approach is that user queries a made against operational DBs. • One way to minimize this problem is to build a “Query Monitor” to check the performance characteristics of a query before executing it.
  • 17.
    • A CoarseData Warehouse is often chosen when the organization has a relatively clean/new operational system and management wants to make the operational data more easily available for just that system. • A Central Data Warehouse • is often chosen when the organization has a clear understanding about it Information Access needs and wants to provide “quality”, “integrated” , information to its knowledge workers • A Distributed Data Warehouse is similar in most respects to a Central Data Warehouse, except that the data is distributed to separate mini-Data Warehouses (Data Marts )on local or specialized servers
  • 18.
    Central Data Warehouse Internet/Intranet Layer 11 direct queries virtual queries ad hoc queries Virtual DW Coarse DW Operational Data Central DW Layer 2a Distributed DW North America Core DW Layer 3 External Data Layer United States $11,000 Sales United States 2b by Sales $10,340to $10,350 (1) $8,730to $10,340 (2) $4,320to $8,730 (2) $1,100to $4,320 (1) $730to $1,100 (3) Presentation/ Data Feed/ Data Non-operational Desktop Access Data Mart Data Mining/ Data Staging and Access Data Layer 1 Layer 4 Indexing Layer 6 Quality Layer Layer 7 Layer 2c 5 Meta-data Repository Layer 8 Warehouse Management Layer 9 Application Messaging (Transport) Layer 10
  • 19.
    Data Marts Only Internet/Intranet Layer 11 direct queries virtual queries ad hoc queries Virtual DW Coarse DW Operational Data Central DW Layer 2a Distributed DW North America Core DW Layer 3 External Data Layer United States $11,000 Sales United States 2b by Sales $10,340to $10,350 (1) $8,730to $10,340 (2) $4,320to $8,730 (2) $1,100to $4,320 (1) $730to $1,100 (3) Presentation/ Data Feed/ Data Non-operational Desktop Access Data Mart Data Mining/ Data Staging and Access Data Layer 1 Layer 4 Indexing Layer 6 Quality Layer Layer 7 Layer 2c 5 Meta-data Repository Layer 8 Warehouse Management Layer 9 Application Messaging (Transport) Layer 10
  • 20.
    Heterogeneity - TheReality i2 Supply Chain Oracle Financials Siebel CRM 3rd Party Data Packaged Custom Oracle Marketing Financial Data Data Warehouse Warehouse Packaged I2 Supply Chain Subset Non- Architected Data Mart Data Marts
  • 21.
    Federated BI Architecture i2Supply Chain Oracle Financials Siebel CRM 3rd Party e-commerce Common Staging Area Real Time ODS Federated Federated Financial Marketing Data Data Real Time Warehouse Warehouse Data Mining and Analytics Federated Packaged Real Time I2 Supply Subset Data Marts Segmentation, Chain Classification, Data Marts Qualification, Analytical Offerings, etc. Applications
  • 22.
    Benefits of DataWarehouse Architecture • Provides organizing framework • Gives flexibility for changes and allows simplified maintenance • Speeds up future development by aiding understanding of dw • Communication tool for roles and requirements • Coordinate data marts
  • 23.
    Primary Technical ChallengeAxis Dirty Data Large Co. Slow Parallel Near ERP DW Real Custom Monthly VLDB Time ERP DW Freq Turnkey Finance ERP DW Multi-Source Small DB Mid-Size Co. Marketing Single Source Fast Clean Data Easy Hard
  • 24.
    Prerequisites for Success • Pain driven • Sponsorship at the highest levels • Sustainable political will • Iterative methodology • Manageable scope • User driven design • Service business mindset • Sustainability