Information & Knowledge
  Management - Class 3
        Marielba Zacarias
       Prof. Auxiliar DEEI
    FCT I, Gab 2.69, Ext. 7749
         Data-warehousing
          mzacaria@ualg.pt
    http://w3.ualg.pt/~mzacaria
Summary

Data-warehouses
 The architected environment
 Design Process
 Data-modeling schemas
Data Warehousing
Data collection for analysis and
reporting taks
Historical data
Stored in a distinct environment from
operational data
Structure different from data-bases
Why
Operational and analitical data have
different requirements in terms of
 usage (frequency, response time)
 hardware
 software
 structure
Data-warehousing
     Users
Before Data-Warehouses....
      The “spider web”




            6
The “arquitected” environment”

                           Atomic                  Dept.              individual
 operational
                             dw                     dw                   dw
                                               “data-marts”
       Detailed                                                           temporal
                         More granular               derived,
         daily                                                             Ad-hoc
                           Temporal              Some primitive
    current value                                                         Heuristic
                          Integrated           Typical of Marketing
  High access prob.                                                    Não-repetitive
                        Subject oriented           Engineering
 Application oriented                                                 Oriented to PC or
                          Sumarized                 Production
                                                                        workstations
                                                   Accounting



                                           7
Type of questions
                  Atomico
  operacional                     Dept.        individual
                    dw


  J. Jones         1986-87
                                Jan – 4101    Clientes
123 Main St.       J. Jones
                                Fev – 4209   Desde 1982
 Credit - AA     456 High St.
                                Mar- 4175    Com saldos
                  Credit - B
                                Apr - 4215    > 5,000
   Jones                                      e crédito
   Credit?         1987-89
                                 Monthly        >= B
                   J. Jones
                 456 High St.    Sales?
                  Credit - A

                 1989 – pte.                 Client types
       Jones       J. Jones                  in analysis?
       Credit    123 Main St.
      History?    Credit - AA
                            8
Architected Environment
                Production
               Environment




 Operational                  Analitical
 environment                 Environment


                   9
Data-warehouse design
 Requirement         Performance Tuning
 Gatherings          Query
 Physical            Optimization
 Environment Setup   Quality Assurance
 Data Modeling       Rolling out to
 ETL                 Production
 OLAP Cube Design    Production
 Front End           Maintenance
 Development         Incremental
                     Enhancements
 Report
 Development
Requirements
       Gathering
Take into account users
  Executive with little time and knowledge about
  technical terms
  Interviews, JAD sessions
    User Reporting/Analysis Requirements
    Hardware, training requirements
    Data source identification
    Concrete project plan
Physical Environment
        Setup
Setup Servers, DBMS and databases,
ETL, OLAP Cubes and reporting services
Create three environments
 development, testing, production
Data-modeling
            Depends on initial data source identification
            Conceptual, logical and physical data modeling




 Should be related
to the information
  architecture!!!!
Data Modeling
  Dimensional Approach
Transactional data is partitioned in facts
  Numeric transaction data
    products ordered, price
Dimensions
  provide context for facts
    order date, customer name, product
    number, location info, salesperson
Dimensional Approaches
 Star
   Fact table (typically a transaction)
   Dimensions (context of the transaction)
 Snowflake
   Dimensions indirectly linked to fact
   tables
Star Metaphor
Star Schema
Relational model
Star schema
Snow-flake schema
OLAP Cube Design
Specification of detailed reporting needs
in terms of the multi-dimensional
structure previously defined (star or
snowflake), but regarded as a n-
dimensional cube
star/snowflake and cubes are pretty
much the same thing
cubes are more appropriate for not IT
users
The Cube Metaphor
Slicing
Dicing
Rotating
ETL

Extraction
Transformation
Loading
SQL Server
Integration Services
SQL Server
Integration Examples
SQL Server
Integration Examples II
    Qualitative data
                 Description term                 ActionId
                 team meeting                          18
                 hr distribution                       19
                 project list                          19
                 team meeting                          19
                 hr distribution                       26
                 project list                          26
                 claims application                    27
                 claims application                    28
                 cards application maintenance         29
                 claims application integration        30
                 hr distribution                       31
                 project list                          31
                 claims application                    34
                 claims application                    35
                 hr distribution                       36
                 project list                          36
SQL Server
Integration Examples III
   Fuzzy Transformations
Front-end development
 Front-ends range from
   in-house development with scripting
   languages php, asp, or perl
   to off-the-shelf products such as Crystal
   Reports or higher-end products such as
   Actuate
   OLAP vendors also offer front-ends of their
   own
Report Development
Derived from requirements
Main point of contact between the data-
warehouse and users
User customization
Report Delivery (web, e-mail, sms, file
formats)
Access privileges
Performance Tuning
ETL
Query Processing
 Users loose interest after 30 sec!
 Query optimization
Report Delivery
Query Optimization
Understand how your DBMS executes queries
Store intermediate results in temporary tables
Query Optimization tips
  Use indexes
  Partition tables (vertically and horizontally)
  De-normalize (less joins)
  Server Tuning
Quality Assurance
Test plan with quality criteria for data
Critical success factor
Often overlooked
Performed by people with knowledge of
the business data not data-warehouses
  Resistance
Rolling to production

Seems easy but..
Putting everyone online may take a full
week in some cases
Online access can be as simple as
sending a link by e-mail
Production Maintenance
 Backup and recovery processes
 Crisis Management
 Monitoring end-user usage
  Capture runaways queries before
  whole system is slowed down
  To measure usage for ROI calculations
  and future enhancements
Incremental enhancements

  Accomplish small changes such as
  changing original geographical
  designations
   A company may add new sales regions
  No matter how simple, never do them
  directly in production environment
Architected environment
Architected Enviroment
Architected
Environment
Architected environment
Tools for unstructured
information management
 Content Management Systems
 Record Management Systems
 Digital Image Management Systems
 Digital Asset Management Systems
 Digital Imaging Systems

Gic2011 aula3-ingles

  • 1.
    Information & Knowledge Management - Class 3 Marielba Zacarias Prof. Auxiliar DEEI FCT I, Gab 2.69, Ext. 7749 Data-warehousing mzacaria@ualg.pt http://w3.ualg.pt/~mzacaria
  • 2.
    Summary Data-warehouses The architectedenvironment Design Process Data-modeling schemas
  • 3.
    Data Warehousing Data collectionfor analysis and reporting taks Historical data Stored in a distinct environment from operational data Structure different from data-bases
  • 4.
    Why Operational and analiticaldata have different requirements in terms of usage (frequency, response time) hardware software structure
  • 5.
  • 6.
    Before Data-Warehouses.... The “spider web” 6
  • 7.
    The “arquitected” environment” Atomic Dept. individual operational dw dw dw “data-marts” Detailed temporal More granular derived, daily Ad-hoc Temporal Some primitive current value Heuristic Integrated Typical of Marketing High access prob. Não-repetitive Subject oriented Engineering Application oriented Oriented to PC or Sumarized Production workstations Accounting 7
  • 8.
    Type of questions Atomico operacional Dept. individual dw J. Jones 1986-87 Jan – 4101 Clientes 123 Main St. J. Jones Fev – 4209 Desde 1982 Credit - AA 456 High St. Mar- 4175 Com saldos Credit - B Apr - 4215 > 5,000 Jones e crédito Credit? 1987-89 Monthly >= B J. Jones 456 High St. Sales? Credit - A 1989 – pte. Client types Jones J. Jones in analysis? Credit 123 Main St. History? Credit - AA 8
  • 9.
    Architected Environment Production Environment Operational Analitical environment Environment 9
  • 10.
    Data-warehouse design Requirement Performance Tuning Gatherings Query Physical Optimization Environment Setup Quality Assurance Data Modeling Rolling out to ETL Production OLAP Cube Design Production Front End Maintenance Development Incremental Enhancements Report Development
  • 11.
    Requirements Gathering Take into account users Executive with little time and knowledge about technical terms Interviews, JAD sessions User Reporting/Analysis Requirements Hardware, training requirements Data source identification Concrete project plan
  • 12.
    Physical Environment Setup Setup Servers, DBMS and databases, ETL, OLAP Cubes and reporting services Create three environments development, testing, production
  • 13.
    Data-modeling Depends on initial data source identification Conceptual, logical and physical data modeling Should be related to the information architecture!!!!
  • 14.
    Data Modeling Dimensional Approach Transactional data is partitioned in facts Numeric transaction data products ordered, price Dimensions provide context for facts order date, customer name, product number, location info, salesperson
  • 15.
    Dimensional Approaches Star Fact table (typically a transaction) Dimensions (context of the transaction) Snowflake Dimensions indirectly linked to fact tables
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    OLAP Cube Design Specificationof detailed reporting needs in terms of the multi-dimensional structure previously defined (star or snowflake), but regarded as a n- dimensional cube star/snowflake and cubes are pretty much the same thing cubes are more appropriate for not IT users
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    SQL Server Integration ExamplesII Qualitative data Description term ActionId team meeting 18 hr distribution 19 project list 19 team meeting 19 hr distribution 26 project list 26 claims application 27 claims application 28 cards application maintenance 29 claims application integration 30 hr distribution 31 project list 31 claims application 34 claims application 35 hr distribution 36 project list 36
  • 30.
    SQL Server Integration ExamplesIII Fuzzy Transformations
  • 31.
    Front-end development Front-endsrange from in-house development with scripting languages php, asp, or perl to off-the-shelf products such as Crystal Reports or higher-end products such as Actuate OLAP vendors also offer front-ends of their own
  • 32.
    Report Development Derived fromrequirements Main point of contact between the data- warehouse and users User customization Report Delivery (web, e-mail, sms, file formats) Access privileges
  • 33.
    Performance Tuning ETL Query Processing Users loose interest after 30 sec! Query optimization Report Delivery
  • 34.
    Query Optimization Understand howyour DBMS executes queries Store intermediate results in temporary tables Query Optimization tips Use indexes Partition tables (vertically and horizontally) De-normalize (less joins) Server Tuning
  • 35.
    Quality Assurance Test planwith quality criteria for data Critical success factor Often overlooked Performed by people with knowledge of the business data not data-warehouses Resistance
  • 36.
    Rolling to production Seemseasy but.. Putting everyone online may take a full week in some cases Online access can be as simple as sending a link by e-mail
  • 37.
    Production Maintenance Backupand recovery processes Crisis Management Monitoring end-user usage Capture runaways queries before whole system is slowed down To measure usage for ROI calculations and future enhancements
  • 38.
    Incremental enhancements Accomplish small changes such as changing original geographical designations A company may add new sales regions No matter how simple, never do them directly in production environment
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
    Tools for unstructured informationmanagement Content Management Systems Record Management Systems Digital Image Management Systems Digital Asset Management Systems Digital Imaging Systems