Data-ware Housing
Introduction
Definition   :    Simplex perception-  No more than collection of Key pieces of information  used to manage & direct the business for the most profitable outcome.   Precise Definition-  It concentrate on data- Data should be subject oriented, be consistent across sources & so on.    Pearson’s Definition-  It is more than vast data- it is also process involved in getting that data from source to table & from table to analyst’s. ** In other word ** “ A DWH is the  data  (Meta/fact/dimension/aggregate) and  process manager  (load/warehouse/query) that make information available, enabling people to make informed decision.
Data-ware housing Architecture :    DWH must architected to support three major  driving  factors. 1) Populating DWH. 2) Day-to-Day management of DWH. 3) The ability to cope with requirement evolution.
Typical Process flow within D.W.H Source Extract & load Warehouse Data transformation and movement User Query Archive data
Processes : Extract & load the data Clean & transform data in to a form that can cope with large data volume & provide good query performance. Back up & Archive data Manage queries & direct them to appropriate data Sources.
Extract & load process:  Op. Data Suitable for operational System, May have been modified & extended over yr’s to support performance. D.W.H Reconstructed
1) Extract & load process:  Controlling the processes : determine when to start extracting the data, run transformation, consistency check & so on. Eg:  Retail sales analysis When  to initiate the extract:  Data should be in a consistent state. Same instances of time. Eg.  Telecom Loading the data :  Temporary Data store. Clean up & Consistency check.  X  Eg.   Current subscriber & Current Event DB. Copy Management tools & data clean-up.:  coding
2) Clean & transformation Clean & transform the data in to a structure that speed up queries  b. Partition data in order to speed up queries, optimize h/w performance& simplify the management of DWH
Clean & transformation Clean & transform the data in to a structure that speed up queries   Make sure data is consistent within itself. Eg: row  Make sure data is consistent with other data  With in the same source.  Make sure data is consistent with data in the  other source system Make sure data is consistent with the information already in the warehouse.
3) Back-up & archive process : Back-up regularly- recover from loss/failure In Archiving older data is removed from system
4) Query management process : Directing query to most effective data source.
Process Architecture
Process Function System manager Extract & load Extract & load the data, performing simple  transformations before & during load. Load Manager Clean & transform Data Transforms & Manages data Warehouse manager Backup & archive Backs up & archives data warehouse Ware house manager Query Manager Directs & manages queries Query Manager
Operational Data Operational Data L O A D M A N A G E R Detailed information Summary info Meta Data Q U E R Y M A N A G E R Warehouse Manager Data dipper OLAP tools Data Information Decision Architecture of data-ware house
Load Manager System Component that perform all the operations necessary to support the  extract  and  load  process. Off-the-Shelf tools, bespoke coding , C programs & Shell script.  Size & Complexity will vary between specific solutions from d.h.w to d.h.w.,  larger the degree of overlap between source systems, the larger the load  manager will be.  Third-Party tools max-20 to 25 % of the total system fun.
Load Manager Architecture Extract  the data from source systems. Fast  load the extracted data into a temporary data store. Perform  Simple transformations  into a structure similar to the one in the data ware house. Each of these function has to be operate automatically & recover from any error it encounters, to very large extent with no human intervention.
Extract data from source system In order get hold of the source data it has to be transfer from Source  systems, and made available to D.W.H.. ASCII files are FTP across the LAN. Current gateways tech. operates too slowly to compete to FTP.
Fast Load Data should be loaded into warehouse in the fastest possible time, in  order to minimize the total load window. This becomes critical as the no. of data sources increases and time  window shrinks. In practice it is more effective to load the data in to a relational D.B. prior to applying transformation & checks.(ASCII)
Simple Transformation  Before or during the load there will be an opportunity to perform simple  Transformations on the data. Here we perform those transformation that does not require complex  Logic, or use of relational set operators. Eg: retail management system.: Strip out all the column that  are not  required in DWH. Convert all the values to the required data types;
Load Manager Architecture File structure Temporary data Store Ware house str. Load Manager Controlling Process Stored Procedure Copy management tools Fast loader
Ware-house Manager System Component that perform all the operations necessary to support the  Ware house management process. Third party system management tools, bespoke coding , C programs &  Shell script.  As the Load manager size & Complexity of ware-house manager will vary  between specific solution. Unlike L.M. the complexity of WH manager is  driven by extend to which the operational management of the DHW has been  automated.  Third-Party tools max-40 % of the total system fun.
Ware-house Manager Architecture Analyze the data to perform consistency &  referential integrity check Transform & merge the source data in to a temporary data source into the Published  DWH. Create indexes, business view, partition views & so on. Generate denormalization if appropriate.
Ware house Manager Architecture Temporary data store Star flake schema Summary tables Ware-house Manager Controlling Process Stored Procedure Backup /recovery tool SQL scripts
Using temporary destination table : Once the data is in temporary Store, the next step is to crate a set of tables identical to the destination table in the DWH. Ex: if the data in DWH is highly partitioned…. As we r abt. to execute substantial constancy check, data should not be loaded until it has been cleaned up.  If consistency check fails  Although Relational databases some form rollback, but in practice it is easy to load data in temporary area, clean it up & then publish it to the DWH.
Complex Transformation Reconcile data
Transform into a star flake schema: Transform it into a form suitable for decision support queries.  Transform into a form in which the bulk of factual data lies in the center. Star schema, snowflake schema, star flake schems.
Create Indexes & views:  One would expect the index creation time to be significant, even if we need only to create index against fact table partition. Because of this most relational technology have facilities to create indexes in parallel, distributing the load across the H/W & significantly reducing the elapsed time. Overhead of inserting a row into a table.
Generate the summaries: Ware-house manager has to create a set of the aggregation to  speed up query performance. Generated Automatically.
Query manager:  System Component that perform all the operations necessary to support the  Query management process. User access tools, specialist data-ware housing monitoring tools, native  data base facilities, bespoke coding ,  C programs & Shell script.  Size & Complexity will vary between specific solutions.  Unlike the L.M. complexity of Q.M. is driven by th extent to which the facilities  are provided by user access tools or native DB facilities.
Query Manager Architecture Direct queries to the appropriate tables Schedule the execution of the user queries.

Data-ware Housing

  • 1.
  • 2.
  • 3.
    Definition : Simplex perception- No more than collection of Key pieces of information used to manage & direct the business for the most profitable outcome. Precise Definition- It concentrate on data- Data should be subject oriented, be consistent across sources & so on. Pearson’s Definition- It is more than vast data- it is also process involved in getting that data from source to table & from table to analyst’s. ** In other word ** “ A DWH is the data (Meta/fact/dimension/aggregate) and process manager (load/warehouse/query) that make information available, enabling people to make informed decision.
  • 4.
    Data-ware housing Architecture: DWH must architected to support three major driving factors. 1) Populating DWH. 2) Day-to-Day management of DWH. 3) The ability to cope with requirement evolution.
  • 5.
    Typical Process flowwithin D.W.H Source Extract & load Warehouse Data transformation and movement User Query Archive data
  • 6.
    Processes : Extract& load the data Clean & transform data in to a form that can cope with large data volume & provide good query performance. Back up & Archive data Manage queries & direct them to appropriate data Sources.
  • 7.
    Extract & loadprocess: Op. Data Suitable for operational System, May have been modified & extended over yr’s to support performance. D.W.H Reconstructed
  • 8.
    1) Extract &load process: Controlling the processes : determine when to start extracting the data, run transformation, consistency check & so on. Eg: Retail sales analysis When to initiate the extract: Data should be in a consistent state. Same instances of time. Eg. Telecom Loading the data : Temporary Data store. Clean up & Consistency check. X Eg. Current subscriber & Current Event DB. Copy Management tools & data clean-up.: coding
  • 9.
    2) Clean &transformation Clean & transform the data in to a structure that speed up queries b. Partition data in order to speed up queries, optimize h/w performance& simplify the management of DWH
  • 10.
    Clean & transformationClean & transform the data in to a structure that speed up queries Make sure data is consistent within itself. Eg: row Make sure data is consistent with other data With in the same source. Make sure data is consistent with data in the other source system Make sure data is consistent with the information already in the warehouse.
  • 11.
    3) Back-up &archive process : Back-up regularly- recover from loss/failure In Archiving older data is removed from system
  • 12.
    4) Query managementprocess : Directing query to most effective data source.
  • 13.
  • 14.
    Process Function Systemmanager Extract & load Extract & load the data, performing simple transformations before & during load. Load Manager Clean & transform Data Transforms & Manages data Warehouse manager Backup & archive Backs up & archives data warehouse Ware house manager Query Manager Directs & manages queries Query Manager
  • 15.
    Operational Data OperationalData L O A D M A N A G E R Detailed information Summary info Meta Data Q U E R Y M A N A G E R Warehouse Manager Data dipper OLAP tools Data Information Decision Architecture of data-ware house
  • 16.
    Load Manager SystemComponent that perform all the operations necessary to support the extract and load process. Off-the-Shelf tools, bespoke coding , C programs & Shell script. Size & Complexity will vary between specific solutions from d.h.w to d.h.w., larger the degree of overlap between source systems, the larger the load manager will be. Third-Party tools max-20 to 25 % of the total system fun.
  • 17.
    Load Manager ArchitectureExtract the data from source systems. Fast load the extracted data into a temporary data store. Perform Simple transformations into a structure similar to the one in the data ware house. Each of these function has to be operate automatically & recover from any error it encounters, to very large extent with no human intervention.
  • 18.
    Extract data fromsource system In order get hold of the source data it has to be transfer from Source systems, and made available to D.W.H.. ASCII files are FTP across the LAN. Current gateways tech. operates too slowly to compete to FTP.
  • 19.
    Fast Load Datashould be loaded into warehouse in the fastest possible time, in order to minimize the total load window. This becomes critical as the no. of data sources increases and time window shrinks. In practice it is more effective to load the data in to a relational D.B. prior to applying transformation & checks.(ASCII)
  • 20.
    Simple Transformation Before or during the load there will be an opportunity to perform simple Transformations on the data. Here we perform those transformation that does not require complex Logic, or use of relational set operators. Eg: retail management system.: Strip out all the column that are not required in DWH. Convert all the values to the required data types;
  • 21.
    Load Manager ArchitectureFile structure Temporary data Store Ware house str. Load Manager Controlling Process Stored Procedure Copy management tools Fast loader
  • 22.
    Ware-house Manager SystemComponent that perform all the operations necessary to support the Ware house management process. Third party system management tools, bespoke coding , C programs & Shell script. As the Load manager size & Complexity of ware-house manager will vary between specific solution. Unlike L.M. the complexity of WH manager is driven by extend to which the operational management of the DHW has been automated. Third-Party tools max-40 % of the total system fun.
  • 23.
    Ware-house Manager ArchitectureAnalyze the data to perform consistency & referential integrity check Transform & merge the source data in to a temporary data source into the Published DWH. Create indexes, business view, partition views & so on. Generate denormalization if appropriate.
  • 24.
    Ware house ManagerArchitecture Temporary data store Star flake schema Summary tables Ware-house Manager Controlling Process Stored Procedure Backup /recovery tool SQL scripts
  • 25.
    Using temporary destinationtable : Once the data is in temporary Store, the next step is to crate a set of tables identical to the destination table in the DWH. Ex: if the data in DWH is highly partitioned…. As we r abt. to execute substantial constancy check, data should not be loaded until it has been cleaned up. If consistency check fails Although Relational databases some form rollback, but in practice it is easy to load data in temporary area, clean it up & then publish it to the DWH.
  • 26.
  • 27.
    Transform into astar flake schema: Transform it into a form suitable for decision support queries. Transform into a form in which the bulk of factual data lies in the center. Star schema, snowflake schema, star flake schems.
  • 28.
    Create Indexes &views: One would expect the index creation time to be significant, even if we need only to create index against fact table partition. Because of this most relational technology have facilities to create indexes in parallel, distributing the load across the H/W & significantly reducing the elapsed time. Overhead of inserting a row into a table.
  • 29.
    Generate the summaries:Ware-house manager has to create a set of the aggregation to speed up query performance. Generated Automatically.
  • 30.
    Query manager: System Component that perform all the operations necessary to support the Query management process. User access tools, specialist data-ware housing monitoring tools, native data base facilities, bespoke coding , C programs & Shell script. Size & Complexity will vary between specific solutions. Unlike the L.M. complexity of Q.M. is driven by th extent to which the facilities are provided by user access tools or native DB facilities.
  • 31.
    Query Manager ArchitectureDirect queries to the appropriate tables Schedule the execution of the user queries.