DATA WAREHOUSING & DATA MININGSubmitted To:Submitted By:Johannes Hoppe                                      Jayant Shah  (M1000624)KetanSood (M1001626)TarunDahiya (M1001303)
INTRODUCTIONA data warehouse architecture is primarily based on the business processes of a business enterprise taking into consideration the data consolidation across the business enterprise with adequate security, data modelling and organization, extent of query requirements, meta data management and application, warehouse staging area planning for optimum bandwidth utilization and full technology implementation.
PROCESS ARCHITECTUREDescribes the number of stages and how data is processed to convert  raw/transactional data into information for end usage. The data staging process includes three main areas of concerns or sub-processes for planning data warehouse architecture namely “Extract”, “Transform” and “Load”. These interrelated processes are sometimes referred to as an “ETL” process.ExtractThe data for the data warehouse can come from different sources and may be of different types.TransformTransformation of data with appropriate conversion, aggregation and cleaning also an important process to be planned for building  a data warehouse. LoadSteps to be considered to load data with optimization by considering the multiple areas where the data is targeted to be loaded and retrieved .
TOOLS USEDMySQL Database
MySQL Workbench
Pentaho Data Integration (Open source ETL tool)STEPS USED1. DATA PREPARATION     1.1  Verifying the data in Excel sheet for different type of errors.    1.2   Preparing data base structure using MySQL.2. DATA INTEGRATION    2.1  Extract the Data.    2.2  Transform the Data.    2.3  Load the Data.
1.1 Verifying the data in Excel (Source) Categories of errors in the source file dealt with. (a few example) Incomplete
 Incorrect
 Inconsistency1.2 Preparing data base structure STEPS: Creating Schema.
 Creating Table.

DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)

  • 1.
    DATA WAREHOUSING &DATA MININGSubmitted To:Submitted By:Johannes Hoppe Jayant Shah (M1000624)KetanSood (M1001626)TarunDahiya (M1001303)
  • 2.
    INTRODUCTIONA data warehousearchitecture is primarily based on the business processes of a business enterprise taking into consideration the data consolidation across the business enterprise with adequate security, data modelling and organization, extent of query requirements, meta data management and application, warehouse staging area planning for optimum bandwidth utilization and full technology implementation.
  • 3.
    PROCESS ARCHITECTUREDescribes thenumber of stages and how data is processed to convert raw/transactional data into information for end usage. The data staging process includes three main areas of concerns or sub-processes for planning data warehouse architecture namely “Extract”, “Transform” and “Load”. These interrelated processes are sometimes referred to as an “ETL” process.ExtractThe data for the data warehouse can come from different sources and may be of different types.TransformTransformation of data with appropriate conversion, aggregation and cleaning also an important process to be planned for building a data warehouse. LoadSteps to be considered to load data with optimization by considering the multiple areas where the data is targeted to be loaded and retrieved .
  • 4.
  • 5.
  • 6.
    Pentaho Data Integration(Open source ETL tool)STEPS USED1. DATA PREPARATION 1.1 Verifying the data in Excel sheet for different type of errors. 1.2 Preparing data base structure using MySQL.2. DATA INTEGRATION 2.1 Extract the Data. 2.2 Transform the Data. 2.3 Load the Data.
  • 7.
    1.1 Verifying thedata in Excel (Source) Categories of errors in the source file dealt with. (a few example) Incomplete
  • 8.
  • 9.
    Inconsistency1.2 Preparingdata base structure STEPS: Creating Schema.
  • 10.