DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

DMDW 5. Student Presentation - Pentaho Data Integration (Kettle)

on

  • 2,475 views

5. ETL project by

5. ETL project by

Statistics

Views

Total Views
2,475
Views on SlideShare
1,817
Embed Views
658

Actions

Likes
0
Downloads
21
Comments
0

4 Embeds 658

http://blog.johanneshoppe.de 652
url_unknown 4
http://translate.googleusercontent.com 1
http://www.slashdocs.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

DMDW 5. Student Presentation - Pentaho Data Integration (Kettle) Presentation Transcript

  • 1. DATA WAREHOUSING & DATA MINING
    Submitted To:Submitted By:
    Johannes Hoppe Jayant Shah (M1000624)
    KetanSood (M1001626)
    TarunDahiya (M1001303)
  • 2. INTRODUCTION
    A data warehouse architecture is primarily based on the business processes of a business enterprise taking into consideration the data consolidation across the business enterprise with adequate security, data modelling and organization, extent of query requirements, meta data management and application, warehouse staging area planning for optimum bandwidth utilization and full technology implementation.
  • 3. PROCESS ARCHITECTURE
    Describes the number of stages and how data is processed to convert raw/transactional data into information for end usage. The data staging process includes three main areas of concerns or sub-processes for planning data warehouse architecture namely “Extract”, “Transform” and “Load”. These interrelated processes are sometimes referred to as an “ETL” process.
    • Extract
    The data for the data warehouse can come from different sources and may be of different types.
    • Transform
    Transformation of data with appropriate conversion, aggregation and cleaning also an important process to be planned for building a data warehouse.
    • Load
    Steps to be considered to load data with optimization by considering the multiple areas where the data is targeted to be loaded and retrieved .
  • 4. TOOLS USED
    • MySQL Database
    • 5. MySQL Workbench
    • 6. Pentaho Data Integration (Open source ETL tool)
  • STEPS USED
    1. DATA PREPARATION
    1.1 Verifying the data in Excel sheet for different type of errors.
    1.2 Preparing data base structure using MySQL.
    2. DATA INTEGRATION
    2.1 Extract the Data.
    2.2 Transform the Data.
    2.3 Load the Data.
  • 7. 1.1 Verifying the data in Excel (Source)
    Categories of errors in the source file dealt with. (a few example)
    • Incomplete
    • 8. Incorrect
    • 9. Inconsistency
  • 1.2 Preparing data base structure
    STEPS:
    • Creating Schema.
    • 10. Creating Table.
    • 11. Creating Columns & assigning Primary Key.
  • 2. Data Integration
  • 12. Q & A