ETL Process In Data
    Warehouse

  By: Komal Choudhary
Outline
 ETL
 Extraction
 Transformation
 Loading
ETL Overview
 Extraction Transformation Loading – ETL
 To get data out of the source and load it into the
  data warehouse.
 Data is extracted from an OLTP database,
  transformed to match the data warehouse
  schema and loaded into the data warehouse
  database
Process
Why???
 As data sources change the data warehouse will
  periodically updated.
 Also, as business changes the DW system needs
  to change – in order to maintain its value as a tool
  for decision makers, as a result of that the ETL
  also changes and evolves. The ETL processes
  must be designed for ease of modification. As
  solid, well-designed, and documented ETL
  system is necessary for the success of a data
  warehouse project.
 An ETL system consists of three consecutive
  functional
    steps: extraction, transformation, and loading:
Extraction
Extract Process
 The Extract step covers the data extraction from
  the source system and makes it accessible for
  further processing. The main objective of the
  extract step is to retrieve all the required data
  from the source system with as little resources as
  possible.
 There are several ways to perform the extract:
1. Update notification
2. Incremental extract
3. Full extract
Clean
 The cleaning step is one of
     the most important as it
     ensures the quality of the data
     in the data warehouse.
    Cleaning should perform basic
     data unification rules, such as:
1.      Making identifiers unique
2.      Convert null values into
        standardized
3.      Convert phone numbers,
        ZIP codes to a standardized
        form
4.      Validate address fields,
        convert them into proper
        naming, e.g.
        Street/St/St./Str./Str
5.      Validate address fields
        against each other.
Transformation
 applies a set of rules
  to transform the data
  from the source to the
  target.
 This includes
  converting any
  measured data to the
  same dimension using
  the same units so that
  they can later be
  joined.
Problems???
 classes of conficts
  and problems that can
  be distinguished in
  two levels : the
  schema and the
  instance level.
1. Schema-level
    problems.
2. Record-level
    problems.
3. Value-level
    problems.
Solution…
 To deal with such
 issues, the integration
 and transformation
 tasks involve a wide
 variety of functions,
 such as normalizing,
 de-normalizing ,
 reformatting,
 recalculating,
 summarizing, merging
 data from multiple
 sources, modifying key
 structures, adding an
 element of time,
 identifying default
 values, supplying
 decision commands to
 choose between
Loading
 Loading data to the
 target
 multidimensional
 structure is the final
 ETL step. In this step,
 extracted and
 transformed data is
 written into the
 dimensional
 structures actually
 accessed by the end
 users and application
 systems. Loading step
 includes both loading
 dimension tables and
Thanks!!!!!

Etl process in data warehouse

  • 1.
    ETL Process InData Warehouse By: Komal Choudhary
  • 2.
    Outline  ETL  Extraction Transformation  Loading
  • 3.
    ETL Overview  ExtractionTransformation Loading – ETL  To get data out of the source and load it into the data warehouse.  Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database
  • 4.
  • 5.
    Why???  As datasources change the data warehouse will periodically updated.  Also, as business changes the DW system needs to change – in order to maintain its value as a tool for decision makers, as a result of that the ETL also changes and evolves. The ETL processes must be designed for ease of modification. As solid, well-designed, and documented ETL system is necessary for the success of a data warehouse project.  An ETL system consists of three consecutive functional steps: extraction, transformation, and loading:
  • 6.
  • 7.
    Extract Process  TheExtract step covers the data extraction from the source system and makes it accessible for further processing. The main objective of the extract step is to retrieve all the required data from the source system with as little resources as possible.  There are several ways to perform the extract: 1. Update notification 2. Incremental extract 3. Full extract
  • 8.
    Clean  The cleaningstep is one of the most important as it ensures the quality of the data in the data warehouse.  Cleaning should perform basic data unification rules, such as: 1. Making identifiers unique 2. Convert null values into standardized 3. Convert phone numbers, ZIP codes to a standardized form 4. Validate address fields, convert them into proper naming, e.g. Street/St/St./Str./Str 5. Validate address fields against each other.
  • 9.
    Transformation  applies aset of rules to transform the data from the source to the target.  This includes converting any measured data to the same dimension using the same units so that they can later be joined.
  • 10.
    Problems???  classes ofconficts and problems that can be distinguished in two levels : the schema and the instance level. 1. Schema-level problems. 2. Record-level problems. 3. Value-level problems.
  • 11.
    Solution…  To dealwith such issues, the integration and transformation tasks involve a wide variety of functions, such as normalizing, de-normalizing , reformatting, recalculating, summarizing, merging data from multiple sources, modifying key structures, adding an element of time, identifying default values, supplying decision commands to choose between
  • 12.
    Loading  Loading datato the target multidimensional structure is the final ETL step. In this step, extracted and transformed data is written into the dimensional structures actually accessed by the end users and application systems. Loading step includes both loading dimension tables and
  • 13.