ETL Overview Extraction Transformation Loading – ETL To get data out of the source and load it into the data warehouse. Data is extracted from an OLTP database, transformed to match the data warehouse schema and loaded into the data warehouse database
Why??? As data sources change the data warehouse will periodically updated. Also, as business changes the DW system needs to change – in order to maintain its value as a tool for decision makers, as a result of that the ETL also changes and evolves. The ETL processes must be designed for ease of modification. As solid, well-designed, and documented ETL system is necessary for the success of a data warehouse project. An ETL system consists of three consecutive functional steps: extraction, transformation, and loading:
Extract Process The Extract step covers the data extraction from the source system and makes it accessible for further processing. The main objective of the extract step is to retrieve all the required data from the source system with as little resources as possible. There are several ways to perform the extract:1. Update notification2. Incremental extract3. Full extract
Clean The cleaning step is one of the most important as it ensures the quality of the data in the data warehouse. Cleaning should perform basic data unification rules, such as:1. Making identifiers unique2. Convert null values into standardized3. Convert phone numbers, ZIP codes to a standardized form4. Validate address fields, convert them into proper naming, e.g. Street/St/St./Str./Str5. Validate address fields against each other.
Transformation applies a set of rules to transform the data from the source to the target. This includes converting any measured data to the same dimension using the same units so that they can later be joined.
Problems??? classes of conficts and problems that can be distinguished in two levels : the schema and the instance level.1. Schema-level problems.2. Record-level problems.3. Value-level problems.
Solution… To deal with such issues, the integration and transformation tasks involve a wide variety of functions, such as normalizing, de-normalizing , reformatting, recalculating, summarizing, merging data from multiple sources, modifying key structures, adding an element of time, identifying default values, supplying decision commands to choose between
Loading Loading data to the target multidimensional structure is the final ETL step. In this step, extracted and transformed data is written into the dimensional structures actually accessed by the end users and application systems. Loading step includes both loading dimension tables and