Tools used in Data Warehousing Component Product used Purpose Reporting Crystal Reports Create presentation style reports with chart and graphs Querying Access 2000 Create complex ad-hoc queries against a variety of data sources OLAP Crystal Analysis Professional Access data cubes for designing views to pivot, filter and aggregate facts on pre-defined dimensions for specific subject areas Data Mining/Statistical Analysis SAS Statistical Analysis and Churn analysis
Extract Transform and Load (ETL) is a process that involves extracting data from multiple sources in various formats, transforming it to fit business needs, and ultimately, loading it into a target system.
The target system will generally be configured as a data warehouse or data mart, though ETL can refer to a process that loads to any type of data storage structure.
The structure itself will typically be a database, but may also be an application, file or other storage facility.
The purpose of ETL is to reformat, cleanse and standardize data so that it can be analyzed or exchanged to address business needs and/or promote interoperability.
Note that ETT (extraction, transformation, transportation), ETM (extraction, transformation, move), ELT (extraction, load, transform) may be used synonymously with ETL.
Its a process in data warehousing responsible for pulling data out of the source systems and placing it into a data warehouse.
Involves the following tasks: 1. Extracting the data from source systems (SAP, ERP, other operational systems), data from different source systems is converted into one consolidated data warehouse format which is ready for transformation processing. 2. Transforming the data -
applying business rules ( like derivations, calculating new measures and dimensions),
cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.),
filtering (e.g., selecting only certain columns to load),
splitting a column into multiple columns and vice versa,
joining together data from multiple sources (e.g., lookup, merge), transposing rows and columns,
applying any kind of simple or complex data validation (e.g., if the first 3 columns in a row are empty then reject the row from processing)
3. Loading the data into a data warehouse or data repository or other reporting applications
ETL Tools Informatica Power Center IBM Websphere DataStage(Formerly known as Ascential DataStage) SAP BusinessObjects Data Integrator IBM Cognos Data Manager (Formerly known as Cognos DecisionStream) Microsoft SQL Server Integration Services Oracle Data Integrator (Formerly known as Sunopsis Data Conductor) SAS Data Integration Studio Oracle Warehouse Builder AB Initio Information Builders Data Migrator Pentaho Pentaho Data Integration Embarcadero Technologies DT/Studio IKAN ETL4ALL IBM DB2 Warehouse Edition Pervasive Data Integrator ETL Solutions Ltd. Transformation Manager Group 1 Software (Sagent) DataFlow Sybase Data Integrated Suite ETL Talend Talend Open Studio Expressor Software Expressor Semantic Data Integration System Elixir Elixir Repertoire OpenSys CloverETL
Example: Invoice / Bill amount for a specific customer based on CAF Number (or) MDN needs to be found from a transactional system which is ADC Number of customers whose invoice / bill is greater than Rs.1000.00 for the past three months needs to have OLAP system which is DSS
Agree upon the business logic and time line for implementation of reports in a phased manner
Logical & Physical data model
Database to suit to business need
Multiple programs are required to develop the database. This involves integration of programs in an optimized manner
Data validation with reference to source system and business rules agreed upon with users
This could be an iterative process till final acceptance by the user
Application development is in accordance to the development process defined at DSS
Delivery of reports in a consistent manner
Release indicates the report is productionised Necessary user guide and training are given to the users to facilitate the use of reports Creation of userid’s and assign access rights for reports Requirement Analysis Application Development Exhaustive Testing Quality Assurance Release Report
Application Areas Industry Application Finance Credit Card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record analysis Transport Logistics management Consumer goods promotion analysis Data Service providers Value added data Utilities Power usage analysis
Generic two-level architecture Periodic extraction data is not completely current in warehouse E T L BACK
Independent Data Mart BACK E T L Separate ETL for each independent data mart Data access complexity due to multiple data marts
Dependent data mart with operational data store BACK E T L Single ETL for enterprise data warehouse (EDW) Dependent data marts loaded from EDW
Logical data mart and @active data warehouse BACK BACK E T L Near real-time ETL for @active Data Warehouse Data marts are NOT separate databases, but logical views of the data warehouse Easier to create new data marts ODS and data warehouse are one and the same