INFORMATION FLOW
MECHANISM
By-
K M Thakur
Information Flow
1. Source Data
2. Data Staging Area
3. Data Warehouse
4. Data Warehouse User
TRANSFORMATION OF
DATA INTO INFORMATION
Steps-
1. Select the source data
2. Extract the data from the source systems
3. Transform the extracted data
4. Load the transformed data into the data warehouse
5. Deliver the information to the end users
This process is also called as ETL process
TYPES OF SOURCE DATA
1. Production Data
• Main source of data.
• Comes from the
operational systems
2. Internal Data
• Taken from internal private files
• Includes data that could not be
store in the computer.
3. External Data
• Collected from external
sources like magazines,
survey results, etc.
• Sources outside of the
organization.
4. Archived Data
• Comprises of all historical data
that exist on tape drivers.
• This data may go back to even 10
yrs. in time.
EXTRACTING OF DATA
 Identify the source of data.
 Finalize filter that will be applied for every source system.
 Produce Automatic extract files from the operational system.
 Generate intermediary files to store selected data to be merged later.
 Render automated job control service for creating extract files.
 Reformat input from outside sources.
 Reformate and standardize the input from data sources.
 Produce common application code for data extraction
 Resolve inconsistencies for common data that will be extracted from multiple sources.
DATA STAGING AREA
1. Data Extraction
2. Data Transformation
3. Data Loading
DATA PROCESSING AT STAGING AREA
 Standardization of data
 Sorting of records
 Comparing and merging
 Aggregation and
summarization of data
 Creation of surrogate
keys
 Filling missing value
 Converting the data
according to warehouse
server
TRANSFORM THE EXTRACTED DATA
 Translating coded values.
 Deriving a new calculated value.
 Merging and splitting of fields.
 Aggregating and summarizing data rows.
 Generating primary and foreign key.
 Applying data validation rules.
 Resolving synonyms and homonyms.
LOAD THE TRANSFORMED DATA INTO
THE DATA WAREHOUSE
FUNCTIONS OF DATA
WAREHOUSE REPOSITORY
 Load data for full refreshes of the tables.
 Perform periodic incremental loads.
 Provide support for loading into multiple tables.
 Ensure optimization of the loading process.
 Render automated job control services for loading
 Provide backup and recovery services.
 Provide security for the data stored.
 Monitor and fine-tune the data warehouse database.
 Periodically archive data from database.
DELIVER INFORMATION TO
END-USERS
INFORMATION DELIVERY
FUNCTION OF DELIVERY
SYSTEM
 Provides security to control information access by different users.
 Monitor user access patterns to improve service and for future
enhancement.
 Enables users to browse contents of warehouse.
 Simplifies data access by hiding complexities of storage from users.
 Perform optimization by reformatting the queries.
 Faster query results.
 Enables users to perform complex analysis.
etc...
INFORMATION DELIVERY
METHODS
• Queries
• Reports
• Analysis
• Applications
• On-line analytical
processing(OLAP)
• Data mining
Accessing warehouse through
Information flow in case of data mining application
COURTESY
For All the
Images and
texts.
THANK YOU !
www.gunjanshree.com

Information Flow Mechanism in Data warehouse

  • 1.
  • 2.
    Information Flow 1. SourceData 2. Data Staging Area 3. Data Warehouse 4. Data Warehouse User
  • 3.
    TRANSFORMATION OF DATA INTOINFORMATION Steps- 1. Select the source data 2. Extract the data from the source systems 3. Transform the extracted data 4. Load the transformed data into the data warehouse 5. Deliver the information to the end users This process is also called as ETL process
  • 4.
    TYPES OF SOURCEDATA 1. Production Data • Main source of data. • Comes from the operational systems 2. Internal Data • Taken from internal private files • Includes data that could not be store in the computer. 3. External Data • Collected from external sources like magazines, survey results, etc. • Sources outside of the organization. 4. Archived Data • Comprises of all historical data that exist on tape drivers. • This data may go back to even 10 yrs. in time.
  • 5.
    EXTRACTING OF DATA Identify the source of data.  Finalize filter that will be applied for every source system.  Produce Automatic extract files from the operational system.  Generate intermediary files to store selected data to be merged later.  Render automated job control service for creating extract files.  Reformat input from outside sources.  Reformate and standardize the input from data sources.  Produce common application code for data extraction  Resolve inconsistencies for common data that will be extracted from multiple sources.
  • 6.
    DATA STAGING AREA 1.Data Extraction 2. Data Transformation 3. Data Loading
  • 7.
    DATA PROCESSING ATSTAGING AREA  Standardization of data  Sorting of records  Comparing and merging  Aggregation and summarization of data  Creation of surrogate keys  Filling missing value  Converting the data according to warehouse server
  • 8.
    TRANSFORM THE EXTRACTEDDATA  Translating coded values.  Deriving a new calculated value.  Merging and splitting of fields.  Aggregating and summarizing data rows.  Generating primary and foreign key.  Applying data validation rules.  Resolving synonyms and homonyms.
  • 9.
    LOAD THE TRANSFORMEDDATA INTO THE DATA WAREHOUSE
  • 10.
    FUNCTIONS OF DATA WAREHOUSEREPOSITORY  Load data for full refreshes of the tables.  Perform periodic incremental loads.  Provide support for loading into multiple tables.  Ensure optimization of the loading process.  Render automated job control services for loading  Provide backup and recovery services.  Provide security for the data stored.  Monitor and fine-tune the data warehouse database.  Periodically archive data from database.
  • 11.
  • 12.
  • 13.
    FUNCTION OF DELIVERY SYSTEM Provides security to control information access by different users.  Monitor user access patterns to improve service and for future enhancement.  Enables users to browse contents of warehouse.  Simplifies data access by hiding complexities of storage from users.  Perform optimization by reformatting the queries.  Faster query results.  Enables users to perform complex analysis. etc...
  • 14.
    INFORMATION DELIVERY METHODS • Queries •Reports • Analysis • Applications • On-line analytical processing(OLAP) • Data mining Accessing warehouse through
  • 15.
    Information flow incase of data mining application
  • 16.
  • 17.

Editor's Notes

  • #4 Also called As ETL Process