2. Overview
A DataWarehouse is a collection of subject
oriented databases. It is a series of processes,
procedures and tools (h/w & s/w).
From the Data Warehouse , data flows to
various customized databases. If this data is
periodically extracted from data warehouse and
loaded into local databases, then local
database is called a Data Mart.
3. Complete Warehouse Solution Architecture
Data Information Knowledge
DDaattaa SSoouurrcceess DDaattaa MMaannaaggeemmeenntt AAcccceessss
Metadata
Legacy Data
Operational Data
The Post
VISA
External Data
Sources
Enterprise
Data
Warehouse
Organizationally
structured
Extract
Transform
Load
Sales
Data
Mart
Inventory
Data
Mart
Purchase
Data
Mart
Departmentally
structured
Asset Assembly (and Management) Asset Exploitation
5. The data in the data warehouse comes from
various sources running on different platforms. An
ETL tool is used to integrate data from various
sources and load it into DataWarehouse.
INFORMATICA is an ETL tool used in the process
of Extracting data, transforming the data and
loading it in data warehouse. INFORMATICA has
two products to carry out this ETL process.
PowerCenter
PowerMart
Overview
7. Components
INFORMATICA PowerCenter has following components :
•ODBC
•PowerCenter Server: It is a application that reads,
transforms and writes data to target.
8.
9. Components
•PowerCenter Client : The client has five different
tools:
The Source Analyzer : Used to add
source definitions to the repository.
The Warehouse Designer : Used to
create targets and add their definitions to the
repository.
The Transformation Developer : Used to
create reusable transformations.
10. Components
Mapplet Designer : Used to create
mapplets.
The Mapping Designer : Used to create
mappings from source to targets.
13. Configuring Server Manager
• Informatica Server name
• Type of network protocol to access the server
– TCP/IP or IPX/SPX
• Port number on which the client
communicates (for TCP/IP) - 4001
• Address of machine on which the server runs
(for IPX/SPX)
• Timeout – number of seconds the SM waits
for response from Informatica Server
14. Configuring Server Manager
• Default directories for session files and
caches e.g $PMRootDir, $PMSessionLogDir,
$PMBadFileDir
• Defining Database Connections
• Defining FTP connections
15. Features
•INFORMATICA Server : Reads data from
sources, transforms data as instructed by
repository metadata and writes it to target.
16. Features
•Repository manager: Used to create and
manage repositories.
Repository is a database containing a set
of instructions to know from where to get data
(source), how to process/transform it and where
to write it (target). This set of instructions is called
metadata.
17. Features
You can create repository users and groups,
assign privileges and permissions, manage
folders and locks, import and export from ODBC
data sources.
•Designer: used to create mappings and target
tables.
•Server manager: used to create sessions and
configure the schedule to run the sessions.
18. Repository User Management
Multiple developers can use same repository
to create/manage multiple projects or same
project.
Informatica allows to create separate user
profile for each developer with separate
username and password.
19. Repository User Management
Privileges like Administer Server, Create sessions,
User Designer can be assigned to each user on
repository.
Groups of users can be created and privileges can
be granted to the groups.
A user can be member of one or more groups.
20. Repository User Management
Access can be restricted to individual folders within a
repository.
Permissions of following types can be granted to
Owner, Owner’s group and Repository users on
folders:
Read: Allow to view the folder and objects within
the folder.
Write: Allow to create and edit objects within the
folder.
Execute: Allow to execute or schedule a session
in the folder.
21. Designer
• Creation of mappings
MAPPING
Type of metadata that you create to specify how
to move and transform data between sources
and targets
- Stored in Repository
22.
23. Mapping
A mapping describes how to move and transform
data from sources to targets. Mapping includes:
Source
Target
Transformations
26. Transformations
There are two categories of transformations
depending upon their scope:
Standard Transformation: It is created in a mapping
and exists within that mapping. It can not be used in
other mappings.
Reusable Transformation: It is created and stored
independently in the repository. It can be used by all
mappings.
27. Transformations
Following are the types of transformations:
Expression – Calculate a value or modify text.
Operates on individual rows.
Aggregator – Perform aggregate calculations.
Operates on sets of rows.
28. Transformations
Source Qualifier – Filter records read from the
relational source only. Order records queried by
Informatica server.
Filter – Filter records sent to the targets.
Applicable to any source.
Stored Procedure – Call a stored procedure.
External procedure/Advanced External
Procedure – Call a procedure in a shared library
(e.g. a DLL) or in a COM layer of Windows NT.
29. Transformations
Sequence Generator – Generates primary
keys.
Rank – Limit records to a top or bottom range.
Normalizer – Normalize records including those
read from COBOL sources.
Lookup – Get related values.
30. Transformations
Update Strategy – Determine whether to insert,
update, delete or reject data.
Joiner – Join records from different databases
or flat file systems.
31. Transformations
Every mapping needs at least one
Source Qualifier Transformation or a
normalizer transformation for COBOL
sources.
32. Ports
A port represents a single column of data.
Every source definition, target definition and
transformation contains a collection of ports.
33. Ports
There exist four types of ports:
Input port - Receives data.
Output port – provide data.
Input/Output port – pass data.
Variable port – Used to store components
of expression.
34. Ports
Source definitions contain only output ports, since
they provide data.
Target definitions contain only input ports, since they
receive data.
Transformations contain a combination of input port,
output port and input/output port, since they can
pass the data as it is or modify the data depending
upon its type.
35. Transformation Language
Transformation Language is used to write
expressions for Transformations. It consists of
functions (similar to SQL) used to modify the
data or validate the data.
36. Transformation Language
Expressions can be written in following
types of transformations:
Aggregator
Expression
Filter
Rank
Update Strategy.
37. Transformation Language
Transformation Language consists of following
components:
Functions : E.g. AVG, COUNT, ISNULL,
SUBSTR, IIF etc.
Operators : E.g. Addition, Subtraction,
Multiplication, Division etc.
Constants : E.g. Built-in constants like TRUE
Variables : E.g. SYSDATE to represent current
date.
Return Values.
38. Mapplets
A Mapplet is a reusable object created in a
repository that represents a set of
transformations.
39. Summary
Basic steps to create a project:
Create database that contains repository.
Create data model for target.
Create repositories.
Create folders within repositories.
Import definitions of sources.
Create targets that will receive data.
40. Summary
Create mappings between source & targets,
including transformations which modify the data.
Create source & target connections in the server
manager.
Create sessions for transferring data between
source & target.
Schedule & run sessions.