Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Management in Informatica Power Center


Published on

Informatica provides the market's leading data integration platform. Tested on nearly 500,000 combinations of platforms and applications, the data integration platform inter operates with the broadest possible range of disparate standards, systems, and applications. This unbiased and universal view makes Informatica unique in today's market as a leader in the data integration platform. It also makes Informatica the ideal strategic platform for companies looking to solve data integration issues of any size.

Published in: Technology
  • Be the first to comment

Management in Informatica Power Center

  1. 1. Webinar session Management in Informatica Power Center
  2. 2. Slide 2  Topic 1 » Informatica PowerCenter 9.X – An overview  Topic 2 » Error Handling In Informatica  Topic 3 » Informatica Domain & Repository Management  Topic 4 » Informatica Recovery Concepts  Topic 5 » Informatica PowerCenter Log Management Webinar Topics
  3. 3. Slide 3 Informatica PowerCenter 9.X – An overview
  4. 4. Slide 4  Understand Informatica & Informatica Product Suite  Explain the Error Handling In Informatica  Understand Informatica Domain & Repository Management  Understand Informatica Recovery Concepts  Understand PowerCenter Log Management At the end of this module, you will be able to: Objectives
  5. 5. Slide 5 Informatica – A Product Company Informatica Corp. provides data integration software and services for various businesses, industries and government organizations including telecommunication, health care, financial and insurance services
  6. 6. Slide 6 Informatica Products & Their Functionalities  There are a wide range of products available under the Informatica product suite that helps satisfy the data integration requirements within the enterprise and beyond  Informatica's product is a portfolio focused on Data Integration: » Data Integration & ETL » Information Lifecycle Management » Complex Event Processing » Data Masking » Data Quality » Data Replication » Data Virtualization » Master Data Management » Ultra Messaging  Currently at version 9.6, these components form a toolset for establishing and maintaining enterprise-wide data warehouses
  7. 7. Slide 7 Informatica Products & Their Functionalities (Contd.)
  8. 8. Slide 8 Informatica Products & Their Functionalities (Contd.)
  9. 9. Slide 9  PowerCenter - Fully integrated end-to-end data integration platform, Informatica PowerCenter Enterprise converts raw data into information to drive analysis, daily operations, and data governance initiatives  Information Lifecycle Management - Informatica’s Information Lifecycle Management software empowers your IT organizations to cost-effectively handle data growth, safely retire legacy systems and applications, optimize test data management and protect sensitive data  Complex Event Processing - Informatica RulePoint is a complex event processing software that delivers robust and effective complex event processing with real-time alerts and insight into pertinent information to operate in a smarter, faster, efficient and competitive way  Data Masking - Informatica Data Masking products dynamically mask sensitive production data from unauthorized access, permanently and irreversibly mask nonproduction data thereby helping IT organizations to comply with data privacy regulations, organization-wide data privacy mandates and reduce the risk of a data breach Informatica Products & Their Functionalities (Contd.)
  10. 10. Slide 10  Data Quality - Informatica Data Quality provides clean, high-quality data regardless of size, data format, platform, or technology to the business. Helps validating and improving address information, profiling and cleansing business data, or implementing a data governance practice and ensure the data quality requirements are met  Data Replication - Informatica Data Replication is database-agnostic, real-time transaction replication software that’s highly scalable, reliable, and non-disruptive to the performance of operational source systems  Data Virtualization - Informatica Data Services provides a single scalable architecture for both data integration and data federation, creating a data virtualization layer that hides and handles the complexity of accessing underlying data sources - all while insulating them from change  Master Data Management - The Informatica Master Data Management (MDM) product family delivers consolidated and reliable business-critical data—also known as master data—to the applications that employees rely on every day  Ultra Messaging - Informatica Ultra Messaging is a family of next-generation, low-latency messaging middleware products. With very high throughput and 24x7 reliability, they deliver extremely low-latency application messaging over both network-based and shared-memory (inter-process) based transports Informatica Products & Their Functionalities (Contd.)
  11. 11. Slide 11 Informatica Resources  Informatica Corporate Website  Informatica University  Customer Portal  Product Documentation  Knowledge Base  Technical Support  Informatica Product Certification
  12. 12. Slide 12 Introduction to PowerCenter PowerCenter:  It is a single, unified enterprise data integration platform that allows companies and government organizations of all sizes to access, discover and integrate data from virtually any business system, in any format and deliver that data throughout the enterprise at any speed  An ETL tool ( Extract, Transform and Load)  The main advantages of PowerCenter over other ETL tools lies in its robustness, for it can be used in both Windows and Unix based systems  PowerCenter can read from a variety of different sources and write to as many targets, while transforming data in between  The main advantages of PowerCenter over other ETL tools, and hence a reason for its popularity over other such tools are as follows: » It is robust, and can be used in both windows and UNIX based systems » It is high performing yet very simple for developing, maintaining and administering
  13. 13. Slide 13 Versions of PowerCenter PowerCenter Version History:  The current version of PowerCenter is Informatica PowerCenter 9.6.1 HF2 (as of Feb ’15)  From version 9.x onwards, PowerCenter has become service oriented, with each server component being identified as a service. (Ex.: Repository service, Integration service etc.)  The previous versions of Informatica are neither in use nor under support of Informatica  For more information please visit
  14. 14. Slide 14 PowerCenter Architecture - SOA  The architecture of Informatica PowerCenter (version 9.x onwards) is based on the Service Oriented Architecture (SOA) concept  A service oriented architecture (SOA) can be defined as a group of services, which communicate with each other. The process of communication involves either simple data passing or it could involve two or more services coordinating same activity  Informatica 9.x represents a major change in the architecture of the product line Aim: Its main aim is to provide improved performance and high availability Approach: By reengineering, the underlying architecture has been made even more service-based
  15. 15. Slide 15 PowerCenter Architecture - Single Unified Architecture
  16. 16. Slide 16 Error Handling In Informatica
  17. 17. Slide 17 Error Handling In Informatica Error Handling is one of the must have components in any Data Warehouse or Data Integration project. When we start with any Data Warehouse or Data Integration projects, business users come up with set of exceptions to be handled in the ETL process. In this article, lets talk about how do we easily handle these user defined error. Identifying errors and creating an error handling strategy is very important. The 2 types of errors in an ETL process are – Data Errors & Process Errors. Data Errors : To handle Data errors we can use the Row Error Logging feature. The errors are captured into the error tables. We can then analyse, correct and reprocess them. Process errors : To handle Process errors we can configure an email task to notify the event of a session failure.
  18. 18. Slide 18 Error Handling In Informatica INFORMATICA FUNCTIONS USED Informatica PowerCenter to define our user defined error capture logic.  ERROR() : This function Causes the PowerCenter Integration Service to skip a row and issue an error message, which you define. The error message displays in the session log or written to the error log tables based on the error logging type configuration in the session.  ABORT() : Stops the session, and issues a specified error message to the session log file or written to the error log tables based on the error logging type configuration in the session. When the PowerCenter Integration Service encounters an ABORT function, it stops transforming data at that row. It processes any rows read before the session aborts.
  19. 19. Slide 19 Error Handling In Informatica
  20. 20. Slide 20 Error Handling In Informatica INFORMATICA ERROR TABLES  Once Configuration is specified, Informatica PowerCenter will create four different tables for error logging and the table details as below.  ETL_PMERR_DATA :- Stores data about a transformation row error and its corresponding source row.  ETL_PMERR_MSG :- Stores metadata about an error and the error message.  ETL_PMERR_SESS :- Stores metadata about the session.  ETL_PMERR_TRANS:- Stores metadata about the source and transformation ports, when error occurs.  With this, we are done with the setting required to capture user defined errors. Any data records which violates our data validation check will be captured into PMERR tables mentioned above.
  21. 21. Slide 21 Error Handling In Informatica REPORT THE ERROR DATA  Now we have the error data stored in the error table, we can pull the error report using an SQL Query.  We can be more fancy with the SQL and get more information from the error. select sess.FOLDER_NAME as 'Folder Name', sess.WORKFLOW_NAME as 'WorkFlow Name', sess.TASK_INST_PATH as 'Session Name', data.SOURCE_ROW_DATA as 'Source Data', msg.ERROR_MSG as 'Error MSG' from ETL_PMERR_SESS sess left outer join ETL_PMERR_DATA data on data.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and data.SESS_INST_ID = sess.SESS_INST_ID left outer join ETL_PMERR_MSG msg on msg.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and msg.SESS_INST_ID = sess.SESS_INST_ID where sess.FOLDER_NAME = <Project Folder Name> and sess.WORKFLOW_NAME = <Workflow Name> and sess.TASK_INST_PATH = <Session Name> and sess.SESS_START_TIME = <Session Run Time>
  22. 22. Slide 22 Informatica Domain & Repository Management
  23. 23. Slide 23 Overview of PowerCenter Architecture The PowerCenter tool consists of :  Client components  Server components
  24. 24. Slide 24 Client Components of PowerCenter  PowerCenter Repository Manager  PowerCenter Designer  PowerCenter Workflow Manager  PowerCenter Workflow Monitor  PowerCenter Administration Console (browser based)
  25. 25. Slide 25 Server Components of PowerCenter The PowerCenter server components comprises of the following services:  Repository service: The Repository service manages the repository. It retrieves, inserts, and updates metadata into the repository database tables  Integration service: The Integration service runs sessions and workflows  SAP BW service: The SAP BW service looks out for RFC requests from SAP BW and initiates workflows to extract data from, or load data into the SAP BW  Web services hub: The Web services hub receives requests from web service clients and exposes PowerCenter workflows as services
  26. 26. Slide 26 Overall Architecture of PowerCenter PowerCenter 9.x Architecture
  27. 27. Slide 27 Informatica- Domain & Nodes The salient features of a Domain are as follows:  A Domain is a logical collection or set of nodes and services  The PowerCenter Domain is the fundamental administrative unit of PowerCenter  A Domain can be a single PowerCenter installation, or it can consist of multiple PowerCenter installations The salient features of a node are as follows:  A node is a logical representation of a physical machine. It has physical attributes such as a hostname and a port number  Each node runs a service manager which is responsible for the application and core services  A node can be a gateway node or a worker node, but it can belong to only one Domain
  28. 28. Slide 28 Gateway Node A gateway node can be described as follows:  The gateway node is the node where all core services are meant to run  The primary function of a gateway node is to route all service request from the PowerCenter client to other available nodes  If gateway node is unavailable, a Domain cannot accept any service request, however only one node within the Domain can act as a gateway at any given point in time
  29. 29. Slide 29 Informatica- Domain & Nodes (Summarization)
  30. 30. Slide 30 How different components of PowerCenter interact
  31. 31. Slide 31 Informatica Repository Management #1. Repository is a generic term referred to container, place or room where something is stored. #2. Informatica repository is a set of database tables where Informatica stores its metadata. METADATA is data that describes other data. More specifically it is data about data. #3. Informatica repository keeps Informatica Meta data. Information about different type of objects, Example mappings, transformations, Folders, connections, user privileges etc. #4. Informatica repository metadata tables in industry also called as OPB tables/views or REP tables/views. #5. Repository is managed with client tool “Informatica power center repository manager”. Repository manager is useful for ADMIN activities. 1 You can create, edit and delete folders. 2 You can manage object and user permissions. 3 You can backup repository to local machine and restore it back to some other server. 4 You can create deployment group. 5 You can view objects and their locks and can disable write intent lock on the objects locked by you. 6 You can import and export objects. 7 You can copy objects from one folder to another.
  32. 32. Slide 32 Informatica Recovery Concepts
  33. 33. Slide 33 Informatica Recovery Concepts # Informatica Recovery Strategies • Workflow Configuration for Recovery • Session Recovery
  34. 34. Slide 34 Informatica Recovery Concepts # Workflow Recovery • Workflow recovery allows you to continue processing the workflow and workflow tasks from the point of interruption. • During the workflow recovery process Integration Service access the workflow state, which is stored in memory or on disk based on the recovery configuration. • The workflow state of operation includes the status of tasks in the workflow and workflow variable values. • The configuration includes. 1. Workflow Configuration for Recovery 2. Session and Tasks Configuration for Recovery 3. Recovering the Workflow from Failure
  35. 35. Slide 35 Informatica Recovery Concepts 1. Workflow Configuration for Recovery To configure a workflow for recovery, we must enable the workflow for recovery or configure the workflow to suspend on task error. Enable Recovery : When you enable a workflow for recovery, the Integration Service saves the workflow state of operation in a shared location. You can recover the workflow if it terminates, stops, or aborts. The workflow does not have to be running.
  36. 36. Slide 36 Informatica Recovery Concepts 1. Workflow Configuration for Recovery Suspend : When you configure a workflow to suspend on error, the Integration Service stores the workflow state of operation in memory. You can recover the suspended workflow if a task fails. You can fix the task error and recover the workflow. If the workflow is not able to recover automatically from failure with in the maximum allowed number of attempts, it goes to 'suspended' state. .
  37. 37. Slide 37 Informatica Recovery Concepts 2. Session and Tasks Configuration for Recovery Session and Tasks Each session or task in a workflow has its own recovery strategy. When the Integration Service recovers a workflow, it recovers tasks based on the recovery strategy of each task or session specified. Three different options are available. 1. Restart task 2. Fail task and continue workflow 3. Resume from the last check point for Recovery
  38. 38. Slide 38 Informatica Recovery Concepts 1. Restart task : This recovery strategy is available for all type of workflow tasks. When the Integration Service recovers a workflow, it restarts each recoverable task that is configured with a restart strategy. You can configure Session and Command tasks with a restart recovery strategy. All other tasks have a restart recovery strategy by default. 2. Fail task and continue workflow : This recovery strategy is only available for session and command tasks. When the Integration Service recovers a workflow, it does not recover the task. The task status becomes failed, and the Integration Service continues running the workflow. Configure a fail recovery strategy if you want to complete the workflow, but you do not want to recover the task. 3. Resume from the last checkpoint : This recovery strategy is only available for session tasks. The Integration Service saves the session state of operation and maintains target recovery tables. If the session aborts, stops, or terminates, the Integration Service uses the saved recovery information to resume the session from the point of interruption.
  39. 39. Slide 39 Informatica Recovery Concepts 3. Recovering the Workflow from Failure Workflow can be either recovered automatically or manually depending on the workflow recovery strategy Recovering Automatically If you have High Availability (HA) licence and the workflow is configured to recover automatically as described above, Integration service automatically attempts to recover the workflow based on the recovery strategy set of each session or task in the workflow. If the workflow is not able to recover automatically from failure with in the maximum allowed number of attempts, it goes to 'suspended' state, which can be then manually recovered. Recovering Manually If you do not have High Availability (HA) licence, you can manually recover the workflow or individual tasks with in a workflow separately. You can access the options as shown in below image from the workflow manager or from the workflow monitor.
  40. 40. Slide 40 Informatica Recovery Concepts 3. Recovering the Workflow from Failure Recovering Manually Recover workflow :- Continue processing the workflow from the point of interruption. Recover Task :- Recover a session but not the rest of the workflow. Recover workflow from a task :- Recover a session and continue processing a workflow.
  41. 41. Slide 41 PowerCenter Log Management
  42. 42. Slide 42 Informatica Log Management Informatica Log Management Workflow can be either recovered automatically or manually depending on the workflow recovery strategy The Integration service will be generate two logs when the mapping runs 1) Session log -- Has the details of the task ,session errors and load statistics.. 2) Workflow log -- Has the details of the workflow processing, and workflow errors.. The workflow log will be generated when the workflow started and the session log will be generated once the session initiated.
  43. 43. Slide 43 Informatica Log Management Informatica Log Management The workflow log will be generated when the workflow started and the session log will be generated once the session initiated. The below process will happen the when the workflow initiated.. 1. The Integration Service writes binary log files on the node. It sends information about the sessions and workflows to the Log Manager. 2. The Log Manager stores information about workflow and session logs in the domain configuration database. The domain configuration database stores information such as the path to the log file location, the node that contains the log, and the Integration Service that created the log. 3. When you view a session or workflow in the Log Events window, the Log Manager retrieves the information from the domain configuration database to determine the location of the session or workflow logs. 4. The Log Manager dispatches a Log Agent to retrieve the log events on each node to display in the Log Events window.
  44. 44. Slide 44 Informatica Log Management Informatica Log Management When a workflow is invoked the Integration Service creates the following output files: Workflow log :The Integration Service process creates a workflow log for each workflow it runs. It writes information in the workflow log such as initialization of processes, workflow task run information, errors encountered, and workflow run summary. Session log : The Integration Service process creates a session log for each session it runs. It writes information in the session log such as initialization of processes, session validation, creation of SQL commands for reader and writer threads, errors encountered, and load summary. Session detail : When you run a session, the Workflow Manager creates session details that provide load statistics for each target in the mapping Performance Detail : Performance details provide transformation-by-transformation information on the flow of data through the session. Reject Files : By default, the Integration Service process creates a reject file for each target in the session. The reject file contains rows of data that the writer does not write to targets.
  45. 45. Slide 45 Informatica Log Management Informatica Log Management When a workflow is invoked the Integration Service creates the following output files: Row Error Logs : When a row error occurs, the Integration Service process logs error information that allows you to determine the cause and source of the error. Recovery Tables Files : The Integration Service process creates recovery tables on the target database system when it runs a session enabled for recovery. When you run a session in recovery mode. Indicator File : If you use a flat file as a target, you can configure the Integration Service to create an indicator file for target row type information. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete, or reject. Cache Files : When the Integration Service process creates memory cache, it also creates cache files. The Integration Service process creates cache files for the following mapping objects: Aggregator transformation, Joiner transformation,Rank transformation, Lookup transformation, Sorter transformation, XML target.
  46. 46. Slide 46 Informatica Log Management Informatica Logs - Different Types of Tracing Levels In Informatica The tracing levels can be configured at the transformation And/OR session level in informatica. There are 4 different types of tracing levels. The different types of tracing levels are listed below: Tracing levels: •None: Applicable only at session level. The Integration Service uses the tracing levels configured in the mapping. •Terse: logs initialization information, error messages, and notification of rejected data in the session log file. •Normal: Integration Service logs initialization and status information, errors encountered and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows. •Verbose Initialization: In addition to normal tracing, the Integration Service logs additional initialization details; names of index and data files used, and detailed transformation statistics. •Verbose Data: In addition to verbose initialization tracing, the Integration Service logs each row that passes into the mapping. Also notes where the Integration Service truncates string data to fit the precision of a column and provides detailed transformation statistics. When you configure the tracing level to verbose data, the Integration Service writes row data for all rows in a block when it processes a transformation.
  47. 47. Slide 47 Informatica Log Management
  48. 48. Slide 48 Informatica Log Management
  49. 49. Slide 49  Module 9 » Performance Tuning & Optimization  Module 10 » PowerCenter Repository Manager  Module 11 » Informatica Administration Console & Security  Module 12 » Informatica 9.X - Technical Architecture  Module 13 » Informatica Installation & Operations Manual  Module 14 » Command line utilities  Module 15 » ETL Scenarios using Informatica  Module 16 » Best Practises & Velocity Methodologies  Module 1 » Informatica PowerCenter 9.X – An overview  Module 2 » ETL Fundamentals  Module 3 » PowerCenter Designer  Module 4 » PowerCenter Workflow Manager & Monitor  Module 5 » Advanced Transformation Techniques  Module 6 » Parameters & Variables  Module 7 » Debugging Troubleshooting Error Handling & Recovery  Module 8 » Cache Course Topics
  50. 50. Questions Slide 50
  51. 51. Slide 51