Informatica partitions


Published on

Types of Partitions in Informatica 8

Published in: Education, Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Memory Optimization 15. Time: Lecture: XX minutes; Labs: 0 minutes Intent: One sentence description of the reason this module is here Flow: Narrative or “storyline” version of the module’s content in a paragraph or so Key Terms: List terms introduced in the module Module Setup: Any physical setup the instructor may need to do before starting the module
  • Memory Optimization 15.
  • Memory Optimization 15. This diagram is a simplification of how the DTM uses memory. The DTM Buffer allows each thread to pass data on to the next thread and for the Writer to receive data to pass to the target. The DTM Buffer is divided into blocks. Different threads control different blocks. If there are multiple transformer threads, each requires its own set of blocks to pass data to the next thread. Thus, the number of required blocks is a function of the number of sources, targets, & stages in your pipeline. In addition to the DTM Buffer, certain transformations require memory known as the transformation caches . The transformation caches reside outside of the DTM Buffer. That means the transformation caches represent an additional memory requirement beyond the DTM Buffer.
  • Memory Optimization 15.
  • Memory Optimization 15.
  • Memory Optimization 15.
  • Memory Optimization 15. The transformation caches are separate from the DTM Buffer.
  • Memory Optimization 15. Use the auto settings as a starting point. Check the session log to see the actual runtime allocations. Note that each transformation stage also requires a minimum of 2 blocks.
  • Memory Optimization 15. Purpose: To allow for a review. Steps: Ensure that students “got” the material, have completed lab successfully, etc.
  • Informatica partitions

    1. 1. Informatica Partitioning
    2. 2. Partitioning Sessions <ul><li>Performance can be improved by processing data in parallel in a single session by creating multiple partitions of the pipeline. If you have PowerCenter partitioning available, you can increase the number of partitions in a pipeline to improve session performance. Increasing the number of partitions allows the Integration Service to create multiple connections to sources and process partitions of source data concurrently. </li></ul>
    3. 3. Session Partition WRITER Source data Target data Target data THREAD 1 THREAD 2 READER TRANSFORMATION
    4. 4. Partition Points & Partitions
    5. 5. Partition Types <ul><li>Round-robin Partitioning </li></ul><ul><li>Hash Partitioning </li></ul><ul><li>Key Range Partitioning </li></ul><ul><li>Pass-through Partitioning </li></ul>
    6. 6. Partition Types <ul><li>Round-robin Partitioning </li></ul><ul><li>The Integration service distributes data evenly among all partitions. Use round-robin partitioning when you need to distribute rows evenly and do not need to group data among partitions. </li></ul><ul><li>Hash Partitioning </li></ul><ul><li>The PowerCenter Server uses a hash function to group rows of data among partitions. The Server groups the data based on a partition key. There are two types of hash partitioning: </li></ul>
    7. 7. Partition Types <ul><li>Hash auto-keys. The Integration Service uses all grouped or sorted ports as a compound partition key. You can use hash auto-keys partitioning at or before Rank, Sorter, and unsorted Aggregator transformations to ensure that rows are grouped properly before they enter these transformations. </li></ul><ul><li>Hash user keys. The Integration Service uses a hash function to group rows of data among partitions based on a user-defined partition key. You choose the ports that define the partition key. </li></ul>
    8. 8. Partition Types <ul><li>Key Range Partitioning </li></ul><ul><li>With this type of partitioning, you specify one or more ports to form a compound partition key for a source or target. The Integration Service then passes data to each partition depending on the ranges you specify for each port. </li></ul><ul><li>Pass-through Partitioning </li></ul><ul><li>In this type of partitioning, the Integration Service passes all rows at one partition point to the next partition point without redistributing them. </li></ul>
    9. 9. Optimizing Sorter/Aggregator with partitions <ul><li>Add a hash auto-keys partition to Sorter/Aggregator transformation. </li></ul><ul><li>To obtain expected results and get best performance when partitioning a sorter/Aggregator transformation, you must group and sort data. To group data, ensure that rows with the same key value are routed to the same partition.  The best way to ensure that data is grouped and distributed evenly among partitions is to add a hash auto-keys partition. </li></ul>
    10. 10. How Hash key partition works ? <ul><li>Hash partitioning maps data to partitions based on a hashing algorithm for the specified partitioning keys. </li></ul><ul><li>Hash functions can be used to locate records in a large file which have similar keys. For that purpose, one needs a hash function that maps similar keys to hash values that differ by at most m, where m is a small integer (say, 1 or 2). The Hash function groups the similar records in the same bucket. </li></ul>
    11. 11. Summary <ul><li>This presentation showed you how to: </li></ul><ul><li>Problem Definition </li></ul><ul><li>Informatica Partitions </li></ul><ul><li>Approach the performance tuning challenge </li></ul>