SlideShare a Scribd company logo
1 of 49
DATA STAGE:
Data stage is an ETL tool where it uses a client/server design
where jobs are created and administered via windows client against
central repository on a server.
Purpose of using data stage:
1.design jobs for execution , Transformation and loading.
2.schedule,run and monitor jobs all within the data stage.
3.create batch controlling jobs.
Data stage client
tools
Data stage
administrato
r
Data stage
manager
Data stage server
Data stage
Designer
Data stage
Director
Data
base
Data
ware
house
Data
mart
Data stage
components
DATA STAGE CLIENT COMPONENTS:
• Data Stage Administrator : This components will be used for to perform create or
delete the projects. , cleaning metadata stored in repository.
• Data stage Manager: it will be used for to perform the following task like..
a) Create the table definitions.
b) Metadata back-up and recovery can be performed.
c) Create the customized components.
DATA STAGE CLIENT COMPONENTS:
• Data stage Director: It is used to validate,
schedule,
run and
monitor the Data stage jobs.
• Data stage Designer: It is used to create the Data stage application known as job. The following activities can be performed with
designer window.
a) Create the source definition.
b) Create the target definition.
c) Develop Transformation Rules
d) Design Jobs.
• Data Stage Repository: It is one of the server side components which is defined to store the information to build out Data Ware
House.
• Data Stage Server: This is defined to execute the job while we are creating Data stage jobs.
DATA STAGE JOB’S:
Job : Job is nothing but it is ordered series of individual stages which are linked
together to describe the flow of data from source and target.
Types of jobs:
• Server jobs
• Parallel Jobs
• Mainframe Jobs
DATA STAGE JOB’S:
Server jobs:
• In server jobs it handle less volume of data
with more performance.
• It is having less number of components.
• Data processing will be slow.
• It work’s on SMP (Symmetric Multi
Processing).
• It is highly impact usage of transformer stage.
Parallel jobs:
• It handles high volume of data.
• It work’s on parallel processing concepts.
• It applies parallelism techniques.
• It follows MPP (Massively parallel
Processing).
• It is having more number of components
compared to server jobs.
• It work’s on orchestrate framework
STAGE:
A stage defines a database, file and processing.
Built-in stage:- These stages defines the extraction, Transformation, and loading.
They are divided into two types
• Passive stages:- which defines read and write access are known as passive stages.
EX:- All Database stages in palette window by designer.
• Active stages:- which defines the data transformation and filtering the data known as active stage.
Ex:- All Processing Stages.
PARALLELISM TECHNIQUES:
It is a process to perform ETL task in parallel approach need to build the data
warehouse. The parallel jobs support the following hardware system like SMP, MPP to
achieve the parallelism.
There are two types of parallelism techniques:
• Pipeline parallelism.
• Partition parallelism.
TYPES OF PARALLELISM TECHNIQUES:
• Pipeline parallelism : the data flow continuously throughout it pipeline . All stages in the job are
Operating simultaneously.
EX: my source is having 4 records as soon as first record starts processing, then all remaining
records processing simultaneously.
• Partition parallelism : in this parallelism , the same job would effectively be run simultaneously by
several processors. Each processors handles separate subset of total records.
EX: my source is having 100 records and 4 partitions. The data will be equally partition across 4
partitions that mean the partitions will get 25 records. Whenever the first partition starts, the
remaining three partitions start processing simultaneously and parallel.
DATA PARTITIONING TECHNIQUES IN DATA
STAGE:
In this data partitioning method the data splits into various partitions distribute across the
processors.
The data partitioning techniques are:
• Auto
• Hash
• Modulus
• Random
• Range
• Round Robin
DATA PARTITIONING TECHNIQUES:
• Round Robin: the first record goes to first processing node, second record goes to the
second processing node and so on….. This method is useful for creating equal size of
partition.
• Hash: The records with the same values for the hash-key field given to the same
processing node.
• Modulus: This partition is based on key column module. This partition is similar to hash
partition.
• Random: The records are randomly distributed across all processing nodes.
• Range: The related records are distributed across the one node . The range is specified
based on key column.
• Auto: This is most common method. The data stage determines the best partition
method to use depending upon the type of stage.
FILE STAGES:
• Sequential File stage: - it is one of the file stages which it can be used to reading the
data from file or writing the data to file. It can support single Input link or single
output link and as well as reject link.
Properties Tab
DATASET:
• Dataset: It is also one of the file stages which it can be used to store the data on
internal format, it is related operating system. So, it will take less time to read or
write the data.
Properties Tab
FILE SET:
• File Set: It is also one of the file stage which it can be used to read or write the data
on file set. The file it can be saved with the extension of ‘.fs’. it operating parallel
Properties Tab
Data set
• It stores data in binary in the internal
format of Data Stage so, it takes less
time to read/write from dataset than any
other source/target.
• It preserves the partitioning schemes so
that you don't have to partition it again.
• You cannot view data without data stage
File set
• It stores data in the format similar to a
sequential file.
• Only advantage of using file set over a
sequential file is "it preserves
partitioning scheme".
• You can view the data but in the order
defined in partitioning scheme.
COMPLEX FLAT FILE:
• This file is used to read the data form Mainframe file. By using CFF we
can read ASCII or EBCDIC (Extended Binary coded Decimal Interchange
Code) data. We can select the required columns and can omit the
remaining. We can collect the rejects (bad formatted records) by setting the
property reject to save (other options: continue fail). We can flatten the
arrays (COBOL files).
Properties Tab
TYPES OF PROCESSING STAGES:
• Aggregator stage: It is one the processing stage which it can be used to perform the
summaries for the group of input data. It can support single input link which carries
the input data and it can support single out put link which carries aggregated data to
output link.
Properties Tab
• Copy stage: It is also one of the processing stages which it can be used just to copy the
input data to number of output links. It can support single input link and number of
output links.
Properties Tab
• Filter stage: it is also one of the processing stage which it can be used to perform the
filter the data based on given condition. It can support single input link and N no of
output links and optionally it support one reject link.
Properties Tab
• Switch stage: it is also one of the processing stage which it can be used to filter the
input data based on given conditions. It can support single input link and 128 output
links .
Properties Tab
• Join stage: It is also one of the processing stages which can be used to combine two or
more input datasets based on key field. It can support two or more input datasets and
one output dataset, and doesn‟t support reject link.
• Join can be performing inner join, left outer, right outer, full-outer joins.
1.Inner join – is to display the matched records from both
the side tables.
2.Left-outer join- is to show the matched records from
both sides as well as unmatched records from left side
table.
3.Right-outer join – is to show the matched records from
both sides as well as unmatched records from right side
table.
4.Full-outer join- is to show the matched as well as
unmatched records from both sides.
Properties Tab
• Merge stage: It is also one of the processing stages which it can be used to merge the
multiple input data. It can support multiple input links, the first input link is called master
input link and remaining links are called Updated links.
• It can be perform inner join and left-outer join only.
Properties Tab
• Look-up stage: This is also one of the processing stages which can be used to look-up
on relational tables. It can support multiple input links and single output link and
support single reject link.
• This stage will be performing inner join and left-outer join.
Properties Tab
It is having two options like:
• Condition not met.
• Lookup Failure.
• Default it is having fail. If the condition not met =“drop”, it will be perform inner join.
If the condition not met option = continue, it will be perform left-outer join. Default it
having lookup Failure=Fail. If it is having continue option will be support Reject link.
• Funnel stage: it is also one of Active processing stage which can be used to combined the
multiple input datasets into single output datasets.
Properties Tab
• Remove Duplicate Stage: it is also one of processing stage which it can be used to remove
the duplicates data based on key field. It can support single input link and single output
link.
Properties Tab
• Sort stage: It is also one of the processing stage which can be used to sort data based on
key field, either ascending order or descending order. It can be support single input link
and single output link.
Properties Tab
• Modify Stage: It is also one of the processing stages which it can be used to
when you are able to handle Null handling and Data type changes. It is
used to change the data types if the source contains the varchar and the
target contains integer then we have to use this Modify Stage and we
have to change according to the requirement. And we can do some
modification in length also.
Properties Tab
• Pivot stage: it is also one of the processing stage. Pivot stage only CONVERTS
COLUMNS INTO ROWS and nothing else. Some DS Professionals refer to this as
NORMALIZATION.
Properties Tab
• Surrogate key Generator: It is also one important stage on processing stage which it can
be used to generate the sequence numbers while implementing slowly changing
dimension. It is a system generated key on dimensional tables.
Properties Tab
• Transformer Stage: It is an active processing stage which allows filtering the data based
on given condition and can derive new data definitions by developing an expression. This
stage uses Microsoft.net framework environment for it’s compilation. The transformer
stage can be performing data cleaning and data scrubbing operation. It can have single
input link and number of output links and also reject link.
Properties Tab
From the above slide we can see it is having stage variables, Derivations, and Constraints.
• Stage Variable - An intermediate processing variable that retains value during read and
doesn’t pass the value into target column.
• Derivation - Expression that specifies value to be passed on to the target column.
• Constraints - Conditions that are either true or false that specifies flow of data with a
link.
Stage Properties Tab
Stage variables
Derivations
constraints
Order of execution:
• Change Capture Stage: This is also one of the active processing stage which it can be
used to capture the changes between two sources like After and Before. The source which
is used as reference to capture the change is called after dataset. The source which we are
looking for the change is called before dataset. The change code will be added in output
dataset. So, by this change code will be recognizing delete, insert or update.
Properties Tab
• Row generator Stage: it produces set of data fitting specified meta data. It is useful
where you want to test your job but have no real data available to process. It is having
no input links and a single output link.
Properties Tab
• Column generator stage: This stage adds the columns to incoming data and generates
mock data for these columns for each data row processed. It can have single input link
and single output link.
• Ex: if input file has some records and now we need to generate unique id for the input
file by using column generator stage we can add unique id for the input file.
Properties Tab
• Head Stage: This stage helpful for testing .and debug the application with large
datasets. This stage selects TOP N rows from the input dataset and copies the selected
rows to an output datasets. It can have a single input link and single output link.
Properties Tab
• Tail Stage: This stage helpful for testing and debug the application with large datasets.
This stage selects BOTTOM N rows from the input dataset and copies the selected rows
to an output datasets. It can have a single input link and single output link.
Properties Tab
• Peek Stage: it can have a single input link and any number output link. It can be used
to print the record column values to the job log view.
Properties Tab
• The Slowly Changing Dimension (SCD): stage is a processing stage that works within
the context of a star schema database. The SCD stage has a single input link, a single
output link, a dimension reference link, and a dimension update link.
• The SCD stage reads source data on the input link, performs a dimension table lookup
on the reference link, and writes data on the output link. The output link can pass data
to another SCD stage, to a different type of processing stage, or to a fact table. The
dimension update link is a separate output link that carries changes to the dimension.
You can perform these steps in a single job or a series of jobs, depending on the number
of dimensions in your database and your performance requirements.
Example:
Properties Tab
SCD type-2:
• Adds a new row to a dimension table.
• Each SCD stage processes a single dimension and performs lookups by using an equality
matching technique. If the dimension is a database table, the stage reads the database
to build a lookup table in memory. If a match is found, the SCD stage updates rows in
the dimension table to reflect the changed data. If a match is not found, the stage
creates a new row in the dimension table. All of the columns that are needed to create a
new dimension row must be present in the source data
Example:
• Ex : The source is having EMP table with 100 records. When I was run the job, the
records was initially loaded into target insert stage, how it means, First compare two
input datasets, in first time there is no change in the records.
• So, the change capture stage gives the change code=1. The transformer stage
transforms the records from source to target by generating sequence to the records
by using stored procedure stage.
• If any update is occurred at source, that update records will be stored to target side
(TGT_UPDATE). How it will be store means, first two compare two input datasets,
changes is occurred at source level then change capture stage gives the change
code=3 , by using this change code, the transformer stage transform the records to
join stage through the update link.
• Join stage joins the updated records and target records by removing duplicate
records using remove duplicate stage. The output of the join stage to connected to
transform stage which was transforming update records to target update stage.
Data stage

More Related Content

What's hot

Physical architecture of sql server
Physical architecture of sql serverPhysical architecture of sql server
Physical architecture of sql serverDivya Sharma
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
How Impala Works
How Impala WorksHow Impala Works
How Impala WorksYue Chen
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceCloudera, Inc.
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
 
Introduction of ssis
Introduction of ssisIntroduction of ssis
Introduction of ssisdeepakk073
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfEric Xiao
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDBvaluebound
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergWalaa Eldin Moustafa
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Beat Signer
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in ImpalaCloudera, Inc.
 
jpa-hibernate-presentation
jpa-hibernate-presentationjpa-hibernate-presentation
jpa-hibernate-presentationJohn Slick
 
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesBest Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesDATAVERSITY
 

What's hot (20)

MS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTUREMS-SQL SERVER ARCHITECTURE
MS-SQL SERVER ARCHITECTURE
 
Physical architecture of sql server
Physical architecture of sql serverPhysical architecture of sql server
Physical architecture of sql server
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Achieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on TezAchieving 100k Queries per Hour on Hive on Tez
Achieving 100k Queries per Hour on Hive on Tez
 
Sqoop
SqoopSqoop
Sqoop
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Introduction of ssis
Introduction of ssisIntroduction of ssis
Introduction of ssis
 
6.hive
6.hive6.hive
6.hive
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Incremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and IcebergIncremental View Maintenance with Coral, DBT, and Iceberg
Incremental View Maintenance with Coral, DBT, and Iceberg
 
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
Structured Query Language (SQL) - Lecture 5 - Introduction to Databases (1007...
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
jpa-hibernate-presentation
jpa-hibernate-presentationjpa-hibernate-presentation
jpa-hibernate-presentation
 
Sql server basics
Sql server basicsSql server basics
Sql server basics
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Best Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar DatabasesBest Practices in the Use of Columnar Databases
Best Practices in the Use of Columnar Databases
 

Similar to Data stage

Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform LoadABDUL KHALIQ
 
Sap abap
Sap abapSap abap
Sap abapnrj10
 
__Stude_DATA FLOW DIAGRAMS.ppt
__Stude_DATA FLOW DIAGRAMS.ppt__Stude_DATA FLOW DIAGRAMS.ppt
__Stude_DATA FLOW DIAGRAMS.pptVinayShekarReddy
 
Pl sql best practices document
Pl sql best practices documentPl sql best practices document
Pl sql best practices documentAshwani Pandey
 
Tableau Basic Questions
Tableau Basic QuestionsTableau Basic Questions
Tableau Basic QuestionsSooraj Vinodan
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
Introduction to STATA(2).pdf
Introduction to STATA(2).pdfIntroduction to STATA(2).pdf
Introduction to STATA(2).pdfYomif3
 
Sap abap questions
Sap abap questionsSap abap questions
Sap abap questionsrasikaj123
 
data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...aasifkuchey85
 
MuleSoft Surat Virtual Meetup#30 - Flat File Schemas Transformation With Mule...
MuleSoft Surat Virtual Meetup#30 - Flat File Schemas Transformation With Mule...MuleSoft Surat Virtual Meetup#30 - Flat File Schemas Transformation With Mule...
MuleSoft Surat Virtual Meetup#30 - Flat File Schemas Transformation With Mule...Jitendra Bafna
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...Amazon Web Services
 
FME Server Workspace Patterns - Continued
FME Server Workspace Patterns - ContinuedFME Server Workspace Patterns - Continued
FME Server Workspace Patterns - ContinuedSafe Software
 
Oracle Course
Oracle CourseOracle Course
Oracle Courserspaike
 
SFDC Other Platform Features
SFDC Other Platform FeaturesSFDC Other Platform Features
SFDC Other Platform FeaturesSujit Kumar
 
Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&toolsAmandeep Gill
 
123448572 all-in-one-informatica
123448572 all-in-one-informatica123448572 all-in-one-informatica
123448572 all-in-one-informaticahomeworkping9
 

Similar to Data stage (20)

Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
 
Sap abap
Sap abapSap abap
Sap abap
 
__Stude_DATA FLOW DIAGRAMS.ppt
__Stude_DATA FLOW DIAGRAMS.ppt__Stude_DATA FLOW DIAGRAMS.ppt
__Stude_DATA FLOW DIAGRAMS.ppt
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data WarehousingDatastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
Bdc details
Bdc detailsBdc details
Bdc details
 
Pl sql best practices document
Pl sql best practices documentPl sql best practices document
Pl sql best practices document
 
Tableau Basic Questions
Tableau Basic QuestionsTableau Basic Questions
Tableau Basic Questions
 
Data warehouse physical design
Data warehouse physical designData warehouse physical design
Data warehouse physical design
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
Introduction to STATA(2).pdf
Introduction to STATA(2).pdfIntroduction to STATA(2).pdf
Introduction to STATA(2).pdf
 
Sap abap questions
Sap abap questionsSap abap questions
Sap abap questions
 
data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...
 
MuleSoft Surat Virtual Meetup#30 - Flat File Schemas Transformation With Mule...
MuleSoft Surat Virtual Meetup#30 - Flat File Schemas Transformation With Mule...MuleSoft Surat Virtual Meetup#30 - Flat File Schemas Transformation With Mule...
MuleSoft Surat Virtual Meetup#30 - Flat File Schemas Transformation With Mule...
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 
The ABAP Query
The ABAP QueryThe ABAP Query
The ABAP Query
 
FME Server Workspace Patterns - Continued
FME Server Workspace Patterns - ContinuedFME Server Workspace Patterns - Continued
FME Server Workspace Patterns - Continued
 
Oracle Course
Oracle CourseOracle Course
Oracle Course
 
SFDC Other Platform Features
SFDC Other Platform FeaturesSFDC Other Platform Features
SFDC Other Platform Features
 
Data Preprocessing&tools
Data Preprocessing&toolsData Preprocessing&tools
Data Preprocessing&tools
 
123448572 all-in-one-informatica
123448572 all-in-one-informatica123448572 all-in-one-informatica
123448572 all-in-one-informatica
 

Recently uploaded

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Recently uploaded (20)

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Data stage

  • 1.
  • 2. DATA STAGE: Data stage is an ETL tool where it uses a client/server design where jobs are created and administered via windows client against central repository on a server. Purpose of using data stage: 1.design jobs for execution , Transformation and loading. 2.schedule,run and monitor jobs all within the data stage. 3.create batch controlling jobs.
  • 3. Data stage client tools Data stage administrato r Data stage manager Data stage server Data stage Designer Data stage Director Data base Data ware house Data mart Data stage components
  • 4. DATA STAGE CLIENT COMPONENTS: • Data Stage Administrator : This components will be used for to perform create or delete the projects. , cleaning metadata stored in repository. • Data stage Manager: it will be used for to perform the following task like.. a) Create the table definitions. b) Metadata back-up and recovery can be performed. c) Create the customized components.
  • 5. DATA STAGE CLIENT COMPONENTS: • Data stage Director: It is used to validate, schedule, run and monitor the Data stage jobs. • Data stage Designer: It is used to create the Data stage application known as job. The following activities can be performed with designer window. a) Create the source definition. b) Create the target definition. c) Develop Transformation Rules d) Design Jobs. • Data Stage Repository: It is one of the server side components which is defined to store the information to build out Data Ware House. • Data Stage Server: This is defined to execute the job while we are creating Data stage jobs.
  • 6. DATA STAGE JOB’S: Job : Job is nothing but it is ordered series of individual stages which are linked together to describe the flow of data from source and target. Types of jobs: • Server jobs • Parallel Jobs • Mainframe Jobs
  • 7. DATA STAGE JOB’S: Server jobs: • In server jobs it handle less volume of data with more performance. • It is having less number of components. • Data processing will be slow. • It work’s on SMP (Symmetric Multi Processing). • It is highly impact usage of transformer stage. Parallel jobs: • It handles high volume of data. • It work’s on parallel processing concepts. • It applies parallelism techniques. • It follows MPP (Massively parallel Processing). • It is having more number of components compared to server jobs. • It work’s on orchestrate framework
  • 8. STAGE: A stage defines a database, file and processing. Built-in stage:- These stages defines the extraction, Transformation, and loading. They are divided into two types • Passive stages:- which defines read and write access are known as passive stages. EX:- All Database stages in palette window by designer. • Active stages:- which defines the data transformation and filtering the data known as active stage. Ex:- All Processing Stages.
  • 9. PARALLELISM TECHNIQUES: It is a process to perform ETL task in parallel approach need to build the data warehouse. The parallel jobs support the following hardware system like SMP, MPP to achieve the parallelism. There are two types of parallelism techniques: • Pipeline parallelism. • Partition parallelism.
  • 10. TYPES OF PARALLELISM TECHNIQUES: • Pipeline parallelism : the data flow continuously throughout it pipeline . All stages in the job are Operating simultaneously. EX: my source is having 4 records as soon as first record starts processing, then all remaining records processing simultaneously. • Partition parallelism : in this parallelism , the same job would effectively be run simultaneously by several processors. Each processors handles separate subset of total records. EX: my source is having 100 records and 4 partitions. The data will be equally partition across 4 partitions that mean the partitions will get 25 records. Whenever the first partition starts, the remaining three partitions start processing simultaneously and parallel.
  • 11. DATA PARTITIONING TECHNIQUES IN DATA STAGE: In this data partitioning method the data splits into various partitions distribute across the processors. The data partitioning techniques are: • Auto • Hash • Modulus • Random • Range • Round Robin
  • 12. DATA PARTITIONING TECHNIQUES: • Round Robin: the first record goes to first processing node, second record goes to the second processing node and so on….. This method is useful for creating equal size of partition. • Hash: The records with the same values for the hash-key field given to the same processing node. • Modulus: This partition is based on key column module. This partition is similar to hash partition. • Random: The records are randomly distributed across all processing nodes. • Range: The related records are distributed across the one node . The range is specified based on key column. • Auto: This is most common method. The data stage determines the best partition method to use depending upon the type of stage.
  • 13. FILE STAGES: • Sequential File stage: - it is one of the file stages which it can be used to reading the data from file or writing the data to file. It can support single Input link or single output link and as well as reject link. Properties Tab
  • 14. DATASET: • Dataset: It is also one of the file stages which it can be used to store the data on internal format, it is related operating system. So, it will take less time to read or write the data. Properties Tab
  • 15. FILE SET: • File Set: It is also one of the file stage which it can be used to read or write the data on file set. The file it can be saved with the extension of ‘.fs’. it operating parallel Properties Tab
  • 16. Data set • It stores data in binary in the internal format of Data Stage so, it takes less time to read/write from dataset than any other source/target. • It preserves the partitioning schemes so that you don't have to partition it again. • You cannot view data without data stage File set • It stores data in the format similar to a sequential file. • Only advantage of using file set over a sequential file is "it preserves partitioning scheme". • You can view the data but in the order defined in partitioning scheme.
  • 17. COMPLEX FLAT FILE: • This file is used to read the data form Mainframe file. By using CFF we can read ASCII or EBCDIC (Extended Binary coded Decimal Interchange Code) data. We can select the required columns and can omit the remaining. We can collect the rejects (bad formatted records) by setting the property reject to save (other options: continue fail). We can flatten the arrays (COBOL files). Properties Tab
  • 18. TYPES OF PROCESSING STAGES: • Aggregator stage: It is one the processing stage which it can be used to perform the summaries for the group of input data. It can support single input link which carries the input data and it can support single out put link which carries aggregated data to output link. Properties Tab
  • 19. • Copy stage: It is also one of the processing stages which it can be used just to copy the input data to number of output links. It can support single input link and number of output links. Properties Tab
  • 20. • Filter stage: it is also one of the processing stage which it can be used to perform the filter the data based on given condition. It can support single input link and N no of output links and optionally it support one reject link. Properties Tab
  • 21. • Switch stage: it is also one of the processing stage which it can be used to filter the input data based on given conditions. It can support single input link and 128 output links . Properties Tab
  • 22. • Join stage: It is also one of the processing stages which can be used to combine two or more input datasets based on key field. It can support two or more input datasets and one output dataset, and doesn‟t support reject link. • Join can be performing inner join, left outer, right outer, full-outer joins.
  • 23. 1.Inner join – is to display the matched records from both the side tables. 2.Left-outer join- is to show the matched records from both sides as well as unmatched records from left side table. 3.Right-outer join – is to show the matched records from both sides as well as unmatched records from right side table. 4.Full-outer join- is to show the matched as well as unmatched records from both sides.
  • 25. • Merge stage: It is also one of the processing stages which it can be used to merge the multiple input data. It can support multiple input links, the first input link is called master input link and remaining links are called Updated links. • It can be perform inner join and left-outer join only. Properties Tab
  • 26. • Look-up stage: This is also one of the processing stages which can be used to look-up on relational tables. It can support multiple input links and single output link and support single reject link. • This stage will be performing inner join and left-outer join. Properties Tab
  • 27. It is having two options like: • Condition not met. • Lookup Failure. • Default it is having fail. If the condition not met =“drop”, it will be perform inner join. If the condition not met option = continue, it will be perform left-outer join. Default it having lookup Failure=Fail. If it is having continue option will be support Reject link.
  • 28. • Funnel stage: it is also one of Active processing stage which can be used to combined the multiple input datasets into single output datasets. Properties Tab
  • 29. • Remove Duplicate Stage: it is also one of processing stage which it can be used to remove the duplicates data based on key field. It can support single input link and single output link. Properties Tab
  • 30. • Sort stage: It is also one of the processing stage which can be used to sort data based on key field, either ascending order or descending order. It can be support single input link and single output link. Properties Tab
  • 31. • Modify Stage: It is also one of the processing stages which it can be used to when you are able to handle Null handling and Data type changes. It is used to change the data types if the source contains the varchar and the target contains integer then we have to use this Modify Stage and we have to change according to the requirement. And we can do some modification in length also. Properties Tab
  • 32. • Pivot stage: it is also one of the processing stage. Pivot stage only CONVERTS COLUMNS INTO ROWS and nothing else. Some DS Professionals refer to this as NORMALIZATION. Properties Tab
  • 33. • Surrogate key Generator: It is also one important stage on processing stage which it can be used to generate the sequence numbers while implementing slowly changing dimension. It is a system generated key on dimensional tables. Properties Tab
  • 34. • Transformer Stage: It is an active processing stage which allows filtering the data based on given condition and can derive new data definitions by developing an expression. This stage uses Microsoft.net framework environment for it’s compilation. The transformer stage can be performing data cleaning and data scrubbing operation. It can have single input link and number of output links and also reject link.
  • 36. From the above slide we can see it is having stage variables, Derivations, and Constraints. • Stage Variable - An intermediate processing variable that retains value during read and doesn’t pass the value into target column. • Derivation - Expression that specifies value to be passed on to the target column. • Constraints - Conditions that are either true or false that specifies flow of data with a link. Stage Properties Tab Stage variables Derivations constraints Order of execution:
  • 37. • Change Capture Stage: This is also one of the active processing stage which it can be used to capture the changes between two sources like After and Before. The source which is used as reference to capture the change is called after dataset. The source which we are looking for the change is called before dataset. The change code will be added in output dataset. So, by this change code will be recognizing delete, insert or update. Properties Tab
  • 38. • Row generator Stage: it produces set of data fitting specified meta data. It is useful where you want to test your job but have no real data available to process. It is having no input links and a single output link. Properties Tab
  • 39. • Column generator stage: This stage adds the columns to incoming data and generates mock data for these columns for each data row processed. It can have single input link and single output link. • Ex: if input file has some records and now we need to generate unique id for the input file by using column generator stage we can add unique id for the input file. Properties Tab
  • 40. • Head Stage: This stage helpful for testing .and debug the application with large datasets. This stage selects TOP N rows from the input dataset and copies the selected rows to an output datasets. It can have a single input link and single output link. Properties Tab
  • 41. • Tail Stage: This stage helpful for testing and debug the application with large datasets. This stage selects BOTTOM N rows from the input dataset and copies the selected rows to an output datasets. It can have a single input link and single output link. Properties Tab
  • 42. • Peek Stage: it can have a single input link and any number output link. It can be used to print the record column values to the job log view. Properties Tab
  • 43. • The Slowly Changing Dimension (SCD): stage is a processing stage that works within the context of a star schema database. The SCD stage has a single input link, a single output link, a dimension reference link, and a dimension update link. • The SCD stage reads source data on the input link, performs a dimension table lookup on the reference link, and writes data on the output link. The output link can pass data to another SCD stage, to a different type of processing stage, or to a fact table. The dimension update link is a separate output link that carries changes to the dimension. You can perform these steps in a single job or a series of jobs, depending on the number of dimensions in your database and your performance requirements.
  • 46. SCD type-2: • Adds a new row to a dimension table. • Each SCD stage processes a single dimension and performs lookups by using an equality matching technique. If the dimension is a database table, the stage reads the database to build a lookup table in memory. If a match is found, the SCD stage updates rows in the dimension table to reflect the changed data. If a match is not found, the stage creates a new row in the dimension table. All of the columns that are needed to create a new dimension row must be present in the source data
  • 48. • Ex : The source is having EMP table with 100 records. When I was run the job, the records was initially loaded into target insert stage, how it means, First compare two input datasets, in first time there is no change in the records. • So, the change capture stage gives the change code=1. The transformer stage transforms the records from source to target by generating sequence to the records by using stored procedure stage. • If any update is occurred at source, that update records will be stored to target side (TGT_UPDATE). How it will be store means, first two compare two input datasets, changes is occurred at source level then change capture stage gives the change code=3 , by using this change code, the transformer stage transform the records to join stage through the update link. • Join stage joins the updated records and target records by removing duplicate records using remove duplicate stage. The output of the join stage to connected to transform stage which was transforming update records to target update stage.