SlideShare a Scribd company logo
1 of 5
Download to read offline
A Compatrive Study of ETL Tools
Sana Yousuf
Department of Computer Science
Military College of Signals, National University of
Sciences & Technology
Islamabad, Pakistan
sn_ysf@yahoo.com
Sanam Shahla Rizvi
Department of Computer Science
Military College of Signals, National University of
Sciences & Technology
Islamabad, Pakistan
ssrizvi@mcs.edu.pk
Abstract—In many organizations valuable data is wasted
because it lies around in different formats and in various
resources. Data warehouses (DWs) are complex systems having
consolidated data with an objective to assist the knowledge
workers in decision making process. The key components of
DWs are the Extraction-Transformation-Loading (ETL)
processes. Since incorrect or misleading data may produce
wrong decisions. This necessitates the selection of appropriate
ETL Tools for a DW to improve data quality. The selection of
ETL tool is a complex and important issue in data
warehousing because it validates the quality of a data
warehouse. This paper first highlights the ETL process briefly
then discuses some of the ETL tools available along with a
general criterion used as measuring parameters for selecting
appropriate ETL tools. At the end an analysis of the tools
based on the generalized criteria is presented to give an insight
of which tool is better for which circumstance.
Keywords: Dataware houses, ETL tools, complex systems,
enterprise systems
I. INTRODUCTION
Data Warehouse is a large data repository that
consolidates various types of data transformed into a single
suitable format. Depending on specific business needs it can
be architectured differently. However in general data stored
in operational databases is transferred to a data ware house
pre processing platform also known as staging area, then
after processing into the data ware house and lastly is
transformed into sets of conformed data marts
A. ETL Process and Concepts
Extract, Transform and Load (ETL), is an important
component of the Data Warehousing Architecture. The
process includes extraction of data from various data
sources, transformation of extracted data according to
business requirements and loading of that data into the
dataware house.
Any programming language can be used to make an ETL
process however making it from bits and pieces is quite
complex. Various ETL tools are available in the market
easing an enterprise to select one based on its requirements
& needs. With the passage of time these tools have matured
and now provide much more than just Extraction,
transformation and loading of data. The improvements
include capabilities such as “data profiling, data quality
control, monitoring and cleansing, real-time and on-demand
data integration in a service oriented architecture, and
metadata management” [12]. Moreover ETL tools are now
customizable according to the functional requirements of an
enterprise data warehouse.
a) Extraction
Being the first step in the ETL process its focus is on
extracting data from different source systems. These sources
are named as source system because they could be internal,
external, structured or unstructured i.e. of any type. Thus
sources systems could be mainframe applications, flat files,
ERP applications, relational databases, non-relational
databases, CRM tools or even message queues. These
sources may have different formats of data i.e. different
internal representation making Extraction a difficult process.
So an extraction tool should be able to :
- Understand all different data storage formats
- Have a communicative ability among various
relational databases
- Read & understand different file formats used in an
organization.
- Extract only relevant data before bringing it into to
the DW.
b) Transformation
The transformation phase ensures the data consistency
and performs data cleansing before loading data in the data
warehouse. In order to transform the data properly, a number
of rules and business calculations are applied to the extracted
data so that different data formats are mapped into a single
format. Transformation can be integrated with extraction or
loading phase depending upon when it is performed.
c) Loading
After transforming and cleansing the extracted data, it is
loaded into fact and dimension tables of the data warehouse
to be used for various analytical purposes. It is done
regularly to avoid data stacks to get piled up. It can be
required in one of the two situations:
- Load the new data that is currently contained in the
operational database
- Load the updates corresponding to the changes
occurred in the operational database
“Reference [3] states that incremental loading is the
preferred approach to data warehouse refreshment because it
generally reduces the amount of data that has to be extracted,
transformed, and loaded by the ETL system. ETL jobs for
incremental loading require access to source data that has
been changed since the previous loading cycle. For this
purpose, so called Change Data Capture (CDC) mechanisms
at the sources can be exploited, if available. Additionally,
ETL jobs for incremental loading potentially require access
to the overall data content of the operational sources.”
The paper provides an insight to the background of ETL
tools in following section. Section III presents brief overview
of the various ETL tools. Section IV focuses on setting the
criteria to rank available tools. Section V on the other hand
presents a comparative analysis of various tools. Paper is
ended by a conclusion of the overall study in section VI.
II. BACKGROUND OF ETL TOOLS
An ETL tool provides a certain set of basic ETL
processing facilities, as explained in section I, to rank it as a
proper ETL tool. Since 2003 Passionned, a consultancy and
research firm, has been closely monitoring the market for
both ETL and data integration tools [4]. Earlier the surveys
conducted were based on the main market driving entities
also known as visionaries. Many organizations used to
assume that they had automatically made the right choice if
they purchased a tool from one of the market leaders.
However the trend changed over time and then organizations
started making ETL tools for according to their requirements
themselves.
Since the late nineties, all the major business intelligence
(BI) vendors had purchased or developed their own ETL
tools. BI tools had more reliable ETL processes and a well
designed method of keeping the data warehouse. BI provided
a better solution but it consumed 70 -80% of the costs
involved in a successful BI system.
Passionned in its ETL Tools survey 2009 described the
importance to evaluate and promote ETL tools because many
organizations still built their data warehouses by hand i.e.
writing complex PL/SQL or SQL and stored procedures. The
focus of such surveyors was that developer productivity
would be increased by a factor of 3-5 times if a proper ETL
tool was used. Thus if a proper guidance was available to
enterprises then choosing the right product would become
easier and less risking for he organization itself. As
explained by reference [5] construction of data ware houses
through ETL tools resulted in a better, stable and more
reliable data-ware house that allowed more aspects to be
checked and monitored in relation to each other. Companies
on their own official websites also present a comparison of
their offered product with other market competitors; Adeptia
[10], Microsoft SSIS and informatica [3] are such examples.
III. SOME FAMOUS ETL TOOLS
Some famous ETL tools available in market are as follows:
A. Pentaho Data Integration
Pentaho [12] is a commercial open-source Business
Intelligence suite along with a data integration product
named Kettle. Using the innovative meta-driven approach it
is fast having an easy to use GUI. Having started in 2001 it
has grown and today it has a strong community of 13,500
registered users. It also supports multi-format data and
allows data movement between many different databases and
files.
B. Talend Open Studio
Talend Open Studio (TOS) [10]is another tool with
support of data integration and is open source. Started in
2006, has a less community of followers but still has quite a
market share as 2 supporters are finance companies. Rather
than metadata driven it uses a code driven approach and has
a GUI for user interaction. The code generation property
allows generating executable code of Java and Perl that can
be run later on a server.
C. Informatica Power Center
Informatica Power Center (IPC) [3] is not an open source
software but is commercially a recommended data
integration suite and thus the market share leader in data
integration tools. Found in 1993, it has made its place in
market with consistency and leadership, today it has 2600
registered users out of which 100 are included in list of stock
exchange companies. The main focus of IPC is on data
integration with numerous capabilities e.g. enterprise size
architecture, data cleansing, data profiling, web servicing and
interoperability with current and legacy systems.
D. Inaplex Inaport
Inaplex [12] provides mid-market solutions focusing
customer relationship management for customers’ data
integration. Besides the customer relationship management it
also lays emphasis on providing simple solutions for data
integration and accountancy handling.
E. Oracle Warehouse Builder
The Oracle Warehouse Builder (OWB) [13] is “a
comprehensive tool for ETL, relational and dimensional
modeling, data quality, data auditing, and full lifecycle
management of data and metadata” [13]. It allows high
performance, security and scalability by having Oracle DB
as the metadata repository and transformation engine.
F. IBM Information Server
A product by IBM (IS Datastage) [10] & is well known
for its services. The capabilities of the tool include data
consolidation, synchronization, and distribution across
disparate databases, automatic data profiling & analysis in
terms of content and structure, data quality enhancement,
transformation and delivery to and from complex sources i.e.
capability to get data from any sources format and deliver it
to any targets, within or outside the enterprise, at the right
time.
It also allows integration and information access for
diverse data and content regardless of the placement of data.
With the data replication services customer information
management can be done quickly.
G. Microsoft SQL ServerIntegration Services
Microsoft SQL Server Integration Services (MS SSIS)
[14] allows run time data transfer and management.
Designed for enterprise wide application support, it provides
a platform for performing ETL functions and creating and
controlling data packages. It allows formation of script
application using .net platform support, increased scalability
with thread pooling, and a more advanced import and export
wizard. It also allows customization of the package suiting
specific organization needs, usage of digital sign for security
and supports service oriented architecture.
IV. ETL TOOL FEATURES
With the available span of functionality and quite a
number of ETL tool vendors it is quite difficult to rank all
the variety of tools as every tool has some special features
too. Some generic behaivour has been identified by [5] on
the basis of which following comparison and graph making
is done.
Following general aspects can be kept in mind when
evaluating an ETL tool
A. Architecture
For evaluating any tool with respect to architecture
aspects such as support for parallel processing, symmetric
multiprocessing, massive multi processing, clustering, load
balancing and feasibility for grid computing should be
considered. Also support for multi user management of ETL
processes running on multiple machines and support for
common meta-model i.e. allowing for exchange of meta data
with self brand and other brands is to be considered too.
B. Functionality
Two main aspects relating to functionality of an ETL tool
are important i.e. the metadata support and the overall
functionality provided by the tool.
The main functionality focuses of whether the tool is data
cleansing oriented or data transformation oriented, or it
performs both equally. Thus one gets a clear picture of what
tool to select depending on the nature of data that shall be put
into the tool. Also the support for direct connection to data
source for input is also an important aspect of functionality.
On the other hand support of metadata is a key aspect
too. An ETL is also responsible of using metadata to map
source data to destination. Thus choosing a tool that
conforms to organizations metadata strategy is very
important.
C. Usability
The usability is one of the important factors of any tool.
Thus points to consider are that the tool should be easy to
use, understand and fast to get used to. In this regard aspects
of concern are that tool should have a well balanced interface
and must support the typical tasks sequence as of any ETL
usage.
D. Reusability
The reusability depends on that the components of a data
ware house architecture, which is constructed using the ETL
tool, must be reusable and can handle parameters. The tools
should be capable of dividing the process into small building
blocks, allow user to make user defined functions and
allowing these functions to be used in the process flow.
E. Connectivity
The main aspects to consider include the native
connections the tool supports, the packages its can read
metadata from, the type of message queuing products the
tool can connect to, capability to graphically join tables,
support for changed data capture principle, transformation
matching and address cleansing ability as well as options for
data profiling uniqueness and distribution etc.
F. Interoperability
Last but not he least the tool should be capable to run on
a number of platforms and also on the different versions of a
product.
V. ANALYSIS OF ETL TOOLS
With all the aspects, as discussed in section IV, in mind
an analysis of the services provided by the tools is discussed
hereafter. Thus in choosing any tools its respective aspects
should be considered. Following graph based analysis
provides support for the decision making. For this analysis
various websites, vendor’s white papers, web-blogs,
comparisons and previous surveys were consulted and thus
based on the basic set of features discussed in section IV the
analysis was conducted.
Each of the above mentioned ETL tools, as discussed in
section III, is graded on the basis of points according to the
level of services supported while the vendors are depicted by
the acronyms in graphs instead of full names.
A. Architectural Aspects
Based on the support of enterprise architecture,
clustering, data separation into groups, Web based
application interface support & cloud computing deployment
support following graph depicts the current services
supported by tools.
Thus IPC and OWB are nice in architectural support with
SSIS coming up right behind.
B. ETL Functionality
Depending upon completeness of tools in terms of
functionality points have been given. Thus support for data
cleansing, transformation, support for integration services
and common metadata model support are the main aspects
considered. The graph is drawn by adding up the points
granted to each tool depending upon the support it provided
i.e. one point for each aspect and then adding up those points
which fall into one category. Same case was done for both
trends i.e. basic functionalities in 2007 and improvements till
2010.
Architectural Aspects
0
5
10
15
20
25
30
35
IBM IS I PC Talend
OS
OWB MS SSIS BO SAP SAS DIS Others
Web-based UI Clustering and Job Distribution
Enalbes SOA Deploy in Cloud Option
Figure 1. Architectural Support
ETL Functionality Provided
0
5
10
15
20
25
30
35
40
45
50
IBM IS I PC Talend
OS
OWB MS
SSIS
BO
SAP
SAS
DIS
Others
Vendors
Points
2007 2010 improvement
Figure 2. Functionality
C. Usability
This graph covers all the points graded to a tool on the
basis of an easy to use, a well designed and a balanced
interface. What you see is what you get (WYSIWYG) and
task compatibility also is other basis of grade. Each point
graded gets accumulated by the existence of a subset of
services necessary of ease of use and understanding. Also
ease of training new users to become used to the interface is
a part of criterion.
D. Reusability
The graph, as follows, depicts a comparison and point
grading on basis of reusability factor supported, capability of
data stream splitting, automatic documentation and support
for definition of user defined functions and using them in the
process flow.
Ease Of Use
0
1
2
3
4
5
6
7
8
IBMIS IPC Talend
OS
OWB MS SSIS BO SAP SAS DIS Others
Vendors
Points
Original 2007 Improvement 2010
Figure 3. Usability
Reusability
0
5
10
15
20
25
30
35
40
IBM IS I PC Talend
OS
OWB MS
SSIS
BO
SAP
SAS
DIS
Others
Reusable service Repository Split Data Streams
Data Partitioning Automatic Documentation
Figure 4. Reusability
E. Connectivity
Connectivity as the name indicates is calculated by
aggregating the points granted to a tool on the following
aspects. These include total number of all the sources which
could be read in without any additional middleware, the
enterprise applications supported by the tool, the platforms it
can run on and last but not the least the support for
messaging (i.e. real time data handling).
F. Interoperability
The support of various platforms in detail is provided in
following graph. Here all Windows & Linux versions are
considered as one while UNIX versions are catered
separately.
Connectivity
0
10
20
30
40
50
60
70
80
90
100
IBM IS I PC Talend
OS
OWB MS
SSIS
BO
SAP
SAS
DIS
Others
Vendors
Points
Platfroms Data Sources Packages Messages
Figure 5. Connectivity
Interoperability
0
10
20
30
40
50
60
70
80
90
100
IBM IS I PC Talend
OS
OWB MS SSIS BO SAP SAS DIS Others
Windows Linux Sun Solaris
HP-UX IBM A/X IBM iSeries OS400
IBM zSeries MVS HP Tru64 Open VMS
Figure 6. Interoperability
From all the analysis conducted it is still hard to
generalize which tool is the best. Though Infomatica proves
to be better in quite many features but MS SSIS and OWB
have improved well overtime and now are in pace with the
high contenders too. Overall it can bee seen when
considering pure ETL tools then IPC can be ranked as still
the market leader with IBM IS coming second along side
Talend OS. However when it comes to DB integrated Tools
then OWB and SSIS follow IPC directly. Thus one should be
careful in selecting the tool as it may not be the best for
organization just by the name of vendor. The capabilities of
the tool should be reviewed before selection.
VI. CONCLUSION
Important data in most of the organizations is under
utilized just because it exists around in different formats and
in various resources. Data warehouses (DWs) are complex
systems having consolidated data with a main objective to
assist the knowledge workers in decision making process.
The key components of DWs are the Extraction-
Transformation-Loading (ETL) processes. The goal of this
paper is to elaborate ETL process, its importance relevant to
the data warehouses and provide a comparison based on
some generalized criteria to find suitability of a tool for a
certain category of consumers. The paper provides a brief
overview of the available ETL tools in market, specifies
some key points that can be made for generalizing
capabilities provided by a tool and using graph based
analysis on a grade point scale to grade the specific tools
selected. This all provides a comparison of the available
tools in terms of the features they provide helping an
organization choose which tool will best suit its needs.
REFERENCES
[1] T.Y. Wah, H. Peng, and C.S. Hok, “Building Data Warehouse,” Proc.
24th South East Asia Regional Computer Conference, November 18-
19, 2007, Bangkok, Thailand
[2] Tho, M. Njuyen, Tjoa, A. Min; Zero-Latency Data Warehousing for
Heterogeneous Data Sources and Continuous Data Streams, Institute
of Software Technology and Interactive Systems Favoriteristr. 9-
11/188, 2003
[3] T. Jaorg, S. Dessloch, Near Real-Time Data Warehousing Using
State-of-the-Art ETL Tools, University of Kaiserslautern, 67653
Kaiserslautern, Germany, 2009.
[4] Passionned, 'The BI Tool survey report”, 2008.
[5] Passionned, “ETL Tools survey report”, 2009.
[6] J. Levin, “ETL Tools Comparison”, March 2008.
[7] Dr. R. Chillar; B. Kochar; Extraction Transformation Loading –A
Road to Data warehouse, 2nd National Conference Mathematical
Techniques: Emerging Paradigms for Electronics and IT Industries
[8] Guide to Data Warehousing and Business Intelligence, available at
http://data-warehouses.net/architecture/etlprocess.html.
[9] Pervasive Systems, Extraordinarily Flexible ETL
Platform,http://www.pervasiveintegration.com/scenarios/Pages/etl_to
ols_data_aggregation.aspx.
[10] Adeptia incorporation, ETL Vendors Comparison, available at
http://www.adeptia.com/products/etl_vendor_comparison.htm
l.
[11] Guide to Data ware housing and Business Intelligence, Architectural
Overview, available at http://data-
warehouses.net/architecture/overview.html.
[12] ETL tools Survey, available at http://www.etltool.com/what-is-
etl.htm.
[13] Oracle Ware house builder 11g, A technical overview, at
http://www.oracle.com/technology/products/warehouse/index.html.
[14] ETL data ware house concepts, available at http://etl-
information.blogspot.com/2007_07_01_archive.htm

More Related Content

Similar to Comparative Study of Popular ETL Tools

Data junction tool
Data junction toolData junction tool
Data junction toolSara shall
 
Informatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.pptInformatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.pptCarlCj1
 
Implementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseImplementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseIJARIIT
 
Information Retrieval And Evaluating Its Usefulness
Information Retrieval And Evaluating Its UsefulnessInformation Retrieval And Evaluating Its Usefulness
Information Retrieval And Evaluating Its UsefulnessDiane Allen
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testingraianup
 
What Is ETL | Process of ETL 2023 | GrapesTech Solutions
What Is ETL | Process of ETL 2023 | GrapesTech SolutionsWhat Is ETL | Process of ETL 2023 | GrapesTech Solutions
What Is ETL | Process of ETL 2023 | GrapesTech SolutionsGrapesTech Solutions
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Managing Data Integration Initiatives
Managing Data Integration InitiativesManaging Data Integration Initiatives
Managing Data Integration InitiativesAllinConsulting
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapersKai Zhao
 
Should ETL Become Obsolete
Should ETL Become ObsoleteShould ETL Become Obsolete
Should ETL Become ObsoleteJerald Burget
 
Business Intelligence Module 3
Business Intelligence Module 3Business Intelligence Module 3
Business Intelligence Module 3Home
 
DBT PU BI Lab Manual for ETL Exercise.pdf
DBT PU BI Lab Manual for ETL Exercise.pdfDBT PU BI Lab Manual for ETL Exercise.pdf
DBT PU BI Lab Manual for ETL Exercise.pdfJanakiramanS13
 

Similar to Comparative Study of Popular ETL Tools (20)

Data junction tool
Data junction toolData junction tool
Data junction tool
 
Informatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.pptInformatica_ Basics_Demo_9.6.ppt
Informatica_ Basics_Demo_9.6.ppt
 
Implementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseImplementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware house
 
Data warehouse presentation
Data warehouse presentationData warehouse presentation
Data warehouse presentation
 
Information Retrieval And Evaluating Its Usefulness
Information Retrieval And Evaluating Its UsefulnessInformation Retrieval And Evaluating Its Usefulness
Information Retrieval And Evaluating Its Usefulness
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testing
 
What Is ETL | Process of ETL 2023 | GrapesTech Solutions
What Is ETL | Process of ETL 2023 | GrapesTech SolutionsWhat Is ETL | Process of ETL 2023 | GrapesTech Solutions
What Is ETL | Process of ETL 2023 | GrapesTech Solutions
 
DW 101
DW 101DW 101
DW 101
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Managing Data Integration Initiatives
Managing Data Integration InitiativesManaging Data Integration Initiatives
Managing Data Integration Initiatives
 
Gowthami_Resume
Gowthami_ResumeGowthami_Resume
Gowthami_Resume
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Abdul ETL Resume
Abdul ETL ResumeAbdul ETL Resume
Abdul ETL Resume
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 
Should ETL Become Obsolete
Should ETL Become ObsoleteShould ETL Become Obsolete
Should ETL Become Obsolete
 
ETL
ETLETL
ETL
 
Business Intelligence Module 3
Business Intelligence Module 3Business Intelligence Module 3
Business Intelligence Module 3
 
DBT PU BI Lab Manual for ETL Exercise.pdf
DBT PU BI Lab Manual for ETL Exercise.pdfDBT PU BI Lab Manual for ETL Exercise.pdf
DBT PU BI Lab Manual for ETL Exercise.pdf
 

More from Rhonda Cetnar

PPT - Custom Essay Help Service PowerPoint Presentation, Free Downl
PPT - Custom Essay Help Service PowerPoint Presentation, Free DownlPPT - Custom Essay Help Service PowerPoint Presentation, Free Downl
PPT - Custom Essay Help Service PowerPoint Presentation, Free DownlRhonda Cetnar
 
College Paper Writing Tips StudentS Guide - EduBirdi
College Paper Writing Tips StudentS Guide - EduBirdiCollege Paper Writing Tips StudentS Guide - EduBirdi
College Paper Writing Tips StudentS Guide - EduBirdiRhonda Cetnar
 
Best Essay Websites For Students To Write A Better Essay
Best Essay Websites For Students To Write A Better EssayBest Essay Websites For Students To Write A Better Essay
Best Essay Websites For Students To Write A Better EssayRhonda Cetnar
 
Purchase Psychology Papers, Best Place To Buy
Purchase Psychology Papers, Best Place To BuyPurchase Psychology Papers, Best Place To Buy
Purchase Psychology Papers, Best Place To BuyRhonda Cetnar
 
Analytic Rubric Sample For An Argumentative Essay Download S
Analytic Rubric Sample For An Argumentative Essay Download SAnalytic Rubric Sample For An Argumentative Essay Download S
Analytic Rubric Sample For An Argumentative Essay Download SRhonda Cetnar
 
Literary Analysis Essay Definition, O
Literary Analysis Essay Definition, OLiterary Analysis Essay Definition, O
Literary Analysis Essay Definition, ORhonda Cetnar
 
Speech Analysis Com101 - COM 101 Speech Analysi
Speech Analysis Com101 - COM 101 Speech AnalysiSpeech Analysis Com101 - COM 101 Speech Analysi
Speech Analysis Com101 - COM 101 Speech AnalysiRhonda Cetnar
 
Fortune Teller Ideas For Kids - Meyasity
Fortune Teller Ideas For Kids - MeyasityFortune Teller Ideas For Kids - Meyasity
Fortune Teller Ideas For Kids - MeyasityRhonda Cetnar
 
Printable Fairy Tale Writing Paper
Printable Fairy Tale Writing PaperPrintable Fairy Tale Writing Paper
Printable Fairy Tale Writing PaperRhonda Cetnar
 
How Can I Write About Myself. Write My Essay Or
How Can I Write About Myself. Write My Essay OrHow Can I Write About Myself. Write My Essay Or
How Can I Write About Myself. Write My Essay OrRhonda Cetnar
 
Buy Custom Pre Written Essays Online Expert Writin
Buy Custom Pre Written Essays Online Expert WritinBuy Custom Pre Written Essays Online Expert Writin
Buy Custom Pre Written Essays Online Expert WritinRhonda Cetnar
 
How To Write An Essay About My Be
How To Write An Essay About My BeHow To Write An Essay About My Be
How To Write An Essay About My BeRhonda Cetnar
 
Steps In Doing Research Paper , Basic Steps In The Research Pr
Steps In Doing Research Paper , Basic Steps In The Research PrSteps In Doing Research Paper , Basic Steps In The Research Pr
Steps In Doing Research Paper , Basic Steps In The Research PrRhonda Cetnar
 
Abilitations Hi-Write Beginner Paper Level 1 Pack Of 100
Abilitations Hi-Write Beginner Paper Level 1 Pack Of 100Abilitations Hi-Write Beginner Paper Level 1 Pack Of 100
Abilitations Hi-Write Beginner Paper Level 1 Pack Of 100Rhonda Cetnar
 
Printable Notebook Papers Activity Shelter - Cute Printa
Printable Notebook Papers Activity Shelter - Cute PrintaPrintable Notebook Papers Activity Shelter - Cute Printa
Printable Notebook Papers Activity Shelter - Cute PrintaRhonda Cetnar
 
How To Write Synthesis Essay Synthesis Essay Examples Synthesis
How To Write Synthesis Essay Synthesis Essay Examples SynthesisHow To Write Synthesis Essay Synthesis Essay Examples Synthesis
How To Write Synthesis Essay Synthesis Essay Examples SynthesisRhonda Cetnar
 
Blank Chinese Pinyin Tian Zi Ge Writing Practice Paper
Blank Chinese Pinyin Tian Zi Ge Writing Practice PaperBlank Chinese Pinyin Tian Zi Ge Writing Practice Paper
Blank Chinese Pinyin Tian Zi Ge Writing Practice PaperRhonda Cetnar
 
Myself Writer Essay How To Write An Essay About Your
Myself Writer Essay How To Write An Essay About YourMyself Writer Essay How To Write An Essay About Your
Myself Writer Essay How To Write An Essay About YourRhonda Cetnar
 
English Grammar And Essay Writing, Workbook 2 (Colle
English Grammar And Essay Writing, Workbook 2 (ColleEnglish Grammar And Essay Writing, Workbook 2 (Colle
English Grammar And Essay Writing, Workbook 2 (ColleRhonda Cetnar
 
Basic 3 Paragraph Essay - Write A Three Paragraph Es
Basic 3 Paragraph Essay - Write A Three Paragraph EsBasic 3 Paragraph Essay - Write A Three Paragraph Es
Basic 3 Paragraph Essay - Write A Three Paragraph EsRhonda Cetnar
 

More from Rhonda Cetnar (20)

PPT - Custom Essay Help Service PowerPoint Presentation, Free Downl
PPT - Custom Essay Help Service PowerPoint Presentation, Free DownlPPT - Custom Essay Help Service PowerPoint Presentation, Free Downl
PPT - Custom Essay Help Service PowerPoint Presentation, Free Downl
 
College Paper Writing Tips StudentS Guide - EduBirdi
College Paper Writing Tips StudentS Guide - EduBirdiCollege Paper Writing Tips StudentS Guide - EduBirdi
College Paper Writing Tips StudentS Guide - EduBirdi
 
Best Essay Websites For Students To Write A Better Essay
Best Essay Websites For Students To Write A Better EssayBest Essay Websites For Students To Write A Better Essay
Best Essay Websites For Students To Write A Better Essay
 
Purchase Psychology Papers, Best Place To Buy
Purchase Psychology Papers, Best Place To BuyPurchase Psychology Papers, Best Place To Buy
Purchase Psychology Papers, Best Place To Buy
 
Analytic Rubric Sample For An Argumentative Essay Download S
Analytic Rubric Sample For An Argumentative Essay Download SAnalytic Rubric Sample For An Argumentative Essay Download S
Analytic Rubric Sample For An Argumentative Essay Download S
 
Literary Analysis Essay Definition, O
Literary Analysis Essay Definition, OLiterary Analysis Essay Definition, O
Literary Analysis Essay Definition, O
 
Speech Analysis Com101 - COM 101 Speech Analysi
Speech Analysis Com101 - COM 101 Speech AnalysiSpeech Analysis Com101 - COM 101 Speech Analysi
Speech Analysis Com101 - COM 101 Speech Analysi
 
Fortune Teller Ideas For Kids - Meyasity
Fortune Teller Ideas For Kids - MeyasityFortune Teller Ideas For Kids - Meyasity
Fortune Teller Ideas For Kids - Meyasity
 
Printable Fairy Tale Writing Paper
Printable Fairy Tale Writing PaperPrintable Fairy Tale Writing Paper
Printable Fairy Tale Writing Paper
 
How Can I Write About Myself. Write My Essay Or
How Can I Write About Myself. Write My Essay OrHow Can I Write About Myself. Write My Essay Or
How Can I Write About Myself. Write My Essay Or
 
Buy Custom Pre Written Essays Online Expert Writin
Buy Custom Pre Written Essays Online Expert WritinBuy Custom Pre Written Essays Online Expert Writin
Buy Custom Pre Written Essays Online Expert Writin
 
How To Write An Essay About My Be
How To Write An Essay About My BeHow To Write An Essay About My Be
How To Write An Essay About My Be
 
Steps In Doing Research Paper , Basic Steps In The Research Pr
Steps In Doing Research Paper , Basic Steps In The Research PrSteps In Doing Research Paper , Basic Steps In The Research Pr
Steps In Doing Research Paper , Basic Steps In The Research Pr
 
Abilitations Hi-Write Beginner Paper Level 1 Pack Of 100
Abilitations Hi-Write Beginner Paper Level 1 Pack Of 100Abilitations Hi-Write Beginner Paper Level 1 Pack Of 100
Abilitations Hi-Write Beginner Paper Level 1 Pack Of 100
 
Printable Notebook Papers Activity Shelter - Cute Printa
Printable Notebook Papers Activity Shelter - Cute PrintaPrintable Notebook Papers Activity Shelter - Cute Printa
Printable Notebook Papers Activity Shelter - Cute Printa
 
How To Write Synthesis Essay Synthesis Essay Examples Synthesis
How To Write Synthesis Essay Synthesis Essay Examples SynthesisHow To Write Synthesis Essay Synthesis Essay Examples Synthesis
How To Write Synthesis Essay Synthesis Essay Examples Synthesis
 
Blank Chinese Pinyin Tian Zi Ge Writing Practice Paper
Blank Chinese Pinyin Tian Zi Ge Writing Practice PaperBlank Chinese Pinyin Tian Zi Ge Writing Practice Paper
Blank Chinese Pinyin Tian Zi Ge Writing Practice Paper
 
Myself Writer Essay How To Write An Essay About Your
Myself Writer Essay How To Write An Essay About YourMyself Writer Essay How To Write An Essay About Your
Myself Writer Essay How To Write An Essay About Your
 
English Grammar And Essay Writing, Workbook 2 (Colle
English Grammar And Essay Writing, Workbook 2 (ColleEnglish Grammar And Essay Writing, Workbook 2 (Colle
English Grammar And Essay Writing, Workbook 2 (Colle
 
Basic 3 Paragraph Essay - Write A Three Paragraph Es
Basic 3 Paragraph Essay - Write A Three Paragraph EsBasic 3 Paragraph Essay - Write A Three Paragraph Es
Basic 3 Paragraph Essay - Write A Three Paragraph Es
 

Recently uploaded

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 

Recently uploaded (20)

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 

Comparative Study of Popular ETL Tools

  • 1. A Compatrive Study of ETL Tools Sana Yousuf Department of Computer Science Military College of Signals, National University of Sciences & Technology Islamabad, Pakistan sn_ysf@yahoo.com Sanam Shahla Rizvi Department of Computer Science Military College of Signals, National University of Sciences & Technology Islamabad, Pakistan ssrizvi@mcs.edu.pk Abstract—In many organizations valuable data is wasted because it lies around in different formats and in various resources. Data warehouses (DWs) are complex systems having consolidated data with an objective to assist the knowledge workers in decision making process. The key components of DWs are the Extraction-Transformation-Loading (ETL) processes. Since incorrect or misleading data may produce wrong decisions. This necessitates the selection of appropriate ETL Tools for a DW to improve data quality. The selection of ETL tool is a complex and important issue in data warehousing because it validates the quality of a data warehouse. This paper first highlights the ETL process briefly then discuses some of the ETL tools available along with a general criterion used as measuring parameters for selecting appropriate ETL tools. At the end an analysis of the tools based on the generalized criteria is presented to give an insight of which tool is better for which circumstance. Keywords: Dataware houses, ETL tools, complex systems, enterprise systems I. INTRODUCTION Data Warehouse is a large data repository that consolidates various types of data transformed into a single suitable format. Depending on specific business needs it can be architectured differently. However in general data stored in operational databases is transferred to a data ware house pre processing platform also known as staging area, then after processing into the data ware house and lastly is transformed into sets of conformed data marts A. ETL Process and Concepts Extract, Transform and Load (ETL), is an important component of the Data Warehousing Architecture. The process includes extraction of data from various data sources, transformation of extracted data according to business requirements and loading of that data into the dataware house. Any programming language can be used to make an ETL process however making it from bits and pieces is quite complex. Various ETL tools are available in the market easing an enterprise to select one based on its requirements & needs. With the passage of time these tools have matured and now provide much more than just Extraction, transformation and loading of data. The improvements include capabilities such as “data profiling, data quality control, monitoring and cleansing, real-time and on-demand data integration in a service oriented architecture, and metadata management” [12]. Moreover ETL tools are now customizable according to the functional requirements of an enterprise data warehouse. a) Extraction Being the first step in the ETL process its focus is on extracting data from different source systems. These sources are named as source system because they could be internal, external, structured or unstructured i.e. of any type. Thus sources systems could be mainframe applications, flat files, ERP applications, relational databases, non-relational databases, CRM tools or even message queues. These sources may have different formats of data i.e. different internal representation making Extraction a difficult process. So an extraction tool should be able to : - Understand all different data storage formats - Have a communicative ability among various relational databases - Read & understand different file formats used in an organization. - Extract only relevant data before bringing it into to the DW. b) Transformation The transformation phase ensures the data consistency and performs data cleansing before loading data in the data warehouse. In order to transform the data properly, a number of rules and business calculations are applied to the extracted data so that different data formats are mapped into a single format. Transformation can be integrated with extraction or loading phase depending upon when it is performed. c) Loading After transforming and cleansing the extracted data, it is loaded into fact and dimension tables of the data warehouse to be used for various analytical purposes. It is done regularly to avoid data stacks to get piled up. It can be required in one of the two situations: - Load the new data that is currently contained in the operational database
  • 2. - Load the updates corresponding to the changes occurred in the operational database “Reference [3] states that incremental loading is the preferred approach to data warehouse refreshment because it generally reduces the amount of data that has to be extracted, transformed, and loaded by the ETL system. ETL jobs for incremental loading require access to source data that has been changed since the previous loading cycle. For this purpose, so called Change Data Capture (CDC) mechanisms at the sources can be exploited, if available. Additionally, ETL jobs for incremental loading potentially require access to the overall data content of the operational sources.” The paper provides an insight to the background of ETL tools in following section. Section III presents brief overview of the various ETL tools. Section IV focuses on setting the criteria to rank available tools. Section V on the other hand presents a comparative analysis of various tools. Paper is ended by a conclusion of the overall study in section VI. II. BACKGROUND OF ETL TOOLS An ETL tool provides a certain set of basic ETL processing facilities, as explained in section I, to rank it as a proper ETL tool. Since 2003 Passionned, a consultancy and research firm, has been closely monitoring the market for both ETL and data integration tools [4]. Earlier the surveys conducted were based on the main market driving entities also known as visionaries. Many organizations used to assume that they had automatically made the right choice if they purchased a tool from one of the market leaders. However the trend changed over time and then organizations started making ETL tools for according to their requirements themselves. Since the late nineties, all the major business intelligence (BI) vendors had purchased or developed their own ETL tools. BI tools had more reliable ETL processes and a well designed method of keeping the data warehouse. BI provided a better solution but it consumed 70 -80% of the costs involved in a successful BI system. Passionned in its ETL Tools survey 2009 described the importance to evaluate and promote ETL tools because many organizations still built their data warehouses by hand i.e. writing complex PL/SQL or SQL and stored procedures. The focus of such surveyors was that developer productivity would be increased by a factor of 3-5 times if a proper ETL tool was used. Thus if a proper guidance was available to enterprises then choosing the right product would become easier and less risking for he organization itself. As explained by reference [5] construction of data ware houses through ETL tools resulted in a better, stable and more reliable data-ware house that allowed more aspects to be checked and monitored in relation to each other. Companies on their own official websites also present a comparison of their offered product with other market competitors; Adeptia [10], Microsoft SSIS and informatica [3] are such examples. III. SOME FAMOUS ETL TOOLS Some famous ETL tools available in market are as follows: A. Pentaho Data Integration Pentaho [12] is a commercial open-source Business Intelligence suite along with a data integration product named Kettle. Using the innovative meta-driven approach it is fast having an easy to use GUI. Having started in 2001 it has grown and today it has a strong community of 13,500 registered users. It also supports multi-format data and allows data movement between many different databases and files. B. Talend Open Studio Talend Open Studio (TOS) [10]is another tool with support of data integration and is open source. Started in 2006, has a less community of followers but still has quite a market share as 2 supporters are finance companies. Rather than metadata driven it uses a code driven approach and has a GUI for user interaction. The code generation property allows generating executable code of Java and Perl that can be run later on a server. C. Informatica Power Center Informatica Power Center (IPC) [3] is not an open source software but is commercially a recommended data integration suite and thus the market share leader in data integration tools. Found in 1993, it has made its place in market with consistency and leadership, today it has 2600 registered users out of which 100 are included in list of stock exchange companies. The main focus of IPC is on data integration with numerous capabilities e.g. enterprise size architecture, data cleansing, data profiling, web servicing and interoperability with current and legacy systems. D. Inaplex Inaport Inaplex [12] provides mid-market solutions focusing customer relationship management for customers’ data integration. Besides the customer relationship management it also lays emphasis on providing simple solutions for data integration and accountancy handling. E. Oracle Warehouse Builder The Oracle Warehouse Builder (OWB) [13] is “a comprehensive tool for ETL, relational and dimensional modeling, data quality, data auditing, and full lifecycle management of data and metadata” [13]. It allows high performance, security and scalability by having Oracle DB as the metadata repository and transformation engine. F. IBM Information Server A product by IBM (IS Datastage) [10] & is well known for its services. The capabilities of the tool include data consolidation, synchronization, and distribution across disparate databases, automatic data profiling & analysis in terms of content and structure, data quality enhancement, transformation and delivery to and from complex sources i.e. capability to get data from any sources format and deliver it to any targets, within or outside the enterprise, at the right time. It also allows integration and information access for diverse data and content regardless of the placement of data.
  • 3. With the data replication services customer information management can be done quickly. G. Microsoft SQL ServerIntegration Services Microsoft SQL Server Integration Services (MS SSIS) [14] allows run time data transfer and management. Designed for enterprise wide application support, it provides a platform for performing ETL functions and creating and controlling data packages. It allows formation of script application using .net platform support, increased scalability with thread pooling, and a more advanced import and export wizard. It also allows customization of the package suiting specific organization needs, usage of digital sign for security and supports service oriented architecture. IV. ETL TOOL FEATURES With the available span of functionality and quite a number of ETL tool vendors it is quite difficult to rank all the variety of tools as every tool has some special features too. Some generic behaivour has been identified by [5] on the basis of which following comparison and graph making is done. Following general aspects can be kept in mind when evaluating an ETL tool A. Architecture For evaluating any tool with respect to architecture aspects such as support for parallel processing, symmetric multiprocessing, massive multi processing, clustering, load balancing and feasibility for grid computing should be considered. Also support for multi user management of ETL processes running on multiple machines and support for common meta-model i.e. allowing for exchange of meta data with self brand and other brands is to be considered too. B. Functionality Two main aspects relating to functionality of an ETL tool are important i.e. the metadata support and the overall functionality provided by the tool. The main functionality focuses of whether the tool is data cleansing oriented or data transformation oriented, or it performs both equally. Thus one gets a clear picture of what tool to select depending on the nature of data that shall be put into the tool. Also the support for direct connection to data source for input is also an important aspect of functionality. On the other hand support of metadata is a key aspect too. An ETL is also responsible of using metadata to map source data to destination. Thus choosing a tool that conforms to organizations metadata strategy is very important. C. Usability The usability is one of the important factors of any tool. Thus points to consider are that the tool should be easy to use, understand and fast to get used to. In this regard aspects of concern are that tool should have a well balanced interface and must support the typical tasks sequence as of any ETL usage. D. Reusability The reusability depends on that the components of a data ware house architecture, which is constructed using the ETL tool, must be reusable and can handle parameters. The tools should be capable of dividing the process into small building blocks, allow user to make user defined functions and allowing these functions to be used in the process flow. E. Connectivity The main aspects to consider include the native connections the tool supports, the packages its can read metadata from, the type of message queuing products the tool can connect to, capability to graphically join tables, support for changed data capture principle, transformation matching and address cleansing ability as well as options for data profiling uniqueness and distribution etc. F. Interoperability Last but not he least the tool should be capable to run on a number of platforms and also on the different versions of a product. V. ANALYSIS OF ETL TOOLS With all the aspects, as discussed in section IV, in mind an analysis of the services provided by the tools is discussed hereafter. Thus in choosing any tools its respective aspects should be considered. Following graph based analysis provides support for the decision making. For this analysis various websites, vendor’s white papers, web-blogs, comparisons and previous surveys were consulted and thus based on the basic set of features discussed in section IV the analysis was conducted. Each of the above mentioned ETL tools, as discussed in section III, is graded on the basis of points according to the level of services supported while the vendors are depicted by the acronyms in graphs instead of full names. A. Architectural Aspects Based on the support of enterprise architecture, clustering, data separation into groups, Web based application interface support & cloud computing deployment support following graph depicts the current services supported by tools. Thus IPC and OWB are nice in architectural support with SSIS coming up right behind. B. ETL Functionality Depending upon completeness of tools in terms of functionality points have been given. Thus support for data cleansing, transformation, support for integration services and common metadata model support are the main aspects considered. The graph is drawn by adding up the points granted to each tool depending upon the support it provided i.e. one point for each aspect and then adding up those points which fall into one category. Same case was done for both trends i.e. basic functionalities in 2007 and improvements till 2010.
  • 4. Architectural Aspects 0 5 10 15 20 25 30 35 IBM IS I PC Talend OS OWB MS SSIS BO SAP SAS DIS Others Web-based UI Clustering and Job Distribution Enalbes SOA Deploy in Cloud Option Figure 1. Architectural Support ETL Functionality Provided 0 5 10 15 20 25 30 35 40 45 50 IBM IS I PC Talend OS OWB MS SSIS BO SAP SAS DIS Others Vendors Points 2007 2010 improvement Figure 2. Functionality C. Usability This graph covers all the points graded to a tool on the basis of an easy to use, a well designed and a balanced interface. What you see is what you get (WYSIWYG) and task compatibility also is other basis of grade. Each point graded gets accumulated by the existence of a subset of services necessary of ease of use and understanding. Also ease of training new users to become used to the interface is a part of criterion. D. Reusability The graph, as follows, depicts a comparison and point grading on basis of reusability factor supported, capability of data stream splitting, automatic documentation and support for definition of user defined functions and using them in the process flow. Ease Of Use 0 1 2 3 4 5 6 7 8 IBMIS IPC Talend OS OWB MS SSIS BO SAP SAS DIS Others Vendors Points Original 2007 Improvement 2010 Figure 3. Usability Reusability 0 5 10 15 20 25 30 35 40 IBM IS I PC Talend OS OWB MS SSIS BO SAP SAS DIS Others Reusable service Repository Split Data Streams Data Partitioning Automatic Documentation Figure 4. Reusability E. Connectivity Connectivity as the name indicates is calculated by aggregating the points granted to a tool on the following aspects. These include total number of all the sources which could be read in without any additional middleware, the enterprise applications supported by the tool, the platforms it can run on and last but not the least the support for messaging (i.e. real time data handling). F. Interoperability The support of various platforms in detail is provided in following graph. Here all Windows & Linux versions are considered as one while UNIX versions are catered separately.
  • 5. Connectivity 0 10 20 30 40 50 60 70 80 90 100 IBM IS I PC Talend OS OWB MS SSIS BO SAP SAS DIS Others Vendors Points Platfroms Data Sources Packages Messages Figure 5. Connectivity Interoperability 0 10 20 30 40 50 60 70 80 90 100 IBM IS I PC Talend OS OWB MS SSIS BO SAP SAS DIS Others Windows Linux Sun Solaris HP-UX IBM A/X IBM iSeries OS400 IBM zSeries MVS HP Tru64 Open VMS Figure 6. Interoperability From all the analysis conducted it is still hard to generalize which tool is the best. Though Infomatica proves to be better in quite many features but MS SSIS and OWB have improved well overtime and now are in pace with the high contenders too. Overall it can bee seen when considering pure ETL tools then IPC can be ranked as still the market leader with IBM IS coming second along side Talend OS. However when it comes to DB integrated Tools then OWB and SSIS follow IPC directly. Thus one should be careful in selecting the tool as it may not be the best for organization just by the name of vendor. The capabilities of the tool should be reviewed before selection. VI. CONCLUSION Important data in most of the organizations is under utilized just because it exists around in different formats and in various resources. Data warehouses (DWs) are complex systems having consolidated data with a main objective to assist the knowledge workers in decision making process. The key components of DWs are the Extraction- Transformation-Loading (ETL) processes. The goal of this paper is to elaborate ETL process, its importance relevant to the data warehouses and provide a comparison based on some generalized criteria to find suitability of a tool for a certain category of consumers. The paper provides a brief overview of the available ETL tools in market, specifies some key points that can be made for generalizing capabilities provided by a tool and using graph based analysis on a grade point scale to grade the specific tools selected. This all provides a comparison of the available tools in terms of the features they provide helping an organization choose which tool will best suit its needs. REFERENCES [1] T.Y. Wah, H. Peng, and C.S. Hok, “Building Data Warehouse,” Proc. 24th South East Asia Regional Computer Conference, November 18- 19, 2007, Bangkok, Thailand [2] Tho, M. Njuyen, Tjoa, A. Min; Zero-Latency Data Warehousing for Heterogeneous Data Sources and Continuous Data Streams, Institute of Software Technology and Interactive Systems Favoriteristr. 9- 11/188, 2003 [3] T. Jaorg, S. Dessloch, Near Real-Time Data Warehousing Using State-of-the-Art ETL Tools, University of Kaiserslautern, 67653 Kaiserslautern, Germany, 2009. [4] Passionned, 'The BI Tool survey report”, 2008. [5] Passionned, “ETL Tools survey report”, 2009. [6] J. Levin, “ETL Tools Comparison”, March 2008. [7] Dr. R. Chillar; B. Kochar; Extraction Transformation Loading –A Road to Data warehouse, 2nd National Conference Mathematical Techniques: Emerging Paradigms for Electronics and IT Industries [8] Guide to Data Warehousing and Business Intelligence, available at http://data-warehouses.net/architecture/etlprocess.html. [9] Pervasive Systems, Extraordinarily Flexible ETL Platform,http://www.pervasiveintegration.com/scenarios/Pages/etl_to ols_data_aggregation.aspx. [10] Adeptia incorporation, ETL Vendors Comparison, available at http://www.adeptia.com/products/etl_vendor_comparison.htm l. [11] Guide to Data ware housing and Business Intelligence, Architectural Overview, available at http://data- warehouses.net/architecture/overview.html. [12] ETL tools Survey, available at http://www.etltool.com/what-is- etl.htm. [13] Oracle Ware house builder 11g, A technical overview, at http://www.oracle.com/technology/products/warehouse/index.html. [14] ETL data ware house concepts, available at http://etl- information.blogspot.com/2007_07_01_archive.htm