Running head: ETL DEVELOPER AT GLOBAL POINT
ETL DEVELOPER AT GLOBAL POINT
Introduction and Methodology
Naveen Kumar Nagulapalli.
Ms. Wilmore, Cheryl k.
IST 8101
Wilmington University.
Introduction
ETL stands for Extract, Transform and Load which refers to the combination of three different functions into a single methodology. This ETL is used in the databases and especially in the data warehousing. This concept is used in effective management of the databases. This covers the whole process of migrating the data from the different databases to a data warehouse (Oracle, n.d.). There are various ETL tools in the current software world and these tools are responsible for extracting the data from various sources, clean the data, Customize, restructure and integrate the data. Then finally it loads the data into the data warehouse. It is very hard to build an ETL process in data warehousing (Shaker, Adeltawad & Hamed, 2011). This ETL process is in use from long time and it acts as the back bone for data warehousing. ETL doesn’t mean only these three processes and there are many other processes in the whole transformation. ETL is a time consuming, complex process and it needs lot of budget and resources during the implementation process (Shaker et al., 2011).
While building a data warehouse we have to closely look into three areas because we are going to get the data from various sources into a single source and make the whole data into a single format or a standardized format for the organization operations (Oracle, n.d.). The three main areas where we should focus are the source area, destination and the mapping area. Source area is the various databases from which we get the data and there are various entities and relations between these tables. The destination area is the data warehouse where there will be a star schema model. Mapping area is where different data models are linked to each other using a specific set of methods. But there is no standard model available for the mapping area and this is the reason we use ETL tools for creating a standard format for a specific business (Shaker et al., 2011).
The description of the steps in ETL process is:
Extract
In this step the data is extracted from different sources without affecting the source databases operations. The main motto is to extract as little data as possible from the source databases. The size of the extracted data is huge and it may range from kilobytes to gigabytes. The extraction can be several ways and this can be updates, incremental and full extract (Oracle, n.d.).
Transform
This is the important step in data warehouse process because here we transform the data according to the destination. There are various rules applied on the data and converts the whole data into same dimension. The whole transformation step includes cleaning, transformation and integration of data. The data should be unambiguous and clean (Shaker et al., 2011).
Load
This is the final st ...
Running head ETL DEVELOPER AT GLOBAL POINTETL DEVELOPER AT .docx
1. Running head: ETL DEVELOPER AT GLOBAL POINT
ETL DEVELOPER AT GLOBAL POINT
Introduction and Methodology
Naveen Kumar Nagulapalli.
Ms. Wilmore, Cheryl k.
IST 8101
Wilmington University.
Introduction
ETL stands for Extract, Transform and Load which refers to the
combination of three different functions into a single
methodology. This ETL is used in the databases and especially
in the data warehousing. This concept is used in effective
management of the databases. This covers the whole process of
migrating the data from the different databases to a data
warehouse (Oracle, n.d.). There are various ETL tools in the
current software world and these tools are responsible for
extracting the data from various sources, clean the data,
Customize, restructure and integrate the data. Then finally it
loads the data into the data warehouse. It is very hard to build
an ETL process in data warehousing (Shaker, Adeltawad &
2. Hamed, 2011). This ETL process is in use from long time and it
acts as the back bone for data warehousing. ETL doesn’t mean
only these three processes and there are many other processes in
the whole transformation. ETL is a time consuming, complex
process and it needs lot of budget and resources during the
implementation process (Shaker et al., 2011).
While building a data warehouse we have to closely look into
three areas because we are going to get the data from various
sources into a single source and make the whole data into a
single format or a standardized format for the organization
operations (Oracle, n.d.). The three main areas where we should
focus are the source area, destination and the mapping area.
Source area is the various databases from which we get the data
and there are various entities and relations between these tables.
The destination area is the data warehouse where there will be a
star schema model. Mapping area is where different data models
are linked to each other using a specific set of methods. But
there is no standard model available for the mapping area and
this is the reason we use ETL tools for creating a standard
format for a specific business (Shaker et al., 2011).
The description of the steps in ETL process is:
Extract
In this step the data is extracted from different sources
without affecting the source databases operations. The main
motto is to extract as little data as possible from the source
databases. The size of the extracted data is huge and it may
range from kilobytes to gigabytes. The extraction can be several
ways and this can be updates, incremental and full extract
(Oracle, n.d.).
Transform
This is the important step in data warehouse process
because here we transform the data according to the destination.
There are various rules applied on the data and converts the
whole data into same dimension. The whole transformation step
includes cleaning, transformation and integration of data. The
data should be unambiguous and clean (Shaker et al., 2011).
3. Load
This is the final step of ETL process and here we load the
data into the target dimensional structure. The goal of this step
is to ensure whether the whole data is loaded properly or not.
This loaded data is accesses by all the users and applications
(Shaker et al., 2011).
All these steps or functions are combined into a single
solution called ETL and there are various tools which are
enabling this process and these tools are useful for most of the
companies to extract the data from various sources and make a
data warehouse. It is always advisable to build our own ETL
tool if the data is small. If there is large amount of data and if it
becomes hard to manage then it is advisable to take an ETL tool
to complete the process. There are many factors that need to be
identified while choosing an ETL tool. In this Action research I
will learn about ETL and why ETL is important in Data
warehouse and what role it is playing in the current
organizations. After learning everything about ETL I will learn
about various models in ETL and the different types of ETL
tools. I will compare all the tools and suggest some best tools
for the organizations. By this research I will be able to learn
about how to implement ETL process to form data warehouse.
Methodology
4. ETL is very important in the data warehousing and there are
various models, methodologies and tools available in the current
technology world. It is becoming hard for most of the
companies to do analysis, create reports and make decisions
with the data they have. The companies first need to form a data
warehouse and to create a data warehouse first they need to
perform ETL process. The data warehouse will be clean and
usable if the organization follows a proper ETL process and
tools (Sweety, Piyush & Saumil, 2012).
The organization should follow a process in finding the best
methodologies and tools to form a proper data warehouse. There
are various research methods that can be used for finding the
solutions and I advise that using an Action research method will
be very useful for learning more about the ETL and finding the
best solution for the implementation of ETL. This research will
improve the quality of the organization and increase the
performance of the ETL process.
Action Research is a systematic, collaborative and critical
approach followed by the participants who are performing an
enquiry about a particular topic. This research is performed to
improve the quality of a particular approach or a technology
that is current in use. While performing an action research the
person or the participants ask questions to the different
stakeholders who are related directly or indirectly to the
topic(Richard, 2000). In an action research project the
researcher takes the help of an subject matter expert in taking
learning and taking important decisions on the project. There
are many advantages for choosing this research process because
this research will help you to find the issues of a particular
topic and then you can take decisions on the problems and
thereby you can implement proper solutions for the research
(Richard, 2000).
The action research process will follow the seven step process
whatever may be the concept that the researcher is dealing.
They are:
1. Selecting a Focus (Richard, 2000).
5. 2. Clarify theories
3. The researcher should identify the questions that he need to
ask the SMEs
4. Then you need to collect the data from various resources
which can be articles, books or questionnaires
5. Analyze the data which we have collected from various
resources.
6. The data should be reported to all the people who are
connected to the research
7. After reporting the results there should be a action on the
particular problem or improvement for which you are
performing this action research (Richard, 2000).
History of Action Research
Kurt Lewin is the reason behind the origin for the Action
research. He also proved that through action research we can
develop relationship between various groups and increase
communication and co-operation between the groups. Lewin
described the Action research concept as a spiral process and
this approach will be useful for identifying the problems and
explore the solutions for the problems (Adelman, 1993).
Figure 1: Lewin’s model of action research. Dickens, L., &
Watkins, K. (1999). Action Research: Rethinking Lewin.
Management Learning, 30(2), 127-140.
As part of my topic I will select this action research
methodology and my study will be both qualitative and
quantitative in getting the information. Action research will
definitely fit for my ETL research because it will address all the
problems that the organizations are facing in finding the best
ETL tool and implementing them. The action research follows a
specific process in finding the solutions and implementing them
(Regina, 2002). The action research mainly contains these
following steps as part of their process:
1. Plan
2. Action
3. Observation and
6. 4. Reflection
Because as part of the planning step, I will identify what need
to be done as part of selecting the ETL tools and models. Then
in the action step I will implement the complete planning phase.
In the next step I will collect and analyze the evidence which I
have found during my action step. I will then reflect all the
evidence. In this way I will go through various iterations until I
find the right solution for my research (Regina, 2002).
References
Adelman, C. (1993). Kurt Lewin and the Origins of Action
Research. Educational Action Research, 1(1), 7-24, DOI:
10.1080/0965079930010102
Oracle. (n.d.). Overview of Extraction, Transformation and
Loading. Retrieved from
https://docs.oracle.com/cd/B19306_01/server.102/b14223/ettove
r.htm
Regina, R. (2002). Supporting Technology Integration through
Action Research. The Clearing House, 75(5), 233-237.
Richard, S. (2000). Guiding School Improvement with Action
Research. Retrieved from