Running head: LITERATURE REVIEW AND PROPOSAL 1
2
LITERATURE REVIEW AND PROPOSAL
Literature Review and Proposal
Naveen Kumar Nagulapalli.
IST 8101
Wilmington University.
Literature Review
For the purposes of analyzing the business, there is always the need to load the warehouse regularly (Egger 2006). Coping of data to the warehouse the data from different operating systems must also be extracted. There is a constant challenge found in the environment of data warehouse. They are in rearrangement, integration, and the consolidation of high amount of data over numerous systems thus providing unified information and as a result, it provided intelligent information base for the business (Goto 1999).
The terminology stands for extraction, transformation, and also loading. It is usually a broad process and hence cannot be assumed to refer to three well-developed steps. The omission of the transportation phrase can lead to the misinterpretation of the term ETL. This can lead to the misinterpretation that other distinct phases and processes are distinct. Data must be shared between system or applications to integrate them. This gives the two applications the same picture of the whole world. The sharing of data was in most cases addressed by the mechanisms that are similar to the ETL (SAS Institute 2004).
Extraction of the data:
The same is also true for the time delta found between two logically identical extractions, the time spam can vary in some instances between day/hours and minuets to the near real time. During the data extraction, the desired data is to be identified and extracted from different source systems including the database systems and application. It is also not easy to effectively identify the specific subsets that are contained in the interest so that more data than the required has to be obtained from it so that the whole process of data identification that is relevant can subsequently be identified in future.
Figure 1: Egger’s Data Extraction Model.
There are some kinds of transformation that have the high chances of taking place during the extraction process. The data sizes that is usually extracted also differs from the many hundreds kilobytes to the hundreds gigabytes, a good example is found in the situation that involve web server log files that are likely to grow into many hundreds of megabytes easily in a time that is considered to be very short. It must however be noted that it varies with the specific source system and the prevailing situations in business (Casters 2010).
The data has then to be physically transported to the systems that are target of even to the systems that are intermediate for the processing that is advances. There usually occurs some kind of transformation that depends much on the chosen way of transportation.
Data transportation:
After Data Extraction, the data has to be shifted to the destination database for better processing. There are usually some kind of transformati ...
Running head LITERATURE REVIEW AND PROPOSAL 12LITERATU.docx
1. Running head: LITERATURE REVIEW AND PROPOSAL 1
2
LITERATURE REVIEW AND PROPOSAL
Literature Review and Proposal
Naveen Kumar Nagulapalli.
IST 8101
Wilmington University.
Literature Review
For the purposes of analyzing the business, there is always the
need to load the warehouse regularly (Egger 2006). Coping of
data to the warehouse the data from different operating systems
must also be extracted. There is a constant challenge found in
the environment of data warehouse. They are in rearrangement,
2. integration, and the consolidation of high amount of data over
numerous systems thus providing unified information and as a
result, it provided intelligent information base for the business
(Goto 1999).
The terminology stands for extraction, transformation, and also
loading. It is usually a broad process and hence cannot be
assumed to refer to three well-developed steps. The omission of
the transportation phrase can lead to the misinterpretation of the
term ETL. This can lead to the misinterpretation that other
distinct phases and processes are distinct. Data must be shared
between system or applications to integrate them. This gives the
two applications the same picture of the whole world. The
sharing of data was in most cases addressed by the mechanisms
that are similar to the ETL (SAS Institute 2004).
Extraction of the data:
The same is also true for the time delta found between two
logically identical extractions, the time spam can vary in some
instances between day/hours and minuets to the near real time.
During the data extraction, the desired data is to be identified
and extracted from different source systems including the
database systems and application. It is also not easy to
effectively identify the specific subsets that are contained in the
interest so that more data than the required has to be obtained
from it so that the whole process of data identification that is
relevant can subsequently be identified in future.
Figure 1: Egger’s Data Extraction Model.
There are some kinds of transformation that have the high
chances of taking place during the extraction process. The data
sizes that is usually extracted also differs from the many
hundreds kilobytes to the hundreds gigabytes, a good example is
found in the situation that involve web server log files that are
likely to grow into many hundreds of megabytes easily in a time
that is considered to be very short. It must however be noted
that it varies with the specific source system and the prevailing
situations in business (Casters 2010).
3. The data has then to be physically transported to the systems
that are target of even to the systems that are intermediate for
the processing that is advances. There usually occurs some kind
of transformation that depends much on the chosen way of
transportation.
Data transportation:
After Data Extraction, the data has to be shifted to the
destination database for better processing. There are usually
some kind of transformation that are likely to be felt. This
transformation depends much on the chosen method of
transportation (Kimball n.d). There is an example of situation of
the statement of the SQL which has the capacity to directly
access target that is remote through gate away can give much
attention on the two different columns as a major part of the
selected statement (Caster 2010). Scalability is the section that
is usually created in majority of the examples. It is also not
easy to effectively identify the specific subsets that are
contained in the interest so that more data than the required has
to be obtained from it so that the whole process of data
identification that is relevant can subsequently be identified in
future.
Figure 2: Data Transportation Model.
Management of the ETL Process:
The process is usually seen to be quite forward. There are some
possibilities that the process can fail. The failure is usually
caused by the extracts that are missing from one system that is
missing values in one of the tables that are used for referencing
or by simple connection or even power outrage. It is, therefore,
important to design the ETL process keeping in mind the failed
recovery (Rizzi n.d).
4. The Proposal
Project Summary:
The purpose of the proposal is to analyze the use of the ETL in
warehousing. The aim is to find out the contribution that it has
done in making the warehousing easy and simpler using the ETL
methods since it has been used in the effective management of
the database for a long period (Caserta 2004).
Project background:
The use of the ETL in warehousing has been in place for a long
period. This has been attributed to the effectiveness that it has
for a long period. This effectiveness is the reason why the study
is aimed at finding out many details about it (Caserta 2004).
Project objectives:
The objective of the project is to analyze the various functions
of the ETL and the effectiveness in the process of warehousing.
Project methodology:
The project's primary objective will be achieved through a
method which involves intensive analysis of the use of the ETL
in warehousing using a case study in providing the direction and
appropriate source of data.
Project risks:
There are no recognizable risks for the project
Cost of the project:
The cost of conducting the project will include things such as
the cost of transportation, supplies, and equipment.
Reference
Egger N (2006) SAP BW Data Retrieval: Mastering the ETL
Process Galileo Press
Oracle. (n.d.). Overview of Extraction, Transformation and
Loading. Retrieved from
https://docs.oracle.com/cd/B19306_01/server.102/b14223/ettove
r.htm
5. Goto M (1999) Theory and structure of the automatic relay
computer E.T.L. Mark II
Electrotechnical Laboratory
SAS Institute (2004) SAS 9. 1. 3 the ETL Studio: User's Guide
SAS Institute
Casters M (2010) Pentaho Kettle
Solution
s: Building Open Source ETL