SlideShare a Scribd company logo
1 of 5
Download to read offline
International Journal of Engineering and Techniques - Volume I Issue 5, September-October 2015
ISSN: 2395-1303 http://www.ijetjournal.org Page 37
A Comparative Study on Operational Database, Data Warehouse
and Hadoop File System
T.Jalaja1
, M.Shailaja2
1,2
(Department of Computer Science, Osmania University/Vasavi College of Engineering, Hyderabad, India)
I. INTRODUCTION
One of the largest technological challenges in
software systems research today is to provide
mechanisms for storage, manipulation, and
information retrieval on large amounts of data. A
database is a collection of related data and a
database system is a database and database software
together. Operational Databases are transactional
databases, which supports on-line transaction
processing (OLTP) that includes insertions, updates,
deletions and also supports information query
requirements.
Operational Database is designed to make
transactional systems run efficiently. It is used to
store detailed and current data. The main emphasis
of this system is on very fast query processing,
maintaining data integrity in multi-access
environments. Thus databases must strike a balance
between efficiency in transaction processing and
supporting query requirements .They can't further
be optimized for theapplications such as OLAP,
DSS and data mining.
A Data Warehouse [1] is a database that is
designed for facilitating querying and analysis. In
contrast to databases, data warehouses generally
contain very large amounts of data from multiple
sources that may include databasesfrom different
data models and sometimes files acquired from
independent systems and platforms. Often it is
designed as OLAP (On-Line Analytical Processing)
systems.
Big data comes from relatively new types of
data sources like social media, public filings,
content available in the public domain through
agencies or subscriptions, documents and e-mails
including both structured and unstructured
texts,digital devices and sensors including location
based smart phone, weather and telemetric data
.There are several approaches to collecting,
storing,processing, and analyzing big data.Hadoop
[5] is a free, Java-based programming framework
that supports the processing of large data sets in a
distributed computing environment. Hadoop is not
atype of database, but rather a software ecosystem
that allows massive parallel computing. The main
focus of this paper is to present a clear
understanding on operational database, data
warehouse and Hadoop technology.
This paper is organized as follows: In section II, we
present literature survey on Operational database,
Datawarehouse and Hadoop Distributed File
System. In Section III, we present the comparative
RESEARCH ARTICLE OPEN ACCESS
Abstract:
A Computer database is a collection of logically related data that is stored in a computer system,
so that a computer program or person using a query language can use it to answer queries. An operational
database (OLTP) contains up-to-date, modifiable application specific data. A data warehouse (OLAP) is a
subject-oriented, integrated, time-variant and non-volatile collection of data used to make business
decisions. Hadoop Distributed File System (HDFS) allows storing large amount of data on a cloud of
machines. In this paper, we surveyed the literature related to operational databases, data warehouse and
hadoop technology.
Keywords — Operational Database, Data warehouse, Hadoop, OLTP, OLAP, Map Reduce.
International Journal of Engineering and Techniques - Volume I Issue 5, September-October 2015
ISSN: 2395-1303 http://www.ijetjournal.org Page 38
studies among the works cited above. In section IV,
we summarize the work.
II. LITERATURE SURVEY
Day-by-day technology in software systems is
improving and is becoming very challenging in
terms of storage, manipulation, and information
retrieval. There are different types of storage
systems having the capability of storing data from
few Megabytes to Peta bytes.In this paper we
discuss about the traditional/operational database,
which is capable of storing the transactional data of
an organization in part A. Then in part B,we discuss
about the datawarehouse, which is capable of
analyzing the business data which is used for
decision making. Later in part C we come across
the way to store large volume of structured as well
as unstructured data, which is further utilized for
analytics.
A. Operational Database (OLTP)
An operational database contains data about
the things that go on inside an organization or
enterprise. For example, an operational database
might contain fact and dimension data describing
transactions, data on customer complaints,
employee information, taking order and fulfilling
them in a store , track of payments and inventory
,information and amounts from credit cards etc.
Operational systems maintain records of daily
business transactions.Traditional databases support
on-line transaction processing (OLTP), which
includes insertions, updates, and deletions, while
also supporting information query requirements. An
operational database contains enterprise data which
are up to date and modifiable. As the name implies,
it is the database that is currently and progressive in
use capturing real time data and supplying data for
real time computations and other analyzing
processes.
A database structure commonly used in GIS in
which data is stored based on 2 dimensional tables
where multiple relationships between data elements
can be defined and established in an adhoc manner.
Relational Database Management System - a
database system made up of files with data elements
in two-dimensional array (rows and columns).This
database management system has the capability to
recombine data elements to form different relations
resulting in a great flexibility of data usage. SQL is
used to manipulate relational databases. The
relational model contains the following
components:
•Collection of objects or relations.
•Set of operations to act on the relations.
•Data integrity for accuracy and consistency.
All other database structures can be reduced
to a set of relational tables. It is easy to use when
compared to other database systems.Traditional
databases are optimized to process queries that may
touch a small part of the database and transactions
that deal with insertions or updates of a few tuples
per relation to process.
Because of the very dynamic nature of an
operational database, there are certain issues that
need to be addressed appropriately. An operational
database can grow very fast in size and bulk so
database administrations and IT analysts must
purchase high powered computer hardware and top
notch database management systems. The
operational database is the source of data for the
data warehouse.
B. Datawarehouse (OLAP)
According to Surajit Chaudhuri and
Umeshwar Dayal as in [6], a data warehouse is a
“subject-oriented, integrated, timevarying,non-
volatile collection of data that is used primarily in
organizational decision making as in [7]. Typically,
the data warehouse is maintained separately from
the organization’s operational databases. The data
warehouse supports on-line analytical processing
(OLAP), the functional and performance
requirements of which are quite different from
those of the on-line transaction processing (OLTP)
applications traditionally supported by the
operational databases.
A data warehouse is a centralized repository
that stores data from multiple information sources
and transforms them into a common,
International Journal of Engineering and Techniques - Volume I Issue 5, September-October 2015
ISSN: 2395-1303 http://www.ijetjournal.org Page 39
multidimensional data model for efficient querying
and analysis as shown in Figure 1.
Figure 1: Data Warehouse Architecture
Historical, summarized and consolidated
data is more important than detailed, individual
records. Since data warehouses contain
consolidated data, perhaps from several operational
databases, over potentially long periods of time,
they tend to be orders of magnitude larger than
operational databases; enterprise data warehouses
are projected to be hundreds of gigabytes to
terabytes in size. The workloads are query intensive
with mostly ad hoc, complex queries that can access
millions of records and perform a lot of scans, joins,
and aggregates. Query throughput and response
times are more important than transaction
throughput. To facilitate complex analyses and
visualization, the data in a warehouse is typically
modeled multidimensional.
The 3-D data cube of the sales data with respect to
time, item, and location is shown in Figure 2.
Figure 2: Multidimensional data (Data Cube)
Typical OLAP operations include rollup (increasing
the level of aggregation) and drill-down (decreasing
the level of aggregation or increasing detail) along
one or more dimension hierarchies, slice and dice
(selection and projection), and pivot (re-orienting
the multidimensional view of data).
Decision support usually requires consolidating
data from many heterogeneous sources: these might
include external sources such as stock market feeds,
in addition to several operational databases. The
different sources might contain data of varying
quality, or use inconsistent representations, codes
and formats, which have to be reconciled. Finally,
supporting the multidimensional data models and
operations typical of OLAP requires special data
organization, access methods, and implementation
methods, not generally provided by commercial
DBMSs targeted for OLTP. It is for all these
reasons that data warehouses are implemented
separately from operational databases.
A popular conceptual model that influences the
front-end tools, database design, and the query
engines for OLAP is the multidimensional view of
data in the warehouse.
C. Hadoop Distributed Filesystem (HDFS)
According to P.Sarada Devi, V.Visweswara
Rao and K.Raghavender as in [8], Data warehouse
is a collection of heterogeneous data which can
effectively and easily manage the data from
multiple databases in a uniform fashion. In addition
to data warehousing ETL, many technologies such
as ERP (Enterprise Resource Planning), SCM,SAP,
BI tools are used in the world market for handling
structured data efficiently and effectively.In order
to handle the large amount of structured, semi
structured and unstructured data generated by
different social network or industries, Hadoop is
being used.
Hadoop is a framework developed as an
Open Source Software based on papers published in
2004 by Google Inc. that deal with the “Map
Reduce” distributed processing and the “Google
File System”, a system they had used to scale their
data processing needs. Hadoop is an open source
framework for writing and running distributed
applications that process large amounts of data''. It
consists of two main components –
Storage: The Hadoop Distributed File System
Processing: "Map Reduce''
International Journal of Engineering and Techniques - Volume I Issue 5, September-October 2015
ISSN: 2395-1303 http://www.ijetjournal.org Page 40
Figure 3: Hadoop Architecture
HDFS is the storage component of Hadoop. It’s a
distributed file system that’s modeled after the
Google File System (GFS) paper [11]. Files in
HDFS are stored across one or more blocks, and
each block is typically 64 MB or larger. Blocks are
replicated across multiple hosts in the hadoop
cluster to help with availability and fault tolerance.
HDFS is able to store huge amounts of information,
scale up incrementally and survive the failure of
significant parts of the storage infrastructure
without losing data. Hadoop creates clusters of
machines and coordinates work among them.
Clusters can be built with inexpensive computers. If
one fails, Hadoop continues to operate the cluster
without losing data or interrupting work, by shifting
work to the remaining machines in the cluster.
HDFS manages storage on the cluster by breaking
incoming files into pieces, called “blocks,” and
storing each of the blocks redundantly across the
pool of servers. In the common case, HDFS stores
three complete copies of each file by copying each
piece to three different servers [9].
Map Reduce is a batch-based, distributed
computing framework modeled after Google’s
paper on Map Reduce [12]. Map reduce data
processing model has two main phases -
''MAPPING'' and ''REDUCING''.
Mapping Phase: Map Reduce takes the input data
and feeds each data element to the mapper which
filters and transforms the input data into the desired
format for business use.
Reducing Phase: The reducer processes all the
outputs from the mapper and arrives as a final result
by performing business logic to get the desired
output.
Figure 4: MapReduce Architecture
MapReduce Architecture
The processing pillar in the Hadoop
ecosystem is the MapReduce framework. The
framework allows the specification of an operation
to be applied to a huge data set, divide the problem
and data, and run it in parallel. From an analyst’s
point of view, this can occur on multiple
dimensions. For example, a very large dataset can
be reduced into a smaller subset where analytics can
be applied [10].
In Hadoop, these kinds of operations are written as
MapReduce jobs in Java. There are a number of
higher level languages like Hive and Pig that make
writing these programs easier. The outputs of these
jobs can be written back to either HDFS or placed
in a traditional data warehouse.
III. COMPARISION
TABLE 1
COMPARISION OF OLTP, OLAP, HDFS
Database
(OLTP)
Datawarehouse
(OLAP)
Hadoop
(HDFS)
Type
of data
Transactiona
l data,
detailed
data.
Historical data,
Derived data,
Metadata
Real time
data,
derived,
enterprise,
Simple
and
complex
data
Data
Storag
e
Stores
operational
data –
structured
data
Stores historical
information –
structured data
Stores
enterprise
data -
structured,
semi
structured
and
Unstructur
ed data.
Data
Modeli
Network ,
hierarchical,
Conceptual,
Logical, And
File
system(Sc
International Journal of Engineering and Techniques - Volume I Issue 5, September-October 2015
ISSN: 2395-1303 http://www.ijetjournal.org Page 41
ng
techni
ques
object
oriented,
object
relational
and
Relational
modeling
techniques
(Schema-on-
write)
Physical Data
modeling
techniques.
hema-on-
read)
Optimi
zation
Efficient for
performing
read-write
operations of
single point
transactions.
Efficient for
performing
Reading/retrieving
large data sets and
for aggregating
data.
HDFS is
efficient
for
performin
g reading
and
writing
large files
(gigabytes
and
larger).
High
throughput
Data
Access
techni
ques
Access to
few records
at once by
predefined
transactions
Access a lot of
records in each
access by ad hoc
queries and
periodic reports
Access
Streaming
data, large
volumes
of data
Data warehousing is faster and cost effective when
compared with Apache Hadoop.Unlike traditional
ETL tools, Hadoop persists the raw data and can be
used to re-process it repeatedly in a very efficient
manner .Hadoop mostly process semi structured
and unstructured data also, whereas Data
Warehouse is best suited for processing only
structured data efficiently.
Figure 5: Showing OLTP, OLAP, Hadoop
IV. CONCLUSION
Thus, the operational/traditional databases are used
to store an application specific data pertaining to an
Organization or Enterprise.This data is used for
query purpose.But, in order to store large amount of
data compared to Traditional database, which is
further used for analysis purpose, based on which
the business decisions are made, Datawarehouses
are maintained. Incontrast, to store
Zettabytes of structured/unstructured dynamic data,
Hadoop distributed File systems are used. These are
more efficient and fault tolerant.
REFERENCES
1. Wided Oueslati and Jalel Akaichi, “A Survey on Data
Warehouse Evolution”, Vol.2, No.4, November 2010,
IJDMS.
2. T.K.Das1 and AratiMohapatro “A Study on Big Data
Integration with Data Warehouse”International Journal of
Computer Trends and Technology (IJCTT) – volume 9
number 4– Mar 2014.
3. Dr. Mrs.Pushpa Suri, Mrs. Meenakshi Sharma” A
Comparative Studybetween The Performance of
Relational& object Oriented Database In Data
Warehouse".
4. Inmon, W. et.al. (2000) Exploration warehousing: turning
business information into business opportunity. John Wiley
& Sons.
5. Hadoop. http://hadoop.apache.org
6. Surajit Chaudhuri and Umeshwar Dayal “An Overview of
Data Warehousing and OLAP Technology”, ACM Sigmod
Record, March 1997).
7. Inmon, W.H., Building the Data Warehouse. John Wiley,
1992.
8. P.SaradaA Devi, V.Viswewara Rao and
K.Raghavender“Emerging Technology Big Data-Hadoop
Over Data Warehousing Using ETL” Proceedings of 2nd
IRF International Conference, 21st September-2014,
Vizag, India.
9. MrigankMridul, AkashdeepKhajuria, Snehasish Dutta,
Kumar N “ Analysis of Bidgata using Apache Hadoop and
Map Reduce” Volume 4, Issue 5, May 2014”.
10.Harshawardhan S. Bhosale, Prof. DevendraP. Gadekar
“Harshawardhan S. Bhosale1, Prof. Devendra P.
Gadekar” International JournalofScientific and Research
Publications, Volume 4, Issue 10, October 2014.
11. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
– “The Google File System.
12. Jeffrey Dean and Sanjay Ghemawat ,Google,
Inc –“MapReduce: Simplified Data Processing on
Large Clusters“.

More Related Content

What's hot

Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Miningcpjcollege
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousingumesh patil
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Harish Chand
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesFellowBuddy.com
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAswathy S Nair
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSINGKing Julian
 
Data Warehousing Overview
Data Warehousing OverviewData Warehousing Overview
Data Warehousing OverviewAhmed Gamal
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaRadhika Kotecha
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousinguncleRhyme
 
Data warehouseconceptsandarchitecture
Data warehouseconceptsandarchitectureData warehouseconceptsandarchitecture
Data warehouseconceptsandarchitecturesamaksh1982
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data miningmaxonlinetr
 

What's hot (20)

Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
 
Issue in Data warehousing and OLAP in E-business
Issue in Data warehousing and OLAP in E-businessIssue in Data warehousing and OLAP in E-business
Issue in Data warehousing and OLAP in E-business
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Warehousing Overview
Data Warehousing OverviewData Warehousing Overview
Data Warehousing Overview
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouseconceptsandarchitecture
Data warehouseconceptsandarchitectureData warehouseconceptsandarchitecture
Data warehouseconceptsandarchitecture
 
Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Ppt
PptPpt
Ppt
 

Viewers also liked

150507 2015년 춘계 한국자원리싸이클링학회 발표자료 (박승수)
150507 2015년 춘계 한국자원리싸이클링학회 발표자료 (박승수)150507 2015년 춘계 한국자원리싸이클링학회 발표자료 (박승수)
150507 2015년 춘계 한국자원리싸이클링학회 발표자료 (박승수)Seungsoo Park
 
Section 4 (Player Bios)
Section 4 (Player Bios)Section 4 (Player Bios)
Section 4 (Player Bios)Dion R. Dargin
 
Interaction presentation 2
Interaction presentation 2Interaction presentation 2
Interaction presentation 2Liz Armstrong
 
Source Citation & Credibility
Source Citation & CredibilitySource Citation & Credibility
Source Citation & Credibilitywarc2015
 
Contributing - Behind the Scenes of the Joomla! Project
Contributing - Behind the Scenes of the Joomla! ProjectContributing - Behind the Scenes of the Joomla! Project
Contributing - Behind the Scenes of the Joomla! ProjectTessa Mero
 
09e02236 101015112940-phpapp02
09e02236 101015112940-phpapp0209e02236 101015112940-phpapp02
09e02236 101015112940-phpapp02MarrCenllon Hia
 
Insurance Sector in 2015
Insurance Sector in 2015Insurance Sector in 2015
Insurance Sector in 2015Baibhav Agrawal
 
01 gdl-nimadearth-40-1-nimade-i
01 gdl-nimadearth-40-1-nimade-i01 gdl-nimadearth-40-1-nimade-i
01 gdl-nimadearth-40-1-nimade-iFajar Deilova
 
Herbalife Catálogo julho 2016 - ENCOMENDAS> lu.pegorini@hotmail.com
Herbalife Catálogo julho 2016 - ENCOMENDAS> lu.pegorini@hotmail.com Herbalife Catálogo julho 2016 - ENCOMENDAS> lu.pegorini@hotmail.com
Herbalife Catálogo julho 2016 - ENCOMENDAS> lu.pegorini@hotmail.com Lusani Dias
 

Viewers also liked (14)

GreshamMigration6a
GreshamMigration6aGreshamMigration6a
GreshamMigration6a
 
150507 2015년 춘계 한국자원리싸이클링학회 발표자료 (박승수)
150507 2015년 춘계 한국자원리싸이클링학회 발표자료 (박승수)150507 2015년 춘계 한국자원리싸이클링학회 발표자료 (박승수)
150507 2015년 춘계 한국자원리싸이클링학회 발표자료 (박승수)
 
ijazah
ijazahijazah
ijazah
 
Apartment Moving Tips
Apartment Moving TipsApartment Moving Tips
Apartment Moving Tips
 
Section 4 (Player Bios)
Section 4 (Player Bios)Section 4 (Player Bios)
Section 4 (Player Bios)
 
FINAL REPORT
FINAL REPORTFINAL REPORT
FINAL REPORT
 
Interaction presentation 2
Interaction presentation 2Interaction presentation 2
Interaction presentation 2
 
[IJET-V1I4P9] Author :Su Hlaing Win
[IJET-V1I4P9] Author :Su Hlaing Win[IJET-V1I4P9] Author :Su Hlaing Win
[IJET-V1I4P9] Author :Su Hlaing Win
 
Source Citation & Credibility
Source Citation & CredibilitySource Citation & Credibility
Source Citation & Credibility
 
Contributing - Behind the Scenes of the Joomla! Project
Contributing - Behind the Scenes of the Joomla! ProjectContributing - Behind the Scenes of the Joomla! Project
Contributing - Behind the Scenes of the Joomla! Project
 
09e02236 101015112940-phpapp02
09e02236 101015112940-phpapp0209e02236 101015112940-phpapp02
09e02236 101015112940-phpapp02
 
Insurance Sector in 2015
Insurance Sector in 2015Insurance Sector in 2015
Insurance Sector in 2015
 
01 gdl-nimadearth-40-1-nimade-i
01 gdl-nimadearth-40-1-nimade-i01 gdl-nimadearth-40-1-nimade-i
01 gdl-nimadearth-40-1-nimade-i
 
Herbalife Catálogo julho 2016 - ENCOMENDAS> lu.pegorini@hotmail.com
Herbalife Catálogo julho 2016 - ENCOMENDAS> lu.pegorini@hotmail.com Herbalife Catálogo julho 2016 - ENCOMENDAS> lu.pegorini@hotmail.com
Herbalife Catálogo julho 2016 - ENCOMENDAS> lu.pegorini@hotmail.com
 

Similar to Comparing Operational Databases, Data Warehouses and Hadoop File Systems

Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfDatacademy.ai
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forAyushMeraki1
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxDURGADEVIL
 
Implementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseImplementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseIJARIIT
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyKate Campbell
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
SAP BODS -quick guide.docx
SAP BODS -quick guide.docxSAP BODS -quick guide.docx
SAP BODS -quick guide.docxKen T
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingsumit621
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseSOMASUNDARAM T
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxUNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxshruthisweety4
 
Data warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersData warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersSourav Singh
 

Similar to Comparing Operational Databases, Data Warehouses and Hadoop File Systems (20)

Top 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdfTop 60+ Data Warehouse Interview Questions and Answers.pdf
Top 60+ Data Warehouse Interview Questions and Answers.pdf
 
DATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining forDATAWAREHOUSE MAIn under data mining for
DATAWAREHOUSE MAIn under data mining for
 
DW 101
DW 101DW 101
DW 101
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
UNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docxUNIT-5 DATA WAREHOUSING.docx
UNIT-5 DATA WAREHOUSING.docx
 
Implementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware houseImplementation of Data Marts in Data ware house
Implementation of Data Marts in Data ware house
 
Advances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing TechnologyAdvances And Research Directions In Data-Warehousing Technology
Advances And Research Directions In Data-Warehousing Technology
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
SAP BODS -quick guide.docx
SAP BODS -quick guide.docxSAP BODS -quick guide.docx
SAP BODS -quick guide.docx
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Database Vs.pptx
Database Vs.pptxDatabase Vs.pptx
Database Vs.pptx
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxUNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptx
 
Unit Ii
Unit IiUnit Ii
Unit Ii
 
Data warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswersData warehousing interview_questionsandanswers
Data warehousing interview_questionsandanswers
 
CTP Data Warehouse
CTP Data WarehouseCTP Data Warehouse
CTP Data Warehouse
 
DMDW 1st module.pdf
DMDW 1st module.pdfDMDW 1st module.pdf
DMDW 1st module.pdf
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Dwbasics
DwbasicsDwbasics
Dwbasics
 

More from IJET - International Journal of Engineering and Techniques

More from IJET - International Journal of Engineering and Techniques (20)

healthcare supervising system to monitor heart rate to diagonize and alert he...
healthcare supervising system to monitor heart rate to diagonize and alert he...healthcare supervising system to monitor heart rate to diagonize and alert he...
healthcare supervising system to monitor heart rate to diagonize and alert he...
 
verifiable and multi-keyword searchable attribute-based encryption scheme for...
verifiable and multi-keyword searchable attribute-based encryption scheme for...verifiable and multi-keyword searchable attribute-based encryption scheme for...
verifiable and multi-keyword searchable attribute-based encryption scheme for...
 
Ijet v5 i6p18
Ijet v5 i6p18Ijet v5 i6p18
Ijet v5 i6p18
 
Ijet v5 i6p17
Ijet v5 i6p17Ijet v5 i6p17
Ijet v5 i6p17
 
Ijet v5 i6p16
Ijet v5 i6p16Ijet v5 i6p16
Ijet v5 i6p16
 
Ijet v5 i6p15
Ijet v5 i6p15Ijet v5 i6p15
Ijet v5 i6p15
 
Ijet v5 i6p14
Ijet v5 i6p14Ijet v5 i6p14
Ijet v5 i6p14
 
Ijet v5 i6p13
Ijet v5 i6p13Ijet v5 i6p13
Ijet v5 i6p13
 
Ijet v5 i6p12
Ijet v5 i6p12Ijet v5 i6p12
Ijet v5 i6p12
 
Ijet v5 i6p11
Ijet v5 i6p11Ijet v5 i6p11
Ijet v5 i6p11
 
Ijet v5 i6p10
Ijet v5 i6p10Ijet v5 i6p10
Ijet v5 i6p10
 
Ijet v5 i6p2
Ijet v5 i6p2Ijet v5 i6p2
Ijet v5 i6p2
 
IJET-V3I2P24
IJET-V3I2P24IJET-V3I2P24
IJET-V3I2P24
 
IJET-V3I2P23
IJET-V3I2P23IJET-V3I2P23
IJET-V3I2P23
 
IJET-V3I2P22
IJET-V3I2P22IJET-V3I2P22
IJET-V3I2P22
 
IJET-V3I2P21
IJET-V3I2P21IJET-V3I2P21
IJET-V3I2P21
 
IJET-V3I2P20
IJET-V3I2P20IJET-V3I2P20
IJET-V3I2P20
 
IJET-V3I2P19
IJET-V3I2P19IJET-V3I2P19
IJET-V3I2P19
 
IJET-V3I2P18
IJET-V3I2P18IJET-V3I2P18
IJET-V3I2P18
 
IJET-V3I2P17
IJET-V3I2P17IJET-V3I2P17
IJET-V3I2P17
 

Recently uploaded

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 

Recently uploaded (20)

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 

Comparing Operational Databases, Data Warehouses and Hadoop File Systems

  • 1. International Journal of Engineering and Techniques - Volume I Issue 5, September-October 2015 ISSN: 2395-1303 http://www.ijetjournal.org Page 37 A Comparative Study on Operational Database, Data Warehouse and Hadoop File System T.Jalaja1 , M.Shailaja2 1,2 (Department of Computer Science, Osmania University/Vasavi College of Engineering, Hyderabad, India) I. INTRODUCTION One of the largest technological challenges in software systems research today is to provide mechanisms for storage, manipulation, and information retrieval on large amounts of data. A database is a collection of related data and a database system is a database and database software together. Operational Databases are transactional databases, which supports on-line transaction processing (OLTP) that includes insertions, updates, deletions and also supports information query requirements. Operational Database is designed to make transactional systems run efficiently. It is used to store detailed and current data. The main emphasis of this system is on very fast query processing, maintaining data integrity in multi-access environments. Thus databases must strike a balance between efficiency in transaction processing and supporting query requirements .They can't further be optimized for theapplications such as OLAP, DSS and data mining. A Data Warehouse [1] is a database that is designed for facilitating querying and analysis. In contrast to databases, data warehouses generally contain very large amounts of data from multiple sources that may include databasesfrom different data models and sometimes files acquired from independent systems and platforms. Often it is designed as OLAP (On-Line Analytical Processing) systems. Big data comes from relatively new types of data sources like social media, public filings, content available in the public domain through agencies or subscriptions, documents and e-mails including both structured and unstructured texts,digital devices and sensors including location based smart phone, weather and telemetric data .There are several approaches to collecting, storing,processing, and analyzing big data.Hadoop [5] is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Hadoop is not atype of database, but rather a software ecosystem that allows massive parallel computing. The main focus of this paper is to present a clear understanding on operational database, data warehouse and Hadoop technology. This paper is organized as follows: In section II, we present literature survey on Operational database, Datawarehouse and Hadoop Distributed File System. In Section III, we present the comparative RESEARCH ARTICLE OPEN ACCESS Abstract: A Computer database is a collection of logically related data that is stored in a computer system, so that a computer program or person using a query language can use it to answer queries. An operational database (OLTP) contains up-to-date, modifiable application specific data. A data warehouse (OLAP) is a subject-oriented, integrated, time-variant and non-volatile collection of data used to make business decisions. Hadoop Distributed File System (HDFS) allows storing large amount of data on a cloud of machines. In this paper, we surveyed the literature related to operational databases, data warehouse and hadoop technology. Keywords — Operational Database, Data warehouse, Hadoop, OLTP, OLAP, Map Reduce.
  • 2. International Journal of Engineering and Techniques - Volume I Issue 5, September-October 2015 ISSN: 2395-1303 http://www.ijetjournal.org Page 38 studies among the works cited above. In section IV, we summarize the work. II. LITERATURE SURVEY Day-by-day technology in software systems is improving and is becoming very challenging in terms of storage, manipulation, and information retrieval. There are different types of storage systems having the capability of storing data from few Megabytes to Peta bytes.In this paper we discuss about the traditional/operational database, which is capable of storing the transactional data of an organization in part A. Then in part B,we discuss about the datawarehouse, which is capable of analyzing the business data which is used for decision making. Later in part C we come across the way to store large volume of structured as well as unstructured data, which is further utilized for analytics. A. Operational Database (OLTP) An operational database contains data about the things that go on inside an organization or enterprise. For example, an operational database might contain fact and dimension data describing transactions, data on customer complaints, employee information, taking order and fulfilling them in a store , track of payments and inventory ,information and amounts from credit cards etc. Operational systems maintain records of daily business transactions.Traditional databases support on-line transaction processing (OLTP), which includes insertions, updates, and deletions, while also supporting information query requirements. An operational database contains enterprise data which are up to date and modifiable. As the name implies, it is the database that is currently and progressive in use capturing real time data and supplying data for real time computations and other analyzing processes. A database structure commonly used in GIS in which data is stored based on 2 dimensional tables where multiple relationships between data elements can be defined and established in an adhoc manner. Relational Database Management System - a database system made up of files with data elements in two-dimensional array (rows and columns).This database management system has the capability to recombine data elements to form different relations resulting in a great flexibility of data usage. SQL is used to manipulate relational databases. The relational model contains the following components: •Collection of objects or relations. •Set of operations to act on the relations. •Data integrity for accuracy and consistency. All other database structures can be reduced to a set of relational tables. It is easy to use when compared to other database systems.Traditional databases are optimized to process queries that may touch a small part of the database and transactions that deal with insertions or updates of a few tuples per relation to process. Because of the very dynamic nature of an operational database, there are certain issues that need to be addressed appropriately. An operational database can grow very fast in size and bulk so database administrations and IT analysts must purchase high powered computer hardware and top notch database management systems. The operational database is the source of data for the data warehouse. B. Datawarehouse (OLAP) According to Surajit Chaudhuri and Umeshwar Dayal as in [6], a data warehouse is a “subject-oriented, integrated, timevarying,non- volatile collection of data that is used primarily in organizational decision making as in [7]. Typically, the data warehouse is maintained separately from the organization’s operational databases. The data warehouse supports on-line analytical processing (OLAP), the functional and performance requirements of which are quite different from those of the on-line transaction processing (OLTP) applications traditionally supported by the operational databases. A data warehouse is a centralized repository that stores data from multiple information sources and transforms them into a common,
  • 3. International Journal of Engineering and Techniques - Volume I Issue 5, September-October 2015 ISSN: 2395-1303 http://www.ijetjournal.org Page 39 multidimensional data model for efficient querying and analysis as shown in Figure 1. Figure 1: Data Warehouse Architecture Historical, summarized and consolidated data is more important than detailed, individual records. Since data warehouses contain consolidated data, perhaps from several operational databases, over potentially long periods of time, they tend to be orders of magnitude larger than operational databases; enterprise data warehouses are projected to be hundreds of gigabytes to terabytes in size. The workloads are query intensive with mostly ad hoc, complex queries that can access millions of records and perform a lot of scans, joins, and aggregates. Query throughput and response times are more important than transaction throughput. To facilitate complex analyses and visualization, the data in a warehouse is typically modeled multidimensional. The 3-D data cube of the sales data with respect to time, item, and location is shown in Figure 2. Figure 2: Multidimensional data (Data Cube) Typical OLAP operations include rollup (increasing the level of aggregation) and drill-down (decreasing the level of aggregation or increasing detail) along one or more dimension hierarchies, slice and dice (selection and projection), and pivot (re-orienting the multidimensional view of data). Decision support usually requires consolidating data from many heterogeneous sources: these might include external sources such as stock market feeds, in addition to several operational databases. The different sources might contain data of varying quality, or use inconsistent representations, codes and formats, which have to be reconciled. Finally, supporting the multidimensional data models and operations typical of OLAP requires special data organization, access methods, and implementation methods, not generally provided by commercial DBMSs targeted for OLTP. It is for all these reasons that data warehouses are implemented separately from operational databases. A popular conceptual model that influences the front-end tools, database design, and the query engines for OLAP is the multidimensional view of data in the warehouse. C. Hadoop Distributed Filesystem (HDFS) According to P.Sarada Devi, V.Visweswara Rao and K.Raghavender as in [8], Data warehouse is a collection of heterogeneous data which can effectively and easily manage the data from multiple databases in a uniform fashion. In addition to data warehousing ETL, many technologies such as ERP (Enterprise Resource Planning), SCM,SAP, BI tools are used in the world market for handling structured data efficiently and effectively.In order to handle the large amount of structured, semi structured and unstructured data generated by different social network or industries, Hadoop is being used. Hadoop is a framework developed as an Open Source Software based on papers published in 2004 by Google Inc. that deal with the “Map Reduce” distributed processing and the “Google File System”, a system they had used to scale their data processing needs. Hadoop is an open source framework for writing and running distributed applications that process large amounts of data''. It consists of two main components – Storage: The Hadoop Distributed File System Processing: "Map Reduce''
  • 4. International Journal of Engineering and Techniques - Volume I Issue 5, September-October 2015 ISSN: 2395-1303 http://www.ijetjournal.org Page 40 Figure 3: Hadoop Architecture HDFS is the storage component of Hadoop. It’s a distributed file system that’s modeled after the Google File System (GFS) paper [11]. Files in HDFS are stored across one or more blocks, and each block is typically 64 MB or larger. Blocks are replicated across multiple hosts in the hadoop cluster to help with availability and fault tolerance. HDFS is able to store huge amounts of information, scale up incrementally and survive the failure of significant parts of the storage infrastructure without losing data. Hadoop creates clusters of machines and coordinates work among them. Clusters can be built with inexpensive computers. If one fails, Hadoop continues to operate the cluster without losing data or interrupting work, by shifting work to the remaining machines in the cluster. HDFS manages storage on the cluster by breaking incoming files into pieces, called “blocks,” and storing each of the blocks redundantly across the pool of servers. In the common case, HDFS stores three complete copies of each file by copying each piece to three different servers [9]. Map Reduce is a batch-based, distributed computing framework modeled after Google’s paper on Map Reduce [12]. Map reduce data processing model has two main phases - ''MAPPING'' and ''REDUCING''. Mapping Phase: Map Reduce takes the input data and feeds each data element to the mapper which filters and transforms the input data into the desired format for business use. Reducing Phase: The reducer processes all the outputs from the mapper and arrives as a final result by performing business logic to get the desired output. Figure 4: MapReduce Architecture MapReduce Architecture The processing pillar in the Hadoop ecosystem is the MapReduce framework. The framework allows the specification of an operation to be applied to a huge data set, divide the problem and data, and run it in parallel. From an analyst’s point of view, this can occur on multiple dimensions. For example, a very large dataset can be reduced into a smaller subset where analytics can be applied [10]. In Hadoop, these kinds of operations are written as MapReduce jobs in Java. There are a number of higher level languages like Hive and Pig that make writing these programs easier. The outputs of these jobs can be written back to either HDFS or placed in a traditional data warehouse. III. COMPARISION TABLE 1 COMPARISION OF OLTP, OLAP, HDFS Database (OLTP) Datawarehouse (OLAP) Hadoop (HDFS) Type of data Transactiona l data, detailed data. Historical data, Derived data, Metadata Real time data, derived, enterprise, Simple and complex data Data Storag e Stores operational data – structured data Stores historical information – structured data Stores enterprise data - structured, semi structured and Unstructur ed data. Data Modeli Network , hierarchical, Conceptual, Logical, And File system(Sc
  • 5. International Journal of Engineering and Techniques - Volume I Issue 5, September-October 2015 ISSN: 2395-1303 http://www.ijetjournal.org Page 41 ng techni ques object oriented, object relational and Relational modeling techniques (Schema-on- write) Physical Data modeling techniques. hema-on- read) Optimi zation Efficient for performing read-write operations of single point transactions. Efficient for performing Reading/retrieving large data sets and for aggregating data. HDFS is efficient for performin g reading and writing large files (gigabytes and larger). High throughput Data Access techni ques Access to few records at once by predefined transactions Access a lot of records in each access by ad hoc queries and periodic reports Access Streaming data, large volumes of data Data warehousing is faster and cost effective when compared with Apache Hadoop.Unlike traditional ETL tools, Hadoop persists the raw data and can be used to re-process it repeatedly in a very efficient manner .Hadoop mostly process semi structured and unstructured data also, whereas Data Warehouse is best suited for processing only structured data efficiently. Figure 5: Showing OLTP, OLAP, Hadoop IV. CONCLUSION Thus, the operational/traditional databases are used to store an application specific data pertaining to an Organization or Enterprise.This data is used for query purpose.But, in order to store large amount of data compared to Traditional database, which is further used for analysis purpose, based on which the business decisions are made, Datawarehouses are maintained. Incontrast, to store Zettabytes of structured/unstructured dynamic data, Hadoop distributed File systems are used. These are more efficient and fault tolerant. REFERENCES 1. Wided Oueslati and Jalel Akaichi, “A Survey on Data Warehouse Evolution”, Vol.2, No.4, November 2010, IJDMS. 2. T.K.Das1 and AratiMohapatro “A Study on Big Data Integration with Data Warehouse”International Journal of Computer Trends and Technology (IJCTT) – volume 9 number 4– Mar 2014. 3. Dr. Mrs.Pushpa Suri, Mrs. Meenakshi Sharma” A Comparative Studybetween The Performance of Relational& object Oriented Database In Data Warehouse". 4. Inmon, W. et.al. (2000) Exploration warehousing: turning business information into business opportunity. John Wiley & Sons. 5. Hadoop. http://hadoop.apache.org 6. Surajit Chaudhuri and Umeshwar Dayal “An Overview of Data Warehousing and OLAP Technology”, ACM Sigmod Record, March 1997). 7. Inmon, W.H., Building the Data Warehouse. John Wiley, 1992. 8. P.SaradaA Devi, V.Viswewara Rao and K.Raghavender“Emerging Technology Big Data-Hadoop Over Data Warehousing Using ETL” Proceedings of 2nd IRF International Conference, 21st September-2014, Vizag, India. 9. MrigankMridul, AkashdeepKhajuria, Snehasish Dutta, Kumar N “ Analysis of Bidgata using Apache Hadoop and Map Reduce” Volume 4, Issue 5, May 2014”. 10.Harshawardhan S. Bhosale, Prof. DevendraP. Gadekar “Harshawardhan S. Bhosale1, Prof. Devendra P. Gadekar” International JournalofScientific and Research Publications, Volume 4, Issue 10, October 2014. 11. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung – “The Google File System. 12. Jeffrey Dean and Sanjay Ghemawat ,Google, Inc –“MapReduce: Simplified Data Processing on Large Clusters“.