SlideShare a Scribd company logo
1 of 6
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1111
BIG DATA WITH HADOOP – FOR DATA MANAGEMENT, PROCESSING
AND STORING
Revathi.V1, Rakshitha.K.R2, Sruthi.K3, Guruprasaath.S4
1Assistant Professor, Department of BCA & M.Sc SS, Sri Krishna Arts and Science College, Coimbatore, India
2,3,4 III BCA ‘A’, Department of BCA & M.Sc SS, Sri Krishna Arts and Science College, Coimbatore, India
----------------------------------------------------------------------------***-----------------------------------------------------------------------------
Abstract - Big data is a term that describes a large amount of
data that are generated from every digital and social media
exchange. Its size and rate of growth make it more complex to
maintain for this purpose Hadoop technology can be used.
Hadoop is an open source software project that enables the
distributed processing of big data sets across clusters of
commodity servers. It is planned to scale up from a single
server to thousands of machines, with a very high degree of
fault tolerance. Along with Hadoop, some techniques are also
used to handle the massive amount of data. Some commonly
used techniques are Map Reduce, HDFS, Apache Hive etc.
Key Words: Big data, Hadoop, HDFS, Map Reduce,
Hadoop Components
1. INTRODUCTION
Companies across the globe have been using data for a long
time to help them make better decisions in order to enhance
their performances. It is the inaugural decade of the 21st
century that actually showcased a rapid shift in the
availability of data and its applicability for improving the
overall effectiveness of the organization. This change that
was to revolutionize the role of data brought into advent the
concept that became popular as Big Data. Big data is a large
amount of data that floods from the daily uses of human life
like telephone, mobiles, traveling, shopping, and computer,
organization which evokes complication for storing,
processing or accessing data using a traditional database.
This is in all likelihood one of the important reasonswhythe
concept of Big data was first embraced by online firms like
Google, eBay, Facebook,Linkedin,etc.,astheseorganizations
were constructed with the foundation concept of using
rapidly changing data. They did not plausibly face the
challenge of integrating the raw and unstructured data with
the already available ones. Since HADOOP has emerged as a
popular tool for BIG DATA to analyze these data using map-
reduce to get the desired outputwiththehelpof programs.It
is a freely available platform to perform these type of
performance. The design purpose of it is to perform millions
of data in a single executive server. [1]
2. CHARACTERISTICS OF DATA
Fig -1: Four V’s of Big Data
 Volume
Volume refers to the amount of data. The size of the data is
represented as a volume. The size of the information is
represented in terabytes and petabytes.
 Variety
Variety forms the data too big. The files arrive in various
formats and of any type, it may be structured or
unstructured, such as text, audio, videos, log files and more.
 Velocity
Velocity refers to the speed of information processing. The
data adds up at high speed. Sometimes 1 minute is also late,
so big data are time sensitive.
 Value
The possible value of Big data is huge. Value is the main root
for big data because it is important for businesses, IT
infrastructure system to store a lot of values in the database.
 Veracity
Veracity refers to noise, biases, and abnormality when we
are handling with high volume, velocity, and a variety of
data. All data not going to be 100% correct, there will be
dirty data. [2]
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1112
3. TECHNOLOGY USED IN BIG DATA
For the determination of processing a large amount of data,
the big data require exceptional technologies. The various
techniques and technologies have been introduced for
manipulating, examining, and visualizing the big data. On
that point are many solutions to handle the Big Data, but the
Hadoop is one of the most widely used technologies.
3.1 HADOOP
Fig -2: Hadoop architecture
Hadoop is an open source project which is developed for
maintaining big data. This is implemented by Apache
Software Foundation. It consists of many small sub projects
which belong to the categoryofinfrastructurefordistributed
computing. Hadoop mainly consists of:
 File System (The Hadoop File System [HDFS]) -
Storage part
 Programming Paradigm (Map Reduce) - Processing
part
The other sub projects allow correlative services or they are
built along the core to add higher-level abstractions. There
exist several problems in dealing with storage of vast
amounts of data. Though the memorycapacities ofthedrives
have expanded massively, therateofreadingdata fromthem
hasn't shown that considerable progress. The reading
process takes a vast amount of time and the process of
writing is also slower. The time can be decreased by taking
from multiple disks at once. Just using one hundredth of a
disk may seem wasteful. Only if there are one hundred
datasets, each of which is one terabyte and providingshared
access to them is also a solution. There also exist many
problems with using several pieces of hardware as it
increases the chances of failure. Replication method that is
creating redundant copies of similar data on the different
devices so that if any failure occurs the copy of the data will
be available. The primary problem is combining the data
being read from different machines. There are so many
methods are able in distributed computing to handle this
problem, but even so, it is quite challenging. All the
complication discussed is well managed by Hadoop.Hadoop
Distributed File System identifies the failure problem and
the Map reduce programming paradigm identifies the
problem of combining data. Map Reduce diminishes the
complication of disk records and writes by giving a
programming model dealing in computation with keys and
values. [3]
4. HDFS (Hadoop Distributed File System)
Fig -3: HDFS system
HDFS is a file system which is constructed of storing a very
large number of files by using streaming data access rule by
running clusters on commodity hardware. HDFS manages
storage on the cluster by breaking incoming files into pieces
called blocks and string each block redundantly across the
pocket billiards on the server. HDFS stores, three copies of
each file by copying each piece to three different hosts. The
size of each block 64MB. HDFS is divided into three modes
which are Name node, Data Node, HDFS Clients/Edge Node.
 Name node
It is a centrally placed node, which holds information about
the Hadoop file system. The principal task of name node is
that it records all the metadata & attributes and specific
locations of files & data blocks in the data nodes. Name node
acts as the master node as it stores all the information
around the system and provides information which isnewly
added, modified and removed from data nodes.
 Data node
It functions as a slave node. Hadoop environment may
comprise more than one data node based on capacity and
performance. A data node performs twomaintasksstoringa
block in HDFS and acts as the political program for running
jobs.
 HDFS Clients/Edge node
HDFS Clients sometimes also know as Edgenode.It works as
a linker between name node and data nodes. Hadoopcluster
at that point is only one client, but there are also many
depending upon performance needs. [4]
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1113
5. MAP REDUCE
Fig -4: Map Reduce
Hadoop Map Reduce is an implementation of the algorithm
developed and managed by the Apache Hadoop project.Map
Reduce (map + reduce) Is a theoretical explanation for
writing applications that treat large amounts of structured
and unstructured data stored in the Hadoop Distributed File
System (HDFS). Map reduce is an instrument which is very
crucial to the analysis of the data which are structured or
unstructured. In which the data are splits in the minuscule
parts of the cluster to take the original form of the data. Map
reduce is 100 times quicker than the Spark and 1000 times
quicker than the Drill, now Drill is also working platform.
For EX: CPU is the mental capacity of computer likethatMap
reduction is the heart of the Hadoop. [5]
6. BIG DATA MAPPING USING MAP REDUCE
TECHNIQUE
 Get the big data ready
When a request is made by the client to run Map Reduce
program, the initial step is to search and understand about
the input file. The file pattern is a random type so it is
necessary to change data that can be processed by the data.
This process of input process and Record reader. Input
Format chooses how the file is going to be smashed into
smaller pieces for processing using a function called Input
Split. It then assigns a Record Reader to transform the raw
data for processing by the map. Dissimilar types of Record
Readers are supplied with Hadoop, offeringa widevarietyof
conversion options.
 Let the Big Data Map Begin
Client's information is now in a mode tolerable to map. For
all input pairs, a specific instance of a map is called to swear
out the data for each input pair. Map and reduce the need to
work together to maintain the data, the program should
collect output information from the separate mappers and
pass it to the reducers. This action can be performed by an
Output Collector. A Reporter function also present
information collected from map tasks. This entire project is
being done on multiple nodes in the Hadoop cluster
simultaneously. Later all the map tasks are complete, the
common results are collected in the partition and a shuffling
occurs, sorting the output for excellent processing by
reducing.
 Reduce and combine for big data
For each output pair, Reduce is called to do its task. In a
similar pattern to map, reduce collect its output while all the
mappings are processed. Reduce can't begin as far as all the
mapping is made out. The output of reducing is also a key
and a value. Hadoop provides an Output Format feature,and
it performs very much like Input Format. Output Format
takes the key-value pair and formulates the output for
writing to HDFS. The last task is performed to write the data
to HDFS. Record Writer is used for writing and also to
perform similarly as Record Reader exceptinreverse.Ituses
up the Output Format data and writes it to HDFS. [6]
7. HADOOP ECOSYSTEM
Fig -5: Components of Hadoop
7.1 DATA MANAGEMENT
 Ambari
It allows installation of services, so that Pig, Hive, scoop,
HBase can be picked and put in. It will work across all the
nodes in the cluster and also can manage the services from
one centralized location like starting up, stopping,
reconfiguring.
 Zookeeper
Zookeeper is a distributed coordination and setting up
service for Hadoop cluster It is a centralized service that
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1114
provides distributed synchronization and providing group
services and maintains the configuration information. With
Hadoop, this will be useful to track if a particular client is
down and plan necessary communication protocol around
node failure.
 Oozie
It is a workflow library that permits us to play and to
connect lots of those essential projects for instances, Pig,
Hive, and Sqoop.
7.2 DATA INTEGRATION
 Sqoop
Sqoop is a command-line interface platform which can be
employed to transfer the data from relational database
environments like Oracle, MySQL, and PostgreSQL into
Hadoop environment.
 Flume and Chukwa
Various applications, operating systems, web-services
generate plenty of log data, for processing these logs a
framework can be developed by using Flume and Chukwa
tools. It is a way to push real time data, information right
into Hadoop and also to execute real time analysis.
7.3 DATA ACCESS
 Pig
Pig is a platform for processing and maintaining data with
easy techniques. By just writing half dozen lines of code pig
can process terabytes of data.
 Hive
Hive is Data warehousing application that provides the SQL
interface to process and analyze the big data storedinHDFS.
Hive infrastructure is made along the top of Hadoop that
help in providing summarization, query, and analysis.
7.4 DATA STORAGE
 HBase
HBase is a distributed column oriented database where as
HDFS is a file system. Simply it is built on top of HDFS
system. It is completely written in Java, which is a non-
relational, distributed database system. It can suffice as the
input and output for the MapReduce. For example, read and
write operations affect all rows but just a small subset of all
columns.
 Cassandra
Apache Cassandra is a no SQLdatabaseformanaginga heavy
quantity of data across many commodity servers with high
availability and no single instance of failure.
7.5 INTERACTION, VISUALIZATION, EXECUTION,
DEVELOPMENT
 Hcatalog
It is known as metadata table and storage management
scheme. It is a way for tools like a Pig, Hive for interoperable
also to hold a consistent view of data across those tools.
 Lucene
Lucent is there for full text searching an API loaded with
algorithms to answer thingslikestandardfull-textsearching,
wild card searching, phrase searching, range searching kind
of stuff.
 Hama
Hama is there for BSP (Book Synchronous Processing). To
turn with a large amount of scientific data home is used.
 Crunch
Crunch is there for writing and testing and running map
reduces pipeline. It basically gives full control to overall four
phrases, which is going like 1. Map, 2. Reduce, 3. Shuffle, and
4. Combine. It is there to assist in joining and aggregation
that is very hard to do with low-level map reduce, so Crunch
is there to make you little easier inside map reduce pipeline.
7.6 DATA SERIALIZATION
 Avro
Avro is a data serialization format which brings data
interoperability among multiple components of Apache
Hadoop. Most of the components in Hadoop started
supporting the Avro data format. It starts with the basic
premise of data produced by component should be readily
consumed by another component Avro has following
features Rich data types, Fast and compact serialization,
Support many programming languages like Java, Python.
 Thrift
It is more specific for creating flexible schemes that work
with Hadoop data. It is specifically because it is intended for
cross-languagecompatibility,sowebuildanapplicationwith
Hadoop data in Java and if we want to use the same object in
an application that you built on Ruby, Python or C++.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1115
7.7 DATA INTELLIGENCE
 Drill
The drill is actually an incubator project and is designed to
perform interactive analysis of nested data.
 Mahout
Mahout is a library for gaining knowledge about machines
and data mining. It is divided into four different groups:
collective filtering, categorization, clustering, and mining of
parallel frequent patterns. TheMahoutlibrary belongsto the
subset that can be performed in a distributed mode and can
be executed by MapReduce. [7]
8. CHALLENGES
 Heterogeneity and Incompleteness
If we need to analyze the data, it should be structured, but
when we deal with the Big Data, data may be structured or
unstructured as well. Heterogeneity is the big challenge in
data Analysis and analysts need to cope with it. Consider an
example of a patient in the Hospital. We will arrive at each
disc for each medical test. And we will likewise create a
record for a hospital stay. This will be different for all
patients. This figure is not well structured. Hence it goes
with the team you manage heterogeneous and incompleteis
required. A secure data analysis should be applied to this.
 Scale
As the name says Big Data is having a large size of data sets.
Managing with large data sets has been a large problem for
teens. Earlier, this difficulty was solved by the processors
getting faster, but now data volumes are becoming hugeand
processors are static. The World is being given towards the
Cloud technology, due to this shifted data is generated at a
very high rate. This high rate of increasing data is becoming
a challenging problem with the data analysts. Hard disksare
used to store the Data. They are slowerI/Operformance.But
now Hard Disks are replaced by the solid state drives and
other applied sciences. These are not at a slower rate like
Hard disks, so the new storage scheme should be designed.
 Timeliness
Another challenge with the size is speed. If the data sets are
heavy in size, longer the time it will take to analyze it. Any
system which deals effectively with the size is likely to
execute well in term of speed. There are caseswhenwe need
the analysis results directly. For example, If there is any
fraud transaction, It should be dissected before the
transaction is completed. And then some newsystemshould
be designed to meet this challenge in data analysis.
 Privacy
Privacy of data is another big problem with large data. In
some nations, there are strict laws regarding the data
privacy, for example, in the USA there are strict laws for
health records, but for others, it is less forceful. Forexample,
in social media, we cannot fix the private posts of users for
sentiment analysis.
 Human Collaborations
In malice of the advanced computational models, there are
many patterns that a computer cannot detect. A novel
method of harnessing human ingenuity to solvea problemis
crowdsourcing. The best example is Wikipedia. We are
relying on the information given by the strangers, however,
most of the time they are correct. But there may be other
people who provide false information.Forhandlingthiskind
of complexity we require a technology model to cope with
this. As mankind, we can look the review of the book and
find that some are positive and some are negative and come
up with a decision whether to buy or not. We need systems
to be that intelligent to determine. [8]
9. OPPORTUNITIES
 Technology
Nearly every top organization like Facebook,IBM,Yahoo has
adopted Big Data and are investing in big data. Facebook
handles 50 Billion photos of users. Every month Google
handles 100 billion searches. From these stats, we can state
that there are a lot of opportunities oninternet,social media.
 Government
Big data can be applied to handle the problems faced by the
government. Obama government announced a big data
research and development initiative in 2012. Big data
analysis played an important role of BJP winning the
elections in 2014 and Indian government is putting on big
data analysis in Indian elections.
 Healthcare
Many health care organizations are adopting big data
techniques for deriving each and every information about
patients. To ameliorate the health care and low down the
cost big data analysis is required and certain technology
should be adapted.
 Science and Research
Big data is the latest topic of inquiry. Many researchers are
solving with big data. There are so many papers being
printed on big data. NASA center for climate simulation
stores 32 petabytes of observations.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1116
 Media
Media is using big data for the promotions and selling of
products by targeting the interest of the user on the net. For
instance, social media posts, data analysts get the number of
posts and then analyze the interest of the user. It can as well
be done by getting the positive or negative reviews on the
social media. [9]
10. CONCLUSION
Big Data (Hadoop) is in huge demand in the marketplace
nowadays. As there a huge amount of data is lying in the
industry, but there is no tool to wield it and Hadoop can
implement on low-cost hardware and can be used by a large
set of the audience on a large number of datasets. In Hadoop
map reduce is the most important component in Hadoop.
Many techniques are used to make the efficient plan for the
map reduce so that it can speed up the system or data
retrieval Technique like Quincy, Asynchronous Processing,
Speculative Execution, Job Awareness, Delay Scheduling,
Copy Compute Splitting had made the schedule effective for
faster processing. [10]
REFERENCES
[1] Bijesh Dhyani, Anurag Barthwal, “Big Data Analytics
using Hadoop”, International Journal of Computer
Applications (0975 – 8887) Volume 108 – No 12, December
2014
[2] Prof. Devendra P. Gadekar,HarshawardhanS.Bhosale,“A
Review Paper on Big Data and Hadoop”, International
Journal of Scientific and Research Publications [IJSRP],
Volume 4, Issue 10, October 2014 1 ISSN 2250-3153
[3] Ms. Gurpreet Kaur, Ms. Manpreet Kaur, “REVIEW PAPER
ON BIG DATA USING HADOOP”, International Journal of
Computer Engineering & Technology (IJCET), Volume 6,
Issue 12, Dec 2015, pp. 65-71, Article ID: IJCET_06_12_008
ISSN Print: 0976-6367 and ISSN Online: 0976–6375
[4] Konstantin Shvachko, Hairong Kuang, Sanjay Radia,
Robert Chandler, "The Hadoop Distributed File System",
October 2010, 978-1-4244-7153, IEEE
[5] Dr. Sanjay Srivastava, Swami Singh, Viplav Mandal, “The
Big Data analytics with Hadoop”, International Journal of
Research in Applied Science & Engineering Technology
(IJRASET), Volume 4 Issue III, March 2016, ISSN:2321-9653
[6] Varsha B. Bobade, “Survey Paper on Big Data and
Hadoop”, International Research Journal of Engineeringand
Technology (IRJET) e-ISSN: 2395-0056, Volume: 03, Issue:
01, Jan-2016
[7] Poonam S. Patil, Rajesh. N. Phursule, “Survey Paper on
Big Data Processing andHadoopComponents”, International
Journal of Science and Research (IJSR), Volume 3 Issue 10,
October 2014, ISSN (Online): 2319-7064
[8] Rahul Beakta, “Big Data And Hadoop: A Review Paper”,
RIEECE -2015, Volume 2, Spl. Issue 2 (2015), e-ISSN: 1694-
2329, p-ISSN: 1694-2345
[9] Ivanilton Polatoa, ReginaldoReb,AlfredoGoldman,Fabio
Kona, "A Comprehensive View of Hadoop Research - A
Systematic Literature Review", Journal of Network and
Computer Applications,Volume46,PP1-25,November2014
[10] Dr. Madhu Goel, Suman Arora, “Survey Paper on
Scheduling in Hadoop”, International Journal of Advanced
Research in Computer Science and Software Engineering
[IJARCSSE], Volume 4, Issue 5, May 2014, ISSN: 2277 128X

More Related Content

What's hot

Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An OverviewArvind Kalyan
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCEAM Publications,India
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET Journal
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkLaxmi8
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET Journal
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreTrendwise Analytics
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabatinabati
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopEdureka!
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in detailsMahmoud Yassin
 

What's hot (20)

Hadoop
HadoopHadoop
Hadoop
 
Big data
Big dataBig data
Big data
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big Data
Big DataBig Data
Big Data
 
Bigdata
Bigdata Bigdata
Bigdata
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Whatisbigdataandwhylearnhadoop
WhatisbigdataandwhylearnhadoopWhatisbigdataandwhylearnhadoop
Whatisbigdataandwhylearnhadoop
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 

Similar to Big Data with Hadoop – For Data Management, Processing and Storing

Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewIJERA Editor
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET Journal
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopIRJET Journal
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation Shivanee garg
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoopAnusha sweety
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
 
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET Journal
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET Journal
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...IRJET Journal
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeSysfore Technologies
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challengesijcisjournal
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformIRJET Journal
 

Similar to Big Data with Hadoop – For Data Management, Processing and Storing (20)

Big Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –ReviewBig Data and Big Data Management (BDM) with current Technologies –Review
Big Data and Big Data Management (BDM) with current Technologies –Review
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and ToolsIRJET- A Comparative Study on Big Data Analytics Approaches and Tools
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
 
Big data Hadoop presentation
Big data  Hadoop  presentation Big data  Hadoop  presentation
Big data Hadoop presentation
 
Big data with hadoop
Big data with hadoopBig data with hadoop
Big data with hadoop
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
 
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
Social Media Market Trender with Dache Manager Using Hadoop and Visualization...
 
Hadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | SysforeHadoop and Big Data Analytics | Sysfore
Hadoop and Big Data Analytics | Sysfore
 
A Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and ChallengesA Comprehensive Study on Big Data Applications and Challenges
A Comprehensive Study on Big Data Applications and Challenges
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
IJSRED-V2I3P43
IJSRED-V2I3P43IJSRED-V2I3P43
IJSRED-V2I3P43
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Big Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop PlatformBig Data Testing Using Hadoop Platform
Big Data Testing Using Hadoop Platform
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIkoyaldeepu123
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 

Recently uploaded (20)

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
EduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AIEduAI - E learning Platform integrated with AI
EduAI - E learning Platform integrated with AI
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 

Big Data with Hadoop – For Data Management, Processing and Storing

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1111 BIG DATA WITH HADOOP – FOR DATA MANAGEMENT, PROCESSING AND STORING Revathi.V1, Rakshitha.K.R2, Sruthi.K3, Guruprasaath.S4 1Assistant Professor, Department of BCA & M.Sc SS, Sri Krishna Arts and Science College, Coimbatore, India 2,3,4 III BCA ‘A’, Department of BCA & M.Sc SS, Sri Krishna Arts and Science College, Coimbatore, India ----------------------------------------------------------------------------***----------------------------------------------------------------------------- Abstract - Big data is a term that describes a large amount of data that are generated from every digital and social media exchange. Its size and rate of growth make it more complex to maintain for this purpose Hadoop technology can be used. Hadoop is an open source software project that enables the distributed processing of big data sets across clusters of commodity servers. It is planned to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Along with Hadoop, some techniques are also used to handle the massive amount of data. Some commonly used techniques are Map Reduce, HDFS, Apache Hive etc. Key Words: Big data, Hadoop, HDFS, Map Reduce, Hadoop Components 1. INTRODUCTION Companies across the globe have been using data for a long time to help them make better decisions in order to enhance their performances. It is the inaugural decade of the 21st century that actually showcased a rapid shift in the availability of data and its applicability for improving the overall effectiveness of the organization. This change that was to revolutionize the role of data brought into advent the concept that became popular as Big Data. Big data is a large amount of data that floods from the daily uses of human life like telephone, mobiles, traveling, shopping, and computer, organization which evokes complication for storing, processing or accessing data using a traditional database. This is in all likelihood one of the important reasonswhythe concept of Big data was first embraced by online firms like Google, eBay, Facebook,Linkedin,etc.,astheseorganizations were constructed with the foundation concept of using rapidly changing data. They did not plausibly face the challenge of integrating the raw and unstructured data with the already available ones. Since HADOOP has emerged as a popular tool for BIG DATA to analyze these data using map- reduce to get the desired outputwiththehelpof programs.It is a freely available platform to perform these type of performance. The design purpose of it is to perform millions of data in a single executive server. [1] 2. CHARACTERISTICS OF DATA Fig -1: Four V’s of Big Data  Volume Volume refers to the amount of data. The size of the data is represented as a volume. The size of the information is represented in terabytes and petabytes.  Variety Variety forms the data too big. The files arrive in various formats and of any type, it may be structured or unstructured, such as text, audio, videos, log files and more.  Velocity Velocity refers to the speed of information processing. The data adds up at high speed. Sometimes 1 minute is also late, so big data are time sensitive.  Value The possible value of Big data is huge. Value is the main root for big data because it is important for businesses, IT infrastructure system to store a lot of values in the database.  Veracity Veracity refers to noise, biases, and abnormality when we are handling with high volume, velocity, and a variety of data. All data not going to be 100% correct, there will be dirty data. [2]
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1112 3. TECHNOLOGY USED IN BIG DATA For the determination of processing a large amount of data, the big data require exceptional technologies. The various techniques and technologies have been introduced for manipulating, examining, and visualizing the big data. On that point are many solutions to handle the Big Data, but the Hadoop is one of the most widely used technologies. 3.1 HADOOP Fig -2: Hadoop architecture Hadoop is an open source project which is developed for maintaining big data. This is implemented by Apache Software Foundation. It consists of many small sub projects which belong to the categoryofinfrastructurefordistributed computing. Hadoop mainly consists of:  File System (The Hadoop File System [HDFS]) - Storage part  Programming Paradigm (Map Reduce) - Processing part The other sub projects allow correlative services or they are built along the core to add higher-level abstractions. There exist several problems in dealing with storage of vast amounts of data. Though the memorycapacities ofthedrives have expanded massively, therateofreadingdata fromthem hasn't shown that considerable progress. The reading process takes a vast amount of time and the process of writing is also slower. The time can be decreased by taking from multiple disks at once. Just using one hundredth of a disk may seem wasteful. Only if there are one hundred datasets, each of which is one terabyte and providingshared access to them is also a solution. There also exist many problems with using several pieces of hardware as it increases the chances of failure. Replication method that is creating redundant copies of similar data on the different devices so that if any failure occurs the copy of the data will be available. The primary problem is combining the data being read from different machines. There are so many methods are able in distributed computing to handle this problem, but even so, it is quite challenging. All the complication discussed is well managed by Hadoop.Hadoop Distributed File System identifies the failure problem and the Map reduce programming paradigm identifies the problem of combining data. Map Reduce diminishes the complication of disk records and writes by giving a programming model dealing in computation with keys and values. [3] 4. HDFS (Hadoop Distributed File System) Fig -3: HDFS system HDFS is a file system which is constructed of storing a very large number of files by using streaming data access rule by running clusters on commodity hardware. HDFS manages storage on the cluster by breaking incoming files into pieces called blocks and string each block redundantly across the pocket billiards on the server. HDFS stores, three copies of each file by copying each piece to three different hosts. The size of each block 64MB. HDFS is divided into three modes which are Name node, Data Node, HDFS Clients/Edge Node.  Name node It is a centrally placed node, which holds information about the Hadoop file system. The principal task of name node is that it records all the metadata & attributes and specific locations of files & data blocks in the data nodes. Name node acts as the master node as it stores all the information around the system and provides information which isnewly added, modified and removed from data nodes.  Data node It functions as a slave node. Hadoop environment may comprise more than one data node based on capacity and performance. A data node performs twomaintasksstoringa block in HDFS and acts as the political program for running jobs.  HDFS Clients/Edge node HDFS Clients sometimes also know as Edgenode.It works as a linker between name node and data nodes. Hadoopcluster at that point is only one client, but there are also many depending upon performance needs. [4]
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1113 5. MAP REDUCE Fig -4: Map Reduce Hadoop Map Reduce is an implementation of the algorithm developed and managed by the Apache Hadoop project.Map Reduce (map + reduce) Is a theoretical explanation for writing applications that treat large amounts of structured and unstructured data stored in the Hadoop Distributed File System (HDFS). Map reduce is an instrument which is very crucial to the analysis of the data which are structured or unstructured. In which the data are splits in the minuscule parts of the cluster to take the original form of the data. Map reduce is 100 times quicker than the Spark and 1000 times quicker than the Drill, now Drill is also working platform. For EX: CPU is the mental capacity of computer likethatMap reduction is the heart of the Hadoop. [5] 6. BIG DATA MAPPING USING MAP REDUCE TECHNIQUE  Get the big data ready When a request is made by the client to run Map Reduce program, the initial step is to search and understand about the input file. The file pattern is a random type so it is necessary to change data that can be processed by the data. This process of input process and Record reader. Input Format chooses how the file is going to be smashed into smaller pieces for processing using a function called Input Split. It then assigns a Record Reader to transform the raw data for processing by the map. Dissimilar types of Record Readers are supplied with Hadoop, offeringa widevarietyof conversion options.  Let the Big Data Map Begin Client's information is now in a mode tolerable to map. For all input pairs, a specific instance of a map is called to swear out the data for each input pair. Map and reduce the need to work together to maintain the data, the program should collect output information from the separate mappers and pass it to the reducers. This action can be performed by an Output Collector. A Reporter function also present information collected from map tasks. This entire project is being done on multiple nodes in the Hadoop cluster simultaneously. Later all the map tasks are complete, the common results are collected in the partition and a shuffling occurs, sorting the output for excellent processing by reducing.  Reduce and combine for big data For each output pair, Reduce is called to do its task. In a similar pattern to map, reduce collect its output while all the mappings are processed. Reduce can't begin as far as all the mapping is made out. The output of reducing is also a key and a value. Hadoop provides an Output Format feature,and it performs very much like Input Format. Output Format takes the key-value pair and formulates the output for writing to HDFS. The last task is performed to write the data to HDFS. Record Writer is used for writing and also to perform similarly as Record Reader exceptinreverse.Ituses up the Output Format data and writes it to HDFS. [6] 7. HADOOP ECOSYSTEM Fig -5: Components of Hadoop 7.1 DATA MANAGEMENT  Ambari It allows installation of services, so that Pig, Hive, scoop, HBase can be picked and put in. It will work across all the nodes in the cluster and also can manage the services from one centralized location like starting up, stopping, reconfiguring.  Zookeeper Zookeeper is a distributed coordination and setting up service for Hadoop cluster It is a centralized service that
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1114 provides distributed synchronization and providing group services and maintains the configuration information. With Hadoop, this will be useful to track if a particular client is down and plan necessary communication protocol around node failure.  Oozie It is a workflow library that permits us to play and to connect lots of those essential projects for instances, Pig, Hive, and Sqoop. 7.2 DATA INTEGRATION  Sqoop Sqoop is a command-line interface platform which can be employed to transfer the data from relational database environments like Oracle, MySQL, and PostgreSQL into Hadoop environment.  Flume and Chukwa Various applications, operating systems, web-services generate plenty of log data, for processing these logs a framework can be developed by using Flume and Chukwa tools. It is a way to push real time data, information right into Hadoop and also to execute real time analysis. 7.3 DATA ACCESS  Pig Pig is a platform for processing and maintaining data with easy techniques. By just writing half dozen lines of code pig can process terabytes of data.  Hive Hive is Data warehousing application that provides the SQL interface to process and analyze the big data storedinHDFS. Hive infrastructure is made along the top of Hadoop that help in providing summarization, query, and analysis. 7.4 DATA STORAGE  HBase HBase is a distributed column oriented database where as HDFS is a file system. Simply it is built on top of HDFS system. It is completely written in Java, which is a non- relational, distributed database system. It can suffice as the input and output for the MapReduce. For example, read and write operations affect all rows but just a small subset of all columns.  Cassandra Apache Cassandra is a no SQLdatabaseformanaginga heavy quantity of data across many commodity servers with high availability and no single instance of failure. 7.5 INTERACTION, VISUALIZATION, EXECUTION, DEVELOPMENT  Hcatalog It is known as metadata table and storage management scheme. It is a way for tools like a Pig, Hive for interoperable also to hold a consistent view of data across those tools.  Lucene Lucent is there for full text searching an API loaded with algorithms to answer thingslikestandardfull-textsearching, wild card searching, phrase searching, range searching kind of stuff.  Hama Hama is there for BSP (Book Synchronous Processing). To turn with a large amount of scientific data home is used.  Crunch Crunch is there for writing and testing and running map reduces pipeline. It basically gives full control to overall four phrases, which is going like 1. Map, 2. Reduce, 3. Shuffle, and 4. Combine. It is there to assist in joining and aggregation that is very hard to do with low-level map reduce, so Crunch is there to make you little easier inside map reduce pipeline. 7.6 DATA SERIALIZATION  Avro Avro is a data serialization format which brings data interoperability among multiple components of Apache Hadoop. Most of the components in Hadoop started supporting the Avro data format. It starts with the basic premise of data produced by component should be readily consumed by another component Avro has following features Rich data types, Fast and compact serialization, Support many programming languages like Java, Python.  Thrift It is more specific for creating flexible schemes that work with Hadoop data. It is specifically because it is intended for cross-languagecompatibility,sowebuildanapplicationwith Hadoop data in Java and if we want to use the same object in an application that you built on Ruby, Python or C++.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1115 7.7 DATA INTELLIGENCE  Drill The drill is actually an incubator project and is designed to perform interactive analysis of nested data.  Mahout Mahout is a library for gaining knowledge about machines and data mining. It is divided into four different groups: collective filtering, categorization, clustering, and mining of parallel frequent patterns. TheMahoutlibrary belongsto the subset that can be performed in a distributed mode and can be executed by MapReduce. [7] 8. CHALLENGES  Heterogeneity and Incompleteness If we need to analyze the data, it should be structured, but when we deal with the Big Data, data may be structured or unstructured as well. Heterogeneity is the big challenge in data Analysis and analysts need to cope with it. Consider an example of a patient in the Hospital. We will arrive at each disc for each medical test. And we will likewise create a record for a hospital stay. This will be different for all patients. This figure is not well structured. Hence it goes with the team you manage heterogeneous and incompleteis required. A secure data analysis should be applied to this.  Scale As the name says Big Data is having a large size of data sets. Managing with large data sets has been a large problem for teens. Earlier, this difficulty was solved by the processors getting faster, but now data volumes are becoming hugeand processors are static. The World is being given towards the Cloud technology, due to this shifted data is generated at a very high rate. This high rate of increasing data is becoming a challenging problem with the data analysts. Hard disksare used to store the Data. They are slowerI/Operformance.But now Hard Disks are replaced by the solid state drives and other applied sciences. These are not at a slower rate like Hard disks, so the new storage scheme should be designed.  Timeliness Another challenge with the size is speed. If the data sets are heavy in size, longer the time it will take to analyze it. Any system which deals effectively with the size is likely to execute well in term of speed. There are caseswhenwe need the analysis results directly. For example, If there is any fraud transaction, It should be dissected before the transaction is completed. And then some newsystemshould be designed to meet this challenge in data analysis.  Privacy Privacy of data is another big problem with large data. In some nations, there are strict laws regarding the data privacy, for example, in the USA there are strict laws for health records, but for others, it is less forceful. Forexample, in social media, we cannot fix the private posts of users for sentiment analysis.  Human Collaborations In malice of the advanced computational models, there are many patterns that a computer cannot detect. A novel method of harnessing human ingenuity to solvea problemis crowdsourcing. The best example is Wikipedia. We are relying on the information given by the strangers, however, most of the time they are correct. But there may be other people who provide false information.Forhandlingthiskind of complexity we require a technology model to cope with this. As mankind, we can look the review of the book and find that some are positive and some are negative and come up with a decision whether to buy or not. We need systems to be that intelligent to determine. [8] 9. OPPORTUNITIES  Technology Nearly every top organization like Facebook,IBM,Yahoo has adopted Big Data and are investing in big data. Facebook handles 50 Billion photos of users. Every month Google handles 100 billion searches. From these stats, we can state that there are a lot of opportunities oninternet,social media.  Government Big data can be applied to handle the problems faced by the government. Obama government announced a big data research and development initiative in 2012. Big data analysis played an important role of BJP winning the elections in 2014 and Indian government is putting on big data analysis in Indian elections.  Healthcare Many health care organizations are adopting big data techniques for deriving each and every information about patients. To ameliorate the health care and low down the cost big data analysis is required and certain technology should be adapted.  Science and Research Big data is the latest topic of inquiry. Many researchers are solving with big data. There are so many papers being printed on big data. NASA center for climate simulation stores 32 petabytes of observations.
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 08 | Aug -2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 1116  Media Media is using big data for the promotions and selling of products by targeting the interest of the user on the net. For instance, social media posts, data analysts get the number of posts and then analyze the interest of the user. It can as well be done by getting the positive or negative reviews on the social media. [9] 10. CONCLUSION Big Data (Hadoop) is in huge demand in the marketplace nowadays. As there a huge amount of data is lying in the industry, but there is no tool to wield it and Hadoop can implement on low-cost hardware and can be used by a large set of the audience on a large number of datasets. In Hadoop map reduce is the most important component in Hadoop. Many techniques are used to make the efficient plan for the map reduce so that it can speed up the system or data retrieval Technique like Quincy, Asynchronous Processing, Speculative Execution, Job Awareness, Delay Scheduling, Copy Compute Splitting had made the schedule effective for faster processing. [10] REFERENCES [1] Bijesh Dhyani, Anurag Barthwal, “Big Data Analytics using Hadoop”, International Journal of Computer Applications (0975 – 8887) Volume 108 – No 12, December 2014 [2] Prof. Devendra P. Gadekar,HarshawardhanS.Bhosale,“A Review Paper on Big Data and Hadoop”, International Journal of Scientific and Research Publications [IJSRP], Volume 4, Issue 10, October 2014 1 ISSN 2250-3153 [3] Ms. Gurpreet Kaur, Ms. Manpreet Kaur, “REVIEW PAPER ON BIG DATA USING HADOOP”, International Journal of Computer Engineering & Technology (IJCET), Volume 6, Issue 12, Dec 2015, pp. 65-71, Article ID: IJCET_06_12_008 ISSN Print: 0976-6367 and ISSN Online: 0976–6375 [4] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chandler, "The Hadoop Distributed File System", October 2010, 978-1-4244-7153, IEEE [5] Dr. Sanjay Srivastava, Swami Singh, Viplav Mandal, “The Big Data analytics with Hadoop”, International Journal of Research in Applied Science & Engineering Technology (IJRASET), Volume 4 Issue III, March 2016, ISSN:2321-9653 [6] Varsha B. Bobade, “Survey Paper on Big Data and Hadoop”, International Research Journal of Engineeringand Technology (IRJET) e-ISSN: 2395-0056, Volume: 03, Issue: 01, Jan-2016 [7] Poonam S. Patil, Rajesh. N. Phursule, “Survey Paper on Big Data Processing andHadoopComponents”, International Journal of Science and Research (IJSR), Volume 3 Issue 10, October 2014, ISSN (Online): 2319-7064 [8] Rahul Beakta, “Big Data And Hadoop: A Review Paper”, RIEECE -2015, Volume 2, Spl. Issue 2 (2015), e-ISSN: 1694- 2329, p-ISSN: 1694-2345 [9] Ivanilton Polatoa, ReginaldoReb,AlfredoGoldman,Fabio Kona, "A Comprehensive View of Hadoop Research - A Systematic Literature Review", Journal of Network and Computer Applications,Volume46,PP1-25,November2014 [10] Dr. Madhu Goel, Suman Arora, “Survey Paper on Scheduling in Hadoop”, International Journal of Advanced Research in Computer Science and Software Engineering [IJARCSSE], Volume 4, Issue 5, May 2014, ISSN: 2277 128X