SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1864
Big Data-A Review Study with Comparitive Analysis of Hadoop
Himani Tyagi1
1Department of CSE, BSAITM, Faridabad, Hariyana
---------------------------------------------------------------------***----------------------------------------------------------------------
Abstract - Data generated in daily life by human being is very large, this complete data is called big data. The data in this form
is made useful for people by preprocessing it. To process this terabytes of data specialized hardwares and softwares are
needed. And this data is going to get increased day by day. Therefore, big data analysis is a challenging and current area of
research and development. The basic objective of this paper is to explore the potential impact, challenges, architectures, and
various tools associated with it. As a result, this article providesa platformto explore bigdata atnumerousstages.Additionally,
it opens a new horizon for researchers to develop the solution, based on the challenges and open research issues. A
comparative study of hadoop, spark is also shown.
Keywords- spark, hadoop, big data analysis, challenge.
1. INTRODUCTION
Big data is a wide term that covers the non traditional strategiesandtechnologiesusedtoprocess,organizeandgatherinsights
from large datasets . It is not shocking or new that to work with the data that is notinthecapabilityofa singlecomputersystem
was a tedious task. With the introduction of big data with hadoop, gave a lot of ease and flexibility to store this ‘big data’.
The data which follows following criteria is considered as big data,
1. Velocity:- it is defined as the speed of generation of data for example 500TB data per day can be considered as big
data.
2. Volume:- this is defined as the size of the data for example 500TB data .
3. Variety :- it re presents the variety of data. there are three categories of data that is structured, semi-structured and
unstructured. mostly the data is unstructured.
4. Veracity :-it can be defined as the truthfulness of data.
The most important feature of hadoop which makes it different from spark is hadoop works on batch processing and
spark works on stream processing.
Fig1:big data concept
There are two types of data in big data[2]:
1. Unstructured Data:. The data generated from the social medias like Facebook, twitter, LinkedIn, instagram, Google+ like
audios, videos etc.
2. Machine data: this data is generated from RFID chip readings and global positioning results(GPS).
3. Structured data: The data collected by the companies on their operations or the sales or the data stored in the form oftables
is called structured data. These data are structured because data has a defined length and format for big data. Examples of
structured data include numbers, dates, and groups of words and numbers called strings.
1. HADOOP COMPONENTS
Hadoop has two components HDFS and map reduce
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1865
A view of hadoop distributed file system(HDFS):-
The difference between regular file systems and HDFS is that in regular file systems each block of data is small,approximately
51 bytes and the problem for multiple seek because of large access for Input/Output unlike HDFS where each block of data is
large,64 MB and a single seek is required for reading huge data sequentially.
Fig2:distributed file system
The architecture of HDFS is considered as master/slave.it consists of a SingleNameNode,a masterserverthatmanagesthefile
system namespace and regulates access to files by clients. The data node is usually responsible for management ofthestorage
attached to the nodes. Due to the replication of data on different data nodes Hadoop distributed file system is called as highly
fault tolerant. The important feature of HDFS is that it is the storage system for the Map reduce jobs, both for inputandoutput.
Fig3:representation of data nodes
There are three modes of Hadoop configuration
1. standalone mode:-in this mode, all Hadoop services runs in a single JVM on a single machine.
2. pseudo-distributed mode:- in pseudo-distributed mode, each Hadoop runs on its own JVM,but on a single machine.
3.Fully Distributed mode:-in Hadoop distributed mode, Hadoop services runs on individual JVM, buttheseresideinseparate
commodity machine in single cluster.
Hadoop services- the main services of Hadoop involves
Data node:-data nodes in Hadoop distributed file system(HDFS) are the slaves[10] that is responsible for storing blocks of
data.
Name node:-It is the master node that is responsible for the managementofdata blocksthatresidesindata node.Itiscentrally
placed node, which contains information about Hadoop file system[10].
Secondary name node:-the secondary name node is a specially dedicated node in HDFS totakea checkpointofthe filesystem
metadata that is present in name node. It keeps track of the data that it is alive.it cannot be considered as the replacement for
name node but can be considered as the helper node. if the name node fails, the data can be recovered from secondary name
node’s logs.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1866
In Hadoop 1.0 map reducer was responsible for performing all the tasks like “job tracker”, allocation of resources so, what if
map reducer crashes so here comes YARN with Hadoop 2.0 that brings many flexibilities for map reducer and the allocation
and management of resources is been allocated to YARN and map reduces is supposed to do only processing of data.
YARN(yet another resource navigator) it is a resource management framework forschedulingandhandlingresourcerequests
from distributed applications and can be called as resource manager. YARN split the two major functionalities of job tracker
into two separate daemons :
1. Application manager
2. Application master
The application master runs the job and give results to applicationmanager.oneapplicationmasterwill beallocatedforone job
submission.it is responsible for containers and managingtheapplication.italsomanagestheresourcesrequiredforthe joband
give the report to the resource manager.
Fig 4:Hadoop cluster architecture
B. MAP REDUCE
The other component of hadoop apart from HDFS is map reduce. It is the programming model and an implementation for
processing and generating large data sets with parallel and distributed algorithms on a cluster. Map reduce has become a
ubiquitous framework for large-scale data processing [3].It is an initial ingestion and transformation step, where individual
input records can be processed in parallel. Reduce process is the aggregation or summation step, in which all associated
records must be processed together in a group. Task tracker keeps track of individual map tasks, and can run parallel. A map
reduce job runs on a particular task tracker slave node. Jobs of a mapreducer
Fig5:map reduce
The steps involved in map reducer job
1.input
2.split
3.map
4.shuffle
5.Reduction
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1867
split map reduction
fig6:steps involved in map reduce
2. THE HADOOP ECOSYSTEM
Hadoop is a free and open source software framework. Hadoop is a Programmingframework usedtosupportthe processingof
large data sets in a distributed computing environment. Hadoop was developed by Google’s MapReduce that is a software
framework where an application break down into various parts. The Current Appache Hadoop ecosystem consists of the
Hadoop Kernel, MapReduce, HDFS and numbers of various components like Apache Hive, Base and Zookeeper,HDFSandMap
Reduce.[1].The Hadoop architecture can now be concluded and the place of HDFS,Map Reduce andYARN canbeseeninthefig
there are two more tools in Hadoop ecosystem that are important are FLUME and SQOOP known as the data ingestion tools.
Fig7:hadoop architecture
Table1:Difference between traditional database and Hadoop
Traditional DBMS Hadoop
Processing Traditional RDBMScannotbe
used to process and store
large amount of data or big
data
but Hadoop has two main components HDFS that is
responsible for storage of big data and Map Reduce that is
responsible for processing large data by splitting it into
several blocks of data and then distributing these blocks
across the nodes on different machines.
Throughput throughput can be defined as
the total amount of data
processed in a particulartime
period so that the output is
maximum.RDBMS could not
achieve higher throughput in
any case when the data is
very large(big).
This is the main reason behind the success of Hadoop over
RDBMS.
Type of data
RDBMS can only be used for
either structured(data in
tabular form) or semi
structured data(e.g, JSON
data)
but because of the variety feature explained above in
hadoop, it can be used for either structured, semi-
structured or unstructured data.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1868
Fig8:spark context
3. COMPONENTS DESCRIPTION:
Spark context-the spark context is the driver program that is responsible for the creation of RDD(resilient Distributed
Datasets).The RDD represents a collection of items distributed across the cluster that can be manipulated in parallel. The
datasets gets converted into blocks of data but into same RDD. The blocks arecalledatomsofthedataset.Spark contextassigns
an executor to each worker node for instance py4J whose responsibility is to transform by default spark context session into
java-spark session for further processing of data. The transformation used most commonly are filter(),map(),join(),flatmap().
Cluster manager: the link between spark context and worker node is cluster manager. This manager works in three modes-
standaloneYARN,MESOS.
Worker nodes: These are the nodes which runs the application code in the cluster. Here two worker nodes represents two
machines. It is not recommended to run more than one worker node at a time.
Table2:The comparative analysis of Hadoop with spark
PARAMETERS MAP REDUCE SPARK
SPEED
Slow in speed as compared to spark
because of the storage and fetchingof
data in the following manner.
There are fast as the processing of data is done in
the following way. Thereisin disk andin_memory
transfer which is in general fast.
IMPLEMENTATION
LANGUAGES
DIFFICULTY LEVEL
JAVA,C++,PEARL,RUBY
It is difficult to code in map reduce for
data processing
PYTHON,R,PROGRAMMING
LANGUAGE,SCALA,JAVA,SQL
Because of the presence of high-level operators it
becomes easy to code in spark.
Fault Tolerance Not fault tolerant Spark is fault tolerant because the computations
on RDD are represented as a lineage graph, a
directed acyclic graph in spark.
4. CONCLUSION
We have entered in an era of billion or trillions of data that is also called Big Data. The paper describes the concept of Big Data
based on 3 Vs, stands for volume, velocity and variety of Big Data. The technology associated to deal with big data –Hadoop,
HDFS, Map Reduce. The challenges with Hadoop in comparision withspark.Thepaper alsodescribesHadoopecosystemwhich
is an open source software used for processing of Big Data. Hadoop is the need of today for processing and dealing with big
data.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072
© 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1869
5. REFERENCES
[1] International Journal of Scientific and Research Publications, Volume 4, Issue 10, October 2014 1 ISSN 2250-3153
www.ijsrp.org A Review Paper on Big Data and Hadoop Harshawardhan S. Bhosale1 , Prof. Devendra P. Gadekar.
[2] International Journal ofEngineeringTechnology,ManagementandApplied Scienceswww.ijetmas.comMarch 2017,Volume
5 Issue 3, ISSN 2349-4476 19 AnjanaRaviprolu The Big Data and Market Research Anjana Raviprolu, Dr.Lankapalli Bullayya.
[3] Jimmy Lin “MapReduce Is Good Enough?” The control project. IEEE Computer 32 (2013).
[4] Bernice Purcell “The emergence of “big data” technology and analytics” Journal of Technology Research 2013
[4] Kiran kumara Reddi & Dnvsl Indira “Different Technique to Transfer Big Data : survey” IEEE Transactions on 52(8)
(Aug.2013) 2348 { 2355}
[5] Kenn Slagter · Ching-Hsien Hsu “An improved partitioning mechanism for optimizing massive data analysis using
MapReduce” Published online: 11 April 2013
[6] Umasri.M.L, Shyamalagowri.D ,Suresh Kumar.S “Mining Big Data:-Currentstatusandforecasttothefuture”Volume4,Issue
1, January 2014 ISSN: 2277 128X
[7] S.Vikram Phaneendra, E.Madhusudhan Reddy,“Big Datasolutions for RDBMS problems- A survey”, In 12thIEEE/ IFIP
Network Operations & Management Symposium (NOMS 2010) (Osaka, Japan, Apr 19{23 2013).
[8] Kaisler, S., Armour, F., Espinosa, J. A., Money, W. (2013). Big Data: Issues and Challenges Moving Forward. International
Confrence on System Sciences (pp. 995-1004). Hawaii: IEEE Computer Soceity.
[9] Marr, B. (2013, November 13). The Awesome Ways Big Data is used Today to Change Our World. Retrieved November 14,
2013, from LinkedIn: https://www.linkedin.com/today /post/ article/20131113065157-64875646-the-awesome-ways-
bigdata-is-used-today-tochange-our-world.
[10] Varsha B.Bobade] Survey Paper on Big Data and Hadoop, International Research Journal of Engineering and Technology
(IRJET) e-ISSN: 2395-0056 Volume: 03 Issue: 01 | Jan-2016 www.irjet.net p-ISSN: 2395-0072
[11] Shilpa Manjit Kaur,” BIG Data and Methodology- A review” ,International Journal of Advanced Research in Computer
Science and Software Engineering, Volume 3, Issue 10, October 2013.
[12] D. P. Acharjya, S. Dehuri and S. Sanyal Computational Intelligence for Big Data Analysis, Springer International Publishing
AG, Switzerland, USA, ISBN 978-3-319-16597-4, 2015.
[13] Jyoti Kumari, Mr. Surender, Statically Analysis on Big Data Using Hadoop,IJCSMC, Vol. 6, Issue. 6, June 2017, pg.259 – 265
[14] C.L. Philip Chen, Chun-Yang Zhang, “Data intensive applications, challenges, techniques and technologies:AsurveyonBig
Data” Information Science 0020-0255 (2014), PP 341-347, elsevier.
[15] Katarina Grolinger At. Al.“Challenges for Map Reduce in Big Data”, IEEE (10th World Congress on Services)978-14799-
5069-0/14,PP 182-189.
[16]Gayathri Ravichandran ,Big Data Processing with Hadoop : A Review , International Research Journal of Engineering and
Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02,Feb -2017 .
[17]Abdelladim Hadioui!!", Nour-eddine El Faddouli, Yassine Benjelloun Touimi, and Samir Bennani Machine Learning Based
On Big Data Extraction of Massive Educational Knowledge,iJET ‒ Vol. 12, No. 11, 2017.
[18]Kache, F., Kache, F., Seuring, S., Seuring, S.,Challenges andopportunitiesofdigital informationattheintersectionof BigData
Analytics and supply chain management. International Journal of Operations & Production Management 37, 10–36,
(2017)https://doi.org/10.1108/IJOPM-02-2015-0078
[19]Zhou, L., Pan, S., Wang, J., Vasilakos, A.V., Machine Learning on Big Data: Opportunities and Challenges. Neurocomputing,
(2017).

More Related Content

What's hot

A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
IRJET Journal
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
cscpconf
 
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
Smith Kim
 
Big data
Big dataBig data
Big data
revathireddyb
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
rahulmonikasharma
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
dbpublications
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
ijsrd.com
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
Editor IJCATR
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
paperpublications3
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
Bhushan Kulkarni
 
IRJET - A Prognosis Approach for Stock Market Prediction based on Term Streak...
IRJET - A Prognosis Approach for Stock Market Prediction based on Term Streak...IRJET - A Prognosis Approach for Stock Market Prediction based on Term Streak...
IRJET - A Prognosis Approach for Stock Market Prediction based on Term Streak...
IRJET Journal
 
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIESSTUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
ijdpsjournal
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
SANTOSH WAYAL
 
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
ijdpsjournal
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET Journal
 
D04501036040
D04501036040D04501036040
D04501036040
ijceronline
 
Dremel
DremelDremel
Dremel
Anhua Xu
 

What's hot (17)

A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
 
Hdf5 intro
Hdf5 introHdf5 intro
Hdf5 intro
 
Big data
Big dataBig data
Big data
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
IRJET - A Prognosis Approach for Stock Market Prediction based on Term Streak...
IRJET - A Prognosis Approach for Stock Market Prediction based on Term Streak...IRJET - A Prognosis Approach for Stock Market Prediction based on Term Streak...
IRJET - A Prognosis Approach for Stock Market Prediction based on Term Streak...
 
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIESSTUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
STUDY ON EMERGING APPLICATIONS ON DATA PLANE AND OPTIMIZATION POSSIBILITIES
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
 
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
 
IRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOPIRJET - Survey Paper on Map Reduce Processing using HADOOP
IRJET - Survey Paper on Map Reduce Processing using HADOOP
 
D04501036040
D04501036040D04501036040
D04501036040
 
Dremel
DremelDremel
Dremel
 

Similar to IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop

Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
IRJET Journal
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
IRJET Journal
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET Journal
 
G017143640
G017143640G017143640
G017143640
IOSR Journals
 
Big data
Big dataBig data
Big data
revathireddyb
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
Aditi Yadav
 
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmPerformance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
IRJET Journal
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
 
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
IRJET-  	  A Study of Comparatively Analysis for HDFS and Google File System ...IRJET-  	  A Study of Comparatively Analysis for HDFS and Google File System ...
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
IRJET Journal
 
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
IRJET Journal
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Cognizant
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
AM Publications,India
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdfAn Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
April Knyff
 
BDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfBDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdf
KUMARRISHAV37
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
Cognizant
 
A NOVEL APPROACH FOR PROCESSING BIG DATA
A NOVEL APPROACH FOR PROCESSING BIG DATAA NOVEL APPROACH FOR PROCESSING BIG DATA
A NOVEL APPROACH FOR PROCESSING BIG DATA
ijdms
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
Sarvesh Meena
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
Sonal Tiwari
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
Mishika Bharadwaj
 

Similar to IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop (20)

Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
G017143640
G017143640G017143640
G017143640
 
Big data
Big dataBig data
Big data
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmPerformance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
 
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
IRJET-  	  A Study of Comparatively Analysis for HDFS and Google File System ...IRJET-  	  A Study of Comparatively Analysis for HDFS and Google File System ...
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
 
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
Performance Enhancement using Appropriate File Formats in Big Data Hadoop Eco...
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCESURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdfAn Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
An Analytical Study on Research Challenges and Issues in Big Data Analysis.pdf
 
BDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfBDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdf
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
A NOVEL APPROACH FOR PROCESSING BIG DATA
A NOVEL APPROACH FOR PROCESSING BIG DATAA NOVEL APPROACH FOR PROCESSING BIG DATA
A NOVEL APPROACH FOR PROCESSING BIG DATA
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
IRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
IRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
IRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
IRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
IRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
IRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
IRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
IRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

Digital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes completeDigital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes complete
shubhamsaraswat8740
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
nedcocy
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
Roger Rozario
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
ydzowc
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
pvpriya2
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
b0754201
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Balvir Singh
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
snaprevwdev
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Massimo Talia
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
DharmaBanothu
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
PreethaV16
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
Seetal Daas
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
harshapolam10
 
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUESAN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
drshikhapandey2022
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
felixwold
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
ElakkiaU
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Transcat
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 

Recently uploaded (20)

Digital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes completeDigital Image Processing Unit -2 Notes complete
Digital Image Processing Unit -2 Notes complete
 
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
一比一原版(爱大毕业证书)爱荷华大学毕业证如何办理
 
Transformers design and coooling methods
Transformers design and coooling methodsTransformers design and coooling methods
Transformers design and coooling methods
 
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
原版制作(Humboldt毕业证书)柏林大学毕业证学位证一模一样
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
 
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptxSENTIMENT ANALYSIS ON PPT AND Project template_.pptx
SENTIMENT ANALYSIS ON PPT AND Project template_.pptx
 
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdfSri Guru Hargobind Ji - Bandi Chor Guru.pdf
Sri Guru Hargobind Ji - Bandi Chor Guru.pdf
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
 
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
 
This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...This study Examines the Effectiveness of Talent Procurement through the Imple...
This study Examines the Effectiveness of Talent Procurement through the Imple...
 
Object Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOADObject Oriented Analysis and Design - OOAD
Object Oriented Analysis and Design - OOAD
 
Assistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdfAssistant Engineer (Chemical) Interview Questions.pdf
Assistant Engineer (Chemical) Interview Questions.pdf
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
SCALING OF MOS CIRCUITS m .pptx
SCALING OF MOS CIRCUITS m                 .pptxSCALING OF MOS CIRCUITS m                 .pptx
SCALING OF MOS CIRCUITS m .pptx
 
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUESAN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
 
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdfAsymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
Asymmetrical Repulsion Magnet Motor Ratio 6-7.pdf
 
An Introduction to the Compiler Designss
An Introduction to the Compiler DesignssAn Introduction to the Compiler Designss
An Introduction to the Compiler Designss
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 

IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1864 Big Data-A Review Study with Comparitive Analysis of Hadoop Himani Tyagi1 1Department of CSE, BSAITM, Faridabad, Hariyana ---------------------------------------------------------------------***---------------------------------------------------------------------- Abstract - Data generated in daily life by human being is very large, this complete data is called big data. The data in this form is made useful for people by preprocessing it. To process this terabytes of data specialized hardwares and softwares are needed. And this data is going to get increased day by day. Therefore, big data analysis is a challenging and current area of research and development. The basic objective of this paper is to explore the potential impact, challenges, architectures, and various tools associated with it. As a result, this article providesa platformto explore bigdata atnumerousstages.Additionally, it opens a new horizon for researchers to develop the solution, based on the challenges and open research issues. A comparative study of hadoop, spark is also shown. Keywords- spark, hadoop, big data analysis, challenge. 1. INTRODUCTION Big data is a wide term that covers the non traditional strategiesandtechnologiesusedtoprocess,organizeandgatherinsights from large datasets . It is not shocking or new that to work with the data that is notinthecapabilityofa singlecomputersystem was a tedious task. With the introduction of big data with hadoop, gave a lot of ease and flexibility to store this ‘big data’. The data which follows following criteria is considered as big data, 1. Velocity:- it is defined as the speed of generation of data for example 500TB data per day can be considered as big data. 2. Volume:- this is defined as the size of the data for example 500TB data . 3. Variety :- it re presents the variety of data. there are three categories of data that is structured, semi-structured and unstructured. mostly the data is unstructured. 4. Veracity :-it can be defined as the truthfulness of data. The most important feature of hadoop which makes it different from spark is hadoop works on batch processing and spark works on stream processing. Fig1:big data concept There are two types of data in big data[2]: 1. Unstructured Data:. The data generated from the social medias like Facebook, twitter, LinkedIn, instagram, Google+ like audios, videos etc. 2. Machine data: this data is generated from RFID chip readings and global positioning results(GPS). 3. Structured data: The data collected by the companies on their operations or the sales or the data stored in the form oftables is called structured data. These data are structured because data has a defined length and format for big data. Examples of structured data include numbers, dates, and groups of words and numbers called strings. 1. HADOOP COMPONENTS Hadoop has two components HDFS and map reduce
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1865 A view of hadoop distributed file system(HDFS):- The difference between regular file systems and HDFS is that in regular file systems each block of data is small,approximately 51 bytes and the problem for multiple seek because of large access for Input/Output unlike HDFS where each block of data is large,64 MB and a single seek is required for reading huge data sequentially. Fig2:distributed file system The architecture of HDFS is considered as master/slave.it consists of a SingleNameNode,a masterserverthatmanagesthefile system namespace and regulates access to files by clients. The data node is usually responsible for management ofthestorage attached to the nodes. Due to the replication of data on different data nodes Hadoop distributed file system is called as highly fault tolerant. The important feature of HDFS is that it is the storage system for the Map reduce jobs, both for inputandoutput. Fig3:representation of data nodes There are three modes of Hadoop configuration 1. standalone mode:-in this mode, all Hadoop services runs in a single JVM on a single machine. 2. pseudo-distributed mode:- in pseudo-distributed mode, each Hadoop runs on its own JVM,but on a single machine. 3.Fully Distributed mode:-in Hadoop distributed mode, Hadoop services runs on individual JVM, buttheseresideinseparate commodity machine in single cluster. Hadoop services- the main services of Hadoop involves Data node:-data nodes in Hadoop distributed file system(HDFS) are the slaves[10] that is responsible for storing blocks of data. Name node:-It is the master node that is responsible for the managementofdata blocksthatresidesindata node.Itiscentrally placed node, which contains information about Hadoop file system[10]. Secondary name node:-the secondary name node is a specially dedicated node in HDFS totakea checkpointofthe filesystem metadata that is present in name node. It keeps track of the data that it is alive.it cannot be considered as the replacement for name node but can be considered as the helper node. if the name node fails, the data can be recovered from secondary name node’s logs.
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1866 In Hadoop 1.0 map reducer was responsible for performing all the tasks like “job tracker”, allocation of resources so, what if map reducer crashes so here comes YARN with Hadoop 2.0 that brings many flexibilities for map reducer and the allocation and management of resources is been allocated to YARN and map reduces is supposed to do only processing of data. YARN(yet another resource navigator) it is a resource management framework forschedulingandhandlingresourcerequests from distributed applications and can be called as resource manager. YARN split the two major functionalities of job tracker into two separate daemons : 1. Application manager 2. Application master The application master runs the job and give results to applicationmanager.oneapplicationmasterwill beallocatedforone job submission.it is responsible for containers and managingtheapplication.italsomanagestheresourcesrequiredforthe joband give the report to the resource manager. Fig 4:Hadoop cluster architecture B. MAP REDUCE The other component of hadoop apart from HDFS is map reduce. It is the programming model and an implementation for processing and generating large data sets with parallel and distributed algorithms on a cluster. Map reduce has become a ubiquitous framework for large-scale data processing [3].It is an initial ingestion and transformation step, where individual input records can be processed in parallel. Reduce process is the aggregation or summation step, in which all associated records must be processed together in a group. Task tracker keeps track of individual map tasks, and can run parallel. A map reduce job runs on a particular task tracker slave node. Jobs of a mapreducer Fig5:map reduce The steps involved in map reducer job 1.input 2.split 3.map 4.shuffle 5.Reduction
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1867 split map reduction fig6:steps involved in map reduce 2. THE HADOOP ECOSYSTEM Hadoop is a free and open source software framework. Hadoop is a Programmingframework usedtosupportthe processingof large data sets in a distributed computing environment. Hadoop was developed by Google’s MapReduce that is a software framework where an application break down into various parts. The Current Appache Hadoop ecosystem consists of the Hadoop Kernel, MapReduce, HDFS and numbers of various components like Apache Hive, Base and Zookeeper,HDFSandMap Reduce.[1].The Hadoop architecture can now be concluded and the place of HDFS,Map Reduce andYARN canbeseeninthefig there are two more tools in Hadoop ecosystem that are important are FLUME and SQOOP known as the data ingestion tools. Fig7:hadoop architecture Table1:Difference between traditional database and Hadoop Traditional DBMS Hadoop Processing Traditional RDBMScannotbe used to process and store large amount of data or big data but Hadoop has two main components HDFS that is responsible for storage of big data and Map Reduce that is responsible for processing large data by splitting it into several blocks of data and then distributing these blocks across the nodes on different machines. Throughput throughput can be defined as the total amount of data processed in a particulartime period so that the output is maximum.RDBMS could not achieve higher throughput in any case when the data is very large(big). This is the main reason behind the success of Hadoop over RDBMS. Type of data RDBMS can only be used for either structured(data in tabular form) or semi structured data(e.g, JSON data) but because of the variety feature explained above in hadoop, it can be used for either structured, semi- structured or unstructured data.
  • 5. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1868 Fig8:spark context 3. COMPONENTS DESCRIPTION: Spark context-the spark context is the driver program that is responsible for the creation of RDD(resilient Distributed Datasets).The RDD represents a collection of items distributed across the cluster that can be manipulated in parallel. The datasets gets converted into blocks of data but into same RDD. The blocks arecalledatomsofthedataset.Spark contextassigns an executor to each worker node for instance py4J whose responsibility is to transform by default spark context session into java-spark session for further processing of data. The transformation used most commonly are filter(),map(),join(),flatmap(). Cluster manager: the link between spark context and worker node is cluster manager. This manager works in three modes- standaloneYARN,MESOS. Worker nodes: These are the nodes which runs the application code in the cluster. Here two worker nodes represents two machines. It is not recommended to run more than one worker node at a time. Table2:The comparative analysis of Hadoop with spark PARAMETERS MAP REDUCE SPARK SPEED Slow in speed as compared to spark because of the storage and fetchingof data in the following manner. There are fast as the processing of data is done in the following way. Thereisin disk andin_memory transfer which is in general fast. IMPLEMENTATION LANGUAGES DIFFICULTY LEVEL JAVA,C++,PEARL,RUBY It is difficult to code in map reduce for data processing PYTHON,R,PROGRAMMING LANGUAGE,SCALA,JAVA,SQL Because of the presence of high-level operators it becomes easy to code in spark. Fault Tolerance Not fault tolerant Spark is fault tolerant because the computations on RDD are represented as a lineage graph, a directed acyclic graph in spark. 4. CONCLUSION We have entered in an era of billion or trillions of data that is also called Big Data. The paper describes the concept of Big Data based on 3 Vs, stands for volume, velocity and variety of Big Data. The technology associated to deal with big data –Hadoop, HDFS, Map Reduce. The challenges with Hadoop in comparision withspark.Thepaper alsodescribesHadoopecosystemwhich is an open source software used for processing of Big Data. Hadoop is the need of today for processing and dealing with big data.
  • 6. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 05 Issue: 11 | Nov 2018 www.irjet.net p-ISSN: 2395-0072 © 2018, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1869 5. REFERENCES [1] International Journal of Scientific and Research Publications, Volume 4, Issue 10, October 2014 1 ISSN 2250-3153 www.ijsrp.org A Review Paper on Big Data and Hadoop Harshawardhan S. Bhosale1 , Prof. Devendra P. Gadekar. [2] International Journal ofEngineeringTechnology,ManagementandApplied Scienceswww.ijetmas.comMarch 2017,Volume 5 Issue 3, ISSN 2349-4476 19 AnjanaRaviprolu The Big Data and Market Research Anjana Raviprolu, Dr.Lankapalli Bullayya. [3] Jimmy Lin “MapReduce Is Good Enough?” The control project. IEEE Computer 32 (2013). [4] Bernice Purcell “The emergence of “big data” technology and analytics” Journal of Technology Research 2013 [4] Kiran kumara Reddi & Dnvsl Indira “Different Technique to Transfer Big Data : survey” IEEE Transactions on 52(8) (Aug.2013) 2348 { 2355} [5] Kenn Slagter · Ching-Hsien Hsu “An improved partitioning mechanism for optimizing massive data analysis using MapReduce” Published online: 11 April 2013 [6] Umasri.M.L, Shyamalagowri.D ,Suresh Kumar.S “Mining Big Data:-Currentstatusandforecasttothefuture”Volume4,Issue 1, January 2014 ISSN: 2277 128X [7] S.Vikram Phaneendra, E.Madhusudhan Reddy,“Big Datasolutions for RDBMS problems- A survey”, In 12thIEEE/ IFIP Network Operations & Management Symposium (NOMS 2010) (Osaka, Japan, Apr 19{23 2013). [8] Kaisler, S., Armour, F., Espinosa, J. A., Money, W. (2013). Big Data: Issues and Challenges Moving Forward. International Confrence on System Sciences (pp. 995-1004). Hawaii: IEEE Computer Soceity. [9] Marr, B. (2013, November 13). The Awesome Ways Big Data is used Today to Change Our World. Retrieved November 14, 2013, from LinkedIn: https://www.linkedin.com/today /post/ article/20131113065157-64875646-the-awesome-ways- bigdata-is-used-today-tochange-our-world. [10] Varsha B.Bobade] Survey Paper on Big Data and Hadoop, International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 03 Issue: 01 | Jan-2016 www.irjet.net p-ISSN: 2395-0072 [11] Shilpa Manjit Kaur,” BIG Data and Methodology- A review” ,International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 10, October 2013. [12] D. P. Acharjya, S. Dehuri and S. Sanyal Computational Intelligence for Big Data Analysis, Springer International Publishing AG, Switzerland, USA, ISBN 978-3-319-16597-4, 2015. [13] Jyoti Kumari, Mr. Surender, Statically Analysis on Big Data Using Hadoop,IJCSMC, Vol. 6, Issue. 6, June 2017, pg.259 – 265 [14] C.L. Philip Chen, Chun-Yang Zhang, “Data intensive applications, challenges, techniques and technologies:AsurveyonBig Data” Information Science 0020-0255 (2014), PP 341-347, elsevier. [15] Katarina Grolinger At. Al.“Challenges for Map Reduce in Big Data”, IEEE (10th World Congress on Services)978-14799- 5069-0/14,PP 182-189. [16]Gayathri Ravichandran ,Big Data Processing with Hadoop : A Review , International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056 Volume: 04 Issue: 02,Feb -2017 . [17]Abdelladim Hadioui!!", Nour-eddine El Faddouli, Yassine Benjelloun Touimi, and Samir Bennani Machine Learning Based On Big Data Extraction of Massive Educational Knowledge,iJET ‒ Vol. 12, No. 11, 2017. [18]Kache, F., Kache, F., Seuring, S., Seuring, S.,Challenges andopportunitiesofdigital informationattheintersectionof BigData Analytics and supply chain management. International Journal of Operations & Production Management 37, 10–36, (2017)https://doi.org/10.1108/IJOPM-02-2015-0078 [19]Zhou, L., Pan, S., Wang, J., Vasilakos, A.V., Machine Learning on Big Data: Opportunities and Challenges. Neurocomputing, (2017).