SlideShare a Scribd company logo
1 of 4
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
Performance Improvement of Heterogeneous Hadoop Cluster Using
Ranking Algorithm
Shivani Soni1, Nidhi Singh2
1 M.tech Scholar, Dept. of Computer Science Engineering, L.N.C.T College, Bhopal (M.P), INDIA
2 Professor, Dept. of Computer Science Engineering, L.N.C.T College, Bhopal (M.P), INDIA
---------------------------------------------------------------------***---------------------------------------------------------------------
ABSTRACT: Enhancing technologies increases the use and
growth of information technology. As wecanseethegrowth of
data which is increasing rapidly in every single minute. This
exponential growth in data is one of the reason for the
generation of rapid data. Stored data is processed to extract
worth from inaccurate data to form a way for the paralleland
distributed processing for Hadoop. AllthenodesinHadoop are
assumed to be in homogeneous nature but it is not same as it
looks like, in cloud different configuration systems are used
which represents it logically. So data placement policy is used
to distribute data on the basis ofpowerofnode. Dynamicblock
placement strategy is used in Hadoop, this strategy work as
the distributing input data blocks to the nodes on the basis of
its computing capacity. The proposed approach balance and
reorganize the input data dynamically in accordance with
each node capability in a heterogeneous nature. Datatransfer
time is reduced in the proposed approach with the
improvement in performance. Block placement strategy, page
ranking algorithm andsamplingalgorithmstrategiesareused
in the proposed approach. The data placement strategy used
works as decreasing the execution time and improving the
performance of the clusters which are of heterogeneous
nature. Big data are handled using Hadoop. Small files are
handled using applications on Hadoop so that the issue of
performance can be reduced on the Hadoop platform. Better
performance improvement is shown in the proposed work.
Keywords: Hadoop; block placement strategy; page
ranking algorithm; sampling algorithm
1. INTRODUCTION
In the year 2006, Doug CuttingandMikeCafarella,developed
Hadoop as a framework which is an open source computing
and processing of large datasets in distributedenvironment.
System failure and data loss can be reduced in the proposed
work. Failure of any node does not matters because of the
availability of thousands of interconnected node.Thousands
of data and its frequent transfer is tackled among the nodes.
Big data is handled by the Hadoop.
Hadoop is widely known because of its increase in
popularity and handling of big data. To achieve better
performance several techniques are used.
1.1 Overview of Hadoop:
The Google Map Reduce programming model implemented
the open source framework called Apache Hadoop. Hadoop
distributed file system HDFS and Map Reduce model are the
two key parts of the Hadoop framework. HDFS stores
application data for Map Reduce and map Reduce process
the data using logics on map Reduce to store data on HDFS.
Versions available of Hadoop are Hadoop 1.x and Hadoop
2.x. YARN is the improvement feature of Hadoop 2.x. Node
Manager and Resource Manager are thetwocompositions of
YARN, where both works differently. For deploying and
managing resources, Resource Manager is responsible and
for managing and reporting data node status to resource
manager, Data node is responsible.
Figure 1. Hadoop Ecosystem
1.2 Big Data:
The collection of massive amount of data is called Big Data.
This data can be in any form either structured or
unstructured, relational ornon-relational.Unstructured data
is the audio, video, text, image or any different pattern of
data. In recent years, Big data has become very popular in
several different fields. This is a big opportunity in business
field. Large transmission and communication of data
generates large data from various sources. The need of data
mining algorithm is required to process big data. Earlier
production of large data is responsible due to corporate
world but in recent years users have become responsiblefor
its data.
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1339
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
1.3 Dynamic Block Placement Strategy:
Dynamic Block Placement Strategy works on two basis
Homogeneous cluster and Heterogeneous cluster.
Figure 2. Block Placement Strategy
Homogeneous Cluster:
Depending upon the availability of space in a cluster data is
distributed among the nodes in homogeneous cluster.
Hadoop has a feature of balancing the data, thisfunctionality
of balancing is called Balancer,whichbalancethedata before
running the applications. Whenever conditionsoccurlikeon
any node large data is accumulatedtheninthiscasebalancer
is an important functionality. Replication is an important
function, balancer is responsible for it to care for
replications. It is the key feature in data movement.
Figure 3. Homogeneous Cluster
Heterogeneous Cluster :
Data is transferred from one node to another node ideally in
an heterogeneous environment. The faster node faces
overheads while processing anddata transfer.Thisresultsin
exploring of the issues comes at the time of data transfer.
Data placement policy explores the arising consequences.
And this all occurs in the heterogeneous environment.
Implementation of data placement policy provides the
details of better goals.
Figure 4. Heterogeneous Cluster
2. RELATED WORK
Many of the author stated and researchedabouttheBigdata,
deal with them and also checks there functionality to work
on homogeneous and heterogeneous cluster of data.
Jeffrey Dean et al. In [1] described about Hadoop, Hadoop
process terabytes of data which is of large amount. The
Apache group written Hadoop using java technology. It
works as parallel processing to process large clusters.
Hadoop is attractive and open source framework. It process
the data and replicates it in reliable manner. It is designed in
a manner to run commodity cluster. Low cost low
performance working in parallel is preferred by commodity
computing. HDFS is an Apache project for Hadoop, it is
distributed, low-cost and have high fault-tolerance file
system. It is convenient for largedata andprovideswithhigh
throughput. Its deploymentisnotcostly.Single namenodein
each cluster maintains file system of Meta data and
application data is stored by multiple nodes. Map Reduce
analyzes large data and advantageous for various
organization. Machine learning, indexing, searching, mining
are some map Reduce applications. Traditional SQLareused
to implement these applications. Also helps in data
transformation,parallelization, network communicationand
handling fault tolerance.
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1340
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
Andrew Wang et al. In [2] explained about HDFS which is a
distributed file system. It stores files across cluster node
redundantly for the purpose of security. HDFS divides files
into blocks and replicates them depending on the factor of
replication. The block placement policy is the default in
HDFS, it works as distributing blocks across cluster node.
Many conditions like unnecessary load on cluster can be
possible at any time which results in reducing the overall
performance of cluster.
Konstantin Shvachko et al. In [3] proposed about block
placement which plays a vital role in performance and data
reliability terms. Reliability, availability and network
utilization are improved by this data block placement
strategy. At the time of creation of new block, the first
replicated block is assigned in first location of the block.And
the other replicates will be assigned randomly on different
nodes keeping in mind that only two replicates should be
placed in a single rack. Name node provides data node for
HDFS, which helps in reducing network traffic andimproves
performance.
Fang Zhou et al. In [4] describes that application master
generates input splits in a Hadoop map Reduce. One input
split is generated for a small file. One map container can use
only one input split, which explains that number of input
splits and number of map containers are equal, which is the
issue because it creates many map container for a small file.
If many containers are created then it require many
processes, resulting in many overheads. Similar overheads
are generated for reduce container
3. PROBLEM DOMAIN
Hadoop handles the big data, which is now become a great
deal to manage because of generation of data in every single
minute. Big data is also the opportunity for business. But
here we are looking forward on the issue of homogeneous
and heterogeneous cluster.
Homogeneous cluster where clusters nodes are of similar
form means they are homogeneous but loadallotedonevery
cluster are not similar. This overloading of any cluster
decreases the performance of cluster node and also reduces
overall performance of outcome.
Similar to heterogeneous cluster, where data nodes are of
different sizes and Hadoop works as distributing equal
amount of load on the nodes. In this case if one node of
larger size is experiencing low load then in that case that
particular node completes it work firstly but it waits for the
other node to complete their work because all these nodes
after processing computes on the result, so for it, it waits for
other nodes which results in reducing performance and
increasing computation time.
Problem in it comes as, there are number of cluster nodes of
different sizes like 2GB, 4GB, 6GB and 8GB. For example
these four nodes are taken and the master node here works
as dividing all the nodes with the number of nodes. Means
the master node will divide those nodes with four.
In this mitigation approach, we are working on the issue of
performance and computation time and will be achieved in
our solution domain.
4. SOLUTION DOMAIN
The solution provided to the above problem will be
described as using page ranking algorithm and sampling
algorithm.
Where page ranking algorithm works on the basis of
frequency. The one who is having better frequency will run
first. Page ranking is used for ranking purpose and on the
basis of frequency, ranking is allotted. Weightandfrequency
is calculated using page ranking algorithm and we used it in
work.
Figure 5. Page Ranking
Here, heterogeneous cluster is used and its performance is
increased using ranking algorithm which depends on the
frequency of occurrence. Whose frequency is maximum will
be run first.
And sampling algorithm works as randomly selecting the
nodes instead of mentioning all the possible samples.
Probability of selecting is the sum which is equal to the
sample size of data n. So, will result in increase in
performance with reducing computation time and
distributing the overall load in equal to all the data nodes.
5. RESULT ANALYSIS
Result analysis of the proposed work describes thatthetime
required in proposed work is less then the time required in
existing work. The below table shows the data of variant file
sizes at different time and time is taken in second.
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1341
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
TABLE 5.1. Comparison table of existing work and
proposed work
File Sizes Original System Proposed System
500 MB 249 sec 101 sec
1 GB 478 sec 226 sec
1.5 GB 633 sec 337 sec
2 GB 815 sec 462 sec
2.5 GB 1052 sec 560 sec
3 GB 1258 sec 652 sec
The below graph shows the graph representation of
proposed work for different file size at different time.
Figure 6. Graph representing the proposed work for
different file size
The below graph shows the comparison of existing work
with proposed work where the time taken for execution of
variant file sizes in proposed work is less in comparison to
existing work. Existing work requires more execution time.
Figure 7. Comparison graph
6. CONCLUSION
The mitigation approach concluded that the data nodes can
be of any size or of same size should not be overloaded and
each node should assign the equal amount ofloaddepending
on there size which will result in increasing performance.
This paper attempts to improve performance of
heterogeneous cluster in Hadoop using Ranking algorithm
which works on the basis of frequency,theone whoishaving
maximum frequency will be executed and run first. It
reduces the computation time and improving performance.
7. FUTUTRE WORK
Depending on the allotment of work load on every node this
issue of performance arises. In future implementation
performance can be more improved using different
techniques.
8. REFERENCE
1. Jeffrey Dean and Sanjay Ghemawat, Mapreduce:
simplified data processing on large clusters,
Communications of the ACM, vol. 51, no. 1, pp. 107113,
2008.
2. Andrew Wang. Better sorting in Network Topology
pseudo Sor tBy Distance when no local node is found.
https://issues.apache.org/jira/browse/HDFS-6268,
2014. [Accessed 28-April-2014].
3. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and
Robert Chansler, The hadoop distributed filesystem,in
Mass Storage Systems and Technologies (MSST), 2010
IEEE 26th Symposium on. IEEE, 2010, pp. 110.
4. Fang Zhou, Assessment of MultipleMapReduce Strategies
for Fast Analytics of Small Files, Ph.D. thesis, Auburn
University, 2015.
5. Fang Zhou, Hai Pham, Jianhui Yue, Hao Zou, and Weikuan
Yu, Sfmapreduce: An optimized mapreduceframework
for small files,” inNetworking,Architectureand Storage
(NAS), 2015 IEEE International Conference on. IEEE,
2015.
6. F. Ahmad, S. Chakradhar, A. Raghunathan, and T. N.
Vijaykumar, “Tarazu: optimizing mapreduce on
heterogeneous clusters,” in Proc. of Int’l Conf. on
Architecture Support for Programming Language and
Operating System (ASPLOS), 2012.
7. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I.
Stoica, “Improving mapreduce performance in
heterogeneous environments,” in Proc. of USENIX
Symposium on Operating System Design and
Implementation (OSDI), 2008
8. H. Herodotou and S. Babu, “Profiling, what-if analysis,
and cost based optimization of mapreduce programs,”
in Proc. Int’ Conf. on Very Large Data Bases (VLDB),
2011
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1342

More Related Content

What's hot

Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khanKamranKhan587
 
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Ashok Royal
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET Journal
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopIRJET Journal
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringBADR
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET Journal
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
What is Hadoop?
What is Hadoop?What is Hadoop?
What is Hadoop?cneudecker
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterParalyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterIOSR Journals
 
Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)Ashok Royal
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkeldariof
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce FrameworkEdureka!
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 

What's hot (20)

Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
Detailed presentation on big data hadoop +Hadoop Project Near Duplicate Detec...
 
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
IRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articlesIRJET- Systematic Review: Progression Study on BIG DATA articles
IRJET- Systematic Review: Progression Study on BIG DATA articles
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
What is Hadoop?
What is Hadoop?What is Hadoop?
What is Hadoop?
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop ClusterParalyzing Bioinformatics Applications Using Conducive Hadoop Cluster
Paralyzing Bioinformatics Applications Using Conducive Hadoop Cluster
 
Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)Presentation on Big Data Hadoop (Summer Training Demo)
Presentation on Big Data Hadoop (Summer Training Demo)
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
 
Hadoop technology doc
Hadoop technology docHadoop technology doc
Hadoop technology doc
 
Hadoop MapReduce Framework
Hadoop MapReduce FrameworkHadoop MapReduce Framework
Hadoop MapReduce Framework
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
IJARCCE_49
IJARCCE_49IJARCCE_49
IJARCCE_49
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 

Similar to IRJET Performance Improvement Hadoop Ranking Algorithm

IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET Journal
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...IRJET Journal
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Modeinventionjournals
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dipayan Dev
 
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
IRJET-  	  A Study of Comparatively Analysis for HDFS and Google File System ...IRJET-  	  A Study of Comparatively Analysis for HDFS and Google File System ...
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...IRJET Journal
 
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET Journal
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training reportSarvesh Meena
 
BDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfBDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfKUMARRISHAV37
 
Review on Big Data Security in Hadoop
Review on Big Data Security in HadoopReview on Big Data Security in Hadoop
Review on Big Data Security in HadoopIRJET Journal
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewIRJET Journal
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET Journal
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelEditor IJCATR
 

Similar to IRJET Performance Improvement Hadoop Ranking Algorithm (20)

IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
G017143640
G017143640G017143640
G017143640
 
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
 
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
IRJET-  	  A Study of Comparatively Analysis for HDFS and Google File System ...IRJET-  	  A Study of Comparatively Analysis for HDFS and Google File System ...
IRJET- A Study of Comparatively Analysis for HDFS and Google File System ...
 
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
hadoop seminar training report
hadoop seminar  training reporthadoop seminar  training report
hadoop seminar training report
 
BDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfBDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdf
 
Review on Big Data Security in Hadoop
Review on Big Data Security in HadoopReview on Big Data Security in Hadoop
Review on Big Data Security in Hadoop
 
Big Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A ReviewBig Data Processing with Hadoop : A Review
Big Data Processing with Hadoop : A Review
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

IRJET Performance Improvement Hadoop Ranking Algorithm

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 Performance Improvement of Heterogeneous Hadoop Cluster Using Ranking Algorithm Shivani Soni1, Nidhi Singh2 1 M.tech Scholar, Dept. of Computer Science Engineering, L.N.C.T College, Bhopal (M.P), INDIA 2 Professor, Dept. of Computer Science Engineering, L.N.C.T College, Bhopal (M.P), INDIA ---------------------------------------------------------------------***--------------------------------------------------------------------- ABSTRACT: Enhancing technologies increases the use and growth of information technology. As wecanseethegrowth of data which is increasing rapidly in every single minute. This exponential growth in data is one of the reason for the generation of rapid data. Stored data is processed to extract worth from inaccurate data to form a way for the paralleland distributed processing for Hadoop. AllthenodesinHadoop are assumed to be in homogeneous nature but it is not same as it looks like, in cloud different configuration systems are used which represents it logically. So data placement policy is used to distribute data on the basis ofpowerofnode. Dynamicblock placement strategy is used in Hadoop, this strategy work as the distributing input data blocks to the nodes on the basis of its computing capacity. The proposed approach balance and reorganize the input data dynamically in accordance with each node capability in a heterogeneous nature. Datatransfer time is reduced in the proposed approach with the improvement in performance. Block placement strategy, page ranking algorithm andsamplingalgorithmstrategiesareused in the proposed approach. The data placement strategy used works as decreasing the execution time and improving the performance of the clusters which are of heterogeneous nature. Big data are handled using Hadoop. Small files are handled using applications on Hadoop so that the issue of performance can be reduced on the Hadoop platform. Better performance improvement is shown in the proposed work. Keywords: Hadoop; block placement strategy; page ranking algorithm; sampling algorithm 1. INTRODUCTION In the year 2006, Doug CuttingandMikeCafarella,developed Hadoop as a framework which is an open source computing and processing of large datasets in distributedenvironment. System failure and data loss can be reduced in the proposed work. Failure of any node does not matters because of the availability of thousands of interconnected node.Thousands of data and its frequent transfer is tackled among the nodes. Big data is handled by the Hadoop. Hadoop is widely known because of its increase in popularity and handling of big data. To achieve better performance several techniques are used. 1.1 Overview of Hadoop: The Google Map Reduce programming model implemented the open source framework called Apache Hadoop. Hadoop distributed file system HDFS and Map Reduce model are the two key parts of the Hadoop framework. HDFS stores application data for Map Reduce and map Reduce process the data using logics on map Reduce to store data on HDFS. Versions available of Hadoop are Hadoop 1.x and Hadoop 2.x. YARN is the improvement feature of Hadoop 2.x. Node Manager and Resource Manager are thetwocompositions of YARN, where both works differently. For deploying and managing resources, Resource Manager is responsible and for managing and reporting data node status to resource manager, Data node is responsible. Figure 1. Hadoop Ecosystem 1.2 Big Data: The collection of massive amount of data is called Big Data. This data can be in any form either structured or unstructured, relational ornon-relational.Unstructured data is the audio, video, text, image or any different pattern of data. In recent years, Big data has become very popular in several different fields. This is a big opportunity in business field. Large transmission and communication of data generates large data from various sources. The need of data mining algorithm is required to process big data. Earlier production of large data is responsible due to corporate world but in recent years users have become responsiblefor its data. © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1339
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 1.3 Dynamic Block Placement Strategy: Dynamic Block Placement Strategy works on two basis Homogeneous cluster and Heterogeneous cluster. Figure 2. Block Placement Strategy Homogeneous Cluster: Depending upon the availability of space in a cluster data is distributed among the nodes in homogeneous cluster. Hadoop has a feature of balancing the data, thisfunctionality of balancing is called Balancer,whichbalancethedata before running the applications. Whenever conditionsoccurlikeon any node large data is accumulatedtheninthiscasebalancer is an important functionality. Replication is an important function, balancer is responsible for it to care for replications. It is the key feature in data movement. Figure 3. Homogeneous Cluster Heterogeneous Cluster : Data is transferred from one node to another node ideally in an heterogeneous environment. The faster node faces overheads while processing anddata transfer.Thisresultsin exploring of the issues comes at the time of data transfer. Data placement policy explores the arising consequences. And this all occurs in the heterogeneous environment. Implementation of data placement policy provides the details of better goals. Figure 4. Heterogeneous Cluster 2. RELATED WORK Many of the author stated and researchedabouttheBigdata, deal with them and also checks there functionality to work on homogeneous and heterogeneous cluster of data. Jeffrey Dean et al. In [1] described about Hadoop, Hadoop process terabytes of data which is of large amount. The Apache group written Hadoop using java technology. It works as parallel processing to process large clusters. Hadoop is attractive and open source framework. It process the data and replicates it in reliable manner. It is designed in a manner to run commodity cluster. Low cost low performance working in parallel is preferred by commodity computing. HDFS is an Apache project for Hadoop, it is distributed, low-cost and have high fault-tolerance file system. It is convenient for largedata andprovideswithhigh throughput. Its deploymentisnotcostly.Single namenodein each cluster maintains file system of Meta data and application data is stored by multiple nodes. Map Reduce analyzes large data and advantageous for various organization. Machine learning, indexing, searching, mining are some map Reduce applications. Traditional SQLareused to implement these applications. Also helps in data transformation,parallelization, network communicationand handling fault tolerance. © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1340
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 Andrew Wang et al. In [2] explained about HDFS which is a distributed file system. It stores files across cluster node redundantly for the purpose of security. HDFS divides files into blocks and replicates them depending on the factor of replication. The block placement policy is the default in HDFS, it works as distributing blocks across cluster node. Many conditions like unnecessary load on cluster can be possible at any time which results in reducing the overall performance of cluster. Konstantin Shvachko et al. In [3] proposed about block placement which plays a vital role in performance and data reliability terms. Reliability, availability and network utilization are improved by this data block placement strategy. At the time of creation of new block, the first replicated block is assigned in first location of the block.And the other replicates will be assigned randomly on different nodes keeping in mind that only two replicates should be placed in a single rack. Name node provides data node for HDFS, which helps in reducing network traffic andimproves performance. Fang Zhou et al. In [4] describes that application master generates input splits in a Hadoop map Reduce. One input split is generated for a small file. One map container can use only one input split, which explains that number of input splits and number of map containers are equal, which is the issue because it creates many map container for a small file. If many containers are created then it require many processes, resulting in many overheads. Similar overheads are generated for reduce container 3. PROBLEM DOMAIN Hadoop handles the big data, which is now become a great deal to manage because of generation of data in every single minute. Big data is also the opportunity for business. But here we are looking forward on the issue of homogeneous and heterogeneous cluster. Homogeneous cluster where clusters nodes are of similar form means they are homogeneous but loadallotedonevery cluster are not similar. This overloading of any cluster decreases the performance of cluster node and also reduces overall performance of outcome. Similar to heterogeneous cluster, where data nodes are of different sizes and Hadoop works as distributing equal amount of load on the nodes. In this case if one node of larger size is experiencing low load then in that case that particular node completes it work firstly but it waits for the other node to complete their work because all these nodes after processing computes on the result, so for it, it waits for other nodes which results in reducing performance and increasing computation time. Problem in it comes as, there are number of cluster nodes of different sizes like 2GB, 4GB, 6GB and 8GB. For example these four nodes are taken and the master node here works as dividing all the nodes with the number of nodes. Means the master node will divide those nodes with four. In this mitigation approach, we are working on the issue of performance and computation time and will be achieved in our solution domain. 4. SOLUTION DOMAIN The solution provided to the above problem will be described as using page ranking algorithm and sampling algorithm. Where page ranking algorithm works on the basis of frequency. The one who is having better frequency will run first. Page ranking is used for ranking purpose and on the basis of frequency, ranking is allotted. Weightandfrequency is calculated using page ranking algorithm and we used it in work. Figure 5. Page Ranking Here, heterogeneous cluster is used and its performance is increased using ranking algorithm which depends on the frequency of occurrence. Whose frequency is maximum will be run first. And sampling algorithm works as randomly selecting the nodes instead of mentioning all the possible samples. Probability of selecting is the sum which is equal to the sample size of data n. So, will result in increase in performance with reducing computation time and distributing the overall load in equal to all the data nodes. 5. RESULT ANALYSIS Result analysis of the proposed work describes thatthetime required in proposed work is less then the time required in existing work. The below table shows the data of variant file sizes at different time and time is taken in second. © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1341
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 TABLE 5.1. Comparison table of existing work and proposed work File Sizes Original System Proposed System 500 MB 249 sec 101 sec 1 GB 478 sec 226 sec 1.5 GB 633 sec 337 sec 2 GB 815 sec 462 sec 2.5 GB 1052 sec 560 sec 3 GB 1258 sec 652 sec The below graph shows the graph representation of proposed work for different file size at different time. Figure 6. Graph representing the proposed work for different file size The below graph shows the comparison of existing work with proposed work where the time taken for execution of variant file sizes in proposed work is less in comparison to existing work. Existing work requires more execution time. Figure 7. Comparison graph 6. CONCLUSION The mitigation approach concluded that the data nodes can be of any size or of same size should not be overloaded and each node should assign the equal amount ofloaddepending on there size which will result in increasing performance. This paper attempts to improve performance of heterogeneous cluster in Hadoop using Ranking algorithm which works on the basis of frequency,theone whoishaving maximum frequency will be executed and run first. It reduces the computation time and improving performance. 7. FUTUTRE WORK Depending on the allotment of work load on every node this issue of performance arises. In future implementation performance can be more improved using different techniques. 8. REFERENCE 1. Jeffrey Dean and Sanjay Ghemawat, Mapreduce: simplified data processing on large clusters, Communications of the ACM, vol. 51, no. 1, pp. 107113, 2008. 2. Andrew Wang. Better sorting in Network Topology pseudo Sor tBy Distance when no local node is found. https://issues.apache.org/jira/browse/HDFS-6268, 2014. [Accessed 28-April-2014]. 3. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler, The hadoop distributed filesystem,in Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. IEEE, 2010, pp. 110. 4. Fang Zhou, Assessment of MultipleMapReduce Strategies for Fast Analytics of Small Files, Ph.D. thesis, Auburn University, 2015. 5. Fang Zhou, Hai Pham, Jianhui Yue, Hao Zou, and Weikuan Yu, Sfmapreduce: An optimized mapreduceframework for small files,” inNetworking,Architectureand Storage (NAS), 2015 IEEE International Conference on. IEEE, 2015. 6. F. Ahmad, S. Chakradhar, A. Raghunathan, and T. N. Vijaykumar, “Tarazu: optimizing mapreduce on heterogeneous clusters,” in Proc. of Int’l Conf. on Architecture Support for Programming Language and Operating System (ASPLOS), 2012. 7. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving mapreduce performance in heterogeneous environments,” in Proc. of USENIX Symposium on Operating System Design and Implementation (OSDI), 2008 8. H. Herodotou and S. Babu, “Profiling, what-if analysis, and cost based optimization of mapreduce programs,” in Proc. Int’ Conf. on Very Large Data Bases (VLDB), 2011 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 1342