SlideShare a Scribd company logo
1 of 3
Download to read offline
COMPUSOFT, An international journal of advanced computer technology, 4 (5), May-2015 (Volume-IV, Issue-V)
1714
EFFECT OF COUNTERS IN PERFORMANCE
OF HADOOP
Mrs. Preeti Jain, Ms. Juhi Kanungo
Department of Computer Science and Engineering
Acropolis Institute of Technology and Research, Indore
ABSTRACT: Recent technological advancements have led to an overflow of data from distinctive domains (e.g.,
health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems)
over the past two decades [1]. Big data is commonly unstructured, huge in volume and require more real-time
analysis. This paper discusses a Big Data problem from NCDC for huge volume of weather data collected from
various parts of world. We had generated map () and reduce () function for solving this problem and experimental
results of these applications on a Hadoop cluster are being discussed. In this paper, performance of above
application has been shown with respect to some counters available.
KEYWORDS: Big Data, Hadoop, Map Reduce, Hadoop Distributed File System (HDFS), CPU time spent, Size of
data, number of blocks, Heap Size, HDFS Bytes Read, Spilled Records.
I. INTRODUCTION
Big data analytics is the area where advanced
analytic Techniques operate on big data sets. It is
really about two things, Big data and analytics and
how the two have teamed up to Create one of the
most profound trends in business intelligence [2].
Map reduce is capable for analyzing large Distributed
data sets; but due to the heterogeneity, velocity and
volume of big data, it is a challenge for traditional
data analysis and management tools..Here we are
going to show the analysis of experiments done on
the Hadoop cluster. This study is done to find effect
of various counters on the execution time with
different data sizes. This study may also help to
understand the tuning of Hadoop cluster for better
performance [5].
II. PERFORMANCE ANALYSIS OF
HADOOP
Here we are going to show the analysis of
experiments done on the Hadoop cluster. This study
is done to find effect of following counters on the
execution time with different data sizes. This study
may also help to understand the tuning of Hadoop
cluster for better performance.
The counters that we had worked on in this study are:
1. Size of Data
2. Number of Blocks
3. Heap Size
4. HDFS Bytes Read
5. Spilled Records
6. CPU Time Spent
The values obtained from this analysis are as follows:
Table 1: Comparison of values of various counters on different data size
S.
no
Size
of
data
(MB)
No of
blocks
Heap
Size
(MB)
HDFS
bytes
read
Spilled
records
CPU
time
spent
1 49.7 6013 55.19 50257 10926 5860
2 50 6013 55.19 50550 10948 5810
3 59.8 6005 55.19 60481 13122 6960
4 60.3 6013 55.25 62018 13170 6960
5 61.6 6215 55.12 62331 13164 6620
6 62.1 6198 55.12 62758 13108 6950
7 62.9 6578 37.25 63572 13130 6880
8 63.4 5360 53.64 64122 13130 6810
9 70 6012 55.25 70870 15072 7700
10 70.5 6023 55.25 71293 15300 7530
11 71.5 6031 37.25 72331 15252 8140
ISSN:2320-0790
COMPUSOFT, An international journal of advanced computer technology, 4 (5), May-2015 (Volume-IV, Issue-V)
1715
III. COMPARISION OF CPU TIME SPENT,
SIZE OF DATA AND NO. OF BLOCKS
In this experiment we are changing the size of data
used to find the change in execution time of the
cluster. In case of Hadoop bigger is the data block
size, it is easier for HDFS to store information at
NameNode and the communication with NameNode
also becomes less [7].
Graph can be created on the above values as follows:
Figure 1: Comparision between CPU time Spent, no.
of blocks and size of data
In Figure 1 we had compared the values of CPU
Time Spent, Size of Data and No. of Blocks. with
variation in Size of Data. Study of this comparison
shows that if number of blocks increases than parallel
execution increases and as a result efficiency of the
system increases.
a. CPU TIME SPENT VS HEAP
SIZE
Map and Reduce task are java processes. JVM heap
size causes effect on the performance of Hadoop. As
studied in Figure 2,if the heap size is small, then it
may results in task failure because of “out of memoy”
error, or due to exccess garbage collection, it may
run slowly. If the size of heap is large system
resources may be wasted.
Figure 2: Graph between CPU time Spent Vs Heap
size
b. CPU TIME SPENT VS HDFS
BYTES READ
HDFS bytes read is the count of bytes read by the
mappers, when the HDFS job starts. This value
contains content of source file as well as the metadata
about splits.Figure 3 shows the difference in CPU
time spent with change in HDFS bytes read.
Figure 3: Graph between CPU time
Spent Vs Heap size
c. CPU TIME SPENT VS
SPILLED RECOREDS:
When output of map function is produced it is not
directly written to the disk. Map task have a circular
memory buffer where output is temperoraily saved.
Buffer size is 100MB by default and can be changed
by editing the io.sort.mb property [4]. When the
buffer gets filled upto a certain thershold size,
another thread gets started and it begins to spill the
contents to disk. If limitation regarding size of heap
occurs, then number of spills should be minimized
by changing io.sort.record.percent parameter values
[6]. Figure 4 shows graph that results in increase in
number of spilled records,with the increase in the
execution time.
Figure 4: Graph between CPU time Spent Vs Spiled
Records
COMPUSOFT, An international journal of advanced computer technology, 4 (5), May-2015 (Volume-IV, Issue-V)
1716
IV. CONCLUSION AND FUTURE SCOPE
Big data offers a prospect to dig out enormous
opportunities in new and emerging types of data.
Hadoop and other improved technologies make us
capable of storing, handling and analyzing structured,
unstructured and semi structured data, without much
effort [3].The HDFS has been designed to use
commodity hardware. this system is highly fault
tolerant and is designed with an aim to provide
deployment on low price hardware. Thus it provides
high throughput access to application data and is best
suitable for applications with large datasets [8].
Under the conclusion of this work, we can say that
performance of Hadoop for Big data analytics
problem does not vary much with the size of data.
With the results of implementation, the graph drawn
represents, the variation in time required to perform
the searching task over the data with assorted size
along with other counters like spilled records, HDFS
bytes read, Heap size, and number of blocks . By
observing the results of the implementation and the
graph based on the implementation, we can conclude
that Hadoop is an efficient system for handling and
analyzing big data.
FUTURE SCOPE
Though HDFS resolves the Big Data problem
efficiently but there is always scope for
improvements. Hadoop provides the Replication of
data to avoid data loss, thereby making the system
“Fault Tolerant”. The more number of copies are
created, the more the system becomes fault tolerant.
If a DataNode fails then map and reduce process of
that node is shifted to the other nodes carrying the
replicated data. But we have only one NameNodes
which contains the metadata of all the DataNodes,
and if this node gets crashes, then the processing
cannot takes place further. This weakness of Hadoop
is being resolved in Hadoop 2, by providing multiple
namespaces and multiple NameNodes in it.
REFRENCES
[1] Maurya and Mahajan, S., “Performance
analysis of MapReduce programs on
Hadoop cluster”, Information and
Communication Technologies (WICT),
2012 World Congress, ISBN:978-1-4673-
4806-5, Oct. 30 2012.
[2] A. Alexandrov, R. Bergmann, S. Ewen, J.C.
Freytag, F. Hueske, A. Heise, O. Kao, M.
Leich, U. Leser, V. M. Felix Naumann, M.
Peters, A. Rheinländer, M. J. Sax and S.
Schelter, “The Stratosphere Plateform for
Big Data Analytics”, The VLDB Journal,
Springer-Verlag Berlin Heidelberg, July
2013.
[3] Shilpa and Manjit Kaur, “Big Data
Visualization tool with Advancement of
Chalange” International Journal of
Advanced Research in Computer and
Software Engineering”, ISSN: 2277 128X,
Volume 4, Issue 3, March 2014.
[4] D. Borthakur, “The Hadoop Distributed File
System: Architecture and Design”.
[5] Han Hu, Yonggang Wen , Tat-Seng Chua
and Xuelong Li, “Toward Scalable Systems
for Big Data Analytics: A Technology
Tutorial”, ISSN :2169-3536, INSPEC
Accession Number: 14429402
[6] P. Kumar and Dr. V. S. Rathore, “Efficient
Capabilities of Processing of Big Data using
Hadoop Map Reduce”, International Journal
of Advanced Research in Computer and
Communication Engineering, Vol. 3, Issue
6, June 2014.
[7] Shilpa and Manjit Kaur, “Big Data
Visualization tool with Advancement of
Chalange” International Journal of
Advanced Research in Computer and
Software Engineering”, ISSN: 2277 128X,
Volume 4, Issue 3, March 2014.
[8] X. Zhang, “Simple example to demonstrate
how does the map reduce work”, June 2013.

More Related Content

What's hot

Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataeSAT Publishing House
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce cscpconf
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsbarbie0909
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)PyData
 
Improving performance of apriori algorithm using hadoop
Improving performance of apriori algorithm using hadoopImproving performance of apriori algorithm using hadoop
Improving performance of apriori algorithm using hadoopeSAT Journals
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
Map Reduce Workloads: A Dynamic Job Ordering and Slot ConfigurationsMap Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurationsdbpublications
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET Journal
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyShital Kat
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analyticsAvinash Pandu
 

What's hot (19)

Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
 
IJET-V2I6P25
IJET-V2I6P25IJET-V2I6P25
IJET-V2I6P25
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 
Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce Survey of Parallel Data Processing in Context with MapReduce
Survey of Parallel Data Processing in Context with MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
 
Improving performance of apriori algorithm using hadoop
Improving performance of apriori algorithm using hadoopImproving performance of apriori algorithm using hadoop
Improving performance of apriori algorithm using hadoop
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
Map Reduce Workloads: A Dynamic Job Ordering and Slot ConfigurationsMap Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
Map Reduce Workloads: A Dynamic Job Ordering and Slot Configurations
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
IRJET - Evaluating and Comparing the Two Variation with Current Scheduling Al...
 
Big data processing using - Hadoop Technology
Big data processing using - Hadoop TechnologyBig data processing using - Hadoop Technology
Big data processing using - Hadoop Technology
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
final report
final reportfinal report
final report
 

Similar to Effect of countries in performance of hadoop.

A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopIRJET Journal
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEijdpsjournal
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)ijdpsjournal
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Modeinventionjournals
 
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERLOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERijdpsjournal
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportAhmad El Tawil
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System cscpconf
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...IJECEIAES
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopIOSR Journals
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET Journal
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutionsaciijournal
 
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSBIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSaciijournal
 

Similar to Effect of countries in performance of hadoop. (20)

B017320612
B017320612B017320612
B017320612
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
A survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo dbA survey on data mining and analysis in hadoop and mongo db
A survey on data mining and analysis in hadoop and mongo db
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
Survey Paper on Big Data and Hadoop
Survey Paper on Big Data and HadoopSurvey Paper on Big Data and Hadoop
Survey Paper on Big Data and Hadoop
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
 
Introduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone ModeIntroduction to Big Data and Hadoop using Local Standalone Mode
Introduction to Big Data and Hadoop using Local Standalone Mode
 
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERLOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
 
Map reduce advantages over parallel databases report
Map reduce advantages over parallel databases reportMap reduce advantages over parallel databases report
Map reduce advantages over parallel databases report
 
Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System Design Issues and Challenges of Peer-to-Peer Video on Demand System
Design Issues and Challenges of Peer-to-Peer Video on Demand System
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
Big Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible SolutionsBig Data Summarization : Framework, Challenges and Possible Solutions
Big Data Summarization : Framework, Challenges and Possible Solutions
 
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONSBIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
BIG DATA SUMMARIZATION: FRAMEWORK, CHALLENGES AND POSSIBLE SOLUTIONS
 

More from Computer Science Journals

Applications of Fuzzy Logic in Image Processing – A Brief Study
Applications of Fuzzy Logic in Image Processing – A Brief StudyApplications of Fuzzy Logic in Image Processing – A Brief Study
Applications of Fuzzy Logic in Image Processing – A Brief StudyComputer Science Journals
 
Secure Data Sharing with ABE in Wireless Sensor Networks
Secure Data Sharing with ABE in Wireless Sensor NetworksSecure Data Sharing with ABE in Wireless Sensor Networks
Secure Data Sharing with ABE in Wireless Sensor NetworksComputer Science Journals
 
An ElGamal Encryption Scheme of Adjacency Matrix and Finite Machines
An ElGamal Encryption Scheme of Adjacency Matrix and Finite MachinesAn ElGamal Encryption Scheme of Adjacency Matrix and Finite Machines
An ElGamal Encryption Scheme of Adjacency Matrix and Finite MachinesComputer Science Journals
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
 
Performance Observation of Proactive and Reactive Routing Protocols with Incr...
Performance Observation of Proactive and Reactive Routing Protocols with Incr...Performance Observation of Proactive and Reactive Routing Protocols with Incr...
Performance Observation of Proactive and Reactive Routing Protocols with Incr...Computer Science Journals
 

More from Computer Science Journals (7)

Computer science journals
Computer science journalsComputer science journals
Computer science journals
 
Applications of Fuzzy Logic in Image Processing – A Brief Study
Applications of Fuzzy Logic in Image Processing – A Brief StudyApplications of Fuzzy Logic in Image Processing – A Brief Study
Applications of Fuzzy Logic in Image Processing – A Brief Study
 
Safe Trust Alert Routing in MANET
Safe Trust Alert Routing in MANETSafe Trust Alert Routing in MANET
Safe Trust Alert Routing in MANET
 
Secure Data Sharing with ABE in Wireless Sensor Networks
Secure Data Sharing with ABE in Wireless Sensor NetworksSecure Data Sharing with ABE in Wireless Sensor Networks
Secure Data Sharing with ABE in Wireless Sensor Networks
 
An ElGamal Encryption Scheme of Adjacency Matrix and Finite Machines
An ElGamal Encryption Scheme of Adjacency Matrix and Finite MachinesAn ElGamal Encryption Scheme of Adjacency Matrix and Finite Machines
An ElGamal Encryption Scheme of Adjacency Matrix and Finite Machines
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
 
Performance Observation of Proactive and Reactive Routing Protocols with Incr...
Performance Observation of Proactive and Reactive Routing Protocols with Incr...Performance Observation of Proactive and Reactive Routing Protocols with Incr...
Performance Observation of Proactive and Reactive Routing Protocols with Incr...
 

Recently uploaded

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 

Recently uploaded (20)

Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 

Effect of countries in performance of hadoop.

  • 1. COMPUSOFT, An international journal of advanced computer technology, 4 (5), May-2015 (Volume-IV, Issue-V) 1714 EFFECT OF COUNTERS IN PERFORMANCE OF HADOOP Mrs. Preeti Jain, Ms. Juhi Kanungo Department of Computer Science and Engineering Acropolis Institute of Technology and Research, Indore ABSTRACT: Recent technological advancements have led to an overflow of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades [1]. Big data is commonly unstructured, huge in volume and require more real-time analysis. This paper discusses a Big Data problem from NCDC for huge volume of weather data collected from various parts of world. We had generated map () and reduce () function for solving this problem and experimental results of these applications on a Hadoop cluster are being discussed. In this paper, performance of above application has been shown with respect to some counters available. KEYWORDS: Big Data, Hadoop, Map Reduce, Hadoop Distributed File System (HDFS), CPU time spent, Size of data, number of blocks, Heap Size, HDFS Bytes Read, Spilled Records. I. INTRODUCTION Big data analytics is the area where advanced analytic Techniques operate on big data sets. It is really about two things, Big data and analytics and how the two have teamed up to Create one of the most profound trends in business intelligence [2]. Map reduce is capable for analyzing large Distributed data sets; but due to the heterogeneity, velocity and volume of big data, it is a challenge for traditional data analysis and management tools..Here we are going to show the analysis of experiments done on the Hadoop cluster. This study is done to find effect of various counters on the execution time with different data sizes. This study may also help to understand the tuning of Hadoop cluster for better performance [5]. II. PERFORMANCE ANALYSIS OF HADOOP Here we are going to show the analysis of experiments done on the Hadoop cluster. This study is done to find effect of following counters on the execution time with different data sizes. This study may also help to understand the tuning of Hadoop cluster for better performance. The counters that we had worked on in this study are: 1. Size of Data 2. Number of Blocks 3. Heap Size 4. HDFS Bytes Read 5. Spilled Records 6. CPU Time Spent The values obtained from this analysis are as follows: Table 1: Comparison of values of various counters on different data size S. no Size of data (MB) No of blocks Heap Size (MB) HDFS bytes read Spilled records CPU time spent 1 49.7 6013 55.19 50257 10926 5860 2 50 6013 55.19 50550 10948 5810 3 59.8 6005 55.19 60481 13122 6960 4 60.3 6013 55.25 62018 13170 6960 5 61.6 6215 55.12 62331 13164 6620 6 62.1 6198 55.12 62758 13108 6950 7 62.9 6578 37.25 63572 13130 6880 8 63.4 5360 53.64 64122 13130 6810 9 70 6012 55.25 70870 15072 7700 10 70.5 6023 55.25 71293 15300 7530 11 71.5 6031 37.25 72331 15252 8140 ISSN:2320-0790
  • 2. COMPUSOFT, An international journal of advanced computer technology, 4 (5), May-2015 (Volume-IV, Issue-V) 1715 III. COMPARISION OF CPU TIME SPENT, SIZE OF DATA AND NO. OF BLOCKS In this experiment we are changing the size of data used to find the change in execution time of the cluster. In case of Hadoop bigger is the data block size, it is easier for HDFS to store information at NameNode and the communication with NameNode also becomes less [7]. Graph can be created on the above values as follows: Figure 1: Comparision between CPU time Spent, no. of blocks and size of data In Figure 1 we had compared the values of CPU Time Spent, Size of Data and No. of Blocks. with variation in Size of Data. Study of this comparison shows that if number of blocks increases than parallel execution increases and as a result efficiency of the system increases. a. CPU TIME SPENT VS HEAP SIZE Map and Reduce task are java processes. JVM heap size causes effect on the performance of Hadoop. As studied in Figure 2,if the heap size is small, then it may results in task failure because of “out of memoy” error, or due to exccess garbage collection, it may run slowly. If the size of heap is large system resources may be wasted. Figure 2: Graph between CPU time Spent Vs Heap size b. CPU TIME SPENT VS HDFS BYTES READ HDFS bytes read is the count of bytes read by the mappers, when the HDFS job starts. This value contains content of source file as well as the metadata about splits.Figure 3 shows the difference in CPU time spent with change in HDFS bytes read. Figure 3: Graph between CPU time Spent Vs Heap size c. CPU TIME SPENT VS SPILLED RECOREDS: When output of map function is produced it is not directly written to the disk. Map task have a circular memory buffer where output is temperoraily saved. Buffer size is 100MB by default and can be changed by editing the io.sort.mb property [4]. When the buffer gets filled upto a certain thershold size, another thread gets started and it begins to spill the contents to disk. If limitation regarding size of heap occurs, then number of spills should be minimized by changing io.sort.record.percent parameter values [6]. Figure 4 shows graph that results in increase in number of spilled records,with the increase in the execution time. Figure 4: Graph between CPU time Spent Vs Spiled Records
  • 3. COMPUSOFT, An international journal of advanced computer technology, 4 (5), May-2015 (Volume-IV, Issue-V) 1716 IV. CONCLUSION AND FUTURE SCOPE Big data offers a prospect to dig out enormous opportunities in new and emerging types of data. Hadoop and other improved technologies make us capable of storing, handling and analyzing structured, unstructured and semi structured data, without much effort [3].The HDFS has been designed to use commodity hardware. this system is highly fault tolerant and is designed with an aim to provide deployment on low price hardware. Thus it provides high throughput access to application data and is best suitable for applications with large datasets [8]. Under the conclusion of this work, we can say that performance of Hadoop for Big data analytics problem does not vary much with the size of data. With the results of implementation, the graph drawn represents, the variation in time required to perform the searching task over the data with assorted size along with other counters like spilled records, HDFS bytes read, Heap size, and number of blocks . By observing the results of the implementation and the graph based on the implementation, we can conclude that Hadoop is an efficient system for handling and analyzing big data. FUTURE SCOPE Though HDFS resolves the Big Data problem efficiently but there is always scope for improvements. Hadoop provides the Replication of data to avoid data loss, thereby making the system “Fault Tolerant”. The more number of copies are created, the more the system becomes fault tolerant. If a DataNode fails then map and reduce process of that node is shifted to the other nodes carrying the replicated data. But we have only one NameNodes which contains the metadata of all the DataNodes, and if this node gets crashes, then the processing cannot takes place further. This weakness of Hadoop is being resolved in Hadoop 2, by providing multiple namespaces and multiple NameNodes in it. REFRENCES [1] Maurya and Mahajan, S., “Performance analysis of MapReduce programs on Hadoop cluster”, Information and Communication Technologies (WICT), 2012 World Congress, ISBN:978-1-4673- 4806-5, Oct. 30 2012. [2] A. Alexandrov, R. Bergmann, S. Ewen, J.C. Freytag, F. Hueske, A. Heise, O. Kao, M. Leich, U. Leser, V. M. Felix Naumann, M. Peters, A. Rheinländer, M. J. Sax and S. Schelter, “The Stratosphere Plateform for Big Data Analytics”, The VLDB Journal, Springer-Verlag Berlin Heidelberg, July 2013. [3] Shilpa and Manjit Kaur, “Big Data Visualization tool with Advancement of Chalange” International Journal of Advanced Research in Computer and Software Engineering”, ISSN: 2277 128X, Volume 4, Issue 3, March 2014. [4] D. Borthakur, “The Hadoop Distributed File System: Architecture and Design”. [5] Han Hu, Yonggang Wen , Tat-Seng Chua and Xuelong Li, “Toward Scalable Systems for Big Data Analytics: A Technology Tutorial”, ISSN :2169-3536, INSPEC Accession Number: 14429402 [6] P. Kumar and Dr. V. S. Rathore, “Efficient Capabilities of Processing of Big Data using Hadoop Map Reduce”, International Journal of Advanced Research in Computer and Communication Engineering, Vol. 3, Issue 6, June 2014. [7] Shilpa and Manjit Kaur, “Big Data Visualization tool with Advancement of Chalange” International Journal of Advanced Research in Computer and Software Engineering”, ISSN: 2277 128X, Volume 4, Issue 3, March 2014. [8] X. Zhang, “Simple example to demonstrate how does the map reduce work”, June 2013.