SlideShare a Scribd company logo
1 of 49
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
“Jnana Sangama” Belagavi 590018
A Project Report on
“PROJECT TITLE” (in caps)
Submitted in partial fulfillment for the award of degree of Bachelor of Engineering
in Computer Science & Engineering during the academic year 2016-20.
By
Student Name USN
Student Name USN
Student Name USN
Student Name USN
Under the guidance of
Guide Name
Assistant Professor
Dept. of CS&E
MRIT, Mandya
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
MYSURU ROYAL INSTITUTE OF TECHNOLOGY, MANDYA
2019 - 2020
VISVESVARAYA TECHNOLOGICAL UNIVERSITY
Mysuru Royal Institute of Technology, Mandya – 571606
2019-2020
Department of Computer Science & Engineering
CERTIFICATE
This is to certify that the project work entitled “TITLE” is a bonafide
work carried out by name (usn), in partial fulfillment for the award of Bachelor of
Engineering in Computer Science and Engineering of the Visvesvaraya
Technological University, Belagavi, Karnataka during the year 2019-2020. It is certified
that all corrections/suggestions indicated for the Internal Assessment have been
incorporated in the report. The project report has been approved as it satisfies the
academic requirements in respect of project work prescribed for the Bachelor of
Engineering degree.
------------------------------------------ -----------------------------------------
Signature of Internal Guide Signature of Project Coordinator
Prof. Guide Name Prof. Chethan Raj C
Asst. Professor Asst. Professor
Dept. of CS&E Dept. of CS&E
MRIT, Mandya MRIT, Mandya
------------------------------------------ -----------------------------------------
Signature of HOD Signature of Principal
Prof. Soumya B Dr. Suresh Chandra
Asst. Professor Principal,
Dept. of CS&E MRIT, Mandya
MRIT, Mandya
EXTERNAL VIVA
Name of the Examiner Signature with date
1.
2.
Mysuru Royal Institute of Technology,
Mandya – 571606
Department of Computer Science and Engineering
DECLARATION
I Student Name, studying in the Eigth semester BE, Computer Science and
Engineering, Mysuru Royal Institute of Technology, Mandya, hereby declare that the
project work entitled “----------TITLE------------------” has been carried out independently
under the guidance of Guide Name, Asst. professor, Department of Computer Science
and Engineering, Mysuru Royal Institute of Technology, Mandya. This project work is
submitted to the Visvesvaraya Technological University, Belagavi in the partial
fulfillment of required for the award of degree in Bachelor of Engineering during the
academic year 2016-2020.
This dissertation has not been submitted previously for the award of any other degree or
diploma to any other institution or university.
Date:
Place:
____Sign_________________
Name
(USN)
I
ACKNOWLEDGEMENT
Happiness cannot be expressed by words and help taken cannot be left without
thanking. I would like to thank all of them who were a part of my project work.
I am thankful to our principal, Dr. Suresh Chandra H S, MRIT Mandya for all
the facilities provided to us in the college.
I would like to convey my sincere thanks to Prof. Soumya B, Head of the
Department, Dept. of Computer Science and Engineering, MRIT.
I am especially thankful to Prof. Chethan Raj C, Project Coordinator, Dept. of
Computer Science and Engineering, MRIT, for his whole hearted encouragement and
individual guidance in carrying out this project.
I express my deep profound gratitude to Prof. Guide Name, Assistant Professor,
Dept. of Computer Science and Engineering, MRIT who has been my guide and
guiding in my endeavor to complete this project successfully.
My profound thanks to all my lecturers for extending their kind co-operation and
help during this project work.
I would like to express my deepest gratitude to my family members, for their
support and love.
Finally, I would like to thank all my friends, who all made invaluable
contributions to my work.
Thanking You
Student Name
(USN)
II
ABSTRACT
A Wireless Sensor Network (WSN) is a collection of small tiny devices which
have computational processing ability, wireless receiver, transmitter technology and a
power supply. Energy consumption in the sensor node is for the sensing, communication
and data processing. More energy is required for data communication in sensor node.
Among wireless communication systems, WSN is the most popularly used network and it
consists of spatially distributed sensor nodes with sensing computation and wireless
communication capabilities. These sensor nodes are scattered in an unattended
environment (i.e. sensing field) to sense the physical world. It is very costly to deploy a
complete test bed containing multiple networked computers, to validate and verify a
certain network protocol or a specific network algorithm. The network simulator saves
both money and time in accomplishing this task. In this project, we introduce two metrics
called signal strength indicator and desired distance estimator to find optimal and reliable
path between source and destination.
III
Table of Contents
CONTENTS PAGE NO.
ACKNOWLEDGEMENT I
ABSTRACT II
LIST OF CONTENTS III
LIST OF FIGURES VI
LIST OF TABLES VII
LIST OF SNAPSHOTS
LIST OF CONTENTS
CHAPTER 1 INTRODUCTION 01
1.1 Domain Overview 01
1.2 Project Overview 02
1.3 Existing System 03
1.4 Disadvantage of Existing System 03
1.5 Problem Statement 04
1.6 Project Motivation 05
1.7 Proposed System 04
1.8 Advantages of Proposed System 05
1.9 Objective of the Project 06
1.10 Organization of report 07
CHAPTER 2 LITERATURE SURVEY 09
2.1 Literature Review 09
2.2 Conclusion of Review 09
CHAPTER 3 SYSTEM REQUIREMENT SPECIFICATION 16
3.1 Introduction 16
3.2 Functional Requirement 16
3.3 Non - Functional Requirement 17
IV
3.4 System Requirements 17
3.4.1 Hardware Requirements 17
3.4.2 Software Requirements 17
CHAPTER 4 SYSTEM DEVELOPMENT 18
4.1 Introduction to System Development 18
4.2 Module & Methodology 18
4.2.1 Sub Module1 18
4.2.2 Sub Module2 21
CHAPTER 5 SYSTEM DESIGN 18
5.1 Introduction to System Design 18
5.2 High Level Design 18
5.2.1 Architecture of the System 18
5.2.2 Data Flow Diagram 21
5.3 Low Level Design 24
5.3.1 Process Diagram 24
5.3.2 Flow Chart 25
5.3.3 Sequence Diagram 26
5.3.4 Sequence Diagram 24
5.3.5 Activity Diagram 25
5.3.6 Use Case Diagram 26
CHAPTER 6 SYSTEM IMPLEMENTATION 27
6.1 Introduction to System Implementation 27
6.2 Language Used for Implementation 27
6.3 Algorithms 28
6.3.1 Algorithm1 Name 28
V
6.3.2 Algorithm2 Name 29
6.4 Code Snippet 30
CHAPTER 7 TESTING 36
7.1 Introduction to Testing 36
7.2 Types of Testing 36
7.3 Test Cases 37
CHAPTER 8 RESULTS AND DISCUSSIONS 39
8.1 Introduction 39
8.2 Snapshots with Description 43
CONCLUSION AND FUTURE ENHANCEMENT 46
REFERENCES 47
VI
LIST OF FIGURES
Figure No. Name of the Figure Page No.
Fig 1.1 Wireless Sensor Network (WSN) 02
Fig 1.2 Sensor Node Components 02
Fig 2.1 Components of NS2 11
Fig 2.2 Network consists of N nodes at time t1=0 sec 13
Fig 2.3 Network consists of N nodes at time t2=10 sec 14
Fig 4.1 Architecture of Existing system 19
Fig 4.2 Architecture of Proposed system 20
Fig 4.3 Work Flow of existing System 21
Fig 4.4 Work flow of proposed system 23
Fig 4.5 Process Diagram 24
Fig 4.6 Flow chart 25
Fig 4.7 Sequence Diagram 26
Fig 7.1 Initial position of node 39
Fig 7.2 Route request from source and destination 40
Fig 7.3 Obstacle during route discovery 41
Fig 7.4 Finding an alternate path 42
Fig 7.5 Bit error rate 43
Fig 7.6 Packet delivery ratio 44
Fig 7.7 Throughput 45
VII
LIST OF TABLES
Table No. Name of Table Page No.
Table 1.1 Routing table-1 13
Table 1.2 Path table-1 14
Table 2.1 Routing table-2 15
Table 2.2 Path table-2 15
Table 6.1 Unit Test Cases 37
Table 6.2 Integration Test Cases 38
VIII
LIST OF SNAPSHOTS
Snapshot No. Name of SnapShot Page No.
Fig 8.1 Routing table-1 13
Fig 8.2 Path table-1 14
Fig 8.3 Routing table-2 15
Fig 8.4 Path table-2 15
Fig 8.5 Unit Test Cases 37
Fig 8.6 Integration Test Cases 38
ABSTRACT
Mapreduce is a prominent groundwork for scrutinizing and processing
substantially massive data and hadoop groundwork is open source. It has been considered
as the default platform in present days for examining, manipulating and storing enormous
data. Since every educational establishments, business industries and research and
development centers rely on hadoop for processing their data, the performance of the
system must be maintained. One major obstacle of the hadoop groundwork that affects
the performance and complicates overall system is long makespan or the completion time
of mapreduce jobs.
The hadoop scheme presently in use ratify stabile assignment of slots i.e. map and
reduce slot numbers are predefined for cluster throughout its life at the inception of
hadoop cluster formation. This setting causes under utilization of resources and large
completion time. In order to reduce this limitations, this project has presented a
mechanism where in the slots are assigned dynamically by self tuning. It collects
execution technicalities of foregoing jobs and based on these details allocates the slots for
map and reduce, so this in turn leverages the performance of overall application.
Chapter 1
INTRODUCTION
In recent years MapReduce programming standard has turned out to be the
prominent technology for analyzing and processing big-data and its implementation
Apache Hadoop is a complimentary implementation which can be used for analyzing
broad range of data. Hadoop is a schematic groundwork adapted to process and stock
huge bulk of data on distributed-parallel environment. Hadoop is designed and written in
a way that it can extend from one server to multiple thousands of system each of which
offers local storage and refining of abundant amount of data submitted by user.
Due to the advancement in cloud computing, the Hadoop-MapReduce is suitable
not only for large-companies and research centre for working on data-intensive projects
but also for regular users by launching a hadoop cluster on cloud.
1.1 Objective
With the rapid advancement of technology and as more and more data is
generated, the applications are employing MapReduce techniques for scrutinizing,
processing, and extracting their data. In this circumstance, main concern of programmer
is how to achieve good reliability and how to enhance performance of a hadoop cluster.
Hadoop groundwork constitutes large set of predefined system technicalities or attributes
and these parameters plays salient role in leveraging performance of application.
Preeminent intentions for development of this application are:
First and foremost objective is to formulate new methods for modifying primitive
attributes of the system for improving overall performance of the system.
The second target of application is to curtail completion-time or also called makespan of
batch of jobs by incorporating new formulated methods.
And finally by achieving the above listed two objectives, third goal is to increase resource
utilization while processing unstable workloads can also be achieved.
1.2 Existing System
In the elementary Hadoop core architecture cluster comprises of a master node
which is solitary and responsible for management and examining of the entire worker or
also called slave nodes. And Hadoop groundwork cluster contains several worker nodes
which hosts the task-tracker routine to execute map-reduce jobs. The jobtracker
component resides in the master node; its main operation is allocating jobs and organizing
map or reduce tasks to executed on map or reduce slots respectively in an adept manner.
The number of tasks which can be accommodated on individual nodes is
represented by a term called slot and in the elementary hadoop structure, each slot can run
only one task at a specified time. Based on this circumstances and theory, the total
number of slots position present in every node indicates the maximal magnitude of
parallelism which can be achieved.
The slot setting arrangement is primitive parameter and considered to be default
throughout cluster’s lifetime which has crucial impact on performance of system. The
basic hadoop groundwork makes use of fixed slot configuration and in this setting the
number of slots for map and reduce is both predefined for each separate node at the
beginning of cluster creation.
This predefined number assigned for static configuration is random values without
taking into account any job attributes. So static configuration of hadoop is not optimized
and performance of the whole system may be hindered.
Some of the drawbacks of classic Hadoop-MapReduce are:
The system uses stabile slot setting, i.e. they have predefined number of map and reduce
slots for individual nodes of cluster throughout its lifetime.
A static arrangement of slots causes improper resource utilization.
It scales down performance of overall system under diverse and unstable workloads.
1.3 Proposed System
In order to overcome the limitations of existing system, this project aims at
designing algorithms for modifying primitive system attributes and increasing the system
performance of batch of jobs. In this project, a new conceptual theory of dynamically
assigning slots is proposed. The vital goal of this new technique is decreasing completion-
time of tasks executed while the simplicity hadoop implementation is retained as it is.
The newly projected and designed system is termed as TuMM which stands for
TUnable knob for minimizing Makespan of MapReduce jobs. Its major goal is to make
slot allotment proportion of map and reduce tasks automatic. Projected system
groundwork composes of two primary components: Workload-Estimator (WE) and Slot-
Scheduler (SS).
The workload-estimator is present in Job-tracker routine, and it acquires the
details like execution time of foregoing completed tasks. This detail is used to compute
current workload in hadoop cluster. Second integral component slot-scheduler fine-tunes
the ratio of map and reduce slots for each worker node based on result computed by
Workload-Estimator.
A new variation of TuMM technique called H-TuMM is implemented for
heterogeneous clusters which assigns slot for all the nodes separately to lessen the
makespan of job cluster.
Some of advantages of this proposed system are:
It minimizes the completion time of two phases thereby scaling down the makespan of
multiple jobs by individually allocating slots for nodes in heterogeneous environment.
The projected system shows up to 28% curtailment of completion time or makespan for
job cluster can be achieved.
And which in turn causes 20% enhancement and rise in proper usage of system resources.
1.4 Organization Of Report
This chapter summarizes introduction of project which is elaborately described in
later section of report, so the second chapter provides a detailed survey about this
projected system which constitutes various paper related to how the performance of
hadoop can be improved. In the third chapter the requirements, constraints listed by the
user for designing this application is described and following chapter illustrates design of
the application which comprises of sequence diagram, architecture, and so on. Fifth
chapter gives insight on implementation part and then testing techniques and test cases
used for verifying this application is depicted in chapter six. Analysis of report containing
snapshots is present in seventh chapter and finally conclusion and future enhancement is
specified in the end.
Chapter 2
LITERATURE SURVEY
Improving MapReduce Performance through Data Placement in
Heterogeneous Hadoop Clusters [1]
J Xie et al have designed and invented a data placement approach in the Hadoop
distributed file system to calibrate the data load in a heterogeneous Hadoop cluster. The
newly designed data placement component firstly distributes a vast data set to multiple
nodes with respect to computing capacity of each node. They designed a data
reorganization algorithm along with data redistribution algorithm in HDFS and these two
algorithms can be used to solve the data skew problem caused by dynamic data addition
and removal. Initial algorithm is used to divide and distribute file chunks to
heterogeneous nodes in a cluster at beginning of cluster formation. When all file
fragments of input file which is currently required by computing nodes are present in a
node, then these chunks are distributed to computing nodes and then second algorithm is
incorporated for rearranging file chunks to solve the data skew problem.
First data placement algorithm starts off by initially splitting a vast input into a
numerous fragments of same size. Then these fragments are allotted to nodes in cluster
based on node’s data processing speed. Comparatively the high-performance nodes can
stock and process more file chunks than low-performance nodes. The input file segments
distributed by this algorithm might get disturbed because of the following reasons: first
new data may be added to the current input file. Second data fragments may be deleted
from current input file. And third new data computing nodes are augmented to the cluster
which is present.
To overcome this data load balancing problem, data redistribution algorithm
mechanism is being incorporated. This reorders file chunks based on computing ratios, so
in this method first the data about disk space utilization and network topology of cluster is
compiled by the data distribution server. Next, two lists called over-utilized and under-
utilized node list is created. Then the server shifts the file chunks from over-utilized node
list to an underutilized node list until data load are allocated evenly among nodes.
MARLA: MapReduce for Heterogeneous Clusters [2]
Z Fadika et al implemented MARLA a MapReduce paradigm with dynamic load
balancing which can be adapted for homogenous, heterogeneous and even for load
imbalanced environments. MARLA is based on basic shared file systems as its input
output management technique. Idea of this new model relies on dynamic task scheduling
which allow nodes in hadoop bunch to request tasks when required. Previously in Hadoop
MapReduce the tasks were evenly distributed and pre-assigned for the nodes before
running a given application, but in MARLA the nodes in the cluster must request for the
job when they are done executing foregoing tasks. Main node is responsible for
registering number of tasks available in nodes and these nodes are assigned a token for
identifying process and this can be used for requesting tasks. When task is requested by
particular node, that specific task becomes unavailable to rest of the processing nodes.
Node can request for a job only when it has executed, and successfully completed the
foregoing tasks and henceforth in this scheme the fast and slow nodes process their fair-
share.
MARLA is composed of three integral components: splitter, task-controller and
fault-tracker. The first component splitter is used for management of input and output ,
second component task-tracker and task-controller is responsible for task assignment and
for checking concurrency and the last component fault-tracker is used for fault tolerance.
Splitter management component is composed of splitting dataset and distribution of
dataset, in order to work this framework takes input fragments as tasks which relies on
user, are created. This scheme increases data visibility provided by shared disk file
system to present its input data to cluster nodes, so input distribution is directly executed
through shared file system.
Task tracker is responsible for availability of tasks from data fragments produced
by splitter, and availability of map and reduce code implemented by user to processing
nodes in cluster by shared file system. Task tracker frequently checks improvement and
progress of tasks and failed tasks are sent to task-bag through fault-tracker component.
Failed tasks in task-bag are put on short term leave and then retried later and completed
tasks are shifted to completed-task-bag and it is moved to reduce phase.
HadoopCL: MapReduce on Distributed Heterogeneous Platforms
through Seamless Integration of Hadoop and OpenCL [3]
In distributed parallel computing as complexity raises the three challenges: the
programmability, reliability and energy efficiency of the system also increases. When
trying to avoid three problems listed previously, performance of system may be hindered.
In this work M Grossman et al have introduced a new idea of integrating Hadoop
MapReduce with OpenCL to facilitate the use of heterogeneous processors in distributed
system. Incorporating OpenCL with Hadoop provides: first user friendly, flexible and
easily learnable application programming interface in high level and most widely used
programming language, second it provides reliability of distributed file system and thirdly
it guarantees minimal power utilization and leveraging performance of heterogeneous
processors.
By adapting new paradigm HadoopCL all the three challenges can be maintained
without sacrificing the performance in hadoop distributed system. Functionalities of
HadoopCL include: first in order to lessen modification done to legacy code, HadoopCL
extends hadoop groundwork’s mapper and reducer classes to support execution of user
written java kernels on heterogeneous hadoop bunches. Next functionality is adoption of
dedicated communication threads and asynchronous communication to escalate utility of
available bandwidth and restrain communication blockage. Third, HadoopCL aids in
translating the java bytecode to OpenCL kernels automatically using APARAPI and
translation of extensions to existing features of APARAPI. Lastly it evaluates
HadoopCL’s performance in two multinode cluster comprising multicore CPUs, GPUs
and APUs.
HadoopCL depends on APARAPI tool for translating java bytecode to OpenCL
kernels and OpenCL kernel code is produced for user-written map and reduce module and
even for HadoopCL glue code whose function is to pass keys and values into user-written
functions. The HadoopCL can modify its own memory access arrangement and iteration
of loop for the best performance of the system. Presently it grants optimization for GPUs
and multicore CPUs and APARAPI was extended to aid asynchronous kernel execution
and it was accomplished by reforming the APARAPI C++ runtime to stock references to
OpenCL events.
Performance Modeling of MapReduce Jobs in Heterogeneous
Cloud Environments [4]
In present days hadoop is used for heterogeneous data handling and management
which has additional challenge efficient cluster administration and job management. In
this heterogeneity of data resources, which system resources is leading to performance
hindrance and bottlenecks is not clear. In order to provide a mechanism for configuring
and optimizing such Hadoop cluster Z Zang et al, analyzed efficiency and performance
precision of the bounds-based performance (BBP) model and using this model they
estimated completion time of MapReduce job in heterogeneous bunches.
BBP (bounds-based performance) paradigm measures upper and lower limit of job
finishing time and this model relies on makespan theorem which is used to calculate
performance confinement on completion time for provided set of n number of tasks that
are processed and refined by k number of servers.
Greedy algorithm is used for allotment of tasks to slots and this is an online
allocation technique where in, slot which has finished executing foregoing task earliest is
assigned a new task. Then lower bound is the product of average duration of n task and
fraction of n tasks and k servers. And upper bound is summation of maximum duration of
n tasks and the product of average duration of n task and fraction of n-1 tasks and k
servers. Difference between the least and at most value indicates set of obtainable
completion times due to task scheduling and non-determinism.
For approximately reckoning total finishing time of job submitted, first median of
task timing taken and maximum duration of task should be measured at different stages of
job execution: map-phase, shuffle-phase/sort-phase and reduce-phase. Median and
maximum reckoning can be redeemed from job execution record. Job completion timing
value of different processing stage like map-phase, shuffle-phase/sort-phase and reduce-
phase of job can be computed by using newly projected bound paradigm.
Dynamic Job Ordering and Slot Configurations for MapReduce
Workloads [5]
MapReduce performance and resource utilization varies based on different map-
reduce slot configurations and job execution orders, so S Tang et al initiated usage of two
classes of algorithms to reduce makespan and total completion time for an offline
workload. First set of algorithms is used for optimizing job ordering given a map-reduce
slot configuration and next class of algorithm is used for optimizing slot configuration.
Algorithm used for optimizing job order is MK_JR and is based on Jhonson’s
Rule for makespan optimization. The Jhonson’s rule can provide a best job order for
makespan when there are one reduce and one map slot only. But generally when random
amount of map and reduce slots are accessible, lowering makespan is considered as NP-
hard. MK_JR algorithm produces 1+δ roughly close value to lowering makespan, where
δ<1 and can be reckoned as ratio of summation of maximum map and reduce task size to
summation of all task size. δ is a very small value because the time needed for processing
single map-reduce task is very small compared to processing time of overall MapReduce
workload. Another algorithm presented for optimizing makespan and total completion
time concurrently is MK_TCT_JR, MK_TCT_JR is a bi-criteria heuristic algorithm, it
optimizes the parameter values by observing the significant trade-off between completion
time and makespan.
After obtaining optimized map-reduce slot configuration by computing and
verifying all possible values from 1 to S-1 where S is total number of slots. But when S
becomes very large, search algorithm may be inefficient, in order to overcome this
problem proportional configuration property was used.
Chapter 3
SOFTWARE REQUIREMENT SPECIFICATION
3.1 Introduction
This chapter discusses about various requirements of the project such as software
requirements, hardware precondition, functional and non-functional prerequisite of the
project and constraints the system must adhere to and this section of report also includes
the purpose and project perspective.
3.2 Purpose
The main purpose of this project is to enhance the performance of the system by
scaling down completion time using the dynamic self tunable slot technique which in turn
leverages the resource utilization of the overall system.
3.3 Project Perspective
The elementary hadoop groundwork cluster is predefined with fixed setting
arrangement for the slots. The number of slots for map and reduce stage is permanently
defined in the beginning of cluster formation and can’t be altered later on. This
elementary mechanism of hadoop schematic groundwork hinders the performance and
optimality of the entire system and induces underutilization of resources among the nodes
in system. Many techniques where projected to address this problem and complication
generated in former methods like:
 Quincy et al [6] adapted locality restraints and fairness hindrance for dealing job
allocation and management complication.
 Zaharia et al [7] suggested a delay scheduling to boost and facilitate optimality of
fair scheduler by leveraging dataset locality
 Verma et al [8] projected a heuristic to scale down makespan of a set of separate,
self-reliant MapReduce jobs by applying classic Johnson’s algorithm.
In order to overcome the limitations of all above techniques, in this project self
tunable slot assignment techniques has been implemented. In this technique, map and
reduce slots are allocated dynamically based on the feedback collected from the workload
computing component. First integral component workload estimator calculates time of the
foregoing job execution and then this is sent to next vital component slot scheduler,
which properly assigns slots to map and reduce.
3.4 Functional Requirements
Functional requirements are used for expressing the behavior of a project, purpose
and role each component. This is represented as using inputs, outputs and its behavior
based on the specified input. While implementing design phase of system, functional
requirements are considered and behavior of the whole project is realized using the use
cases. The use case depicts the behavior, relationship between the components or modules
of the project and the use case explanation for this project is illustrated in system design
chapter.
3.5 Non – Functional Requirements
Non functional requirements depict conditions which can be utilized for analyzing
the working of a project instead of its behavior. Category of perquisite or requirements of
this kind is depicted elaborately in groundwork of this project. It is used to indicate the
characteristics like security and usability which can be termed as execution-qualities. And
traits like reliability, performance, optimality, consistency, maintainability can be termed
as evolution-qualities are also described for projected application.
3.5.1 Performance
Compare to other existing techniques of slot assignment, the self tunable slot
allocation mechanism adapted in this project performance is high which in turn lowers the
makespan of the job batch and resource utilization is increased.
3.5.2 Optimality
Optimality defines how best the application runs in any circumstances irrespective
of any kind of input dataset which results in effective and efficient processing method. In
this application optimality is realized based on new formulated methodology that we have
incorporated to specific dataset.
3.5.3 Reliability
The reliability of application mainly depends on efficiency, throughput or
performance that directly or indirectly affects overall behavior of system by considering
specific data and its required properties. This application is more reliable, it allocates slots
based feedback of foregoing jobs timing, thereby proper utilization of resources.
3.5.4 Portability
Portability furnishes insight on how a project could be carried out irrespective of
any platform or environment. This application makes use of open source architecture of
hadoop and java as coding language so it is compatible with different platforms and
datasets, and easily portable.
3.5.5 Security
The security of this application depends on hadoop tool utilized, but it’s not
affected by the dataset being used.
3.6 System Specification
3.6.1 Hardware Specification
 SystemProcessor : Pentium IV 2.4 GHz
 Secondary Storage : 40 GB
 Primary Memory : 4GB
3.6.2 Software Specification
 Platform tool : Windows 7/UBUNTU
 Programmed by : Java 1.7, Hadoop 0.8.1
 Interface : Eclipse
 RDB : MYSQL
Chapter 4
SYSTEM DESIGN
System design is a substanially important phase in project building and creation
stages of whole development cycle. The system design is a mecahnsim for depicting and
representing the overall architecture of the system, interfaces between the different
components, the methods and parameters defined for each module and data for a system
according to requirements specified by the user.
In order to design a system, first step is to collect the system requirements,
functional and non functional requirements , constaraints from the user. Second step is
designing the system in an abstract manner, this step provides outline of all major
components that is required for designing sytem architecture. Third step is detecting and
addressing bottlecks generated in the abstarct or high-level design due to violation of
some constarints specified by the user. Next operation is designing system in more
elaborate and detailed manner and this step constitutes specifying the methods,
parameters,interfaces to application components.
4.1 High Level Design
High level design reveals an abstract layout of entire application where abstract
HLD pictorially depicts primitive constituents of system to be developed. The
architecture of the system, the diagrams depicting flow of realtionship, flow of data are all
considered as the high level designs and these designs are written using non-technical
terms with slight additional technical terms.
4.1.1 System Architecture
Architecture of application projects a blueprint of entire system in pictorial
illustration. In this project the architecture three major components: job-assigner, slot-
assigner and task processor as shown in the figure 4.1. When the user submits a batch of
jobs to the system, first its sent to job-assigner component.
This component in turn contains two sub components slot scheduler and workload
estimator. Integral component workload estimator repeatedly collects the execution time
technicalities of latterly completed tasks at periodic intervals and then this value is used
for reckoning current map-reduce tasks at hand. After this second integral part, slot
scheduler based on these estimation adjusts and assigns the slot ratio to map and reduce
slave nodes .
Figure 4.1 System Architecture
4.1.2 Data Flow Diagram
Data flow diagram is constructed using geometrical components for representing
flow of data between modules of sytem and another name for DFD is bubble-chart. DFD
may be used to define abstraction of whole application at any stage, in that the context
diagram is contemplated to be as topmost position of abstraction.
Figure 4.2 potrays data flow diagram, whern the user logins, he will be provided
with two options of working on homogeneous or heterogeneous cluster after this user
submits job to be processed. System later examines whether job is scheduled for
processing or waiting in queue. Once job is scheduled for carrying out work, workload is
verified and job is split into task and in next step task is assigned to slot for execution.
Figure 4.2 Data Flow Diagram
4.2 Low Level Design
Low level design is used describing major components of system in detail and
elaborate manner so this technique is also called detailed-design. In this technique the
diagram is constructed by iteratively refining the given details, requirements and
constraints and also depicts modules, their parameters, methods and relationship among
them.
4.2.1 Use Case Diagram
Use case diagram is a simple mechanism for demonstrating how the user interacts
with the system functions which falls under behavioral drawing category in UML design
and can be used to find out different users and describe their behavior towards different
use cases.
Figure 4.3 Use Case Diagram
Figure 4.3 portrays use case representation where user interacts with functional
components like job tracker, reduce process for assigning job to system. In this diagram
there are two different users: first user submits the job for processing, second type of user
is the one who requires the content.
4.2.2 Sequence Diagram
Sequence diagram portrays how interplay occurs between different modules
formulated in application and it falls under category of interaction drawing of UML
design. This diagram shows how process interacts, their order, and sequence of message
sent and received for interaction and all this occurs in a time frame, so this diagram are
also referred as event-diagrams.
Figure 4.4 Sequence Diagram
Figure 4.4 portrays sequnce daigram which depicts interplay between three
components: user job, job tracker and task tracker. First integral module user job process
interacts with job tacker by sending a request message for processing user job and then
job tracker sends user data to task tacker to processes job, after execution of job, task
tracker send respond message with user results back to user.
4.2.3 Activity Diagram
Activity diagram is a kind of behavioral reprentation which describes how
workflows in overall system and is used to describe actions, interactions and activities of
system in step-by-step manner.It can also be called as a type of flowchart. Figure 4.5
showcases activity diagram which describes how the workflows between all the modules
i.e job tracker, mapping process and reducing job.
Figure 4.5 Activity Diagram
4.2.4 Collaboration Diagram
Collaboration diagram called as communication diagram is a type of interaction
diagram in UML design which describes interplay among modules of system through
messages as depicted in figure 4.6 which portrays collaboration diagram. This diagram
combines both dynamic behavior and static details of a system and therefore it can be
formed by the details taken from use case diagram, sequence diagram, class diagram and
so on.
Figure 4.6 Colloboration Diagram
Chapter 5
IMPLEMENTATION
5.1 System Implementation
Implementation stage of an application creation is actualization of ideas, design
and requirement specification into source code. The primary objective of implementation
part of building a project is production of source codes with good style and comments
when necessary, by applying a proper and best coding technique which is suitable with
the help of proper documents.
Program codes are created in accordance to the structured coding techniques,
which adheres to control flow, so that execution sequence follows the order in which
codes are scripted. This makes the code unambiguous and more readable, which eases
understanding, modifying, debugging, testing, and documentation of the programs.
5.2 Modules
The modules implemented in this application for scaling down the makespan of
the submitted jobs and efficient utilization of resource is described as follows:
5.2.1 Batch Processing of jobs
In this module, we have to submit a job in the manner of batch. From this batch
process, the jobs are processed as each and every batch for ease of understanding
5.2.2 Estimation of Workload
In the basic version, assessment of workload was obtained based on amount of
remaining jobs for map and reduce stage. New idea projected is based on workload
details previously or known beforehand and for this workload details can be collected
from tasks configuration, training stage, or some factual data settings, but in some cases
the information regarding workload may not be precise or accessible for use.
So for this situation, in this module workload details without any previous data is
estimated. This module first considers incomplete task of both map and reduce stage and
then execution time is summed up and used as estimator value but jobs present in waiting
queue are not considered for this calculation.
5.2.3 Feedback for workload
The technique adapted in last section is most suitable for homogenous
environment where all the nodes are running same type of job, similar configuration and
with the aid similar system resource usage. But for trending heterogeneous where slot
allocation is dynamically changed, a new technique which uses feedback from foregoing
jobs is implemented. This proposal helps to balance and accommodate slots
automatically.
5.2.4 Job manager
In this module, job manager is used to manage resources available in task
administrator. Here, job manager have three sections are, workload estimator, slot
scheduler, and the scheduler and integral component slot scheduler is used to assign job
to task administrator.
5.2.5 Task Administrator
In this module task administrator, performs the job instructed by job manager and
this component performs task with guidance of task manager. Task manager implements
job by two phases are map and reduce
5.3 Snapshot of Code Snippets
Snapshot 5.1: Map Function
Above code snippet is code for map stage which is extended from base predefined class.
Snapshot 5.2: Overridden Map Function
This snapshot 5.2 is code snippet which describes overridden map procedure.
Snapshot 5.3: Overridden Reduce Function
Above snapshot 5.3 of code snippet is used for coding overridden reduce function.
Snapshot 5.4: Driver Configuration
Snapshot 5.4 of code snippet portrayed above gives insight of driver configuration
integral component.
Chapter 6
TESTING
Function of testing is to discover errors and is used for determining every
plausible faults or glitches which may be developed in product. It presents a way for
assessing functionality of components, modules, assemblies, sub-assemblies and
completed product with thorough examination. It is the phase of product development,
which is used to make sure that system is designed according to the specified
requirements, the system adheres to user specified constraints, meets user expectation and
does not fail in an unacceptable manner.
6.1 Testing Types
Testing can be done in specific method, using several kinds of testing at different
stage levels of developed product and some of the testing types are:
6.1.1 Unit Testing
Unit testing is a technique for checking an individual module or codes in the final
product with related program input so that it produces valid and desired outputs. For unit
testing individual constituents of application is tested with test-cases written for each
integral component by considering both positive and negative inputs. Each and every
decision condition, internal code flow should be verified for desirable output with
thorough examination and scrutinize.
6.1.2 Integration Testing
Integration testing is a mechanism for examining integrated module code and in
this technique uses an iterative approach where individual components that are unit tested
are combined and verified for errors and proper functionality. This step is done many
times by integrating all components designed and tested. This testing is specifically used
for determining the problems which arises from the combination of the components.
6.1.3 System Testing
System testing involves checking complete integrated software of product
developed to check whether it adheres user requirements and can detect errors in
integrated modules and as well as in entire system. System considers traits for checking
usability, performance, optimality, exception handling, volume and load testing.
6.1.4 Functional Testing
Functional tests administers systematic demonstrations that functions tested are
available as specified by business and technical specifications, documentation, and
manuals for user and organization of this kind of testing is focused on key functionality,
requirements, or special test cases. In addition, systematic coverage pertaining to identify
Business process flows; data fields, predefined processes, and successive processes must
be considered for testing and before it is complete, additional tests are identified and the
effective value of current tests is determined.
6.1.5 Acceptance Testing
Acceptance testing is a method of testing specifically designed for examining
whether the product or application to be tested meets all described requirements by the
user, whether it adheres to the constraints specified. This testing is conducted after
performing system testing and then product is made available for user and this technique
can be adapted as black box testing technique.
6.1.6 White Box Testing
White box testing is mechanism which checks internal working or code of product
and in order to test product test-cases are designed based on details of flow of control and
data, path and branch conditions. This technique is used for checking the system at code
and can be incorporated in unit test verification, integration and regression testing.
6.1.7 Black Box Testing
Black box testing is a testing mechanism designed for verifying the functionality
of the system components or modules or whole project. This testing doesn’t consider
internal flow of code structures, it provides inputs and responds with outputs without
considering how inner software structure works and this testing is suitable for most of
testing levels like acceptance, system, integration and unit.
6.2 Test Cases
Test case is a document which contains set of perquisites using which application
can be tested and can be used for checking modules or whole application. Test case can
be formal also called technical or informal called non-technical, in formal draft
application is tested for both positive and negative circumstances and in latter kind there
is no technical specification, it is based on the working of the application. It can be
documented using technicalities of modules, methodology, scenarios and cases.
6.2.1 Scenario for Hadoop Initialization
Test Case ID Test_Case_01
Scenario Hadoop Initialization
Description
This case checks for proper initialization of
hadoop. To initialize hadoop, it must be
installed, configured and path must set
properly.
Input
Enter the script to start initialization of the
hadoop in the terminal
Expected Result
Specific Hadoop version with related files
should be added.
Actual Result
Hadoop with specified version is initialized
with related files.
Remarks Pass
Table 6.1 Test Case of Hadoop Initialization
6.2.2 Scenario for Hadoop Cluster Formation
Test Case ID Test_Case_02
Scenario Hadoop Cluster Formation
Description
The perquisite for cluster formation is making
sure that hadoop package must be available for
use from the same specified path to all the
nodes. Then based on required number, cluster
is formed.
Input
Enter the script start_dfs.sh and start_mapred.sh
in the terminal
Expected Result
Hadoop with given number of clusters should
be defined.
Actual Result Required number of hadoop cluster is created.
Remarks Pass
Table 6.2 Test Case of Hadoop Cluster Formation
6.2.3 Scenario for Verification by JPS Command
Test Case ID Test_Case_03
Scenario Verification by JPS Command
Description
The Java virtual machine process status tool
is used for verifying the hadoop parameters
like job-history, name-node, and resource-
manager are functioning properly.
Input Enter the jps command in the terminal
Expected Result
The JPS should be initialized and all the
required attributes and properties should be
defined.
Actual Result
JPS is initialized and required attributes like
name-node, resource-manager is defined.
Remarks Pass
Table 6.3 Test Case of Verification using JPS Command
6.2.4 Scenario for Port Programming
Test Case ID Test_Case _04
Scenario Port Programming
Description Processing of the port assigned to hadoop
Expected Result
The port allocated to hadoop should be
properly configured and programmed.
Actual Result The port assigned for hadoop is programmed.
Remarks Pass
Table 6.4 Test Case for Port Programming
6.2.5 Scenario for Configuring Environment for Dataset
Test Case ID Test_Case_05
Scenario Setup for submitting dataset
Description
Configure a directory / folder to dump a
source file
Input Specific dataset
Expected Result
The source file for specific dataset should be
created according to hadoop parameters in a
hadoop environment.
Actual Result Source file for specific dataset is created
Remarks Pass
Table 6.5 Test Case for Configuring Environment for Dataset
6.2.6 Scenario for Processing Data Using Mapreduce
Test Case ID Test_Case_06
Scenario Processing data using mapreduce
Description
Hadoop processes the large dataset with map
reduce technique. Which contains reduce and
map stages for analyzing data effectively.
Input Any specific dataset
Expected Result
Hadoop should reduce the processing activity
by utilizing map reduce technique on specific
database.
Actual Result
The hadoop processing activity is reduced by
using map reduce technique
Remarks Pass
Table 6.6 Test Case for Usage of MapReduce
6.2.7 Scenario for Result Analysis
Test Case ID Test_Case_07
Scenario Result Analysis
Description
After specific dataset is submitted for
processing, the hadoop uses map-reduce
mechanism for processing this data. Then
result is produced as graphs
Input Submit dataset to mapreduce framework
Expected Result
Different kinds of graphs which describes the
results effectively should be generated
Actual Result
The application generates graphs for given
dataset.
Remarks Pass
Table 6.7 Test Case for Result Analysis
Chapter 7
RESULT ANALYSIS
Snapshot 7.1: Initialization of Hadoop Cluster
The snapshot 7.1 shows the initialization of the hadoop cluster1 and snapshot 7.2
illustrates hadoop cluster parameters like namenode, tmp, datanode, pid. Initialization is
through set of commands in terminal.
Snapshot 7.2: Parameters of Hadoop Cluster
Snapshot 7.3: Start Hadoop Services
Snapshot above shows how hadoop services are started with command start –all and it
starts all services like namenode, resource manager, secondary namenode and node
manager.
Snapshot 7.4: Hadoop Page
Snapshot shows overview of hadoop bunches which comprises of starting date, version,
cluster id and some additional configuration details.
Snapshot 7.5: JPS Verification
Snapshot 7.5 shows JPS tracking, in all the hadoop applications, the JPS command i.e.
java virtual machine process status tool and this command is used for examining all
hadoop parameters proper functioning. Snapshot 7.6 portrays how path for source file is
set.
Snapshot 7.6 setting path for source file
Snapshot 7.7: MapReduce Process
Snapshot 7.8: Main Page of Slot Configuration
Snapshot 7.9: Bar Graph illustrating the Result
Snapshot 7.10: Line Graph illustrating the result
Result of application is depicted in snapshot 7.9 and 7.10 which constitutes line
and bar graph of result analysis
CONCLUSION
The project presented a new slot assignment technique called TuMM for
dynamically assigning slots in Hadoop. The vital purpose of this application is to boost
utilization of resource and scale down makespan of the given n number of jobs. This
mechanism is suitable for homogeneous clusters and for heterogeneous hadoop bunches
modified version of latter mechanism called H-TuMM is introduced. In this system,
performance improvement is done, by separately configuring slots for every node. From
this new slot allocation method, the project shows of about 28% decrease of completion
time.
REFERENCES
[1] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors et al. “ Improving MapReduce
Performance through Data Placement in Heterogeneous Hadoop Clusters”,
Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), 2010
IEEE International Symposium.
[2] Z. Fadika, E. Dede, J. Hartog, M. Govindaraju, “MARLA: MapReduce for
Heterogeneous Clusters”, in Cluster, Cloud and Grid Computing (CCGrid), 2012
12th
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
(ccgrid 2012)
[3] M. Grossman, M. Breternitz, V. Sarkar, “HadoopCL: MapReduce on Distributed
Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL”,
in Parallel and Distributed Processing Symposium Workshops and PhD Forum
(IPDPSW), 2013 IEEE 27th
International.
[4] Z. Zhang, L. Cherkasova, B. T. Loo, “Performance Modeling of Mapreduce Jobs
in Heterogeneous Cloud Environments”, in Cloud Computing (CLOUD), 2013
IEEE Sixth International Conference.
[5] S. Tang, B. S. Lee, B. He, “Dynamic Job Ordering and Slot Configuration for
MapReduce Workloads”, in IEEE Transaction on Services Computing (volume:9,
Issue: 1).
[6] M. Isard, Vijayan Prabhakaran, J. Currey et al., “Quincy: fair scheduling for
distributed computing clusters,” in SOSP’09, 2009, pp. 261–276.
[7] M. Zaharia, D. Borthakur, J. S. Sarma et al., “Delay scheduling: A simple
technique for achieving locality and fairness in cluster scheduling,” in
EuroSys’10, 2010.
[8] A. Verma, L. Cherkasova, and R. H. Campbell, “Two sides of a coin: Optimizing
the schedule of mapreduce jobs to minimize their makespan and improve cluster
performance,” in MASCOTS’ 12, Aug 2012.

More Related Content

What's hot

Feedback System in PHP
Feedback System in PHPFeedback System in PHP
Feedback System in PHPPrince Kumar
 
Ignou MCA mini project report
Ignou MCA mini project reportIgnou MCA mini project report
Ignou MCA mini project reportHitesh Jangid
 
Digital Video Editing
Digital Video EditingDigital Video Editing
Digital Video Editingmroe
 
Android College Application Project Report
Android College Application Project ReportAndroid College Application Project Report
Android College Application Project Reportstalin george
 
Final Project presentation (on App devlopment)
Final Project presentation (on App devlopment)Final Project presentation (on App devlopment)
Final Project presentation (on App devlopment)S.M. Fazla Rabbi
 
Synopsis for Online Railway Railway Reservation System
Synopsis for Online Railway Railway Reservation SystemSynopsis for Online Railway Railway Reservation System
Synopsis for Online Railway Railway Reservation SystemZainabNoorGul
 
Online Examination System For Android AAD Report Akshay Kalapgar
Online Examination System For Android AAD Report Akshay KalapgarOnline Examination System For Android AAD Report Akshay Kalapgar
Online Examination System For Android AAD Report Akshay KalapgarAkshayKalapgar
 
Bus Management System
Bus Management SystemBus Management System
Bus Management SystemAl Mamun
 
Qa 00501--online ticket-booking_pvr_cinemas
Qa 00501--online ticket-booking_pvr_cinemasQa 00501--online ticket-booking_pvr_cinemas
Qa 00501--online ticket-booking_pvr_cinemassokkary
 
online blood bank system design
online blood bank system designonline blood bank system design
online blood bank system designRohit Jawale
 
Railway reservation system
Railway reservation systemRailway reservation system
Railway reservation systemKOYELMAJUMDAR1
 
Artificial Intelligence Face recognition attendance system using MATLAB
Artificial Intelligence Face recognition attendance system using MATLABArtificial Intelligence Face recognition attendance system using MATLAB
Artificial Intelligence Face recognition attendance system using MATLABNaomi Kulkarni
 
synopsis on stock management system in medical store in php
synopsis on stock management system in medical store in phpsynopsis on stock management system in medical store in php
synopsis on stock management system in medical store in phpsachin993
 
Online Attendance Management System
Online Attendance Management SystemOnline Attendance Management System
Online Attendance Management SystemRIDDHICHOUHAN2
 
Online Movie ticket booking Project
Online Movie ticket booking ProjectOnline Movie ticket booking Project
Online Movie ticket booking ProjectSHAZIA JAMALI
 
Synopsis on billing system
Synopsis on billing systemSynopsis on billing system
Synopsis on billing systemAlok Sharma
 

What's hot (20)

Feedback System in PHP
Feedback System in PHPFeedback System in PHP
Feedback System in PHP
 
Ignou MCA mini project report
Ignou MCA mini project reportIgnou MCA mini project report
Ignou MCA mini project report
 
Digital Video Editing
Digital Video EditingDigital Video Editing
Digital Video Editing
 
Android College Application Project Report
Android College Application Project ReportAndroid College Application Project Report
Android College Application Project Report
 
Final Project presentation (on App devlopment)
Final Project presentation (on App devlopment)Final Project presentation (on App devlopment)
Final Project presentation (on App devlopment)
 
Synopsis for Online Railway Railway Reservation System
Synopsis for Online Railway Railway Reservation SystemSynopsis for Online Railway Railway Reservation System
Synopsis for Online Railway Railway Reservation System
 
Online Examination System For Android AAD Report Akshay Kalapgar
Online Examination System For Android AAD Report Akshay KalapgarOnline Examination System For Android AAD Report Akshay Kalapgar
Online Examination System For Android AAD Report Akshay Kalapgar
 
News portal
News portalNews portal
News portal
 
Bus Management System
Bus Management SystemBus Management System
Bus Management System
 
Qa 00501--online ticket-booking_pvr_cinemas
Qa 00501--online ticket-booking_pvr_cinemasQa 00501--online ticket-booking_pvr_cinemas
Qa 00501--online ticket-booking_pvr_cinemas
 
online blood bank system design
online blood bank system designonline blood bank system design
online blood bank system design
 
Railway reservation system
Railway reservation systemRailway reservation system
Railway reservation system
 
E learning full report
E learning full reportE learning full report
E learning full report
 
Artificial Intelligence Face recognition attendance system using MATLAB
Artificial Intelligence Face recognition attendance system using MATLABArtificial Intelligence Face recognition attendance system using MATLAB
Artificial Intelligence Face recognition attendance system using MATLAB
 
Internship report
Internship reportInternship report
Internship report
 
synopsis on stock management system in medical store in php
synopsis on stock management system in medical store in phpsynopsis on stock management system in medical store in php
synopsis on stock management system in medical store in php
 
Online Attendance Management System
Online Attendance Management SystemOnline Attendance Management System
Online Attendance Management System
 
Online Movie ticket booking Project
Online Movie ticket booking ProjectOnline Movie ticket booking Project
Online Movie ticket booking Project
 
Online final report
Online final reportOnline final report
Online final report
 
Synopsis on billing system
Synopsis on billing systemSynopsis on billing system
Synopsis on billing system
 

Similar to Here are the key objectives of the project based on the introduction provided:1. To improve the performance of MapReduce jobs running on Hadoop by reducing job completion time (makespan). 2. To dynamically allocate map and reduce slots based on the execution details of previous jobs, instead of static pre-defined slot allocation. 3. To leverage cluster resources more efficiently by avoiding underutilization through dynamic slot allocation.4. To develop a mechanism for self-tuning of map and reduce slots that can adapt to the resource requirements of different jobs.5. To make Hadoop more scalable and suitable for processing large volumes of data for organizations, research centers and regular users by optimizing job performance

automatic database schema generation
automatic database schema generationautomatic database schema generation
automatic database schema generationsoma Dileep kumar
 
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...Munisekhar Gunapati
 
Digital Intelligence, a walkway to Chirology
Digital Intelligence, a walkway to ChirologyDigital Intelligence, a walkway to Chirology
Digital Intelligence, a walkway to Chirologyjgd2121
 
An evaluation of distributed datastores using the app scale cloud platform
An evaluation of distributed datastores using the app scale cloud platformAn evaluation of distributed datastores using the app scale cloud platform
An evaluation of distributed datastores using the app scale cloud platformhimanshuvaishnav
 
Project.12
Project.12Project.12
Project.12GS Kosta
 
Minor project report format for 2018 2019 final
Minor project report format for 2018 2019 finalMinor project report format for 2018 2019 final
Minor project report format for 2018 2019 finalShrikantkumar21
 
Adaptive Computing Seminar Report - Suyog Potdar
Adaptive Computing Seminar Report - Suyog PotdarAdaptive Computing Seminar Report - Suyog Potdar
Adaptive Computing Seminar Report - Suyog PotdarSuyog Potdar
 
Interference Aware Multi-path Routing in Wireless Sensor Networks
Interference Aware Multi-path Routing in Wireless Sensor NetworksInterference Aware Multi-path Routing in Wireless Sensor Networks
Interference Aware Multi-path Routing in Wireless Sensor NetworksRakesh Behera
 
WIRELESS ROBOT
WIRELESS ROBOTWIRELESS ROBOT
WIRELESS ROBOTAIRTEL
 
Bit Serial multiplier using Verilog
Bit Serial multiplier using VerilogBit Serial multiplier using Verilog
Bit Serial multiplier using VerilogBhargavKatkam
 
Auto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between StationsAuto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between StationsMadhav Reddy Chintapalli
 
IRJET- Design of Water Distribution Network System by using Branch Software
IRJET-  	  Design of Water Distribution Network System by using Branch SoftwareIRJET-  	  Design of Water Distribution Network System by using Branch Software
IRJET- Design of Water Distribution Network System by using Branch SoftwareIRJET Journal
 
Student portal system application -Project Book
Student portal system application -Project BookStudent portal system application -Project Book
Student portal system application -Project BookS.M. Fazla Rabbi
 
Major Report on ADIAN
Major Report on ADIANMajor Report on ADIAN
Major Report on ADIANsmittal121
 
IRJET- Course outcome Attainment Estimation System
IRJET-  	  Course outcome Attainment Estimation SystemIRJET-  	  Course outcome Attainment Estimation System
IRJET- Course outcome Attainment Estimation SystemIRJET Journal
 

Similar to Here are the key objectives of the project based on the introduction provided:1. To improve the performance of MapReduce jobs running on Hadoop by reducing job completion time (makespan). 2. To dynamically allocate map and reduce slots based on the execution details of previous jobs, instead of static pre-defined slot allocation. 3. To leverage cluster resources more efficiently by avoiding underutilization through dynamic slot allocation.4. To develop a mechanism for self-tuning of map and reduce slots that can adapt to the resource requirements of different jobs.5. To make Hadoop more scalable and suitable for processing large volumes of data for organizations, research centers and regular users by optimizing job performance (20)

automatic database schema generation
automatic database schema generationautomatic database schema generation
automatic database schema generation
 
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
 
Digital Intelligence, a walkway to Chirology
Digital Intelligence, a walkway to ChirologyDigital Intelligence, a walkway to Chirology
Digital Intelligence, a walkway to Chirology
 
An evaluation of distributed datastores using the app scale cloud platform
An evaluation of distributed datastores using the app scale cloud platformAn evaluation of distributed datastores using the app scale cloud platform
An evaluation of distributed datastores using the app scale cloud platform
 
Project.12
Project.12Project.12
Project.12
 
Minor project report format for 2018 2019 final
Minor project report format for 2018 2019 finalMinor project report format for 2018 2019 final
Minor project report format for 2018 2019 final
 
Adaptive Computing Seminar Report - Suyog Potdar
Adaptive Computing Seminar Report - Suyog PotdarAdaptive Computing Seminar Report - Suyog Potdar
Adaptive Computing Seminar Report - Suyog Potdar
 
Interference Aware Multi-path Routing in Wireless Sensor Networks
Interference Aware Multi-path Routing in Wireless Sensor NetworksInterference Aware Multi-path Routing in Wireless Sensor Networks
Interference Aware Multi-path Routing in Wireless Sensor Networks
 
Face detection
Face detectionFace detection
Face detection
 
WIRELESS ROBOT
WIRELESS ROBOTWIRELESS ROBOT
WIRELESS ROBOT
 
Bit Serial multiplier using Verilog
Bit Serial multiplier using VerilogBit Serial multiplier using Verilog
Bit Serial multiplier using Verilog
 
Front Pages_pdf_format
Front Pages_pdf_formatFront Pages_pdf_format
Front Pages_pdf_format
 
THESIS
THESISTHESIS
THESIS
 
Documentation
DocumentationDocumentation
Documentation
 
Auto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between StationsAuto Metro Train to Shuttle Between Stations
Auto Metro Train to Shuttle Between Stations
 
Bachelors project
Bachelors projectBachelors project
Bachelors project
 
IRJET- Design of Water Distribution Network System by using Branch Software
IRJET-  	  Design of Water Distribution Network System by using Branch SoftwareIRJET-  	  Design of Water Distribution Network System by using Branch Software
IRJET- Design of Water Distribution Network System by using Branch Software
 
Student portal system application -Project Book
Student portal system application -Project BookStudent portal system application -Project Book
Student portal system application -Project Book
 
Major Report on ADIAN
Major Report on ADIANMajor Report on ADIAN
Major Report on ADIAN
 
IRJET- Course outcome Attainment Estimation System
IRJET-  	  Course outcome Attainment Estimation SystemIRJET-  	  Course outcome Attainment Estimation System
IRJET- Course outcome Attainment Estimation System
 

Recently uploaded

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 

Recently uploaded (20)

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 

Here are the key objectives of the project based on the introduction provided:1. To improve the performance of MapReduce jobs running on Hadoop by reducing job completion time (makespan). 2. To dynamically allocate map and reduce slots based on the execution details of previous jobs, instead of static pre-defined slot allocation. 3. To leverage cluster resources more efficiently by avoiding underutilization through dynamic slot allocation.4. To develop a mechanism for self-tuning of map and reduce slots that can adapt to the resource requirements of different jobs.5. To make Hadoop more scalable and suitable for processing large volumes of data for organizations, research centers and regular users by optimizing job performance

  • 1. VISVESVARAYA TECHNOLOGICAL UNIVERSITY “Jnana Sangama” Belagavi 590018 A Project Report on “PROJECT TITLE” (in caps) Submitted in partial fulfillment for the award of degree of Bachelor of Engineering in Computer Science & Engineering during the academic year 2016-20. By Student Name USN Student Name USN Student Name USN Student Name USN Under the guidance of Guide Name Assistant Professor Dept. of CS&E MRIT, Mandya DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING MYSURU ROYAL INSTITUTE OF TECHNOLOGY, MANDYA 2019 - 2020
  • 2. VISVESVARAYA TECHNOLOGICAL UNIVERSITY Mysuru Royal Institute of Technology, Mandya – 571606 2019-2020 Department of Computer Science & Engineering CERTIFICATE This is to certify that the project work entitled “TITLE” is a bonafide work carried out by name (usn), in partial fulfillment for the award of Bachelor of Engineering in Computer Science and Engineering of the Visvesvaraya Technological University, Belagavi, Karnataka during the year 2019-2020. It is certified that all corrections/suggestions indicated for the Internal Assessment have been incorporated in the report. The project report has been approved as it satisfies the academic requirements in respect of project work prescribed for the Bachelor of Engineering degree. ------------------------------------------ ----------------------------------------- Signature of Internal Guide Signature of Project Coordinator Prof. Guide Name Prof. Chethan Raj C Asst. Professor Asst. Professor Dept. of CS&E Dept. of CS&E MRIT, Mandya MRIT, Mandya ------------------------------------------ ----------------------------------------- Signature of HOD Signature of Principal Prof. Soumya B Dr. Suresh Chandra Asst. Professor Principal, Dept. of CS&E MRIT, Mandya MRIT, Mandya EXTERNAL VIVA Name of the Examiner Signature with date 1. 2.
  • 3. Mysuru Royal Institute of Technology, Mandya – 571606 Department of Computer Science and Engineering DECLARATION I Student Name, studying in the Eigth semester BE, Computer Science and Engineering, Mysuru Royal Institute of Technology, Mandya, hereby declare that the project work entitled “----------TITLE------------------” has been carried out independently under the guidance of Guide Name, Asst. professor, Department of Computer Science and Engineering, Mysuru Royal Institute of Technology, Mandya. This project work is submitted to the Visvesvaraya Technological University, Belagavi in the partial fulfillment of required for the award of degree in Bachelor of Engineering during the academic year 2016-2020. This dissertation has not been submitted previously for the award of any other degree or diploma to any other institution or university. Date: Place: ____Sign_________________ Name (USN)
  • 4. I ACKNOWLEDGEMENT Happiness cannot be expressed by words and help taken cannot be left without thanking. I would like to thank all of them who were a part of my project work. I am thankful to our principal, Dr. Suresh Chandra H S, MRIT Mandya for all the facilities provided to us in the college. I would like to convey my sincere thanks to Prof. Soumya B, Head of the Department, Dept. of Computer Science and Engineering, MRIT. I am especially thankful to Prof. Chethan Raj C, Project Coordinator, Dept. of Computer Science and Engineering, MRIT, for his whole hearted encouragement and individual guidance in carrying out this project. I express my deep profound gratitude to Prof. Guide Name, Assistant Professor, Dept. of Computer Science and Engineering, MRIT who has been my guide and guiding in my endeavor to complete this project successfully. My profound thanks to all my lecturers for extending their kind co-operation and help during this project work. I would like to express my deepest gratitude to my family members, for their support and love. Finally, I would like to thank all my friends, who all made invaluable contributions to my work. Thanking You Student Name (USN)
  • 5. II ABSTRACT A Wireless Sensor Network (WSN) is a collection of small tiny devices which have computational processing ability, wireless receiver, transmitter technology and a power supply. Energy consumption in the sensor node is for the sensing, communication and data processing. More energy is required for data communication in sensor node. Among wireless communication systems, WSN is the most popularly used network and it consists of spatially distributed sensor nodes with sensing computation and wireless communication capabilities. These sensor nodes are scattered in an unattended environment (i.e. sensing field) to sense the physical world. It is very costly to deploy a complete test bed containing multiple networked computers, to validate and verify a certain network protocol or a specific network algorithm. The network simulator saves both money and time in accomplishing this task. In this project, we introduce two metrics called signal strength indicator and desired distance estimator to find optimal and reliable path between source and destination.
  • 6. III Table of Contents CONTENTS PAGE NO. ACKNOWLEDGEMENT I ABSTRACT II LIST OF CONTENTS III LIST OF FIGURES VI LIST OF TABLES VII LIST OF SNAPSHOTS LIST OF CONTENTS CHAPTER 1 INTRODUCTION 01 1.1 Domain Overview 01 1.2 Project Overview 02 1.3 Existing System 03 1.4 Disadvantage of Existing System 03 1.5 Problem Statement 04 1.6 Project Motivation 05 1.7 Proposed System 04 1.8 Advantages of Proposed System 05 1.9 Objective of the Project 06 1.10 Organization of report 07 CHAPTER 2 LITERATURE SURVEY 09 2.1 Literature Review 09 2.2 Conclusion of Review 09 CHAPTER 3 SYSTEM REQUIREMENT SPECIFICATION 16 3.1 Introduction 16 3.2 Functional Requirement 16 3.3 Non - Functional Requirement 17
  • 7. IV 3.4 System Requirements 17 3.4.1 Hardware Requirements 17 3.4.2 Software Requirements 17 CHAPTER 4 SYSTEM DEVELOPMENT 18 4.1 Introduction to System Development 18 4.2 Module & Methodology 18 4.2.1 Sub Module1 18 4.2.2 Sub Module2 21 CHAPTER 5 SYSTEM DESIGN 18 5.1 Introduction to System Design 18 5.2 High Level Design 18 5.2.1 Architecture of the System 18 5.2.2 Data Flow Diagram 21 5.3 Low Level Design 24 5.3.1 Process Diagram 24 5.3.2 Flow Chart 25 5.3.3 Sequence Diagram 26 5.3.4 Sequence Diagram 24 5.3.5 Activity Diagram 25 5.3.6 Use Case Diagram 26 CHAPTER 6 SYSTEM IMPLEMENTATION 27 6.1 Introduction to System Implementation 27 6.2 Language Used for Implementation 27 6.3 Algorithms 28 6.3.1 Algorithm1 Name 28
  • 8. V 6.3.2 Algorithm2 Name 29 6.4 Code Snippet 30 CHAPTER 7 TESTING 36 7.1 Introduction to Testing 36 7.2 Types of Testing 36 7.3 Test Cases 37 CHAPTER 8 RESULTS AND DISCUSSIONS 39 8.1 Introduction 39 8.2 Snapshots with Description 43 CONCLUSION AND FUTURE ENHANCEMENT 46 REFERENCES 47
  • 9. VI LIST OF FIGURES Figure No. Name of the Figure Page No. Fig 1.1 Wireless Sensor Network (WSN) 02 Fig 1.2 Sensor Node Components 02 Fig 2.1 Components of NS2 11 Fig 2.2 Network consists of N nodes at time t1=0 sec 13 Fig 2.3 Network consists of N nodes at time t2=10 sec 14 Fig 4.1 Architecture of Existing system 19 Fig 4.2 Architecture of Proposed system 20 Fig 4.3 Work Flow of existing System 21 Fig 4.4 Work flow of proposed system 23 Fig 4.5 Process Diagram 24 Fig 4.6 Flow chart 25 Fig 4.7 Sequence Diagram 26 Fig 7.1 Initial position of node 39 Fig 7.2 Route request from source and destination 40 Fig 7.3 Obstacle during route discovery 41 Fig 7.4 Finding an alternate path 42 Fig 7.5 Bit error rate 43 Fig 7.6 Packet delivery ratio 44 Fig 7.7 Throughput 45
  • 10. VII LIST OF TABLES Table No. Name of Table Page No. Table 1.1 Routing table-1 13 Table 1.2 Path table-1 14 Table 2.1 Routing table-2 15 Table 2.2 Path table-2 15 Table 6.1 Unit Test Cases 37 Table 6.2 Integration Test Cases 38
  • 11. VIII LIST OF SNAPSHOTS Snapshot No. Name of SnapShot Page No. Fig 8.1 Routing table-1 13 Fig 8.2 Path table-1 14 Fig 8.3 Routing table-2 15 Fig 8.4 Path table-2 15 Fig 8.5 Unit Test Cases 37 Fig 8.6 Integration Test Cases 38
  • 12. ABSTRACT Mapreduce is a prominent groundwork for scrutinizing and processing substantially massive data and hadoop groundwork is open source. It has been considered as the default platform in present days for examining, manipulating and storing enormous data. Since every educational establishments, business industries and research and development centers rely on hadoop for processing their data, the performance of the system must be maintained. One major obstacle of the hadoop groundwork that affects the performance and complicates overall system is long makespan or the completion time of mapreduce jobs. The hadoop scheme presently in use ratify stabile assignment of slots i.e. map and reduce slot numbers are predefined for cluster throughout its life at the inception of hadoop cluster formation. This setting causes under utilization of resources and large completion time. In order to reduce this limitations, this project has presented a mechanism where in the slots are assigned dynamically by self tuning. It collects execution technicalities of foregoing jobs and based on these details allocates the slots for map and reduce, so this in turn leverages the performance of overall application.
  • 13. Chapter 1 INTRODUCTION In recent years MapReduce programming standard has turned out to be the prominent technology for analyzing and processing big-data and its implementation Apache Hadoop is a complimentary implementation which can be used for analyzing broad range of data. Hadoop is a schematic groundwork adapted to process and stock huge bulk of data on distributed-parallel environment. Hadoop is designed and written in a way that it can extend from one server to multiple thousands of system each of which offers local storage and refining of abundant amount of data submitted by user. Due to the advancement in cloud computing, the Hadoop-MapReduce is suitable not only for large-companies and research centre for working on data-intensive projects but also for regular users by launching a hadoop cluster on cloud. 1.1 Objective With the rapid advancement of technology and as more and more data is generated, the applications are employing MapReduce techniques for scrutinizing, processing, and extracting their data. In this circumstance, main concern of programmer is how to achieve good reliability and how to enhance performance of a hadoop cluster. Hadoop groundwork constitutes large set of predefined system technicalities or attributes and these parameters plays salient role in leveraging performance of application. Preeminent intentions for development of this application are: First and foremost objective is to formulate new methods for modifying primitive attributes of the system for improving overall performance of the system. The second target of application is to curtail completion-time or also called makespan of batch of jobs by incorporating new formulated methods. And finally by achieving the above listed two objectives, third goal is to increase resource utilization while processing unstable workloads can also be achieved.
  • 14. 1.2 Existing System In the elementary Hadoop core architecture cluster comprises of a master node which is solitary and responsible for management and examining of the entire worker or also called slave nodes. And Hadoop groundwork cluster contains several worker nodes which hosts the task-tracker routine to execute map-reduce jobs. The jobtracker component resides in the master node; its main operation is allocating jobs and organizing map or reduce tasks to executed on map or reduce slots respectively in an adept manner. The number of tasks which can be accommodated on individual nodes is represented by a term called slot and in the elementary hadoop structure, each slot can run only one task at a specified time. Based on this circumstances and theory, the total number of slots position present in every node indicates the maximal magnitude of parallelism which can be achieved. The slot setting arrangement is primitive parameter and considered to be default throughout cluster’s lifetime which has crucial impact on performance of system. The basic hadoop groundwork makes use of fixed slot configuration and in this setting the number of slots for map and reduce is both predefined for each separate node at the beginning of cluster creation. This predefined number assigned for static configuration is random values without taking into account any job attributes. So static configuration of hadoop is not optimized and performance of the whole system may be hindered. Some of the drawbacks of classic Hadoop-MapReduce are: The system uses stabile slot setting, i.e. they have predefined number of map and reduce slots for individual nodes of cluster throughout its lifetime. A static arrangement of slots causes improper resource utilization. It scales down performance of overall system under diverse and unstable workloads.
  • 15. 1.3 Proposed System In order to overcome the limitations of existing system, this project aims at designing algorithms for modifying primitive system attributes and increasing the system performance of batch of jobs. In this project, a new conceptual theory of dynamically assigning slots is proposed. The vital goal of this new technique is decreasing completion- time of tasks executed while the simplicity hadoop implementation is retained as it is. The newly projected and designed system is termed as TuMM which stands for TUnable knob for minimizing Makespan of MapReduce jobs. Its major goal is to make slot allotment proportion of map and reduce tasks automatic. Projected system groundwork composes of two primary components: Workload-Estimator (WE) and Slot- Scheduler (SS). The workload-estimator is present in Job-tracker routine, and it acquires the details like execution time of foregoing completed tasks. This detail is used to compute current workload in hadoop cluster. Second integral component slot-scheduler fine-tunes the ratio of map and reduce slots for each worker node based on result computed by Workload-Estimator. A new variation of TuMM technique called H-TuMM is implemented for heterogeneous clusters which assigns slot for all the nodes separately to lessen the makespan of job cluster. Some of advantages of this proposed system are: It minimizes the completion time of two phases thereby scaling down the makespan of multiple jobs by individually allocating slots for nodes in heterogeneous environment. The projected system shows up to 28% curtailment of completion time or makespan for job cluster can be achieved. And which in turn causes 20% enhancement and rise in proper usage of system resources.
  • 16. 1.4 Organization Of Report This chapter summarizes introduction of project which is elaborately described in later section of report, so the second chapter provides a detailed survey about this projected system which constitutes various paper related to how the performance of hadoop can be improved. In the third chapter the requirements, constraints listed by the user for designing this application is described and following chapter illustrates design of the application which comprises of sequence diagram, architecture, and so on. Fifth chapter gives insight on implementation part and then testing techniques and test cases used for verifying this application is depicted in chapter six. Analysis of report containing snapshots is present in seventh chapter and finally conclusion and future enhancement is specified in the end.
  • 17. Chapter 2 LITERATURE SURVEY Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters [1] J Xie et al have designed and invented a data placement approach in the Hadoop distributed file system to calibrate the data load in a heterogeneous Hadoop cluster. The newly designed data placement component firstly distributes a vast data set to multiple nodes with respect to computing capacity of each node. They designed a data reorganization algorithm along with data redistribution algorithm in HDFS and these two algorithms can be used to solve the data skew problem caused by dynamic data addition and removal. Initial algorithm is used to divide and distribute file chunks to heterogeneous nodes in a cluster at beginning of cluster formation. When all file fragments of input file which is currently required by computing nodes are present in a node, then these chunks are distributed to computing nodes and then second algorithm is incorporated for rearranging file chunks to solve the data skew problem. First data placement algorithm starts off by initially splitting a vast input into a numerous fragments of same size. Then these fragments are allotted to nodes in cluster based on node’s data processing speed. Comparatively the high-performance nodes can stock and process more file chunks than low-performance nodes. The input file segments distributed by this algorithm might get disturbed because of the following reasons: first new data may be added to the current input file. Second data fragments may be deleted from current input file. And third new data computing nodes are augmented to the cluster which is present. To overcome this data load balancing problem, data redistribution algorithm mechanism is being incorporated. This reorders file chunks based on computing ratios, so in this method first the data about disk space utilization and network topology of cluster is compiled by the data distribution server. Next, two lists called over-utilized and under- utilized node list is created. Then the server shifts the file chunks from over-utilized node list to an underutilized node list until data load are allocated evenly among nodes.
  • 18. MARLA: MapReduce for Heterogeneous Clusters [2] Z Fadika et al implemented MARLA a MapReduce paradigm with dynamic load balancing which can be adapted for homogenous, heterogeneous and even for load imbalanced environments. MARLA is based on basic shared file systems as its input output management technique. Idea of this new model relies on dynamic task scheduling which allow nodes in hadoop bunch to request tasks when required. Previously in Hadoop MapReduce the tasks were evenly distributed and pre-assigned for the nodes before running a given application, but in MARLA the nodes in the cluster must request for the job when they are done executing foregoing tasks. Main node is responsible for registering number of tasks available in nodes and these nodes are assigned a token for identifying process and this can be used for requesting tasks. When task is requested by particular node, that specific task becomes unavailable to rest of the processing nodes. Node can request for a job only when it has executed, and successfully completed the foregoing tasks and henceforth in this scheme the fast and slow nodes process their fair- share. MARLA is composed of three integral components: splitter, task-controller and fault-tracker. The first component splitter is used for management of input and output , second component task-tracker and task-controller is responsible for task assignment and for checking concurrency and the last component fault-tracker is used for fault tolerance. Splitter management component is composed of splitting dataset and distribution of dataset, in order to work this framework takes input fragments as tasks which relies on user, are created. This scheme increases data visibility provided by shared disk file system to present its input data to cluster nodes, so input distribution is directly executed through shared file system. Task tracker is responsible for availability of tasks from data fragments produced by splitter, and availability of map and reduce code implemented by user to processing nodes in cluster by shared file system. Task tracker frequently checks improvement and progress of tasks and failed tasks are sent to task-bag through fault-tracker component. Failed tasks in task-bag are put on short term leave and then retried later and completed tasks are shifted to completed-task-bag and it is moved to reduce phase.
  • 19. HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL [3] In distributed parallel computing as complexity raises the three challenges: the programmability, reliability and energy efficiency of the system also increases. When trying to avoid three problems listed previously, performance of system may be hindered. In this work M Grossman et al have introduced a new idea of integrating Hadoop MapReduce with OpenCL to facilitate the use of heterogeneous processors in distributed system. Incorporating OpenCL with Hadoop provides: first user friendly, flexible and easily learnable application programming interface in high level and most widely used programming language, second it provides reliability of distributed file system and thirdly it guarantees minimal power utilization and leveraging performance of heterogeneous processors. By adapting new paradigm HadoopCL all the three challenges can be maintained without sacrificing the performance in hadoop distributed system. Functionalities of HadoopCL include: first in order to lessen modification done to legacy code, HadoopCL extends hadoop groundwork’s mapper and reducer classes to support execution of user written java kernels on heterogeneous hadoop bunches. Next functionality is adoption of dedicated communication threads and asynchronous communication to escalate utility of available bandwidth and restrain communication blockage. Third, HadoopCL aids in translating the java bytecode to OpenCL kernels automatically using APARAPI and translation of extensions to existing features of APARAPI. Lastly it evaluates HadoopCL’s performance in two multinode cluster comprising multicore CPUs, GPUs and APUs. HadoopCL depends on APARAPI tool for translating java bytecode to OpenCL kernels and OpenCL kernel code is produced for user-written map and reduce module and even for HadoopCL glue code whose function is to pass keys and values into user-written functions. The HadoopCL can modify its own memory access arrangement and iteration of loop for the best performance of the system. Presently it grants optimization for GPUs and multicore CPUs and APARAPI was extended to aid asynchronous kernel execution and it was accomplished by reforming the APARAPI C++ runtime to stock references to OpenCL events.
  • 20. Performance Modeling of MapReduce Jobs in Heterogeneous Cloud Environments [4] In present days hadoop is used for heterogeneous data handling and management which has additional challenge efficient cluster administration and job management. In this heterogeneity of data resources, which system resources is leading to performance hindrance and bottlenecks is not clear. In order to provide a mechanism for configuring and optimizing such Hadoop cluster Z Zang et al, analyzed efficiency and performance precision of the bounds-based performance (BBP) model and using this model they estimated completion time of MapReduce job in heterogeneous bunches. BBP (bounds-based performance) paradigm measures upper and lower limit of job finishing time and this model relies on makespan theorem which is used to calculate performance confinement on completion time for provided set of n number of tasks that are processed and refined by k number of servers. Greedy algorithm is used for allotment of tasks to slots and this is an online allocation technique where in, slot which has finished executing foregoing task earliest is assigned a new task. Then lower bound is the product of average duration of n task and fraction of n tasks and k servers. And upper bound is summation of maximum duration of n tasks and the product of average duration of n task and fraction of n-1 tasks and k servers. Difference between the least and at most value indicates set of obtainable completion times due to task scheduling and non-determinism. For approximately reckoning total finishing time of job submitted, first median of task timing taken and maximum duration of task should be measured at different stages of job execution: map-phase, shuffle-phase/sort-phase and reduce-phase. Median and maximum reckoning can be redeemed from job execution record. Job completion timing value of different processing stage like map-phase, shuffle-phase/sort-phase and reduce- phase of job can be computed by using newly projected bound paradigm. Dynamic Job Ordering and Slot Configurations for MapReduce Workloads [5] MapReduce performance and resource utilization varies based on different map- reduce slot configurations and job execution orders, so S Tang et al initiated usage of two
  • 21. classes of algorithms to reduce makespan and total completion time for an offline workload. First set of algorithms is used for optimizing job ordering given a map-reduce slot configuration and next class of algorithm is used for optimizing slot configuration. Algorithm used for optimizing job order is MK_JR and is based on Jhonson’s Rule for makespan optimization. The Jhonson’s rule can provide a best job order for makespan when there are one reduce and one map slot only. But generally when random amount of map and reduce slots are accessible, lowering makespan is considered as NP- hard. MK_JR algorithm produces 1+δ roughly close value to lowering makespan, where δ<1 and can be reckoned as ratio of summation of maximum map and reduce task size to summation of all task size. δ is a very small value because the time needed for processing single map-reduce task is very small compared to processing time of overall MapReduce workload. Another algorithm presented for optimizing makespan and total completion time concurrently is MK_TCT_JR, MK_TCT_JR is a bi-criteria heuristic algorithm, it optimizes the parameter values by observing the significant trade-off between completion time and makespan. After obtaining optimized map-reduce slot configuration by computing and verifying all possible values from 1 to S-1 where S is total number of slots. But when S becomes very large, search algorithm may be inefficient, in order to overcome this problem proportional configuration property was used.
  • 22. Chapter 3 SOFTWARE REQUIREMENT SPECIFICATION 3.1 Introduction This chapter discusses about various requirements of the project such as software requirements, hardware precondition, functional and non-functional prerequisite of the project and constraints the system must adhere to and this section of report also includes the purpose and project perspective. 3.2 Purpose The main purpose of this project is to enhance the performance of the system by scaling down completion time using the dynamic self tunable slot technique which in turn leverages the resource utilization of the overall system. 3.3 Project Perspective The elementary hadoop groundwork cluster is predefined with fixed setting arrangement for the slots. The number of slots for map and reduce stage is permanently defined in the beginning of cluster formation and can’t be altered later on. This elementary mechanism of hadoop schematic groundwork hinders the performance and optimality of the entire system and induces underutilization of resources among the nodes in system. Many techniques where projected to address this problem and complication generated in former methods like:  Quincy et al [6] adapted locality restraints and fairness hindrance for dealing job allocation and management complication.  Zaharia et al [7] suggested a delay scheduling to boost and facilitate optimality of fair scheduler by leveraging dataset locality  Verma et al [8] projected a heuristic to scale down makespan of a set of separate, self-reliant MapReduce jobs by applying classic Johnson’s algorithm. In order to overcome the limitations of all above techniques, in this project self tunable slot assignment techniques has been implemented. In this technique, map and reduce slots are allocated dynamically based on the feedback collected from the workload
  • 23. computing component. First integral component workload estimator calculates time of the foregoing job execution and then this is sent to next vital component slot scheduler, which properly assigns slots to map and reduce. 3.4 Functional Requirements Functional requirements are used for expressing the behavior of a project, purpose and role each component. This is represented as using inputs, outputs and its behavior based on the specified input. While implementing design phase of system, functional requirements are considered and behavior of the whole project is realized using the use cases. The use case depicts the behavior, relationship between the components or modules of the project and the use case explanation for this project is illustrated in system design chapter. 3.5 Non – Functional Requirements Non functional requirements depict conditions which can be utilized for analyzing the working of a project instead of its behavior. Category of perquisite or requirements of this kind is depicted elaborately in groundwork of this project. It is used to indicate the characteristics like security and usability which can be termed as execution-qualities. And traits like reliability, performance, optimality, consistency, maintainability can be termed as evolution-qualities are also described for projected application. 3.5.1 Performance Compare to other existing techniques of slot assignment, the self tunable slot allocation mechanism adapted in this project performance is high which in turn lowers the makespan of the job batch and resource utilization is increased. 3.5.2 Optimality Optimality defines how best the application runs in any circumstances irrespective of any kind of input dataset which results in effective and efficient processing method. In this application optimality is realized based on new formulated methodology that we have incorporated to specific dataset.
  • 24. 3.5.3 Reliability The reliability of application mainly depends on efficiency, throughput or performance that directly or indirectly affects overall behavior of system by considering specific data and its required properties. This application is more reliable, it allocates slots based feedback of foregoing jobs timing, thereby proper utilization of resources. 3.5.4 Portability Portability furnishes insight on how a project could be carried out irrespective of any platform or environment. This application makes use of open source architecture of hadoop and java as coding language so it is compatible with different platforms and datasets, and easily portable. 3.5.5 Security The security of this application depends on hadoop tool utilized, but it’s not affected by the dataset being used. 3.6 System Specification 3.6.1 Hardware Specification  SystemProcessor : Pentium IV 2.4 GHz  Secondary Storage : 40 GB  Primary Memory : 4GB 3.6.2 Software Specification  Platform tool : Windows 7/UBUNTU  Programmed by : Java 1.7, Hadoop 0.8.1  Interface : Eclipse  RDB : MYSQL
  • 25. Chapter 4 SYSTEM DESIGN System design is a substanially important phase in project building and creation stages of whole development cycle. The system design is a mecahnsim for depicting and representing the overall architecture of the system, interfaces between the different components, the methods and parameters defined for each module and data for a system according to requirements specified by the user. In order to design a system, first step is to collect the system requirements, functional and non functional requirements , constaraints from the user. Second step is designing the system in an abstract manner, this step provides outline of all major components that is required for designing sytem architecture. Third step is detecting and addressing bottlecks generated in the abstarct or high-level design due to violation of some constarints specified by the user. Next operation is designing system in more elaborate and detailed manner and this step constitutes specifying the methods, parameters,interfaces to application components. 4.1 High Level Design High level design reveals an abstract layout of entire application where abstract HLD pictorially depicts primitive constituents of system to be developed. The architecture of the system, the diagrams depicting flow of realtionship, flow of data are all considered as the high level designs and these designs are written using non-technical terms with slight additional technical terms. 4.1.1 System Architecture Architecture of application projects a blueprint of entire system in pictorial illustration. In this project the architecture three major components: job-assigner, slot- assigner and task processor as shown in the figure 4.1. When the user submits a batch of jobs to the system, first its sent to job-assigner component. This component in turn contains two sub components slot scheduler and workload estimator. Integral component workload estimator repeatedly collects the execution time technicalities of latterly completed tasks at periodic intervals and then this value is used
  • 26. for reckoning current map-reduce tasks at hand. After this second integral part, slot scheduler based on these estimation adjusts and assigns the slot ratio to map and reduce slave nodes . Figure 4.1 System Architecture 4.1.2 Data Flow Diagram Data flow diagram is constructed using geometrical components for representing flow of data between modules of sytem and another name for DFD is bubble-chart. DFD may be used to define abstraction of whole application at any stage, in that the context diagram is contemplated to be as topmost position of abstraction. Figure 4.2 potrays data flow diagram, whern the user logins, he will be provided with two options of working on homogeneous or heterogeneous cluster after this user submits job to be processed. System later examines whether job is scheduled for
  • 27. processing or waiting in queue. Once job is scheduled for carrying out work, workload is verified and job is split into task and in next step task is assigned to slot for execution. Figure 4.2 Data Flow Diagram
  • 28. 4.2 Low Level Design Low level design is used describing major components of system in detail and elaborate manner so this technique is also called detailed-design. In this technique the diagram is constructed by iteratively refining the given details, requirements and constraints and also depicts modules, their parameters, methods and relationship among them. 4.2.1 Use Case Diagram Use case diagram is a simple mechanism for demonstrating how the user interacts with the system functions which falls under behavioral drawing category in UML design and can be used to find out different users and describe their behavior towards different use cases. Figure 4.3 Use Case Diagram Figure 4.3 portrays use case representation where user interacts with functional components like job tracker, reduce process for assigning job to system. In this diagram
  • 29. there are two different users: first user submits the job for processing, second type of user is the one who requires the content. 4.2.2 Sequence Diagram Sequence diagram portrays how interplay occurs between different modules formulated in application and it falls under category of interaction drawing of UML design. This diagram shows how process interacts, their order, and sequence of message sent and received for interaction and all this occurs in a time frame, so this diagram are also referred as event-diagrams. Figure 4.4 Sequence Diagram Figure 4.4 portrays sequnce daigram which depicts interplay between three components: user job, job tracker and task tracker. First integral module user job process interacts with job tacker by sending a request message for processing user job and then
  • 30. job tracker sends user data to task tacker to processes job, after execution of job, task tracker send respond message with user results back to user. 4.2.3 Activity Diagram Activity diagram is a kind of behavioral reprentation which describes how workflows in overall system and is used to describe actions, interactions and activities of system in step-by-step manner.It can also be called as a type of flowchart. Figure 4.5 showcases activity diagram which describes how the workflows between all the modules i.e job tracker, mapping process and reducing job. Figure 4.5 Activity Diagram
  • 31. 4.2.4 Collaboration Diagram Collaboration diagram called as communication diagram is a type of interaction diagram in UML design which describes interplay among modules of system through messages as depicted in figure 4.6 which portrays collaboration diagram. This diagram combines both dynamic behavior and static details of a system and therefore it can be formed by the details taken from use case diagram, sequence diagram, class diagram and so on. Figure 4.6 Colloboration Diagram
  • 32. Chapter 5 IMPLEMENTATION 5.1 System Implementation Implementation stage of an application creation is actualization of ideas, design and requirement specification into source code. The primary objective of implementation part of building a project is production of source codes with good style and comments when necessary, by applying a proper and best coding technique which is suitable with the help of proper documents. Program codes are created in accordance to the structured coding techniques, which adheres to control flow, so that execution sequence follows the order in which codes are scripted. This makes the code unambiguous and more readable, which eases understanding, modifying, debugging, testing, and documentation of the programs. 5.2 Modules The modules implemented in this application for scaling down the makespan of the submitted jobs and efficient utilization of resource is described as follows: 5.2.1 Batch Processing of jobs In this module, we have to submit a job in the manner of batch. From this batch process, the jobs are processed as each and every batch for ease of understanding 5.2.2 Estimation of Workload In the basic version, assessment of workload was obtained based on amount of remaining jobs for map and reduce stage. New idea projected is based on workload details previously or known beforehand and for this workload details can be collected from tasks configuration, training stage, or some factual data settings, but in some cases the information regarding workload may not be precise or accessible for use. So for this situation, in this module workload details without any previous data is estimated. This module first considers incomplete task of both map and reduce stage and
  • 33. then execution time is summed up and used as estimator value but jobs present in waiting queue are not considered for this calculation. 5.2.3 Feedback for workload The technique adapted in last section is most suitable for homogenous environment where all the nodes are running same type of job, similar configuration and with the aid similar system resource usage. But for trending heterogeneous where slot allocation is dynamically changed, a new technique which uses feedback from foregoing jobs is implemented. This proposal helps to balance and accommodate slots automatically. 5.2.4 Job manager In this module, job manager is used to manage resources available in task administrator. Here, job manager have three sections are, workload estimator, slot scheduler, and the scheduler and integral component slot scheduler is used to assign job to task administrator. 5.2.5 Task Administrator In this module task administrator, performs the job instructed by job manager and this component performs task with guidance of task manager. Task manager implements job by two phases are map and reduce 5.3 Snapshot of Code Snippets Snapshot 5.1: Map Function Above code snippet is code for map stage which is extended from base predefined class.
  • 34. Snapshot 5.2: Overridden Map Function This snapshot 5.2 is code snippet which describes overridden map procedure. Snapshot 5.3: Overridden Reduce Function Above snapshot 5.3 of code snippet is used for coding overridden reduce function.
  • 35. Snapshot 5.4: Driver Configuration Snapshot 5.4 of code snippet portrayed above gives insight of driver configuration integral component.
  • 36. Chapter 6 TESTING Function of testing is to discover errors and is used for determining every plausible faults or glitches which may be developed in product. It presents a way for assessing functionality of components, modules, assemblies, sub-assemblies and completed product with thorough examination. It is the phase of product development, which is used to make sure that system is designed according to the specified requirements, the system adheres to user specified constraints, meets user expectation and does not fail in an unacceptable manner. 6.1 Testing Types Testing can be done in specific method, using several kinds of testing at different stage levels of developed product and some of the testing types are: 6.1.1 Unit Testing Unit testing is a technique for checking an individual module or codes in the final product with related program input so that it produces valid and desired outputs. For unit testing individual constituents of application is tested with test-cases written for each integral component by considering both positive and negative inputs. Each and every decision condition, internal code flow should be verified for desirable output with thorough examination and scrutinize. 6.1.2 Integration Testing Integration testing is a mechanism for examining integrated module code and in this technique uses an iterative approach where individual components that are unit tested are combined and verified for errors and proper functionality. This step is done many times by integrating all components designed and tested. This testing is specifically used for determining the problems which arises from the combination of the components. 6.1.3 System Testing System testing involves checking complete integrated software of product developed to check whether it adheres user requirements and can detect errors in
  • 37. integrated modules and as well as in entire system. System considers traits for checking usability, performance, optimality, exception handling, volume and load testing. 6.1.4 Functional Testing Functional tests administers systematic demonstrations that functions tested are available as specified by business and technical specifications, documentation, and manuals for user and organization of this kind of testing is focused on key functionality, requirements, or special test cases. In addition, systematic coverage pertaining to identify Business process flows; data fields, predefined processes, and successive processes must be considered for testing and before it is complete, additional tests are identified and the effective value of current tests is determined. 6.1.5 Acceptance Testing Acceptance testing is a method of testing specifically designed for examining whether the product or application to be tested meets all described requirements by the user, whether it adheres to the constraints specified. This testing is conducted after performing system testing and then product is made available for user and this technique can be adapted as black box testing technique. 6.1.6 White Box Testing White box testing is mechanism which checks internal working or code of product and in order to test product test-cases are designed based on details of flow of control and data, path and branch conditions. This technique is used for checking the system at code and can be incorporated in unit test verification, integration and regression testing. 6.1.7 Black Box Testing Black box testing is a testing mechanism designed for verifying the functionality of the system components or modules or whole project. This testing doesn’t consider internal flow of code structures, it provides inputs and responds with outputs without considering how inner software structure works and this testing is suitable for most of testing levels like acceptance, system, integration and unit.
  • 38. 6.2 Test Cases Test case is a document which contains set of perquisites using which application can be tested and can be used for checking modules or whole application. Test case can be formal also called technical or informal called non-technical, in formal draft application is tested for both positive and negative circumstances and in latter kind there is no technical specification, it is based on the working of the application. It can be documented using technicalities of modules, methodology, scenarios and cases. 6.2.1 Scenario for Hadoop Initialization Test Case ID Test_Case_01 Scenario Hadoop Initialization Description This case checks for proper initialization of hadoop. To initialize hadoop, it must be installed, configured and path must set properly. Input Enter the script to start initialization of the hadoop in the terminal Expected Result Specific Hadoop version with related files should be added. Actual Result Hadoop with specified version is initialized with related files. Remarks Pass Table 6.1 Test Case of Hadoop Initialization
  • 39. 6.2.2 Scenario for Hadoop Cluster Formation Test Case ID Test_Case_02 Scenario Hadoop Cluster Formation Description The perquisite for cluster formation is making sure that hadoop package must be available for use from the same specified path to all the nodes. Then based on required number, cluster is formed. Input Enter the script start_dfs.sh and start_mapred.sh in the terminal Expected Result Hadoop with given number of clusters should be defined. Actual Result Required number of hadoop cluster is created. Remarks Pass Table 6.2 Test Case of Hadoop Cluster Formation 6.2.3 Scenario for Verification by JPS Command Test Case ID Test_Case_03 Scenario Verification by JPS Command Description The Java virtual machine process status tool is used for verifying the hadoop parameters like job-history, name-node, and resource-
  • 40. manager are functioning properly. Input Enter the jps command in the terminal Expected Result The JPS should be initialized and all the required attributes and properties should be defined. Actual Result JPS is initialized and required attributes like name-node, resource-manager is defined. Remarks Pass Table 6.3 Test Case of Verification using JPS Command 6.2.4 Scenario for Port Programming Test Case ID Test_Case _04 Scenario Port Programming Description Processing of the port assigned to hadoop Expected Result The port allocated to hadoop should be properly configured and programmed. Actual Result The port assigned for hadoop is programmed. Remarks Pass Table 6.4 Test Case for Port Programming
  • 41. 6.2.5 Scenario for Configuring Environment for Dataset Test Case ID Test_Case_05 Scenario Setup for submitting dataset Description Configure a directory / folder to dump a source file Input Specific dataset Expected Result The source file for specific dataset should be created according to hadoop parameters in a hadoop environment. Actual Result Source file for specific dataset is created Remarks Pass Table 6.5 Test Case for Configuring Environment for Dataset 6.2.6 Scenario for Processing Data Using Mapreduce Test Case ID Test_Case_06 Scenario Processing data using mapreduce Description Hadoop processes the large dataset with map reduce technique. Which contains reduce and map stages for analyzing data effectively. Input Any specific dataset
  • 42. Expected Result Hadoop should reduce the processing activity by utilizing map reduce technique on specific database. Actual Result The hadoop processing activity is reduced by using map reduce technique Remarks Pass Table 6.6 Test Case for Usage of MapReduce 6.2.7 Scenario for Result Analysis Test Case ID Test_Case_07 Scenario Result Analysis Description After specific dataset is submitted for processing, the hadoop uses map-reduce mechanism for processing this data. Then result is produced as graphs Input Submit dataset to mapreduce framework Expected Result Different kinds of graphs which describes the results effectively should be generated Actual Result The application generates graphs for given dataset. Remarks Pass Table 6.7 Test Case for Result Analysis
  • 43. Chapter 7 RESULT ANALYSIS Snapshot 7.1: Initialization of Hadoop Cluster The snapshot 7.1 shows the initialization of the hadoop cluster1 and snapshot 7.2 illustrates hadoop cluster parameters like namenode, tmp, datanode, pid. Initialization is through set of commands in terminal. Snapshot 7.2: Parameters of Hadoop Cluster
  • 44. Snapshot 7.3: Start Hadoop Services Snapshot above shows how hadoop services are started with command start –all and it starts all services like namenode, resource manager, secondary namenode and node manager. Snapshot 7.4: Hadoop Page Snapshot shows overview of hadoop bunches which comprises of starting date, version, cluster id and some additional configuration details.
  • 45. Snapshot 7.5: JPS Verification Snapshot 7.5 shows JPS tracking, in all the hadoop applications, the JPS command i.e. java virtual machine process status tool and this command is used for examining all hadoop parameters proper functioning. Snapshot 7.6 portrays how path for source file is set. Snapshot 7.6 setting path for source file
  • 46. Snapshot 7.7: MapReduce Process Snapshot 7.8: Main Page of Slot Configuration
  • 47. Snapshot 7.9: Bar Graph illustrating the Result Snapshot 7.10: Line Graph illustrating the result Result of application is depicted in snapshot 7.9 and 7.10 which constitutes line and bar graph of result analysis
  • 48. CONCLUSION The project presented a new slot assignment technique called TuMM for dynamically assigning slots in Hadoop. The vital purpose of this application is to boost utilization of resource and scale down makespan of the given n number of jobs. This mechanism is suitable for homogeneous clusters and for heterogeneous hadoop bunches modified version of latter mechanism called H-TuMM is introduced. In this system, performance improvement is done, by separately configuring slots for every node. From this new slot allocation method, the project shows of about 28% decrease of completion time.
  • 49. REFERENCES [1] J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors et al. “ Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters”, Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), 2010 IEEE International Symposium. [2] Z. Fadika, E. Dede, J. Hartog, M. Govindaraju, “MARLA: MapReduce for Heterogeneous Clusters”, in Cluster, Cloud and Grid Computing (CCGrid), 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012) [3] M. Grossman, M. Breternitz, V. Sarkar, “HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL”, in Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW), 2013 IEEE 27th International. [4] Z. Zhang, L. Cherkasova, B. T. Loo, “Performance Modeling of Mapreduce Jobs in Heterogeneous Cloud Environments”, in Cloud Computing (CLOUD), 2013 IEEE Sixth International Conference. [5] S. Tang, B. S. Lee, B. He, “Dynamic Job Ordering and Slot Configuration for MapReduce Workloads”, in IEEE Transaction on Services Computing (volume:9, Issue: 1). [6] M. Isard, Vijayan Prabhakaran, J. Currey et al., “Quincy: fair scheduling for distributed computing clusters,” in SOSP’09, 2009, pp. 261–276. [7] M. Zaharia, D. Borthakur, J. S. Sarma et al., “Delay scheduling: A simple technique for achieving locality and fairness in cluster scheduling,” in EuroSys’10, 2010. [8] A. Verma, L. Cherkasova, and R. H. Campbell, “Two sides of a coin: Optimizing the schedule of mapreduce jobs to minimize their makespan and improve cluster performance,” in MASCOTS’ 12, Aug 2012.