SlideShare a Scribd company logo
1 of 6
Download to read offline
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 162
GEO DISTRIBUTED PARALLELIZATION PACTS IN MAP REDUCE
FRAMEWORK IN MULTIPLE DATACENTER
C.Kirubanantham1
, C.Rajavenkateswaran2
1
Student, computer science engineering, Nandha College of Technology, India
2
Assistant professor, Information Technology, Nandha College of Technology, India
Abstract
MapReduce framework in Hadoop plays an important role in handling and processing big data. Hadoop is scalable that is it can
reliably store and process petabytes. MapReduce works by dividing input files into chunks and processing these in a series of
parallelizable steps. MapReduce framework offers a response to the problem by distributing computations among large sets of nodes.
we concentrate on geographical distribution of data for sequential execution of MapReduce jobs to optimize the execution time. The
fixed execution strategy of MapReduce program is not optimal for many task and as it does not know about the behavior of the
functions. Thus, to overcome these issues, we are enhancing our proposed work with parallelization contracts. The parallelization
contracts include input and output contract which includes the constraints and functions of data execution. These contracts help to
capture a reasonable amount of semantics for executing any type of task with reduced time consumption.
Keywords— Bigdata, Datacenter, Geo-distributed, Hadoop, MapReduce, PACT
----------------------------------------------------------------------***---------------------------------------------------------------------
1. INTRODUCTION
Data comes from many sources such as social media websites,
sensors to gather climate information, trajectory information,
transaction records, other web site usage data etc. it is called
Big data. The limits to what can be done are often times due to
how much data can be processed in a given time-frame. Paral-
lelization contract framework have become one of the toolkit
for processing large datasets using cloud computing resources,
and are provided by most cloud vendors. GEO-PACT, a system
for efficiently processing geo-distributed big data GEO-PACT
is a Hadoop based framework that can efficiently perform a
sequence of Parallelization Contracts jobs on a geo-distributed
dataset across multiple datacenters. Sequences of Paralleliza-
tion Contracts jobs are executed on a given input by applying
the first job on the given input, applying the second job on the
output of the first job, and so on. GEO-PACT acts much like
the atmosphere surrounding the clouds. The problem of execut-
ing geo-distributed Parallelization Contracts job sequences as
arising in “cloud of clouds” scenarios is analyzed for job ex-
ecution. Executing individual Parallelization Contracts jobs in
each datacenter on corresponding inputs and then aggregating
results is defined as multiple execution path. The datacenter
with optimized execution path is selected for job execution.
2 HDFS
The Hadoop Distributed File System (HDFS). Provides redun-
dant storage for massive amounts of data using inexpensive
commodity hardware and serves as the large scale data storage
system. The NameNode splits large files into fixed sized data
blocks which are scattered across the cluster. Typically the data
block size for the HDFS is conFigd as 64MB, but it can be
conFigd by file system clients as per usage requirements. The
data storage is of type write once/read many (WORM) and
once written, the files can only be appended and cannot be
modified to maintain data coherency. Since HDFS is built on
commodity hardware, the machine fault rate is high. In order to
make the system failure tolerant, data blocks are replicated
across multiple DataNodes.
2.1 NameNode
NameNode is important for HDFS to store its metadata relia-
bly. Furthermore, while the file data is accessed in a write once
and read many model, the metadata structures (e.g., the names
of files and directories) can be modified by a large number of
clients concurrently. It is important that this information is
never desynchronized. Therefore, it is all handled by a single
machine, called the NameNode. The NameNode stores all the
metadata for the file system. Because of the relatively low
amount of metadata per file (Information about file locations in
HDFS, Information about file ownership and permissions,
Names of the individual blocks, Locations of the blocks of
each file), all of this information can be stored in the main
memory of the NameNode machine, allowing fast access to the
metadata.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 163
Fig 1.1 HDFS Architecture
2.2 Secondary NameNode
The Secondary NameNode is not a failover NameNode and it
performs memory-intensive administrative functions for the
NameNode. Secondary NameNode should run on a separate
machine in a large installation. The secondary name-node is to
perform periodic checkpoints. The secondary name-node pe-
riodically downloads current name-node image and edits log
files, joins them into new image and uploads the new image
back to the (primary and the only) name-node. If the name-
node fails and it will restart on the same physical node then
there is no need to shutdown data-nodes, just the name-node
need to be restarted. If the old node cannot use anymore you
will need to copy the latest image somewhere else. The latest
image can be found either on the node that used to be the pri-
mary before failure if available; or on the secondary name-
node.
2.3 DataNode
A DataNode is a storage server that accepts read/write requests
from the NameNode. DataNodes store data blocks for local or
remote clients of HDFS. Each data block is saved as a separate
file in the local file system of the DataNode. The DataNode
also performs block creation, deletion and replication as a part
of file system operations. For keeping the records up-to-date,
the DataNode periodically reports all of its data block informa-
tion to the NameNode. DataNode instances can talk to each
other for data replication. To maintain its live status in the clus-
ter, it periodically sends heartbeat signals to the NameNode.
When the NameNode fails to receive heartbeat signals from the
DataNode, it is marked as a dead node in the cluster.
3. GEO-DISTRIBUTION
Applications on cloud environment are geographically distri-
buted, for the reasons: to access data frequently; data’s are col-
lected and stored by different organizations to shared towards a
common goal; datasets are replicated across datacenters for
availability. Geo-PACT is a Hadoop-based system that can
efficiently process concurrent jobs on a geographically distri-
buted dataset.
Consider n datacenters, given by DC1, DC2…DCn with input
sub datasets I1, I2,…,In respectively. The total amount of input
data is thus |I|=n
i=1|Ii|. The bandwidth between the datacenters
DCi and DCj (i!= j) is given by Bi,j and the cost of transmitting
one unit of data between two datacenters is Ci,j. We define an
initial minimum partition size of input dataset to be transferred
across datacenters. On this geo-distributed input dataset se-
quence of jobs J1, J2,…Jn have to be executed. In each job
parallelization contract (PACTs) is applied. PACTS consist of
input and output contract. Input contract consist of five con-
tracts and output contract may be optional.
4 THE PACT PROGRAMMING MODEL
The PACT Programming Model is an extension of map/reduce
programming model. It operates over a key/value couple data
model so called Parallelization Contracts (PACTs). Following
the PACT programming model, programs are implemented by
providing task specific user code (the user functions, UFs) for
selected PACTs and assembling them to a work flow. A PACT
defines properties on the input and output data of its associated
user function. Figure 2 shows the parts of a Parallelization
Contracts, contains only one Input Contract and an optional
Output Contract. The Input Contract of a Parallelization Con-
tracts designates how the user function can be evaluated in
parallel. Optional output Contracts allow the optimizer to infer
certain properties of the output data of a user function and to
create a more efficient execution strategy for a program.
Fig 4.1 PACT Components
4.1 The Map Input Contract
The Map contract is used to process each key/value couple
independently; the Map input contract states that the user code
is invoked only once for each key-value couple of the input
data set.
Fig 4.2 Map Input Contract
Data
access
Client app
Name node
Secondary
name node
Data node Data node Data node
Metadata request
Independent
subset
User code
First-
order
function
Output
contract
Output
data
ValueKey
Input
data
ValueKey
Input
data
Independent
Invocations
of user code
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 164
4.3 The Reduce Input Contract
PACT Reduce input contract, in which all key-value couple of
a PACTs input data are grouped with an identical key. The user
code is attached to the Reduce contract is invoked for each of
these groups independently.
Fig 4.3 Reduce input contract
4.4 The Cross Input Contract
Multi-input contracts is the Cross input contract. The user code
attached to those multi-input contracts expects to receive data
from two specific data sources as input. Multi-input contracts
also construct the subsets for the user code based on two dif-
ferent input set and builds the Cartesian product of the two
inputs. All couple in the Cartesian product are then processed
independently by separate calls of the user code.
Fig 4.4 Cross input contract
4.5 The CoGroup Input Contract
CoGroup input contract, which is also a multi-input contract.
Independent subsets are built by joining the groups with same
keys of all inputs. The key/value couple of all inputs with the
same key are assigned to the same subset.
Fig 4.5 Cogroup input contract
4.6 The Match Input Contract
The Match input contract is a multiple input contract. All com-
binations of key-value couples with same keys are built in the
input data sets. After all groups are processed independently by
separate invocations of the attached user code. The Match in-
put contract performs equi-join on the key.
Fig 4.6 Match input contract
4.7 The PACT Output Contracts
Output Contracts are capable to declare indisputable properties
of user functions to maximize the efficiency of the task execu-
tion. An example of such an output contract is the SameKey
contract. When attached to a Map function then the user code
will not change the key, i.e. the type and value of the key re-
main after the user code’s invocation and are the same in the
output as in the input. Those references can be used by an op-
timizer that to generates parallel execution plans. The men-
tioned SameKey contract can frequently help to avoid unneces-
sary repartitioning and therefore expensive data shipping.
Hence, Output Contracts can significantly improve the runtime
of a PACT program.
5. OPERATION
On given geo-distributed input dataset, a sequence of opera-
tions is performed. In the following we focus on parallelization
contracts jobs J1 , J2 , ..., J m giving rise to a sequence of 5xm
Input
data A
ValueKey
Independent
Invocations
of user code
Input data B
Value
Key
Independent
Invocations
of user code
Input
data
ValueKey
Input data B
Independent
Invocations
of user code
ValueKey
Input
data A
ValueKey
Independent
Invocations
of user code
Input
data A
Val-
ue
Key
Input data BValue
Key
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 165
operations. Since each parallelization contracts job consists of
five major phases/operations (map, cross, match, cogroup, re-
duce). We define the state of data before a phase as a
stage identified by 0..5m. So input data is in stage 0 and final
output data received after applying MapReduce jobs 1...m is in
stage 5m.
To move data from stage s to next stage s + 1 a paralleli-
zation contracts phase is applied to data partitions and the same
number of (output) data partitions are created. The initial kth
partition of data is denoted by Pk0, and Pks represents the out-
put after executing parallelization contracts phases 1..s on par-
tition Pk0. Before performing a MapReduce phase, a partition
present in a datacenter may be moved to another datacenter. To
make our solution tractable we only allow full partitions of
data to be copied. The move may be for an initial partition or a
derivative of it received after executing one or more paralleli-
zation contracts phases. Initial partition sizes can be used as
parameter to trade accuracy and computation costs.
In this section executing sequence of parallelization contracts
on given input datasets. One of the main execution path is mul-
tiple execution path. Execution path for executing individual
parallelization contracts in each datacenter on given sub data-
sets and aggregating result from that datacenter is called mul-
tiple execution path. Aggregating result is not done automati-
cally in a datacenter hence indexed aggregators are used. Mov-
ing a data partition from one datacenter to another is costly
since this involves copying data across inter-datacenter links
from one distributed file system to another. If a partition Psk is
copied from DCi to DCj at stage s, this partition should not be
copied back to DCj to DCi at s+1 stage and also not copied
within same stage. Input sub-datasets are replicated across da-
tacenters, by taking exactly one replica of each of the input
sub-datasets.
5.1 Architecture
GEO-PACTs consist of single group manager in only one data-
center and job manager is deployed in every participating data-
center. Group manager finds the execution path and each job
manager in datacenter manage phase of parallelization contract
jobs that executed within datacenter using Hadoop cluster. Job
manager contains two components which are copy manager
and aggregation manager. Copy manager manage copying data
from one datacenter to another and aggregation manager man-
age the aggregation of result copied from different datacenter.
Data center configuration file describe about the datacenter
which have to be participate in geographically distributed pa-
rallelization contracts jobs handled by geo-pacts. Identification
is provided to each datacenter to identify when it is running
parallelization contracts. Datacenters store their data in Hadoop
distributed file system.
Fig 5.1 Architecture of GEO-PACT
Geographically distributed parallelization contracts jobs se-
quences are submitted to the Group manager through a job
configuration file. Group manager breaks the job sequence into
a number of tasks and this information is informed to Job man-
ager components copy manager and aggregation manager to
describe the portion of task that should be performed within
corresponding datacenters. User can specify sub dataset of
geographically distributed input dataset through a xml based
job configuration file.
Once the group manager starts its execution in multiple execu-
tion paths it instructs each job manager about the paralleliza-
tion contracts jobs that should be executed in corresponding
datacenter and the respective sub-datasets those parallelization
contracts jobs should be executed on. Job managers execute the
jobs using the Hadoop clusters deployed in corresponding da-
tacenters. The Group manager informs a Job manager to copy
data to a corresponding datacenter or aggregate multiple sub
datasets copied from two or more corresponding datacenters. A
Job manager use local Copy manager and Aggregation manag-
er to execute these tasks.
When ever a copy operation has to be performed, Copy man-
ager in the datacenter from where the data has to be copied
from and reads data from the corresponding HDFS storage and
sends data to the Copy manager located in the datacenter to
which data has to be copied, to through a TCP stream. The
Copy manager in the receiving side stores this data in the cor-
responding HDFS distributed storage. GEO-PACT uses aggre-
gators to aggregate the results obtained from parallelization
contracts jobs in parts of the input. Once output data generated
in one or more datacenters is copied to a single destination
Job execu-
tion
Group Manager
DC config file Job config file
Copy manager Aggregation manager
Job Managers
Datacenters
Name Node Job tracker
Hadoop Cluster
Job execution
Data
node
Data
node
Task
Tracker
Task
Tracker
Job
execution
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 166
datacenter, the Group manager instructs the Job manager run-
ning in the destination datacenter to initiate an aggregate op-
eration.
6. DATACENTER ARCHITECTURE
Parallelization contract style computation systems have a min-
imum network bandwidth to move the data to the computation,
instead of the computation must move to the data. In datacen-
ter, compute nodes and storage nodes are attached to same
network switch. Compute node contains central processing unit
and network interface, and storage node contains disk to store
data. Instead of co-locating storage and computation in the
same box as in the traditional MapReduce storage architecture,
this design co-locates storage and computation on the same
Ethernet switch. The advantages of using remote disks in a
datacenter are many. First, both computation node and storage
node can be modified to meet the application requirements
during both cluster construction and operation. Second, com-
putation node and storage node can be decoupled from each
other when they went to failure. It avoids wasted resources that
would have been used to reconstruct lost storage when a com-
putation node fails.
Third, fine-grained power management techniques can be used,
whereby compute and storage nodes are enabled and disabled
to meet current application requirements. Finally, because
computation is now an independent resource, a mix of both
high and low power processors can be implemented. The run-
time environment managing application execution can change
the processors being used for a specific application in order to
meet administrative power and performance goals.
Fig 6.1 Remote Storage Architecture
7. FAILURE HANDLING
The Group manager must start in a trustworthy datacenter.
Each Job manager frequently sends a heartbeat message back
to the Group manager. If the Group manager does not receive a
heartbeat from a corresponding Job manager for some time, a
new Job manager is started in the same datacenter. If the Group
manager receives a late heartbeat message from the old Job
manager after starting the new Job manager, a message is sent
back to the old Job manager to quit itself. Once a Job manager
starts a parallelization contracts job it stores information about
this job in a predefined location of the distributed file system in
its datacenter so that a newly started Job manager can catch up.
Failure of a Group manager will result in the failure of the
GEO-PACT instance similar to the termination of the Job
Tracker component of Hadoop.
8 CONCLUSIONS AND FUTURE WORK
This paper presents Geo-Pact, a extension of MapReduce
framework that can efficiently execute a sequence of jobs on
geographically distributed datasets. Minimizing either execu-
tion time or cost. GEO-PACTS can substantially improve time
or cost of job execution compared to naive schedules of cur-
rently widely followed deployments. This framework is also
applicable to single datacenters with non uniform transmission
characteristics, such as datacenters divided into zones or other
network architectures. Future work is to form clusters from the
final result and with this analyzed data can be retrived are more
efficiently.
REFERENCES
[1] Agarwal S, Dunagan J, Jain N, Saroiu S, Wolman A,
and Bhogan H, “Volley: Automated Data Placement for
Geo-Distributed Cloud Services,” in National Spatial
Data Infrastructure, 2010.
[2] Alexander Alexandrov, Stephan Ewen, Max Heimel,
Fabian Hueske, Odej Kao, Volker Markl, Erik Nijkamp,
Daniel Warneke, “MapReduce and PACT - Comparing
Data Parallel Programming Models”.
[3] Apache Software Foundation, “Hadoop”,
http://hadoop.apache.org.
[4] “Big data”,
http://www.emc.com/campaign/bigdata/index.html
[5] ChamikaraJayalath, Julian Stephen, and Patrick Eugster,
“From the Cloud to the Atmosphere: Running MapRe-
duce across Datacenters”, IEEE Transactions on Com-
puters Special Issue On Cloud Of Clouds Volume 63,
issue 1, pages 74-87, May 2013.
[6] Data from Year 2000 US Census”,
http://aws.amazon.com/datasets/Economics/2290.
[7] Dean J and Ghemawat S, “MapReduce Simplified Data
Processing on Large clusters”, in Operating Systems
Design and Implementation, 2004.
[8] Dean J and Ghemawat S, “MapReduce Simplified Data
Processing on Large Clusters”, in Operating Systems
Rack uplinkLAN (Ethernet)
Storage Node 1 Storage Node m
Network disk
controller
Disk
Network disk
controller
Disk
Compute Node m
Network interface
CPU/Memory
Compute Node1
Network interface
CPU/Memory
Network switch
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 167
Design and Implementation, pages 137–150, 2004.
[9] Dejun J, Pierre D, and Chi C, “EC2 Performance Analy-
sis for Resource Provisioning of Service-Oriented Ap-
plications”, in International Conference on Service
Oriented Computing, 2009.
[10] Fushimi S, Kitsuregawa M, and Tanaka H, “An Over-
view of The System Software of A Parallel Relational
Database Machine GRACE”, Very Large Data Base,
pages 209–219, Morgan Kaufmann, 1986.
[11] “Hadoop the Definitive Guide”,
http://oreilly.com/catalog/ 9780596521981
[12] “Hadoop”, http://hadoop.apache.org/index.html
[13] “HDFS”,
http://developer.yahoo.com/hadoop/tutorial/module2.ht
ml
[14] Introduction to Analytics of Big Data and Hadoop”,
Storage Networking Industry Associa-
tion,http://www.snia.org/sites/default/files2/ABDS2012/
Tuto-
rials/RobPeglarIntroduction_Analytics%20_Big%20Dat
a_Hadoop.pdf
[15] Introduction to hadoop mapreduce”,
http://developer.yahoo.com/hadoop/tutorial/module4
[16] Jeffrey Shafer, Scott Rixner, Alan L. Cox Rice Universi-
ty “Datacenter Storage Architecture for MapReduce
Applications”.
[17] Lloyd W, Freedman M J, Kaminsky M, and Andersen D
J, “Don’t Settle for Eventual: Scalable Causal Consis-
tency for Wide area Storage with COPS”, in ACM
Symposium on Operating Systems Principles, 2011.
[18] Olston C, Reed B, Silberstein A, and Srivastava U, “Au-
tomatic Optimization of Parallel Dataflow Programs”,
Unix Users Group Annual Technical Conference, pages
267–273, Unix Users Group Association, 2008.
[19] Ousterhout J, Agrawal P, Erickson D, Kozyrakis C, Le-
verich J, Mazieres D, Mitra S, Narayanan A, Parulkar
G, Rosenblum M, Rumble S M, Stratmann E, and
Stutsman R, “The Case for RAMClouds: Scalable High-
performance Storage Entirely in DRAM”, ACM Special
Interest Group on Operating Systems Operating Sys-
tems Review, volume 43, no 4, pages 92–105, January
2010.
[20] Pavlo A, Paulson E, RasinA, Abadi D J, DeWitt D J,
Madden S, and Stonebraker M, “A Comparison of Ap-
proaches to Large-Scale Data Analysis”, Special Interest
Group on Management Of Data Conference, pages 165–
178, ACM, 2009.
[21] Selinger P G, Astrahan M M, Chamberlin D D, Lorie D
D, and Price T. G “Access Path Selection in a Relation-
al Database Management System”. In P. A Bernstein,
editor, Special Interest Group on Management Of Data
Conference, pages 23–34, ACM, 1979.
[22] Sovran Y, Power R, Aguilera M K, and Li J, “Transac-
tional Storage for Geo-replicated Systems”, in ACM
Symposium on Operating Systems Principles, 2011.
[23] “Teradata”, http://www.teradata.com
[24] vorgelegt von, Daniel Warneke, Berlin, “Massively Pa-
rallel Data Processing on Infrastructure as a Service
Platforms”.
[25] Yang H, Dasdan A, Hsiao R, and Parker D S, “MapRe-
duce-Merge: Simplified Relational Data Processing on
Large Clusters”, in Special Interest Group on Manage-
ment of Data, 2007.
[26] Zahariab M, Konwinski A, JosephA D, Katz R H, and
Stoica I, “Improving MapReduce Performance in Hete-
rogeneous Environments”, in Operating Systems Design
and Implementation, 2008.

More Related Content

What's hot

HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...Kyong-Ha Lee
 
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in CloudSurvey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in CloudAM Publications
 
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...ijdpsjournal
 
An experimental evaluation of performance
An experimental evaluation of performanceAn experimental evaluation of performance
An experimental evaluation of performanceijcsa
 
A Brief on MapReduce Performance
A Brief on MapReduce PerformanceA Brief on MapReduce Performance
A Brief on MapReduce PerformanceAM Publications
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceAIRCC Publishing Corporation
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsIJERA Editor
 
Exploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed UpExploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed UpIJERD Editor
 
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...AM Publications
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudIJERA Editor
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniquejournalBEEI
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...redpel dot com
 

What's hot (19)

HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
HadoopXML: A Suite for Parallel Processing of Massive XML Data with Multiple ...
 
Dremel
DremelDremel
Dremel
 
Survey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in CloudSurvey on Load Rebalancing for Distributed File System in Cloud
Survey on Load Rebalancing for Distributed File System in Cloud
 
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...
 
IJET-V3I2P24
IJET-V3I2P24IJET-V3I2P24
IJET-V3I2P24
 
An experimental evaluation of performance
An experimental evaluation of performanceAn experimental evaluation of performance
An experimental evaluation of performance
 
A Brief on MapReduce Performance
A Brief on MapReduce PerformanceA Brief on MapReduce Performance
A Brief on MapReduce Performance
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for Mapreduce
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in Clouds
 
Database System Architectures
Database System ArchitecturesDatabase System Architectures
Database System Architectures
 
Distributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - IntroductionDistributed DBMS - Unit 1 - Introduction
Distributed DBMS - Unit 1 - Introduction
 
Exploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed UpExploiting Multi Core Architectures for Process Speed Up
Exploiting Multi Core Architectures for Process Speed Up
 
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
[IJET V2I5P18] Authors:Pooja Mangla, Dr. Sandip Kumar Goyal
 
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
Enhancement of Map Function Image Processing System Using DHRF Algorithm on B...
 
Distributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private CloudDistributed Framework for Data Mining As a Service on Private Cloud
Distributed Framework for Data Mining As a Service on Private Cloud
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
 
Parallel Database
Parallel DatabaseParallel Database
Parallel Database
 

Viewers also liked

An enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical dataAn enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical dataeSAT Publishing House
 
Mining elevated service itemsets on transactional recordsets using slicing
Mining elevated service itemsets on transactional recordsets using slicingMining elevated service itemsets on transactional recordsets using slicing
Mining elevated service itemsets on transactional recordsets using slicingeSAT Publishing House
 
Unity 7 gui baesd design platform for ubuntu touch mobile os
Unity 7 gui baesd design platform for ubuntu touch mobile osUnity 7 gui baesd design platform for ubuntu touch mobile os
Unity 7 gui baesd design platform for ubuntu touch mobile oseSAT Publishing House
 
Studies on pyrolysis of spent engine oil in a quartz load cell
Studies on pyrolysis of spent engine oil in a quartz load cellStudies on pyrolysis of spent engine oil in a quartz load cell
Studies on pyrolysis of spent engine oil in a quartz load celleSAT Publishing House
 
Applications of matlab in optimization of bridge
Applications of matlab in optimization of bridgeApplications of matlab in optimization of bridge
Applications of matlab in optimization of bridgeeSAT Publishing House
 
Temperature analysis of lna with improved linearity for rf receiver
Temperature analysis of lna with improved linearity for rf receiverTemperature analysis of lna with improved linearity for rf receiver
Temperature analysis of lna with improved linearity for rf receivereSAT Publishing House
 
Background differencing algorithm for moving object detection using system ge...
Background differencing algorithm for moving object detection using system ge...Background differencing algorithm for moving object detection using system ge...
Background differencing algorithm for moving object detection using system ge...eSAT Publishing House
 
Hydrogen production from glycerol using microbial electrolysis cell
Hydrogen production from glycerol using microbial electrolysis cellHydrogen production from glycerol using microbial electrolysis cell
Hydrogen production from glycerol using microbial electrolysis celleSAT Publishing House
 
Rate adaptive resource allocation in ofdma using bees algorithm
Rate adaptive resource allocation in ofdma using bees algorithmRate adaptive resource allocation in ofdma using bees algorithm
Rate adaptive resource allocation in ofdma using bees algorithmeSAT Publishing House
 
Gis in assessing topographical aspects of hilly regions
Gis in assessing topographical aspects of hilly regionsGis in assessing topographical aspects of hilly regions
Gis in assessing topographical aspects of hilly regionseSAT Publishing House
 
Diabetic maculopathy detection using fundus fluorescein angiogram images a ...
Diabetic maculopathy detection using fundus fluorescein angiogram images   a ...Diabetic maculopathy detection using fundus fluorescein angiogram images   a ...
Diabetic maculopathy detection using fundus fluorescein angiogram images a ...eSAT Publishing House
 
In data streams using classification and clustering
In data streams using classification and clusteringIn data streams using classification and clustering
In data streams using classification and clusteringeSAT Publishing House
 
Performance investigation of a flat plate collector
Performance investigation of a flat plate collectorPerformance investigation of a flat plate collector
Performance investigation of a flat plate collectoreSAT Publishing House
 
Proposed aes for image steganography in different medias
Proposed aes for image steganography in different mediasProposed aes for image steganography in different medias
Proposed aes for image steganography in different mediaseSAT Publishing House
 
A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...eSAT Publishing House
 
Application of ibearugbulem’s model for optimizing granite concrete mix
Application of ibearugbulem’s model for optimizing granite concrete mixApplication of ibearugbulem’s model for optimizing granite concrete mix
Application of ibearugbulem’s model for optimizing granite concrete mixeSAT Publishing House
 
Hydrostatic transmission as an alternative to conventional gearbox
Hydrostatic transmission as an alternative to conventional gearboxHydrostatic transmission as an alternative to conventional gearbox
Hydrostatic transmission as an alternative to conventional gearboxeSAT Publishing House
 
Efficiently searching nearest neighbor in documents
Efficiently searching nearest neighbor in documentsEfficiently searching nearest neighbor in documents
Efficiently searching nearest neighbor in documentseSAT Publishing House
 
Behaviour of concrete beams reinforced with glass
Behaviour of concrete beams reinforced with glassBehaviour of concrete beams reinforced with glass
Behaviour of concrete beams reinforced with glasseSAT Publishing House
 
Effect of age and seasonal variations on leachate
Effect of age and seasonal variations on leachateEffect of age and seasonal variations on leachate
Effect of age and seasonal variations on leachateeSAT Publishing House
 

Viewers also liked (20)

An enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical dataAn enhanced fuzzy rough set based clustering algorithm for categorical data
An enhanced fuzzy rough set based clustering algorithm for categorical data
 
Mining elevated service itemsets on transactional recordsets using slicing
Mining elevated service itemsets on transactional recordsets using slicingMining elevated service itemsets on transactional recordsets using slicing
Mining elevated service itemsets on transactional recordsets using slicing
 
Unity 7 gui baesd design platform for ubuntu touch mobile os
Unity 7 gui baesd design platform for ubuntu touch mobile osUnity 7 gui baesd design platform for ubuntu touch mobile os
Unity 7 gui baesd design platform for ubuntu touch mobile os
 
Studies on pyrolysis of spent engine oil in a quartz load cell
Studies on pyrolysis of spent engine oil in a quartz load cellStudies on pyrolysis of spent engine oil in a quartz load cell
Studies on pyrolysis of spent engine oil in a quartz load cell
 
Applications of matlab in optimization of bridge
Applications of matlab in optimization of bridgeApplications of matlab in optimization of bridge
Applications of matlab in optimization of bridge
 
Temperature analysis of lna with improved linearity for rf receiver
Temperature analysis of lna with improved linearity for rf receiverTemperature analysis of lna with improved linearity for rf receiver
Temperature analysis of lna with improved linearity for rf receiver
 
Background differencing algorithm for moving object detection using system ge...
Background differencing algorithm for moving object detection using system ge...Background differencing algorithm for moving object detection using system ge...
Background differencing algorithm for moving object detection using system ge...
 
Hydrogen production from glycerol using microbial electrolysis cell
Hydrogen production from glycerol using microbial electrolysis cellHydrogen production from glycerol using microbial electrolysis cell
Hydrogen production from glycerol using microbial electrolysis cell
 
Rate adaptive resource allocation in ofdma using bees algorithm
Rate adaptive resource allocation in ofdma using bees algorithmRate adaptive resource allocation in ofdma using bees algorithm
Rate adaptive resource allocation in ofdma using bees algorithm
 
Gis in assessing topographical aspects of hilly regions
Gis in assessing topographical aspects of hilly regionsGis in assessing topographical aspects of hilly regions
Gis in assessing topographical aspects of hilly regions
 
Diabetic maculopathy detection using fundus fluorescein angiogram images a ...
Diabetic maculopathy detection using fundus fluorescein angiogram images   a ...Diabetic maculopathy detection using fundus fluorescein angiogram images   a ...
Diabetic maculopathy detection using fundus fluorescein angiogram images a ...
 
In data streams using classification and clustering
In data streams using classification and clusteringIn data streams using classification and clustering
In data streams using classification and clustering
 
Performance investigation of a flat plate collector
Performance investigation of a flat plate collectorPerformance investigation of a flat plate collector
Performance investigation of a flat plate collector
 
Proposed aes for image steganography in different medias
Proposed aes for image steganography in different mediasProposed aes for image steganography in different medias
Proposed aes for image steganography in different medias
 
A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...A novel way of verifiable redistribution of the secret in a multiuser environ...
A novel way of verifiable redistribution of the secret in a multiuser environ...
 
Application of ibearugbulem’s model for optimizing granite concrete mix
Application of ibearugbulem’s model for optimizing granite concrete mixApplication of ibearugbulem’s model for optimizing granite concrete mix
Application of ibearugbulem’s model for optimizing granite concrete mix
 
Hydrostatic transmission as an alternative to conventional gearbox
Hydrostatic transmission as an alternative to conventional gearboxHydrostatic transmission as an alternative to conventional gearbox
Hydrostatic transmission as an alternative to conventional gearbox
 
Efficiently searching nearest neighbor in documents
Efficiently searching nearest neighbor in documentsEfficiently searching nearest neighbor in documents
Efficiently searching nearest neighbor in documents
 
Behaviour of concrete beams reinforced with glass
Behaviour of concrete beams reinforced with glassBehaviour of concrete beams reinforced with glass
Behaviour of concrete beams reinforced with glass
 
Effect of age and seasonal variations on leachate
Effect of age and seasonal variations on leachateEffect of age and seasonal variations on leachate
Effect of age and seasonal variations on leachate
 

Similar to IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308

IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET Journal
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...IOSR Journals
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...IJSRD
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...IJSRD
 
Cloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control MethodCloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control MethodIRJET Journal
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTijwscjournal
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentSafayet Hossain
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
IRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication SystemIRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication SystemIRJET Journal
 
assignment3
assignment3assignment3
assignment3Kirti J
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationEditor IJMTER
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
IRJET- Cross User Bigdata Deduplication
IRJET-  	  Cross User Bigdata DeduplicationIRJET-  	  Cross User Bigdata Deduplication
IRJET- Cross User Bigdata DeduplicationIRJET Journal
 

Similar to IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 (20)

IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
IRJET- Generate Distributed Metadata using Blockchain Technology within HDFS ...
 
H04502048051
H04502048051H04502048051
H04502048051
 
H017144148
H017144148H017144148
H017144148
 
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
 
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...
 
Cloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control MethodCloud Computing Ambiance using Secluded Access Control Method
Cloud Computing Ambiance using Secluded Access Control Method
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
 
A01260104
A01260104A01260104
A01260104
 
Hadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and AssessmentHadoop Cluster Analysis and Assessment
Hadoop Cluster Analysis and Assessment
 
E031201032036
E031201032036E031201032036
E031201032036
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud Environment
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 
Eg4301808811
Eg4301808811Eg4301808811
Eg4301808811
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
IRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication SystemIRJET - A Secure Access Policies based on Data Deduplication System
IRJET - A Secure Access Policies based on Data Deduplication System
 
assignment3
assignment3assignment3
assignment3
 
A Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-DuplicationA Hybrid Cloud Approach for Secure Authorized De-Duplication
A Hybrid Cloud Approach for Secure Authorized De-Duplication
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
IRJET- Cross User Bigdata Deduplication
IRJET-  	  Cross User Bigdata DeduplicationIRJET-  	  Cross User Bigdata Deduplication
IRJET- Cross User Bigdata Deduplication
 

More from eSAT Publishing House

Likely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnamLikely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnameSAT Publishing House
 
Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...eSAT Publishing House
 
Hudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnamHudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnameSAT Publishing House
 
Groundwater investigation using geophysical methods a case study of pydibhim...
Groundwater investigation using geophysical methods  a case study of pydibhim...Groundwater investigation using geophysical methods  a case study of pydibhim...
Groundwater investigation using geophysical methods a case study of pydibhim...eSAT Publishing House
 
Flood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaFlood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaeSAT Publishing House
 
Enhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingEnhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingeSAT Publishing House
 
Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...eSAT Publishing House
 
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...eSAT Publishing House
 
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...eSAT Publishing House
 
Shear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a reviewShear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a revieweSAT Publishing House
 
Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...eSAT Publishing House
 
Risk analysis and environmental hazard management
Risk analysis and environmental hazard managementRisk analysis and environmental hazard management
Risk analysis and environmental hazard managementeSAT Publishing House
 
Review study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallsReview study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallseSAT Publishing House
 
Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...eSAT Publishing House
 
Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...eSAT Publishing House
 
Coastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaCoastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaeSAT Publishing House
 
Can fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structuresCan fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structureseSAT Publishing House
 
Assessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingsAssessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingseSAT Publishing House
 
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...eSAT Publishing House
 
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...eSAT Publishing House
 

More from eSAT Publishing House (20)

Likely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnamLikely impacts of hudhud on the environment of visakhapatnam
Likely impacts of hudhud on the environment of visakhapatnam
 
Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...Impact of flood disaster in a drought prone area – case study of alampur vill...
Impact of flood disaster in a drought prone area – case study of alampur vill...
 
Hudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnamHudhud cyclone – a severe disaster in visakhapatnam
Hudhud cyclone – a severe disaster in visakhapatnam
 
Groundwater investigation using geophysical methods a case study of pydibhim...
Groundwater investigation using geophysical methods  a case study of pydibhim...Groundwater investigation using geophysical methods  a case study of pydibhim...
Groundwater investigation using geophysical methods a case study of pydibhim...
 
Flood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, indiaFlood related disasters concerned to urban flooding in bangalore, india
Flood related disasters concerned to urban flooding in bangalore, india
 
Enhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity buildingEnhancing post disaster recovery by optimal infrastructure capacity building
Enhancing post disaster recovery by optimal infrastructure capacity building
 
Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...Effect of lintel and lintel band on the global performance of reinforced conc...
Effect of lintel and lintel band on the global performance of reinforced conc...
 
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
Wind damage to trees in the gitam university campus at visakhapatnam by cyclo...
 
Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...Wind damage to buildings, infrastrucuture and landscape elements along the be...
Wind damage to buildings, infrastrucuture and landscape elements along the be...
 
Shear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a reviewShear strength of rc deep beam panels – a review
Shear strength of rc deep beam panels – a review
 
Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...Role of voluntary teams of professional engineers in dissater management – ex...
Role of voluntary teams of professional engineers in dissater management – ex...
 
Risk analysis and environmental hazard management
Risk analysis and environmental hazard managementRisk analysis and environmental hazard management
Risk analysis and environmental hazard management
 
Review study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear wallsReview study on performance of seismically tested repaired shear walls
Review study on performance of seismically tested repaired shear walls
 
Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...Monitoring and assessment of air quality with reference to dust particles (pm...
Monitoring and assessment of air quality with reference to dust particles (pm...
 
Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...Low cost wireless sensor networks and smartphone applications for disaster ma...
Low cost wireless sensor networks and smartphone applications for disaster ma...
 
Coastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of indiaCoastal zones – seismic vulnerability an analysis from east coast of india
Coastal zones – seismic vulnerability an analysis from east coast of india
 
Can fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structuresCan fracture mechanics predict damage due disaster of structures
Can fracture mechanics predict damage due disaster of structures
 
Assessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildingsAssessment of seismic susceptibility of rc buildings
Assessment of seismic susceptibility of rc buildings
 
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
A geophysical insight of earthquake occurred on 21 st may 2014 off paradip, b...
 
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
Effect of hudhud cyclone on the development of visakhapatnam as smart and gre...
 

Recently uploaded

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction managementMariconPadriquez1
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 

Recently uploaded (20)

Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
computer application and construction management
computer application and construction managementcomputer application and construction management
computer application and construction management
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 

IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 162 GEO DISTRIBUTED PARALLELIZATION PACTS IN MAP REDUCE FRAMEWORK IN MULTIPLE DATACENTER C.Kirubanantham1 , C.Rajavenkateswaran2 1 Student, computer science engineering, Nandha College of Technology, India 2 Assistant professor, Information Technology, Nandha College of Technology, India Abstract MapReduce framework in Hadoop plays an important role in handling and processing big data. Hadoop is scalable that is it can reliably store and process petabytes. MapReduce works by dividing input files into chunks and processing these in a series of parallelizable steps. MapReduce framework offers a response to the problem by distributing computations among large sets of nodes. we concentrate on geographical distribution of data for sequential execution of MapReduce jobs to optimize the execution time. The fixed execution strategy of MapReduce program is not optimal for many task and as it does not know about the behavior of the functions. Thus, to overcome these issues, we are enhancing our proposed work with parallelization contracts. The parallelization contracts include input and output contract which includes the constraints and functions of data execution. These contracts help to capture a reasonable amount of semantics for executing any type of task with reduced time consumption. Keywords— Bigdata, Datacenter, Geo-distributed, Hadoop, MapReduce, PACT ----------------------------------------------------------------------***--------------------------------------------------------------------- 1. INTRODUCTION Data comes from many sources such as social media websites, sensors to gather climate information, trajectory information, transaction records, other web site usage data etc. it is called Big data. The limits to what can be done are often times due to how much data can be processed in a given time-frame. Paral- lelization contract framework have become one of the toolkit for processing large datasets using cloud computing resources, and are provided by most cloud vendors. GEO-PACT, a system for efficiently processing geo-distributed big data GEO-PACT is a Hadoop based framework that can efficiently perform a sequence of Parallelization Contracts jobs on a geo-distributed dataset across multiple datacenters. Sequences of Paralleliza- tion Contracts jobs are executed on a given input by applying the first job on the given input, applying the second job on the output of the first job, and so on. GEO-PACT acts much like the atmosphere surrounding the clouds. The problem of execut- ing geo-distributed Parallelization Contracts job sequences as arising in “cloud of clouds” scenarios is analyzed for job ex- ecution. Executing individual Parallelization Contracts jobs in each datacenter on corresponding inputs and then aggregating results is defined as multiple execution path. The datacenter with optimized execution path is selected for job execution. 2 HDFS The Hadoop Distributed File System (HDFS). Provides redun- dant storage for massive amounts of data using inexpensive commodity hardware and serves as the large scale data storage system. The NameNode splits large files into fixed sized data blocks which are scattered across the cluster. Typically the data block size for the HDFS is conFigd as 64MB, but it can be conFigd by file system clients as per usage requirements. The data storage is of type write once/read many (WORM) and once written, the files can only be appended and cannot be modified to maintain data coherency. Since HDFS is built on commodity hardware, the machine fault rate is high. In order to make the system failure tolerant, data blocks are replicated across multiple DataNodes. 2.1 NameNode NameNode is important for HDFS to store its metadata relia- bly. Furthermore, while the file data is accessed in a write once and read many model, the metadata structures (e.g., the names of files and directories) can be modified by a large number of clients concurrently. It is important that this information is never desynchronized. Therefore, it is all handled by a single machine, called the NameNode. The NameNode stores all the metadata for the file system. Because of the relatively low amount of metadata per file (Information about file locations in HDFS, Information about file ownership and permissions, Names of the individual blocks, Locations of the blocks of each file), all of this information can be stored in the main memory of the NameNode machine, allowing fast access to the metadata.
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 163 Fig 1.1 HDFS Architecture 2.2 Secondary NameNode The Secondary NameNode is not a failover NameNode and it performs memory-intensive administrative functions for the NameNode. Secondary NameNode should run on a separate machine in a large installation. The secondary name-node is to perform periodic checkpoints. The secondary name-node pe- riodically downloads current name-node image and edits log files, joins them into new image and uploads the new image back to the (primary and the only) name-node. If the name- node fails and it will restart on the same physical node then there is no need to shutdown data-nodes, just the name-node need to be restarted. If the old node cannot use anymore you will need to copy the latest image somewhere else. The latest image can be found either on the node that used to be the pri- mary before failure if available; or on the secondary name- node. 2.3 DataNode A DataNode is a storage server that accepts read/write requests from the NameNode. DataNodes store data blocks for local or remote clients of HDFS. Each data block is saved as a separate file in the local file system of the DataNode. The DataNode also performs block creation, deletion and replication as a part of file system operations. For keeping the records up-to-date, the DataNode periodically reports all of its data block informa- tion to the NameNode. DataNode instances can talk to each other for data replication. To maintain its live status in the clus- ter, it periodically sends heartbeat signals to the NameNode. When the NameNode fails to receive heartbeat signals from the DataNode, it is marked as a dead node in the cluster. 3. GEO-DISTRIBUTION Applications on cloud environment are geographically distri- buted, for the reasons: to access data frequently; data’s are col- lected and stored by different organizations to shared towards a common goal; datasets are replicated across datacenters for availability. Geo-PACT is a Hadoop-based system that can efficiently process concurrent jobs on a geographically distri- buted dataset. Consider n datacenters, given by DC1, DC2…DCn with input sub datasets I1, I2,…,In respectively. The total amount of input data is thus |I|=n i=1|Ii|. The bandwidth between the datacenters DCi and DCj (i!= j) is given by Bi,j and the cost of transmitting one unit of data between two datacenters is Ci,j. We define an initial minimum partition size of input dataset to be transferred across datacenters. On this geo-distributed input dataset se- quence of jobs J1, J2,…Jn have to be executed. In each job parallelization contract (PACTs) is applied. PACTS consist of input and output contract. Input contract consist of five con- tracts and output contract may be optional. 4 THE PACT PROGRAMMING MODEL The PACT Programming Model is an extension of map/reduce programming model. It operates over a key/value couple data model so called Parallelization Contracts (PACTs). Following the PACT programming model, programs are implemented by providing task specific user code (the user functions, UFs) for selected PACTs and assembling them to a work flow. A PACT defines properties on the input and output data of its associated user function. Figure 2 shows the parts of a Parallelization Contracts, contains only one Input Contract and an optional Output Contract. The Input Contract of a Parallelization Con- tracts designates how the user function can be evaluated in parallel. Optional output Contracts allow the optimizer to infer certain properties of the output data of a user function and to create a more efficient execution strategy for a program. Fig 4.1 PACT Components 4.1 The Map Input Contract The Map contract is used to process each key/value couple independently; the Map input contract states that the user code is invoked only once for each key-value couple of the input data set. Fig 4.2 Map Input Contract Data access Client app Name node Secondary name node Data node Data node Data node Metadata request Independent subset User code First- order function Output contract Output data ValueKey Input data ValueKey Input data Independent Invocations of user code
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 164 4.3 The Reduce Input Contract PACT Reduce input contract, in which all key-value couple of a PACTs input data are grouped with an identical key. The user code is attached to the Reduce contract is invoked for each of these groups independently. Fig 4.3 Reduce input contract 4.4 The Cross Input Contract Multi-input contracts is the Cross input contract. The user code attached to those multi-input contracts expects to receive data from two specific data sources as input. Multi-input contracts also construct the subsets for the user code based on two dif- ferent input set and builds the Cartesian product of the two inputs. All couple in the Cartesian product are then processed independently by separate calls of the user code. Fig 4.4 Cross input contract 4.5 The CoGroup Input Contract CoGroup input contract, which is also a multi-input contract. Independent subsets are built by joining the groups with same keys of all inputs. The key/value couple of all inputs with the same key are assigned to the same subset. Fig 4.5 Cogroup input contract 4.6 The Match Input Contract The Match input contract is a multiple input contract. All com- binations of key-value couples with same keys are built in the input data sets. After all groups are processed independently by separate invocations of the attached user code. The Match in- put contract performs equi-join on the key. Fig 4.6 Match input contract 4.7 The PACT Output Contracts Output Contracts are capable to declare indisputable properties of user functions to maximize the efficiency of the task execu- tion. An example of such an output contract is the SameKey contract. When attached to a Map function then the user code will not change the key, i.e. the type and value of the key re- main after the user code’s invocation and are the same in the output as in the input. Those references can be used by an op- timizer that to generates parallel execution plans. The men- tioned SameKey contract can frequently help to avoid unneces- sary repartitioning and therefore expensive data shipping. Hence, Output Contracts can significantly improve the runtime of a PACT program. 5. OPERATION On given geo-distributed input dataset, a sequence of opera- tions is performed. In the following we focus on parallelization contracts jobs J1 , J2 , ..., J m giving rise to a sequence of 5xm Input data A ValueKey Independent Invocations of user code Input data B Value Key Independent Invocations of user code Input data ValueKey Input data B Independent Invocations of user code ValueKey Input data A ValueKey Independent Invocations of user code Input data A Val- ue Key Input data BValue Key
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 165 operations. Since each parallelization contracts job consists of five major phases/operations (map, cross, match, cogroup, re- duce). We define the state of data before a phase as a stage identified by 0..5m. So input data is in stage 0 and final output data received after applying MapReduce jobs 1...m is in stage 5m. To move data from stage s to next stage s + 1 a paralleli- zation contracts phase is applied to data partitions and the same number of (output) data partitions are created. The initial kth partition of data is denoted by Pk0, and Pks represents the out- put after executing parallelization contracts phases 1..s on par- tition Pk0. Before performing a MapReduce phase, a partition present in a datacenter may be moved to another datacenter. To make our solution tractable we only allow full partitions of data to be copied. The move may be for an initial partition or a derivative of it received after executing one or more paralleli- zation contracts phases. Initial partition sizes can be used as parameter to trade accuracy and computation costs. In this section executing sequence of parallelization contracts on given input datasets. One of the main execution path is mul- tiple execution path. Execution path for executing individual parallelization contracts in each datacenter on given sub data- sets and aggregating result from that datacenter is called mul- tiple execution path. Aggregating result is not done automati- cally in a datacenter hence indexed aggregators are used. Mov- ing a data partition from one datacenter to another is costly since this involves copying data across inter-datacenter links from one distributed file system to another. If a partition Psk is copied from DCi to DCj at stage s, this partition should not be copied back to DCj to DCi at s+1 stage and also not copied within same stage. Input sub-datasets are replicated across da- tacenters, by taking exactly one replica of each of the input sub-datasets. 5.1 Architecture GEO-PACTs consist of single group manager in only one data- center and job manager is deployed in every participating data- center. Group manager finds the execution path and each job manager in datacenter manage phase of parallelization contract jobs that executed within datacenter using Hadoop cluster. Job manager contains two components which are copy manager and aggregation manager. Copy manager manage copying data from one datacenter to another and aggregation manager man- age the aggregation of result copied from different datacenter. Data center configuration file describe about the datacenter which have to be participate in geographically distributed pa- rallelization contracts jobs handled by geo-pacts. Identification is provided to each datacenter to identify when it is running parallelization contracts. Datacenters store their data in Hadoop distributed file system. Fig 5.1 Architecture of GEO-PACT Geographically distributed parallelization contracts jobs se- quences are submitted to the Group manager through a job configuration file. Group manager breaks the job sequence into a number of tasks and this information is informed to Job man- ager components copy manager and aggregation manager to describe the portion of task that should be performed within corresponding datacenters. User can specify sub dataset of geographically distributed input dataset through a xml based job configuration file. Once the group manager starts its execution in multiple execu- tion paths it instructs each job manager about the paralleliza- tion contracts jobs that should be executed in corresponding datacenter and the respective sub-datasets those parallelization contracts jobs should be executed on. Job managers execute the jobs using the Hadoop clusters deployed in corresponding da- tacenters. The Group manager informs a Job manager to copy data to a corresponding datacenter or aggregate multiple sub datasets copied from two or more corresponding datacenters. A Job manager use local Copy manager and Aggregation manag- er to execute these tasks. When ever a copy operation has to be performed, Copy man- ager in the datacenter from where the data has to be copied from and reads data from the corresponding HDFS storage and sends data to the Copy manager located in the datacenter to which data has to be copied, to through a TCP stream. The Copy manager in the receiving side stores this data in the cor- responding HDFS distributed storage. GEO-PACT uses aggre- gators to aggregate the results obtained from parallelization contracts jobs in parts of the input. Once output data generated in one or more datacenters is copied to a single destination Job execu- tion Group Manager DC config file Job config file Copy manager Aggregation manager Job Managers Datacenters Name Node Job tracker Hadoop Cluster Job execution Data node Data node Task Tracker Task Tracker Job execution
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 166 datacenter, the Group manager instructs the Job manager run- ning in the destination datacenter to initiate an aggregate op- eration. 6. DATACENTER ARCHITECTURE Parallelization contract style computation systems have a min- imum network bandwidth to move the data to the computation, instead of the computation must move to the data. In datacen- ter, compute nodes and storage nodes are attached to same network switch. Compute node contains central processing unit and network interface, and storage node contains disk to store data. Instead of co-locating storage and computation in the same box as in the traditional MapReduce storage architecture, this design co-locates storage and computation on the same Ethernet switch. The advantages of using remote disks in a datacenter are many. First, both computation node and storage node can be modified to meet the application requirements during both cluster construction and operation. Second, com- putation node and storage node can be decoupled from each other when they went to failure. It avoids wasted resources that would have been used to reconstruct lost storage when a com- putation node fails. Third, fine-grained power management techniques can be used, whereby compute and storage nodes are enabled and disabled to meet current application requirements. Finally, because computation is now an independent resource, a mix of both high and low power processors can be implemented. The run- time environment managing application execution can change the processors being used for a specific application in order to meet administrative power and performance goals. Fig 6.1 Remote Storage Architecture 7. FAILURE HANDLING The Group manager must start in a trustworthy datacenter. Each Job manager frequently sends a heartbeat message back to the Group manager. If the Group manager does not receive a heartbeat from a corresponding Job manager for some time, a new Job manager is started in the same datacenter. If the Group manager receives a late heartbeat message from the old Job manager after starting the new Job manager, a message is sent back to the old Job manager to quit itself. Once a Job manager starts a parallelization contracts job it stores information about this job in a predefined location of the distributed file system in its datacenter so that a newly started Job manager can catch up. Failure of a Group manager will result in the failure of the GEO-PACT instance similar to the termination of the Job Tracker component of Hadoop. 8 CONCLUSIONS AND FUTURE WORK This paper presents Geo-Pact, a extension of MapReduce framework that can efficiently execute a sequence of jobs on geographically distributed datasets. Minimizing either execu- tion time or cost. GEO-PACTS can substantially improve time or cost of job execution compared to naive schedules of cur- rently widely followed deployments. This framework is also applicable to single datacenters with non uniform transmission characteristics, such as datacenters divided into zones or other network architectures. Future work is to form clusters from the final result and with this analyzed data can be retrived are more efficiently. REFERENCES [1] Agarwal S, Dunagan J, Jain N, Saroiu S, Wolman A, and Bhogan H, “Volley: Automated Data Placement for Geo-Distributed Cloud Services,” in National Spatial Data Infrastructure, 2010. [2] Alexander Alexandrov, Stephan Ewen, Max Heimel, Fabian Hueske, Odej Kao, Volker Markl, Erik Nijkamp, Daniel Warneke, “MapReduce and PACT - Comparing Data Parallel Programming Models”. [3] Apache Software Foundation, “Hadoop”, http://hadoop.apache.org. [4] “Big data”, http://www.emc.com/campaign/bigdata/index.html [5] ChamikaraJayalath, Julian Stephen, and Patrick Eugster, “From the Cloud to the Atmosphere: Running MapRe- duce across Datacenters”, IEEE Transactions on Com- puters Special Issue On Cloud Of Clouds Volume 63, issue 1, pages 74-87, May 2013. [6] Data from Year 2000 US Census”, http://aws.amazon.com/datasets/Economics/2290. [7] Dean J and Ghemawat S, “MapReduce Simplified Data Processing on Large clusters”, in Operating Systems Design and Implementation, 2004. [8] Dean J and Ghemawat S, “MapReduce Simplified Data Processing on Large Clusters”, in Operating Systems Rack uplinkLAN (Ethernet) Storage Node 1 Storage Node m Network disk controller Disk Network disk controller Disk Compute Node m Network interface CPU/Memory Compute Node1 Network interface CPU/Memory Network switch
  • 6. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 03 Special Issue: 07 | May-2014, Available @ http://www.ijret.org 167 Design and Implementation, pages 137–150, 2004. [9] Dejun J, Pierre D, and Chi C, “EC2 Performance Analy- sis for Resource Provisioning of Service-Oriented Ap- plications”, in International Conference on Service Oriented Computing, 2009. [10] Fushimi S, Kitsuregawa M, and Tanaka H, “An Over- view of The System Software of A Parallel Relational Database Machine GRACE”, Very Large Data Base, pages 209–219, Morgan Kaufmann, 1986. [11] “Hadoop the Definitive Guide”, http://oreilly.com/catalog/ 9780596521981 [12] “Hadoop”, http://hadoop.apache.org/index.html [13] “HDFS”, http://developer.yahoo.com/hadoop/tutorial/module2.ht ml [14] Introduction to Analytics of Big Data and Hadoop”, Storage Networking Industry Associa- tion,http://www.snia.org/sites/default/files2/ABDS2012/ Tuto- rials/RobPeglarIntroduction_Analytics%20_Big%20Dat a_Hadoop.pdf [15] Introduction to hadoop mapreduce”, http://developer.yahoo.com/hadoop/tutorial/module4 [16] Jeffrey Shafer, Scott Rixner, Alan L. Cox Rice Universi- ty “Datacenter Storage Architecture for MapReduce Applications”. [17] Lloyd W, Freedman M J, Kaminsky M, and Andersen D J, “Don’t Settle for Eventual: Scalable Causal Consis- tency for Wide area Storage with COPS”, in ACM Symposium on Operating Systems Principles, 2011. [18] Olston C, Reed B, Silberstein A, and Srivastava U, “Au- tomatic Optimization of Parallel Dataflow Programs”, Unix Users Group Annual Technical Conference, pages 267–273, Unix Users Group Association, 2008. [19] Ousterhout J, Agrawal P, Erickson D, Kozyrakis C, Le- verich J, Mazieres D, Mitra S, Narayanan A, Parulkar G, Rosenblum M, Rumble S M, Stratmann E, and Stutsman R, “The Case for RAMClouds: Scalable High- performance Storage Entirely in DRAM”, ACM Special Interest Group on Operating Systems Operating Sys- tems Review, volume 43, no 4, pages 92–105, January 2010. [20] Pavlo A, Paulson E, RasinA, Abadi D J, DeWitt D J, Madden S, and Stonebraker M, “A Comparison of Ap- proaches to Large-Scale Data Analysis”, Special Interest Group on Management Of Data Conference, pages 165– 178, ACM, 2009. [21] Selinger P G, Astrahan M M, Chamberlin D D, Lorie D D, and Price T. G “Access Path Selection in a Relation- al Database Management System”. In P. A Bernstein, editor, Special Interest Group on Management Of Data Conference, pages 23–34, ACM, 1979. [22] Sovran Y, Power R, Aguilera M K, and Li J, “Transac- tional Storage for Geo-replicated Systems”, in ACM Symposium on Operating Systems Principles, 2011. [23] “Teradata”, http://www.teradata.com [24] vorgelegt von, Daniel Warneke, Berlin, “Massively Pa- rallel Data Processing on Infrastructure as a Service Platforms”. [25] Yang H, Dasdan A, Hsiao R, and Parker D S, “MapRe- duce-Merge: Simplified Relational Data Processing on Large Clusters”, in Special Interest Group on Manage- ment of Data, 2007. [26] Zahariab M, Konwinski A, JosephA D, Katz R H, and Stoica I, “Improving MapReduce Performance in Hete- rogeneous Environments”, in Operating Systems Design and Implementation, 2008.