SlideShare a Scribd company logo
1 of 8
Download to read offline
Next Generation Hadoop: High Availability for YARN

                         Arinto Murdopo                                              Jim Dowling
                KTH Royal Institute of Technology                        Swedish Institute of Computer Science
                   Hanstavägen 49 - 1065A,                                         Isafjordsgatan 22,
                     164 53 Kista, Sweden                                       164 40 Kista, Sweden
                          arinto@kth.se                                            jdowling@kth.se


ABSTRACT                                                           several cluster computing frameworks to handle big data ef-
Hadoop is one of the widely-adopted cluster computing frame-       fectively.
works for big data processing, but it is not free from limi-
tations. Computer scientists and engineers are continuously        One of the widely adopted cluster computing frameworks
making efforts to eliminate those limitations and improve           that commonly used by web-companies is Hadoop1 . It mainly
Hadoop. One of the improvements in Hadoop is YARN,                 consists of Hadoop Distributed File System (HDFS) [11] to
which eliminates scalability limitation of the first generation     store the data. On top of HDFS, MapReduce framework
MapReduce. However, YARN still suffers from availabil-              inspired by Google’s MapReduce [1] was developed to pro-
ity limitation, i.e. single-point-of-failure in YARN resource-     cess the data inside. Although Hadoop arguably has be-
manager. In this paper we propose an architecture to solve         come the standard solution for managing big data, it is not
YARN availability limitation. The novelty of this architec-        free from limitations. These limitations have triggered sig-
ture lies on its stateless failure model, which enables multiple   nificant efforts from academia and enterprise to improve
YARN resource-managers to run concurrently and maintains           Hadoop. Cloudera tried to reduce availability limitation
high availability. MySQL Cluster (NDB) is proposed as the          of HDFS using NameNode replication [9]. KTHFS solved
storage technology in our architecture. Furthermore, we im-        the HDFS availability limitation by utilizing MySQL Clus-
plemented a proof-of-concept for the proposed architecture.        ter to make HDFS NameNode stateless [12]. Scalability of
The evaluations show that the proof-of-concept is able to in-      MapReduce has become prominent limitation. MapReduce
crease the availability of YARN. In addition, NDB is shown         has reached scalability limit of 4000 nodes. To solve this
to have the highest throughput compared to Apache’s pro-           limitation, the open source community proposed the next
posed storages (ZooKeeper and NDB). Finally, the evalua-           generation MapReduce called YARN (Yet Another Resource
tions show the NDB achieves linear scalability hence it is         Negotiator) [8]. From the enterprise world, Corona was re-
suitable for our proposed stateless failure model.                 leased by Facebook to overcome the aforementioned scalabil-
                                                                   ity limitation [2]. Another limitation is Hadoop’s inability
Categories and Subject Descriptors                                 to perform fine-grained resource sharing between multiple
                                                                   computation frameworks. Mesos tried to solve this limita-
D.4.7 [Operating Systems]: Distributed Systems, Batch
                                                                   tion by implementation of distributed two-level scheduling
Processing Systems
                                                                   mechanism called resource offers [3].

General Terms                                                      However, few solutions have addressed the availability lim-
Big Data, Storage Management                                       itation in MapReduce framework. When a MapReduce’s
                                                                   JobTracker failures occur, the corresponding application is
1. INTRODUCTION                                                    not able to continue, reducing MapReduce’s availability. Cur-
Big data has become widespread across industries, especially       rent YARN architecture is unable to solve this availability
web-companies. It has reached petabytes scale and it will          limitation. ResourceManager, the JobTracker-equivalent in
keep increasing in the upcoming years. Traditional storage         YARN, remains a single-point-of-failure. The open source
systems such as regular file systems and relational databases       community has recently started to solve this issue but no
are not designed to handle this petabytes-scale of magnitude.      final and proven solution is available yet2 . The current pro-
Scalability is the main issue for the traditional storage sys-     posal from the open source community is to use ZooKeeper
tems in handling big data. This situation has resulted in          [4] or HDFS as a persistent storage to store ResourceMan-
                                                                   ager’s states. Upon failure, ResourceManager will be recov-
                                                                   ered using the stored states.

                                                                   Solving this availability limitation will bring YARN into
                                                                   cloud-ready state. YARN can be executed in the cloud,
                                                                   such as Amazon EC2, and it is resistant to failures that
                                                                   often happen in the cloud.
                                                                   1
                                                                       http://hadoop.apache.org/
                                                                   2
                                                                       https://issues.apache.org/jira/browse/YARN-128
In this report, we present a new architecture for YARN. The
main goal of the new architecture is to solve the aforemen-
tioned availability limitation in YARN. This architecture
provides better alternatives than the existing Zoo-Keeper-
based architecture since it eliminates the potential scalabil-
ity limitation due to ZooKeeper’s relatively limited through-
put.

For achieving the desired availability, the new architecture
utilizes a distributed in-memory database called MySQL
Cluster(NDB)3 to persist the ResourceManager states. NDB
itself automatically replicates the stored data into different
NDB data-nodes to ensure high availability. Moreover, NDB
is able to handle up to 1.8 million write queries per sec-
ond [5].

This report is organized as following. Section 2 presents
existing YARN architecture, its availability limitations and
proposed solution from Apache. The proposed architecture                     Figure 1: YARN Architecture
is presented in Section 3. Section 4, presents our evaluation
to verify the availability and the scalability of the proposed
architecture. The related works in improving availability             pluggable, which means we can implement our own
of cluster computing framework are presented in Section 5.            scheduling policy to be used in our YARN deployment.
And we conclude this report and propose future work for               YARN currently provides three policies to choose from,
this project in Section 6.                                            i.e. fair-scheduler, FIFO-scheduler and capacity-scheduler.
                                                                      For the available resources, scheduler should ideally
2.     YARN ARCHITECTURE                                              use CPU, memory, disk and other computing resources
This section explains the current YARN architecture, YARN             as factor of resources during scheduling. However, cur-
availability limitation, and Apache’s proposed solution to            rent YARN only supports memory as the factor of re-
overcome the limitation.                                              source during scheduling.
                                                                   2. Resource-tracker, which handles computing-nodes man-
2.1      Architecture Overview                                        agement. ”Computing-nodes” in this context means
YARN’s main goal is to provide more flexibility compared               nodes that have node-manager process run on it and
to Hadoop in term of data processing framework that can               have computing resources. The management tasks in-
be executed on top of it [7]. It is equipped with generic dis-        clude new nodes registration, handling requests from
tributed application framework and resource-management                invalid or decommisioned nodes, and nodes’ heartbeats
components. Therefore, YARN supports not only MapRe-                  processing. Resource-tracker works closely with node-
duce, but also other data processing frameworks such as               liveness-monitor(NMLivenessMonitor class), which keeps
Apache Giraph, Apache Hama and Spark.                                 track of live and dead computing nodes based on nodes’
                                                                      heartbeats, and node-list-manager(NodesListManager
In addition, YARN is aimed to solve scalability limitation            class), which store the list of valid and excluded com-
in original implementation of Apache’s MapReduce [6]. To              puting nodes based on YARN configuration files.
achieve this aim, YARN splits MapReduce job-tracker re-
                                                                   3. Applications-manager, which maintains collection of
sponsibilities of application scheduling, resource manage-
                                                                      user submitted jobs and cache of completed jobs. It is
ment and application monitoring into separate processes
                                                                      the entry point for clients to submit their jobs.
or daemons. The new processes that handle job-tracker
responsibilities are resource-manager which handles global
resource management and job scheduling, and application-         In YARN, clients submit jobs through applications-manager
master which is responsible for job monitoring, job life-cyle    and the submission triggers scheduler to try to schedule
management and resource negotiation with the resource-           the job. When the job is scheduled, resource-manager allo-
manager. Each submitted job corresponds to an application-       cates a container and launches a corresponding application-
master process. Furthermore, YARN converts original MapRe-       master. The application-master takes over and process the
duce task-tracker into node-manager, which manages task          job by splitting them into smaller tasks, requesting addi-
execution in YARN’s unit of resource called container.           tional containers to resource-manager, launching them with
                                                                 the help of node-manager, assigning the tasks into the avail-
Figure 1 shows the current YARN architecture. Resource-          able containers and keeping track of the job progress. Clients
manager has three core components, they are:                     learn the job progress by polling application-master every
                                                                 specific seconds based on YARN configuration. When the
                                                                 job is completed, application-master cleans up its working
     1. Scheduler, which schedules submitted jobs based on       state.
        specific policy and available resources. The policy is
3
    http://www.mysql.com/products/cluster/                       2.2 Availability Limitation in YARN
Although YARN solves the scalability limitation of origi-
nal MapReduce, it still suffers from an availability limita-
tion which is the single-point-of-failure nature of resource-
manager. This section explains why YARN resource-manager
is a single-point-of-failure.

Refer to Figure 1, container and task failures are handled
by node-manager. When a container fails or dies, node-
manager detects the failure event and launches a new con-
tainer to replace the failing container and restart the task
execution in the new container.
                                                                              Figure 2: Stateless Failure Model
In the event of application-master failure, the resource-manager
detects the failure and start a new instance of the application-
                                                                   and the first allocated container details such as container
master with a new container. The ability to recover the as-
                                                                   identification number, container node detail, requested re-
sociated job state depends on the application-master imple-
                                                                   source and job priority.
mentation. MapReduce application-master has the ability
to recover the state but it is not enabled by default. Other
                                                                   Upon restart, resource-manager reloads the saved informa-
than resource-manager, associated client also reacts with the
                                                                   tion and restarts all node-managers and application-masters.
failure. The client contacts the resource-manager to locate
                                                                   This restart mechanism does not retain the jobs that cur-
the new application-master’s address.
                                                                   rently executing in the cluster. In the worst case, all progress
                                                                   will be lost and the job will be started from the beginning.
Upon failure of a node-manager, the resource-manager up-
                                                                   To minimize this effect, a new application-master should be
dates its list of available node-managers. Application-master
                                                                   designed to read the previous application-master states that
should recover the tasks run on the failing node-managers
                                                                   executes under the failed resource-manager. For example,a
but it depends on the application-master implementation.
                                                                   MapReduce application-master handles this case by stor-
MapReduce application-master has an additional capability
                                                                   ing the progress in another process called job-history-server
to recover the failing task and blacklist the node-managers
                                                                   and upon restart, a new application-master obtains the job
that often fail.
                                                                   progress from a job-history-server.
Failure of the resource-manager is severe since clients can
                                                                   The main drawback of this model is the existence of down-
not submit a new job and existing running job could not
                                                                   time to start a new resource-manager process when the old
negotiate and request for new container. Existing node-
                                                                   one fails. If the down-time is too long, all processes reach
managers and application-masters try to reconnect to the
                                                                   time-out and clients need to re-submit their jobs to the new
failed resource-manager. The job progress will be lost when
                                                                   resource-manager. Furthermore, HDFS is not suitable for
they are unable to reconnect. This lost of job progress will
                                                                   storing lots of data with small size (in this case, the data
likely frustrate engineers or data scientists that use YARN
                                                                   are the application states and the application-attempts).
because typical production jobs that run on top of YARN
                                                                   ZooKeeper is suitable for current data size, but it is likely
are expected to have long running time and typically they
                                                                   to introduce problem when the amount of stored data in-
are in the order of few hours. Furthermore, this limitation is
                                                                   creased since ZooKeeper is designed to store typically small
preventing YARN to be used efficiently in cloud environment
                                                                   configuration data.
(such as Amazon EC2) since node failures often happen in
cloud environment.
                                                                   3. YARN WITH HIGH AVAILABILITY
                                                                   We explain our proposed failure model and architecture to
2.3      Proposed Solution from Apache                             solve YARN availability limitation. Furthermore, implemen-
To tackle this availability issue, Apache proposed to have         tation of the proposal is explained in this section.
recovery failure model using ZooKeeper or HDFS-based per-
sistent storage4 . The proposed recovery failure model is
transparent to clients, that means clients does not need to        3.1 Stateless Failure Model
re-submit the jobs. In this model, resource-manager saves          We propose stateless failure model, which means all neces-
relevant information upon job submission.                          sary information and states used by resource-manager are
                                                                   stored in a persistent storage. Based on our observation,
These information currently include application-identification-     these information include:
number, application-submission-context and list of application-
attempts. An application-submission-context contains in-             1. Application related information such as application-id,
formation related to the job submission such as applica-                application-submission-context and application-attempts.
tion name, user who submits the job, and amount of re-
quested resource. An application-attempt represents each             2. Resource related information such as list of node-managers
resource-manager attempt to run a job by creating a new                 and available resources.
application-master process. The saved information related
to application-attempt are attempt identification number
                                                                   Figure 2 shows the architecture of stateless failure model.
4
    https://issues.apache.org/jira/browse/YARN-128                 Since all the necessary information are stored in persistent
Column               Type
                                                                             id                   int
                                                                             clustertimestamp     bigint
                                                                             submittime           bigint
                                                                             appcontext           varbinary(13900)

                                                                         Table 1: Properties of application state


                                                                  scalability is achieved by auto-data-sharding based on user-
                                                                  defined partition key. The latest benchmark from Oracle
                                                                  shows that MySQL Cluster version 7.2 achieves horizontal
                                                                  scalability, i.e when number of datanodes is increased 15
                                                                  times, the throughput is increased 13.63 times [5].

                                                                  Regarding the performance, NDB has fast read and write
                                                                  rate. The aforementioned benchmark [5] shows that 30-
                                                                  node-NDB cluster supports 19.5 million writes per second.
                                                                  It supports fine-grained locking, which means only affected
Figure 3: YARN with High Availability Architec-                   rows are locked during a transaction. Updates on two dif-
ture                                                              ferent rows in the same table can be executed concurrently.
                                                                  SQL and NoSQL interfaces are supported which makes NDB
                                                                  highly flexible depending on users’ needs and requirements.
storage, it is possible to have more than one resource-managers
running at the same time. All of the resource-managers
share the information through the storage and none of them        3.3 NDB Storage Module
hold the information in their memory.                             As a proof-of-concept of our proposed architecture, we de-
                                                                  signed and implemented NDB storage module for YARN
When a resource-manager fails, the other resource-managers        resource-manager. Due to limited time, recovery failure
can easily take over the job since all the needed states are      model was used in our implementation. In this report, we
stored in the storage. Clients, node-managers and application-    will refer the proof-of-concept of NDB-based-YARN as YARN-
masters need to be modified so that they can point to new          NDB.
resource-managers upon the failure.

To achieve high availability through this failure model, we       3.3.1 Database Design
need to have a storage that has these following requirements:     We designed two NDB tables to store application states and
                                                                  their corresponding application-attempts. They are called
                                                                  applicationstate and attemptstate. Table 1 shows the columns
  1. The storage should be highly available. It does not          for applicationstate table. id is a running number and it is
     have single-point-of-failure.                                only unique within a resource-manager. clustertimestamp
                                                                  is the timestamp when the corresponding resource-manager
  2. The storage should be able to handle high read and           is started. When we have more than one resource-manager
     write rates for small data (in the order of at most few      running at a time (as in stateless failure model), we need to
     kilo bytes), since this failure model needs to perform       differentiate the applications that run among them. There-
     very frequent read and write to the storage.                 fore, the primary keys for this table are id and clustertimes-
                                                                  tamp. appcontext is a serialized ApplicationSubmissionCon-
                                                                  text object, thus the type is varbinary.
ZooKeeper and HDFS satisfy the first requirement, but they
do not satisfy the second requirement. ZooKeeper is not de-       The columns for attemptstate table are shown in Table 2.
signed as a persistent storage for data and HDFS is not de-       applicationid and clustertimestampe are the foreign keys to
signed to handle high read and write rates for small data. We     applicationstate table. attemptid is the id of an attempt
need other storage technology and MySQL Cluster (NDB)             and mastercontainer contains serialized information about
is suitable for these requirements. Section 3.2 explain NDB       the first container that is assigned into the corresponding
in more details.                                                  application-master. The primary keys of this table are at-
                                                                  temptid, applicationid and clustertimestamp.
Figure 3 shows the high level diagram of the proposed ar-
chitecture. NDB is introduced to store resource-manager           To enhance table performance in term of read and write
states.                                                           throughput, partitioning technique was used5 . Both ta-
                                                                  bles were partitioned by applicationid and clustertimestamp.
                                                                  With this technique, NDB located the desired data with-
3.2   MySQL Cluster (NDB)                                         out contacting NDB’s location resolver service, hence it was
MySQL Cluster (NDB) is a scalable in-memory distributed
                                                                  faster compared to NDB tables without partitioning.
database. It is designed for availability, which means there
is no single-point-of-failure in NDB cluster. Furthermore,        5
                                                                    http://dev.mysql.com/doc/refman/5.5/en/partitioning-
it complies with ACID-transactional properties. Horizontal        key.html
Column              Type                             and all unfinished jobs are re-executed with a new application-
           attemptid           int                              attempt.
           applicationid       int
           clustertimestamp    bigint
           mastercontainer     varbinary(13900)                 4. EVALUATION
                                                                We designed two types of evaluation in this project. The
        Table 2: Properties of attempt state                    first evaluation was to test whether the NDB storage module
                                                                works as expected or not. The second evaluation was to
                                                                investigate and compare the throughput among ZooKeeper,
                                                                HDFS and NDB when storing YARN’s application state.

                                                                4.1 NDB Storage Module Evaluation
                                                                4.1.1 Unit Test
                                                                This evaluation used the unit test class explained in Sec-
                                                                tion 3.3.2. It was performed using single-node-NDB-cluster
                                                                i.e. two NDB datanode-processes in a node. on top of a
                                                                computer with 4 GB of RAM and Intel dual-core i3 CPU
                                                                at 2.40 GHz. We changed accordingly the ClusterJ’s Java-
                                                                properties-file to point into our single-node-NDB-cluster.

                                                                The unit test class was executed using Maven and Netbeans,
                                                                and the result was positive. We tested the consistency by
                                                                executing the unit test class several times and the results
                                                                were always pass.

                                                                4.1.2    Actual Resource-Manager Failure Test
                                                                In this evaluation, we used Swedish Institute of Computer
                                                                Science (SICS) cluster. Each node in SICS’s cluster had
                                                                30 GB of RAM and two six-core AMD Opteron processor
    Figure 4: NDB Storage Unit Test Flowchart                   at 2.6GHz, which effectively could run 12 threads without
                                                                significant context-switching overhead. Ubuntu 11.04 with
                                                                Linux Kernel 2.6.38-12-server was installed as the operat-
3.3.2    Integration with Resource-Manager                      ing system and Java(TM) SE Runtime Environment (JRE)
We developed YARN-NDB using ClusterJ6 for two develop-          version 1.6.0 was the Java runtime environment.
ment iterations based on patches released by Apache. The
first YARN-NDB implementation is based on YARN-128.full-         NDB was deployed in 6-node-cluster and YARN-NDB was
code.5 patch on top of Hadoop trunk dated 11 November           configured using single-node setting. We executed pi and
2012. The second implementation7 is based on YARN-231-2         bbp examples that come from Hadoop distribution. In the
patch8 on top of Hadoop trunk dated 23 December 2012. In        middle of pi and bbp execution, we terminated the resource-
this report, we refer to the second implementation of YARN-     manager process using Linux kill command. The new resource-
NDB unless otherwise specified. The NDB storage module           manager with the same address and port was started three
in YARN-NDB has same functionalities as Apache YARN’s           seconds after the old one was successfully terminated.
HDFS and ZooKeeper storage module such as adding and
deleting application states and attempts.                       We observed that the currently running job finished prop-
                                                                erly, which means the resource-manager was correctly restarted.
Furthermore, we developed unit test module for the storage      Several connection-retry-attempts to contact the resource-
module. Figure 4 shows the flowchart of this unit test mod-      manager by node-managers, application-masters and MapRe-
ule. In this module, three MapReduce jobs are submitted         duce clients were observed. To check for consistency, we
into YARN-NDB. The first job finishes the execution before        submitted a new job to the new resource-manager and the
a resource-manager fails. The second job is successfully sub-   new job was finished correctly. We repeated this experi-
mitted and scheduled, hence application-master is launched,     ment several times and same results were observed, i.e the
but no container is allocated. The third job is successfully    new resource-manager was successfully restarted and took
submitted but not yet scheduled. These three jobs represent     over the killed resource-manager’s roles correctly.
three different scenarios when a resource-manager fails.

Restarting a resource-manager is achieved by connecting         4.2 NDB Performance Evaluation
the existing application-masters and node-managers to the       We utilised the same set of machines in SICS cluster as our
new resource-manager. All application-masters and node-         evaluation in Section4.1.2. NDB was deployed in the same
managers process are rebooted by the new resource-manager       6-node-cluster and ZooKeeper were deployed to three SICS
                                                                nodes. The maximum memory for each ZooKeeper process
6
  http://dev.mysql.com/doc/ndbapi/en/mccj.html                  was set to 5GB of RAM. HDFS were also deployed to three
7
  https://github.com/arinto/hadoop-common                       SICS nodes and it used ZooKeeper’s maximum memory con-
8
  https://issues.apache.org/jira/browse/YARN-231                figuration of 5GB of RAM.
18000
                                                                                                                                           ZooKeeper
                                                                                                                                                NDB
                                                                                        16000                                                  HDFS

                                                                                        14000




                                                                 Completed requests/s
                                                                                        12000

                                                                                        10000

                                                                                        8000

                                                                                        6000

                                                                                        4000

                                                                                        2000

                                                                                           0
                                                                                                       R intensive   W intensive R/W intensive
                                                                                                                      Workload type

               Figure 5: zkndb Architecture
                                                                Figure 6: zkndb Throughput Benchmark Result for
                                                                8 Threads and 1 Minute of Benchmark Execution
 4.2.1      zkdnb Framework
We developed zkndb framework9 to effectively benchmark
storage systems with minimum effort. Figure 5 shows the          Application state information was an array of random bytes,
architecture of zkndb framework. The framework consists         with length of 53 bytes. The length of application state in-
of three main packages:                                         formation was determined after observing actual application
                                                                state information that stored when executing YARN-NDB
                                                                jobs. Each data-read consisted of reading an application
      1. storage package, which contains the configurable load   identification and its corresponding application state infor-
         generator (StorageImpl ) in term of number of reads    mation.
         and writes per time unit.
   2. metrics package, which contains metrics parameters        Three types of workload were used in our experiment, they
      (MetricsEngine), for example write or read request        were:
      and acknowledge. Additionally, this package contains
      also the metrics logging mechanism (ThroughputEngineImpl ). 1. Read-intensive. One set of data was written into database,
   3. benchmark package, which contains benchmark appli-              and zkndb always read on the written data.
      cations and manages benchmark executions.
                                                                   2. Write-intensive. No read was performed, zkndb always
                                                                      wrote a new set of data into different location.
zkndb framework offers flexibility in integrating new storage
technologies, defining new metrics and storing benchmark            3. Read-write balance. Read and write were performed
results. To integrate a new storage technology, framework             alternately.
users can implement storage-interface in storage package. A
new metric can be developed by implementing metric in-          Furthermore, we varied the throughput rate by configuring
terface in metrics package. Additionally, framework users       the number of threads that accessed the database for read-
can design new metrics logging mechanism by implementing        ing and writing. To maximize the throughput, no delay was
throughput-engine-interface in metrics package. Resulting       configured in between each read and each write. We com-
data produced by ThroughputEngineImpl were further pro-         pared the throughput between ZooKeeper, HDFS, and NDB
cessed by our custom scripts for further analysis. For this     for equal configurations of number of threads and workload
evaluation, three storage implementations were added into       types. In addition, scalability of each storage was investi-
the framework, which are NDB, HDFS and ZooKeeper.               gated by increasing the number of threads, while keeping
                                                                the other configurations unchanged.
 4.2.2      Benchmark Implementation in zkndb
For ZooKeeper and HDFS, we ported YARN’s storage mod-            4.2.3                          Throughput Benchmark Result
ule implementation based on YARN-128.full-code.5 patch10        Figure 6 shows the throughput benchmark result for eight
into our benchmark. The first iteration of YARN-NDB’s            threads and one minute of execution with the three types of
NDB storage module is ported into our zkndb NDB storage         different workload and three types of storage implementa-
implementation.                                                 tion: ZooKeeper, NDB and HDFS.
Each data-write into the storage had an application identifi-    For all three workload types, NDB had the highest through-
cation and application state information. Application iden-     put compared to ZooKeeper and HDFS. These results can
tification was a Java long data type with size of eight bytes.   be attributed to the nature of NDB as a high performance
9
     https://github.com/4knahs/zkndb                            persistent storage which is capable to handle high read and
10
     https://issues.apache.org/jira/browse/YARN-128             write request rate. Refer to the error bar in Figure6, NDB
45000                                                                                 30000
                                                                     ZooKeeper                                                                          ZooKeeper
                                                                          NDB                                                                                NDB
                       40000                                             HDFS                                                                               HDFS
                                                                                                             25000
                       35000
Completed requests/s




                                                                                      Completed requests/s
                       30000                                                                                 20000

                       25000
                                                                                                             15000
                       20000

                       15000                                                                                 10000

                       10000
                                                                                                             5000
                       5000

                          0                                                                                     0
                               4    8    12    16               24               36                                  4   8   12   16               24               36
                                                Number of threads                                                                  Number of threads


Figure 7: Scalability Benchmark Results for Read-                                     Figure 8: Scalability Benchmark Results for Write-
Intensive Workload                                                                    Intensive Workload


has big deviation between its average and the lowest value                            the other hand, HDFS performed very poor for this work-
during experiment. This big deviation could be attributed                             load. The highest throughput achieved by NDB with 36
to infrequent intervention from NDB management process                                threads was only 534.92 requests per second. The poor per-
to recalculate the data index for fast access.                                        formance of HDFS could be attributed to the same reasons
                                                                                      as explained in Section 4.2.3, which are NameNode-locking
Interestingly, ZooKeeper’s throughput were stable for all                             overhead and inefficient data access pattern for small files.
workload types. This throughput stability can be accounted
to ZooKeeper’s behavior to linearize incoming requests that                           5. RELATED WORK
causes read and write request have approximately the same
execution time. Another possible explanation for ZooKeeper’s                          5.1 Corona
throughput stability is the YARN’s ZooKeeper storage mod-                             Corona [2] introduces a new process called cluster-manager
ule implementation. The module implementation code could                              to take over cluster management functions from MapReduce
cause the read and write execution time equal.                                        job-tracker. The main purposes of cluster-manager is to keep
                                                                                      track of amount of free resources in and manages the nodes
As expected, HDFS had the lowest throughput for all work-                             in the cluster. Corona utilizes push-based scheduling, i.e.
load types. HDFS’ low throughput may be attributed to                                 cluster-manager pushes the allocated resources back to the
HDFS’ NameNode-locking overhead and inefficient data ac-                                job-tracker after it receives resource requests. Furthermore,
cess pattern when it processes lots of small files. Each                               Corona claims that scheduling latency is low since there is no
time HDFS receives read or write request, HDFS NameN-                                 periodic heartbeat involved during this resource scheduling.
ode needs to acquire a lock for the file path so HDFS can                              Although Corona solves the MapReduce scalability limita-
return a valid result. Acquiring a lock frequently increases                          tion, it has single-point-of-failure in cluster-manager hence
data access time hence decreases the throughput. The inef-                            the MapReduce availability limitation is still present.
ficient data access pattern in HDFS is due to data splitting
to fit the data into HDFS block and data replication. Fur-                             5.2 KTHFS
thermore, the needs to write the data into disk in HDFS                               KTHFS [12] solves scalability and availability limitation of
decreases the throughput as observed in write-intensive and                           HDFS NameNodes. The filesystem metadata information
read-write balance workload.                                                          of HDFS NameNodes are stored in NDB, hence the HDFS
                                                                                      NameNodes are fully state-less. By being state-less, more
4.2.4                          Scalability Benchmark Result                           than one HDFS NameNodes can run simultaneously and
                                                                                      failure of HDFS NameNodes can be easily mitigated by the
Figure 7 shows the increases in throughput when we in-
                                                                                      remaining alive NameNodes. Furthermore, KTHFS has lin-
creased the number of threads for read-intensive workload.
                                                                                      ear throughput scalability, that is throughput increment can
All of the storage implementation increased their through-
                                                                                      be performed by adding HDFS NameNodes or adding NDB
put when the number of threads were increased. NDB had
                                                                                      DataNodes. KTHFS has inspired the NDB usage to solve
the highest increase compared to HDFS and ZooKeeper. For
                                                                                      the YARN availability limitation.
NDB, doubling the number of threads increased the through-
put by 1.69 and it was close to linear scalability.
                                                                                      5.3 Mesos
Same trend was observed for write-intensive workload as                               Mesos [3] is a resource management platform that enables
shown in 8. NDB still had the highest increase in through-                            commodity-cluster-sharing between different cluster comput-
put compared to HDFS and ZooKeeper. For NDB, doubling                                 ing frameworks. Cluster utilization is improved due to the
the number of threads increased the throughput by 1.67. On                            sharing mechanism. It has several master processes that
have similar roles compared to YARN resource-manager.                                     ¨
                                                                  thank our colleague: Umit Cavus B¨y¨k¸ahin, Strahinja
                                                                                                 ¸        u u s
The availability of Mesos is achieved by having several stand-    Lazetic and, Vasiliki Kalavri for providing feedback through-
by master processes to replace the failed active master pro-      out this project. Additionally we would like to thank our
cess. Mesos utilizes ZooKeeper to monitor the group of mas-       EMDC friends: Muhammad Anis uddin Nasir, Emmanouil
ter processes. And during master process failures, ZooKeeper      Dimogerontakis, Maria Stylianou and Mudit Verma for con-
performs leader-election to choose the new active master          tinuous support throughout report writing process.
process. Reconstruction of state is performed by the newly
active master process. This reconstruction mechanism may          8. REFERENCES
introduce significant amount of delay when the state is big.        [1] J. Dean and S. Ghemawat. MapReduce: simplified
                                                                       data processing on large clusters. Commun. ACM,
5.4    Apache HDFS-1623                                                51(1):107–113, Jan. 2008.
Apache utilizes failover recovery model to solve HDFS Na-          [2] Facebook. Under the hood: Scheduling MapReduce
meNode single-point-of-failure limitation [9,10]. In this solu-        jobs more efficiently with corona, Nov. 2012. Retrieved
tion, additional HDFS NameNodes are introduced as stand-               at November, 18, 2012 from
by NameNodes. The active NameNode writes all changes                   http://on.fb.me/109FHPD.
to the file system namespace into a write-ahead-log in a            [3] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi,
persistent storage. Overhead when storing data is likely to            A. D. Joseph, R. Katz, S. Shenker, and I. Stoica.
be introduced and the overhead magnitude depends on the                Mesos: a platform for fine-grained resource sharing in
choice of storage system. This solution supports automatic             the data center. In Proceedings of the 8th USENIX
failover, but the solution complexity increases due to the             conference on Networked systems design and
existence of additional processes as failure detectors. These          implementation, NSDI’11, page 22, Berkeley, CA,
failure detectors trigger automatic failover mechanism when            USA, 2011. USENIX Association.
they detect NameNode failures.                                     [4] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed.
                                                                       ZooKeeper: wait-free coordination for internet-scale
6.    CONCLUSION AND FUTURE WORK                                       systems. In USENIX ATC, volume 10, 2010.
We have presented an architecture for highly-available clus-       [5] M. Keep. MySQL cluster 7.2 GA released, delivers 1
ter computing management framework. The proposed ar-                   BILLION queries per minute, Apr. 2012. Retrieved at
chitecture incorporated state-less failure model into exist-           November, 18, 2012 from http://dev.mysql.com/tech-
ing Apache YARN. To achieve the high-availability nature               resources/articles/mysql-cluster-7.2-ga.html.
and the state-less failure model, MySQL Cluster (NDB) was
                                                                   [6] A. C. Murthy. The next generation of apache hadoop
proposed as the storage technology for storing the necessary
                                                                       MapReduce, Feb. 2011. Retrieved at November, 18,
state information.
                                                                       2012 from
                                                                       http://developer.yahoo.com/blogs/hadoop/posts/
As a proof-of-concept, we implemented Apache YARN’s re-
                                                                       2011/02/mapreduce-nextgen/.
covery failure model using NDB (YARN-NDB) and we de-
                                                                   [7] A. C. Murthy. Introducing apache hadoop YARN,
veloped zkndb benchmark framework to test it. Availability
                                                                       Aug. 2012. Retrieved at November, 11, 2012 from
and scalability of the implementation has been examined
                                                                       http://hortonworks.com/blog/introducing-apache-
and proven using unit test, actual resource-manager fail-
                                                                       hadoop-yarn/.
ure test and throughput benchmark experiments. Results
showed that YARN-NDB was better in term of throughput              [8] A. C. Murthy, C. Douglas, M. Konar, O. O’Malley,
and ability to scale compared to existing ZooKeeper and                S. Radia, S. Agarwal, and V. KV. Architecture of next
HDFS-based solutions.                                                  generation apache hadoop MapReduce framework.
                                                                       Retrieved at November, 18, 2012 from
For future work, we plan to further develop YARN-NDB                   https://issues.apache.org/jira/secure/attachment/
with fully state-less failure model. As the first step of this          12486023/MapR.
plan, more detailed analysis of resource-manager states are        [9] A. Myers. High availability for the hadoop distributed
needed. After the states are successfully analysed, we plan to         file system (HDFS), Mar. 2012. Retrieved at
re-design the database to accommodate additional informa-              November, 18, 2012 from http://bit.ly/ZT1xIc.
tion of states from the analysis. In addition, modifications       [10] S. Radia. High availability framework for HDFS NN,
in YARN-NDB code are needed to remove the information                  Feb. 2011. Retrieved at January, 4, 2012 from
from memory and always access NDB when the information                 https://issues.apache.org/jira/browse/HDFS-1623.
are needed. Next, we perform evaluation to measure the            [11] K. Shvachko, H. Kuang, S. Radia, and R. Chansler.
throughput and overhead of the new implementation. Fi-                 The hadoop distributed file system. In 2010 IEEE
nally, after the new implementation successfully passes the            26th Symposium on Mass Storage Systems and
evaluations, we should deploy YARN-NDB in significantly                 Technologies (MSST), pages 1–10, May 2010.
big cluster with real-world workload to check for its actual      [12] M. Wasif. A distributed namespace for a distributed
scalability. The resulting YARN-NDB is expected to run                 file system, 2012. Retrieved at November, 18, 2012
perfectly in cloud environment and handle the node failures            from http://kth.diva-portal.org/smash/
properly.                                                              record.jsf?searchId=1&pid=diva2:548037.

7.    ACKNOWLEDGEMENT
The authors would like to thank our partner M`rio Almeida
                                             a
for his contribution in the project. We would also like to

More Related Content

What's hot

Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Simplilearn
 
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?Naganarasimha Garla
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Rohit Agrawal
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endthkoch
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopCloudera, Inc.
 
Boost your Oracle RAC manageability with Policy-Managed Databases
Boost your Oracle RAC manageability with Policy-Managed DatabasesBoost your Oracle RAC manageability with Policy-Managed Databases
Boost your Oracle RAC manageability with Policy-Managed DatabasesLudovico Caldara
 
Oracle database high availability solutions
Oracle database high availability solutionsOracle database high availability solutions
Oracle database high availability solutionsKirill Loifman
 
Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with YarnDavid Kaiser
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
 

What's hot (20)

Yarn
YarnYarn
Yarn
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
 
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
Distributed Resource Scheduling Frameworks, Is there a clear Winner ?
 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
 
Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9Hadoop 2.0, MRv2 and YARN - Module 9
Hadoop 2.0, MRv2 and YARN - Module 9
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead end
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache HadoopIntroduction to Cloudera's Administrator Training for Apache Hadoop
Introduction to Cloudera's Administrator Training for Apache Hadoop
 
Boost your Oracle RAC manageability with Policy-Managed Databases
Boost your Oracle RAC manageability with Policy-Managed DatabasesBoost your Oracle RAC manageability with Policy-Managed Databases
Boost your Oracle RAC manageability with Policy-Managed Databases
 
Oracle database high availability solutions
Oracle database high availability solutionsOracle database high availability solutions
Oracle database high availability solutions
 
Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh 01112016
Dipesh Singh 01112016
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
HBase internals
HBase internalsHBase internals
HBase internals
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with YarnScale 12 x   Efficient Multi-tenant Hadoop 2 Workloads with Yarn
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
Yarn
YarnYarn
Yarn
 

Viewers also liked

The counting system for small animals in japanese
The counting system for small animals in japaneseThe counting system for small animals in japanese
The counting system for small animals in japaneseCheyanneStotlar
 
Queens Parh Rangers AD410 น.ส.ฐิติมา ประเสริฐชัย เลขที่8
Queens Parh Rangers AD410 น.ส.ฐิติมา  ประเสริฐชัย เลขที่8Queens Parh Rangers AD410 น.ส.ฐิติมา  ประเสริฐชัย เลขที่8
Queens Parh Rangers AD410 น.ส.ฐิติมา ประเสริฐชัย เลขที่8yaying-yingg
 
Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Arinto Murdopo
 
An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...Arinto Murdopo
 
Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksArinto Murdopo
 
Moodboards eda
Moodboards edaMoodboards eda
Moodboards edaedaozdemir
 
Netcare csi kelvin's talk aug 2015
Netcare csi kelvin's talk aug 2015Netcare csi kelvin's talk aug 2015
Netcare csi kelvin's talk aug 2015Kelvin Glen
 
Cultura mites
Cultura mitesCultura mites
Cultura mitesComalat1D
 
Cultura mites
Cultura mitesCultura mites
Cultura mitesComalat1D
 
153 test plan
153 test plan153 test plan
153 test plan< <
 
Uso correto de epi´s abafadores
Uso correto de epi´s   abafadoresUso correto de epi´s   abafadores
Uso correto de epi´s abafadoresPaulo Carvalho
 
how to say foods and drinks in japanese
how to say foods and drinks in japanesehow to say foods and drinks in japanese
how to say foods and drinks in japaneseCheyanneStotlar
 
Arviointi ja palaute 2011
Arviointi ja palaute 2011Arviointi ja palaute 2011
Arviointi ja palaute 2011Marko Havu
 
Pankki 2.0-hankkeen esittely
Pankki 2.0-hankkeen esittelyPankki 2.0-hankkeen esittely
Pankki 2.0-hankkeen esittelyPankki2
 
Distributed Computing - What, why, how..
Distributed Computing - What, why, how..Distributed Computing - What, why, how..
Distributed Computing - What, why, how..Arinto Murdopo
 
Dan-leiri 2012
Dan-leiri 2012Dan-leiri 2012
Dan-leiri 2012Marko Havu
 

Viewers also liked (20)

Pechakucha
PechakuchaPechakucha
Pechakucha
 
Facebook
FacebookFacebook
Facebook
 
The counting system for small animals in japanese
The counting system for small animals in japaneseThe counting system for small animals in japanese
The counting system for small animals in japanese
 
UX homework4
UX homework4UX homework4
UX homework4
 
Queens Parh Rangers AD410 น.ส.ฐิติมา ประเสริฐชัย เลขที่8
Queens Parh Rangers AD410 น.ส.ฐิติมา  ประเสริฐชัย เลขที่8Queens Parh Rangers AD410 น.ส.ฐิติมา  ประเสริฐชัย เลขที่8
Queens Parh Rangers AD410 น.ส.ฐิติมา ประเสริฐชัย เลขที่8
 
Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services
 
An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...
 
Quantum Cryptography and Possible Attacks
Quantum Cryptography and Possible AttacksQuantum Cryptography and Possible Attacks
Quantum Cryptography and Possible Attacks
 
Moodboards eda
Moodboards edaMoodboards eda
Moodboards eda
 
Netcare csi kelvin's talk aug 2015
Netcare csi kelvin's talk aug 2015Netcare csi kelvin's talk aug 2015
Netcare csi kelvin's talk aug 2015
 
Cultura mites
Cultura mitesCultura mites
Cultura mites
 
Cultura mites
Cultura mitesCultura mites
Cultura mites
 
153 test plan
153 test plan153 test plan
153 test plan
 
Uso correto de epi´s abafadores
Uso correto de epi´s   abafadoresUso correto de epi´s   abafadores
Uso correto de epi´s abafadores
 
how to say foods and drinks in japanese
how to say foods and drinks in japanesehow to say foods and drinks in japanese
how to say foods and drinks in japanese
 
Arviointi ja palaute 2011
Arviointi ja palaute 2011Arviointi ja palaute 2011
Arviointi ja palaute 2011
 
Pankki 2.0-hankkeen esittely
Pankki 2.0-hankkeen esittelyPankki 2.0-hankkeen esittely
Pankki 2.0-hankkeen esittely
 
Distributed Computing - What, why, how..
Distributed Computing - What, why, how..Distributed Computing - What, why, how..
Distributed Computing - What, why, how..
 
Sam houston chess team
Sam houston chess teamSam houston chess team
Sam houston chess team
 
Dan-leiri 2012
Dan-leiri 2012Dan-leiri 2012
Dan-leiri 2012
 

Similar to High Availability Architecture for YARN

PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEijdpsjournal
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)ijdpsjournal
 
An experimental evaluation of performance
An experimental evaluation of performanceAn experimental evaluation of performance
An experimental evaluation of performanceijcsa
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 
Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Big Data Joe™ Rossi
 
Hadoop 2.0 YARN webinar
Hadoop 2.0 YARN webinar Hadoop 2.0 YARN webinar
Hadoop 2.0 YARN webinar Abhishek Kapoor
 
Cache and consistency in nosql
Cache and consistency in nosqlCache and consistency in nosql
Cache and consistency in nosqlJoão Gabriel Lima
 
BDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfBDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfKUMARRISHAV37
 
A request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedA request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedJoão Gabriel Lima
 
HADOOP_2_0_YARN_Arch - Copy.pptx
HADOOP_2_0_YARN_Arch - Copy.pptxHADOOP_2_0_YARN_Arch - Copy.pptx
HADOOP_2_0_YARN_Arch - Copy.pptxBirajPaul6
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesDavid Tjahjono,MD,MBA(UK)
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Cognizant
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoopdatabloginfo
 
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...Nandhitha B
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
1-s2.0-S1877050915011874-main
1-s2.0-S1877050915011874-main1-s2.0-S1877050915011874-main
1-s2.0-S1877050915011874-mainHamza Zafar
 
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERLOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERijdpsjournal
 

Similar to High Availability Architecture for YARN (20)

PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCEPERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
PERFORMANCE EVALUATION OF BIG DATA PROCESSING OF CLOAK-REDUCE
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
 
An experimental evaluation of performance
An experimental evaluation of performanceAn experimental evaluation of performance
An experimental evaluation of performance
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1Hadoop - Past, Present and Future - v1.1
Hadoop - Past, Present and Future - v1.1
 
Hadoop 2.0 YARN webinar
Hadoop 2.0 YARN webinar Hadoop 2.0 YARN webinar
Hadoop 2.0 YARN webinar
 
Cache and consistency in nosql
Cache and consistency in nosqlCache and consistency in nosql
Cache and consistency in nosql
 
BDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdfBDA Mod2@AzDOCUMENTS.in.pdf
BDA Mod2@AzDOCUMENTS.in.pdf
 
Huhadoop - v1.1
Huhadoop - v1.1Huhadoop - v1.1
Huhadoop - v1.1
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
A request skew aware heterogeneous distributed
A request skew aware heterogeneous distributedA request skew aware heterogeneous distributed
A request skew aware heterogeneous distributed
 
HADOOP_2_0_YARN_Arch - Copy.pptx
HADOOP_2_0_YARN_Arch - Copy.pptxHADOOP_2_0_YARN_Arch - Copy.pptx
HADOOP_2_0_YARN_Arch - Copy.pptx
 
Harnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution TimesHarnessing Hadoop and Big Data to Reduce Execution Times
Harnessing Hadoop and Big Data to Reduce Execution Times
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Hadoop 2.0 and yarn
Hadoop 2.0 and yarnHadoop 2.0 and yarn
Hadoop 2.0 and yarn
 
1.demystifying big data & hadoop
1.demystifying big data & hadoop1.demystifying big data & hadoop
1.demystifying big data & hadoop
 
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
Introduction to yarn B.Nandhitha 2nd M.sc., computer science,Bon secours coll...
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
1-s2.0-S1877050915011874-main
1-s2.0-S1877050915011874-main1-s2.0-S1877050915011874-main
1-s2.0-S1877050915011874-main
 
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTERLOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
LOAD BALANCING LARGE DATA SETS IN A HADOOP CLUSTER
 

More from Arinto Murdopo

Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...Arinto Murdopo
 
Quantum Cryptography and Possible Attacks-slide
Quantum Cryptography and Possible Attacks-slideQuantum Cryptography and Possible Attacks-slide
Quantum Cryptography and Possible Attacks-slideArinto Murdopo
 
Parallelization of Smith-Waterman Algorithm using MPI
Parallelization of Smith-Waterman Algorithm using MPIParallelization of Smith-Waterman Algorithm using MPI
Parallelization of Smith-Waterman Algorithm using MPIArinto Murdopo
 
Megastore - ID2220 Presentation
Megastore - ID2220 PresentationMegastore - ID2220 Presentation
Megastore - ID2220 PresentationArinto Murdopo
 
Flume Event Scalability
Flume Event ScalabilityFlume Event Scalability
Flume Event ScalabilityArinto Murdopo
 
Large Scale Distributed Storage Systems in Volunteer Computing - Slide
Large Scale Distributed Storage Systems in Volunteer Computing - SlideLarge Scale Distributed Storage Systems in Volunteer Computing - Slide
Large Scale Distributed Storage Systems in Volunteer Computing - SlideArinto Murdopo
 
Large-Scale Decentralized Storage Systems for Volunter Computing Systems
Large-Scale Decentralized Storage Systems for Volunter Computing SystemsLarge-Scale Decentralized Storage Systems for Volunter Computing Systems
Large-Scale Decentralized Storage Systems for Volunter Computing SystemsArinto Murdopo
 
Rise of Network Virtualization
Rise of Network VirtualizationRise of Network Virtualization
Rise of Network VirtualizationArinto Murdopo
 
Architecting a Cloud-Scale Identity Fabric
Architecting a Cloud-Scale Identity FabricArchitecting a Cloud-Scale Identity Fabric
Architecting a Cloud-Scale Identity FabricArinto Murdopo
 
Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignArinto Murdopo
 
Distributed Storage System for Volunteer Computing
Distributed Storage System for Volunteer ComputingDistributed Storage System for Volunteer Computing
Distributed Storage System for Volunteer ComputingArinto Murdopo
 
Why File Sharing is Dangerous?
Why File Sharing is Dangerous?Why File Sharing is Dangerous?
Why File Sharing is Dangerous?Arinto Murdopo
 
Why Use “REST” Architecture for Web Services?
Why Use “REST” Architecture for Web Services?Why Use “REST” Architecture for Web Services?
Why Use “REST” Architecture for Web Services?Arinto Murdopo
 

More from Arinto Murdopo (18)

Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...An Integer Programming Representation for Data Center Power-Aware Management ...
An Integer Programming Representation for Data Center Power-Aware Management ...
 
Quantum Cryptography and Possible Attacks-slide
Quantum Cryptography and Possible Attacks-slideQuantum Cryptography and Possible Attacks-slide
Quantum Cryptography and Possible Attacks-slide
 
Parallelization of Smith-Waterman Algorithm using MPI
Parallelization of Smith-Waterman Algorithm using MPIParallelization of Smith-Waterman Algorithm using MPI
Parallelization of Smith-Waterman Algorithm using MPI
 
Dremel Paper Review
Dremel Paper ReviewDremel Paper Review
Dremel Paper Review
 
Megastore - ID2220 Presentation
Megastore - ID2220 PresentationMegastore - ID2220 Presentation
Megastore - ID2220 Presentation
 
Flume Event Scalability
Flume Event ScalabilityFlume Event Scalability
Flume Event Scalability
 
Large Scale Distributed Storage Systems in Volunteer Computing - Slide
Large Scale Distributed Storage Systems in Volunteer Computing - SlideLarge Scale Distributed Storage Systems in Volunteer Computing - Slide
Large Scale Distributed Storage Systems in Volunteer Computing - Slide
 
Large-Scale Decentralized Storage Systems for Volunter Computing Systems
Large-Scale Decentralized Storage Systems for Volunter Computing SystemsLarge-Scale Decentralized Storage Systems for Volunter Computing Systems
Large-Scale Decentralized Storage Systems for Volunter Computing Systems
 
Rise of Network Virtualization
Rise of Network VirtualizationRise of Network Virtualization
Rise of Network Virtualization
 
Architecting a Cloud-Scale Identity Fabric
Architecting a Cloud-Scale Identity FabricArchitecting a Cloud-Scale Identity Fabric
Architecting a Cloud-Scale Identity Fabric
 
Consistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System DesignConsistency Tradeoffs in Modern Distributed Database System Design
Consistency Tradeoffs in Modern Distributed Database System Design
 
Distributed Storage System for Volunteer Computing
Distributed Storage System for Volunteer ComputingDistributed Storage System for Volunteer Computing
Distributed Storage System for Volunteer Computing
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Why File Sharing is Dangerous?
Why File Sharing is Dangerous?Why File Sharing is Dangerous?
Why File Sharing is Dangerous?
 
Why Use “REST” Architecture for Web Services?
Why Use “REST” Architecture for Web Services?Why Use “REST” Architecture for Web Services?
Why Use “REST” Architecture for Web Services?
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 

High Availability Architecture for YARN

  • 1. Next Generation Hadoop: High Availability for YARN Arinto Murdopo Jim Dowling KTH Royal Institute of Technology Swedish Institute of Computer Science Hanstavägen 49 - 1065A, Isafjordsgatan 22, 164 53 Kista, Sweden 164 40 Kista, Sweden arinto@kth.se jdowling@kth.se ABSTRACT several cluster computing frameworks to handle big data ef- Hadoop is one of the widely-adopted cluster computing frame- fectively. works for big data processing, but it is not free from limi- tations. Computer scientists and engineers are continuously One of the widely adopted cluster computing frameworks making efforts to eliminate those limitations and improve that commonly used by web-companies is Hadoop1 . It mainly Hadoop. One of the improvements in Hadoop is YARN, consists of Hadoop Distributed File System (HDFS) [11] to which eliminates scalability limitation of the first generation store the data. On top of HDFS, MapReduce framework MapReduce. However, YARN still suffers from availabil- inspired by Google’s MapReduce [1] was developed to pro- ity limitation, i.e. single-point-of-failure in YARN resource- cess the data inside. Although Hadoop arguably has be- manager. In this paper we propose an architecture to solve come the standard solution for managing big data, it is not YARN availability limitation. The novelty of this architec- free from limitations. These limitations have triggered sig- ture lies on its stateless failure model, which enables multiple nificant efforts from academia and enterprise to improve YARN resource-managers to run concurrently and maintains Hadoop. Cloudera tried to reduce availability limitation high availability. MySQL Cluster (NDB) is proposed as the of HDFS using NameNode replication [9]. KTHFS solved storage technology in our architecture. Furthermore, we im- the HDFS availability limitation by utilizing MySQL Clus- plemented a proof-of-concept for the proposed architecture. ter to make HDFS NameNode stateless [12]. Scalability of The evaluations show that the proof-of-concept is able to in- MapReduce has become prominent limitation. MapReduce crease the availability of YARN. In addition, NDB is shown has reached scalability limit of 4000 nodes. To solve this to have the highest throughput compared to Apache’s pro- limitation, the open source community proposed the next posed storages (ZooKeeper and NDB). Finally, the evalua- generation MapReduce called YARN (Yet Another Resource tions show the NDB achieves linear scalability hence it is Negotiator) [8]. From the enterprise world, Corona was re- suitable for our proposed stateless failure model. leased by Facebook to overcome the aforementioned scalabil- ity limitation [2]. Another limitation is Hadoop’s inability Categories and Subject Descriptors to perform fine-grained resource sharing between multiple computation frameworks. Mesos tried to solve this limita- D.4.7 [Operating Systems]: Distributed Systems, Batch tion by implementation of distributed two-level scheduling Processing Systems mechanism called resource offers [3]. General Terms However, few solutions have addressed the availability lim- Big Data, Storage Management itation in MapReduce framework. When a MapReduce’s JobTracker failures occur, the corresponding application is 1. INTRODUCTION not able to continue, reducing MapReduce’s availability. Cur- Big data has become widespread across industries, especially rent YARN architecture is unable to solve this availability web-companies. It has reached petabytes scale and it will limitation. ResourceManager, the JobTracker-equivalent in keep increasing in the upcoming years. Traditional storage YARN, remains a single-point-of-failure. The open source systems such as regular file systems and relational databases community has recently started to solve this issue but no are not designed to handle this petabytes-scale of magnitude. final and proven solution is available yet2 . The current pro- Scalability is the main issue for the traditional storage sys- posal from the open source community is to use ZooKeeper tems in handling big data. This situation has resulted in [4] or HDFS as a persistent storage to store ResourceMan- ager’s states. Upon failure, ResourceManager will be recov- ered using the stored states. Solving this availability limitation will bring YARN into cloud-ready state. YARN can be executed in the cloud, such as Amazon EC2, and it is resistant to failures that often happen in the cloud. 1 http://hadoop.apache.org/ 2 https://issues.apache.org/jira/browse/YARN-128
  • 2. In this report, we present a new architecture for YARN. The main goal of the new architecture is to solve the aforemen- tioned availability limitation in YARN. This architecture provides better alternatives than the existing Zoo-Keeper- based architecture since it eliminates the potential scalabil- ity limitation due to ZooKeeper’s relatively limited through- put. For achieving the desired availability, the new architecture utilizes a distributed in-memory database called MySQL Cluster(NDB)3 to persist the ResourceManager states. NDB itself automatically replicates the stored data into different NDB data-nodes to ensure high availability. Moreover, NDB is able to handle up to 1.8 million write queries per sec- ond [5]. This report is organized as following. Section 2 presents existing YARN architecture, its availability limitations and proposed solution from Apache. The proposed architecture Figure 1: YARN Architecture is presented in Section 3. Section 4, presents our evaluation to verify the availability and the scalability of the proposed architecture. The related works in improving availability pluggable, which means we can implement our own of cluster computing framework are presented in Section 5. scheduling policy to be used in our YARN deployment. And we conclude this report and propose future work for YARN currently provides three policies to choose from, this project in Section 6. i.e. fair-scheduler, FIFO-scheduler and capacity-scheduler. For the available resources, scheduler should ideally 2. YARN ARCHITECTURE use CPU, memory, disk and other computing resources This section explains the current YARN architecture, YARN as factor of resources during scheduling. However, cur- availability limitation, and Apache’s proposed solution to rent YARN only supports memory as the factor of re- overcome the limitation. source during scheduling. 2. Resource-tracker, which handles computing-nodes man- 2.1 Architecture Overview agement. ”Computing-nodes” in this context means YARN’s main goal is to provide more flexibility compared nodes that have node-manager process run on it and to Hadoop in term of data processing framework that can have computing resources. The management tasks in- be executed on top of it [7]. It is equipped with generic dis- clude new nodes registration, handling requests from tributed application framework and resource-management invalid or decommisioned nodes, and nodes’ heartbeats components. Therefore, YARN supports not only MapRe- processing. Resource-tracker works closely with node- duce, but also other data processing frameworks such as liveness-monitor(NMLivenessMonitor class), which keeps Apache Giraph, Apache Hama and Spark. track of live and dead computing nodes based on nodes’ heartbeats, and node-list-manager(NodesListManager In addition, YARN is aimed to solve scalability limitation class), which store the list of valid and excluded com- in original implementation of Apache’s MapReduce [6]. To puting nodes based on YARN configuration files. achieve this aim, YARN splits MapReduce job-tracker re- 3. Applications-manager, which maintains collection of sponsibilities of application scheduling, resource manage- user submitted jobs and cache of completed jobs. It is ment and application monitoring into separate processes the entry point for clients to submit their jobs. or daemons. The new processes that handle job-tracker responsibilities are resource-manager which handles global resource management and job scheduling, and application- In YARN, clients submit jobs through applications-manager master which is responsible for job monitoring, job life-cyle and the submission triggers scheduler to try to schedule management and resource negotiation with the resource- the job. When the job is scheduled, resource-manager allo- manager. Each submitted job corresponds to an application- cates a container and launches a corresponding application- master process. Furthermore, YARN converts original MapRe- master. The application-master takes over and process the duce task-tracker into node-manager, which manages task job by splitting them into smaller tasks, requesting addi- execution in YARN’s unit of resource called container. tional containers to resource-manager, launching them with the help of node-manager, assigning the tasks into the avail- Figure 1 shows the current YARN architecture. Resource- able containers and keeping track of the job progress. Clients manager has three core components, they are: learn the job progress by polling application-master every specific seconds based on YARN configuration. When the job is completed, application-master cleans up its working 1. Scheduler, which schedules submitted jobs based on state. specific policy and available resources. The policy is 3 http://www.mysql.com/products/cluster/ 2.2 Availability Limitation in YARN
  • 3. Although YARN solves the scalability limitation of origi- nal MapReduce, it still suffers from an availability limita- tion which is the single-point-of-failure nature of resource- manager. This section explains why YARN resource-manager is a single-point-of-failure. Refer to Figure 1, container and task failures are handled by node-manager. When a container fails or dies, node- manager detects the failure event and launches a new con- tainer to replace the failing container and restart the task execution in the new container. Figure 2: Stateless Failure Model In the event of application-master failure, the resource-manager detects the failure and start a new instance of the application- and the first allocated container details such as container master with a new container. The ability to recover the as- identification number, container node detail, requested re- sociated job state depends on the application-master imple- source and job priority. mentation. MapReduce application-master has the ability to recover the state but it is not enabled by default. Other Upon restart, resource-manager reloads the saved informa- than resource-manager, associated client also reacts with the tion and restarts all node-managers and application-masters. failure. The client contacts the resource-manager to locate This restart mechanism does not retain the jobs that cur- the new application-master’s address. rently executing in the cluster. In the worst case, all progress will be lost and the job will be started from the beginning. Upon failure of a node-manager, the resource-manager up- To minimize this effect, a new application-master should be dates its list of available node-managers. Application-master designed to read the previous application-master states that should recover the tasks run on the failing node-managers executes under the failed resource-manager. For example,a but it depends on the application-master implementation. MapReduce application-master handles this case by stor- MapReduce application-master has an additional capability ing the progress in another process called job-history-server to recover the failing task and blacklist the node-managers and upon restart, a new application-master obtains the job that often fail. progress from a job-history-server. Failure of the resource-manager is severe since clients can The main drawback of this model is the existence of down- not submit a new job and existing running job could not time to start a new resource-manager process when the old negotiate and request for new container. Existing node- one fails. If the down-time is too long, all processes reach managers and application-masters try to reconnect to the time-out and clients need to re-submit their jobs to the new failed resource-manager. The job progress will be lost when resource-manager. Furthermore, HDFS is not suitable for they are unable to reconnect. This lost of job progress will storing lots of data with small size (in this case, the data likely frustrate engineers or data scientists that use YARN are the application states and the application-attempts). because typical production jobs that run on top of YARN ZooKeeper is suitable for current data size, but it is likely are expected to have long running time and typically they to introduce problem when the amount of stored data in- are in the order of few hours. Furthermore, this limitation is creased since ZooKeeper is designed to store typically small preventing YARN to be used efficiently in cloud environment configuration data. (such as Amazon EC2) since node failures often happen in cloud environment. 3. YARN WITH HIGH AVAILABILITY We explain our proposed failure model and architecture to 2.3 Proposed Solution from Apache solve YARN availability limitation. Furthermore, implemen- To tackle this availability issue, Apache proposed to have tation of the proposal is explained in this section. recovery failure model using ZooKeeper or HDFS-based per- sistent storage4 . The proposed recovery failure model is transparent to clients, that means clients does not need to 3.1 Stateless Failure Model re-submit the jobs. In this model, resource-manager saves We propose stateless failure model, which means all neces- relevant information upon job submission. sary information and states used by resource-manager are stored in a persistent storage. Based on our observation, These information currently include application-identification- these information include: number, application-submission-context and list of application- attempts. An application-submission-context contains in- 1. Application related information such as application-id, formation related to the job submission such as applica- application-submission-context and application-attempts. tion name, user who submits the job, and amount of re- quested resource. An application-attempt represents each 2. Resource related information such as list of node-managers resource-manager attempt to run a job by creating a new and available resources. application-master process. The saved information related to application-attempt are attempt identification number Figure 2 shows the architecture of stateless failure model. 4 https://issues.apache.org/jira/browse/YARN-128 Since all the necessary information are stored in persistent
  • 4. Column Type id int clustertimestamp bigint submittime bigint appcontext varbinary(13900) Table 1: Properties of application state scalability is achieved by auto-data-sharding based on user- defined partition key. The latest benchmark from Oracle shows that MySQL Cluster version 7.2 achieves horizontal scalability, i.e when number of datanodes is increased 15 times, the throughput is increased 13.63 times [5]. Regarding the performance, NDB has fast read and write rate. The aforementioned benchmark [5] shows that 30- node-NDB cluster supports 19.5 million writes per second. It supports fine-grained locking, which means only affected Figure 3: YARN with High Availability Architec- rows are locked during a transaction. Updates on two dif- ture ferent rows in the same table can be executed concurrently. SQL and NoSQL interfaces are supported which makes NDB highly flexible depending on users’ needs and requirements. storage, it is possible to have more than one resource-managers running at the same time. All of the resource-managers share the information through the storage and none of them 3.3 NDB Storage Module hold the information in their memory. As a proof-of-concept of our proposed architecture, we de- signed and implemented NDB storage module for YARN When a resource-manager fails, the other resource-managers resource-manager. Due to limited time, recovery failure can easily take over the job since all the needed states are model was used in our implementation. In this report, we stored in the storage. Clients, node-managers and application- will refer the proof-of-concept of NDB-based-YARN as YARN- masters need to be modified so that they can point to new NDB. resource-managers upon the failure. To achieve high availability through this failure model, we 3.3.1 Database Design need to have a storage that has these following requirements: We designed two NDB tables to store application states and their corresponding application-attempts. They are called applicationstate and attemptstate. Table 1 shows the columns 1. The storage should be highly available. It does not for applicationstate table. id is a running number and it is have single-point-of-failure. only unique within a resource-manager. clustertimestamp is the timestamp when the corresponding resource-manager 2. The storage should be able to handle high read and is started. When we have more than one resource-manager write rates for small data (in the order of at most few running at a time (as in stateless failure model), we need to kilo bytes), since this failure model needs to perform differentiate the applications that run among them. There- very frequent read and write to the storage. fore, the primary keys for this table are id and clustertimes- tamp. appcontext is a serialized ApplicationSubmissionCon- text object, thus the type is varbinary. ZooKeeper and HDFS satisfy the first requirement, but they do not satisfy the second requirement. ZooKeeper is not de- The columns for attemptstate table are shown in Table 2. signed as a persistent storage for data and HDFS is not de- applicationid and clustertimestampe are the foreign keys to signed to handle high read and write rates for small data. We applicationstate table. attemptid is the id of an attempt need other storage technology and MySQL Cluster (NDB) and mastercontainer contains serialized information about is suitable for these requirements. Section 3.2 explain NDB the first container that is assigned into the corresponding in more details. application-master. The primary keys of this table are at- temptid, applicationid and clustertimestamp. Figure 3 shows the high level diagram of the proposed ar- chitecture. NDB is introduced to store resource-manager To enhance table performance in term of read and write states. throughput, partitioning technique was used5 . Both ta- bles were partitioned by applicationid and clustertimestamp. With this technique, NDB located the desired data with- 3.2 MySQL Cluster (NDB) out contacting NDB’s location resolver service, hence it was MySQL Cluster (NDB) is a scalable in-memory distributed faster compared to NDB tables without partitioning. database. It is designed for availability, which means there is no single-point-of-failure in NDB cluster. Furthermore, 5 http://dev.mysql.com/doc/refman/5.5/en/partitioning- it complies with ACID-transactional properties. Horizontal key.html
  • 5. Column Type and all unfinished jobs are re-executed with a new application- attemptid int attempt. applicationid int clustertimestamp bigint mastercontainer varbinary(13900) 4. EVALUATION We designed two types of evaluation in this project. The Table 2: Properties of attempt state first evaluation was to test whether the NDB storage module works as expected or not. The second evaluation was to investigate and compare the throughput among ZooKeeper, HDFS and NDB when storing YARN’s application state. 4.1 NDB Storage Module Evaluation 4.1.1 Unit Test This evaluation used the unit test class explained in Sec- tion 3.3.2. It was performed using single-node-NDB-cluster i.e. two NDB datanode-processes in a node. on top of a computer with 4 GB of RAM and Intel dual-core i3 CPU at 2.40 GHz. We changed accordingly the ClusterJ’s Java- properties-file to point into our single-node-NDB-cluster. The unit test class was executed using Maven and Netbeans, and the result was positive. We tested the consistency by executing the unit test class several times and the results were always pass. 4.1.2 Actual Resource-Manager Failure Test In this evaluation, we used Swedish Institute of Computer Science (SICS) cluster. Each node in SICS’s cluster had 30 GB of RAM and two six-core AMD Opteron processor Figure 4: NDB Storage Unit Test Flowchart at 2.6GHz, which effectively could run 12 threads without significant context-switching overhead. Ubuntu 11.04 with Linux Kernel 2.6.38-12-server was installed as the operat- 3.3.2 Integration with Resource-Manager ing system and Java(TM) SE Runtime Environment (JRE) We developed YARN-NDB using ClusterJ6 for two develop- version 1.6.0 was the Java runtime environment. ment iterations based on patches released by Apache. The first YARN-NDB implementation is based on YARN-128.full- NDB was deployed in 6-node-cluster and YARN-NDB was code.5 patch on top of Hadoop trunk dated 11 November configured using single-node setting. We executed pi and 2012. The second implementation7 is based on YARN-231-2 bbp examples that come from Hadoop distribution. In the patch8 on top of Hadoop trunk dated 23 December 2012. In middle of pi and bbp execution, we terminated the resource- this report, we refer to the second implementation of YARN- manager process using Linux kill command. The new resource- NDB unless otherwise specified. The NDB storage module manager with the same address and port was started three in YARN-NDB has same functionalities as Apache YARN’s seconds after the old one was successfully terminated. HDFS and ZooKeeper storage module such as adding and deleting application states and attempts. We observed that the currently running job finished prop- erly, which means the resource-manager was correctly restarted. Furthermore, we developed unit test module for the storage Several connection-retry-attempts to contact the resource- module. Figure 4 shows the flowchart of this unit test mod- manager by node-managers, application-masters and MapRe- ule. In this module, three MapReduce jobs are submitted duce clients were observed. To check for consistency, we into YARN-NDB. The first job finishes the execution before submitted a new job to the new resource-manager and the a resource-manager fails. The second job is successfully sub- new job was finished correctly. We repeated this experi- mitted and scheduled, hence application-master is launched, ment several times and same results were observed, i.e the but no container is allocated. The third job is successfully new resource-manager was successfully restarted and took submitted but not yet scheduled. These three jobs represent over the killed resource-manager’s roles correctly. three different scenarios when a resource-manager fails. Restarting a resource-manager is achieved by connecting 4.2 NDB Performance Evaluation the existing application-masters and node-managers to the We utilised the same set of machines in SICS cluster as our new resource-manager. All application-masters and node- evaluation in Section4.1.2. NDB was deployed in the same managers process are rebooted by the new resource-manager 6-node-cluster and ZooKeeper were deployed to three SICS nodes. The maximum memory for each ZooKeeper process 6 http://dev.mysql.com/doc/ndbapi/en/mccj.html was set to 5GB of RAM. HDFS were also deployed to three 7 https://github.com/arinto/hadoop-common SICS nodes and it used ZooKeeper’s maximum memory con- 8 https://issues.apache.org/jira/browse/YARN-231 figuration of 5GB of RAM.
  • 6. 18000 ZooKeeper NDB 16000 HDFS 14000 Completed requests/s 12000 10000 8000 6000 4000 2000 0 R intensive W intensive R/W intensive Workload type Figure 5: zkndb Architecture Figure 6: zkndb Throughput Benchmark Result for 8 Threads and 1 Minute of Benchmark Execution 4.2.1 zkdnb Framework We developed zkndb framework9 to effectively benchmark storage systems with minimum effort. Figure 5 shows the Application state information was an array of random bytes, architecture of zkndb framework. The framework consists with length of 53 bytes. The length of application state in- of three main packages: formation was determined after observing actual application state information that stored when executing YARN-NDB jobs. Each data-read consisted of reading an application 1. storage package, which contains the configurable load identification and its corresponding application state infor- generator (StorageImpl ) in term of number of reads mation. and writes per time unit. 2. metrics package, which contains metrics parameters Three types of workload were used in our experiment, they (MetricsEngine), for example write or read request were: and acknowledge. Additionally, this package contains also the metrics logging mechanism (ThroughputEngineImpl ). 1. Read-intensive. One set of data was written into database, 3. benchmark package, which contains benchmark appli- and zkndb always read on the written data. cations and manages benchmark executions. 2. Write-intensive. No read was performed, zkndb always wrote a new set of data into different location. zkndb framework offers flexibility in integrating new storage technologies, defining new metrics and storing benchmark 3. Read-write balance. Read and write were performed results. To integrate a new storage technology, framework alternately. users can implement storage-interface in storage package. A new metric can be developed by implementing metric in- Furthermore, we varied the throughput rate by configuring terface in metrics package. Additionally, framework users the number of threads that accessed the database for read- can design new metrics logging mechanism by implementing ing and writing. To maximize the throughput, no delay was throughput-engine-interface in metrics package. Resulting configured in between each read and each write. We com- data produced by ThroughputEngineImpl were further pro- pared the throughput between ZooKeeper, HDFS, and NDB cessed by our custom scripts for further analysis. For this for equal configurations of number of threads and workload evaluation, three storage implementations were added into types. In addition, scalability of each storage was investi- the framework, which are NDB, HDFS and ZooKeeper. gated by increasing the number of threads, while keeping the other configurations unchanged. 4.2.2 Benchmark Implementation in zkndb For ZooKeeper and HDFS, we ported YARN’s storage mod- 4.2.3 Throughput Benchmark Result ule implementation based on YARN-128.full-code.5 patch10 Figure 6 shows the throughput benchmark result for eight into our benchmark. The first iteration of YARN-NDB’s threads and one minute of execution with the three types of NDB storage module is ported into our zkndb NDB storage different workload and three types of storage implementa- implementation. tion: ZooKeeper, NDB and HDFS. Each data-write into the storage had an application identifi- For all three workload types, NDB had the highest through- cation and application state information. Application iden- put compared to ZooKeeper and HDFS. These results can tification was a Java long data type with size of eight bytes. be attributed to the nature of NDB as a high performance 9 https://github.com/4knahs/zkndb persistent storage which is capable to handle high read and 10 https://issues.apache.org/jira/browse/YARN-128 write request rate. Refer to the error bar in Figure6, NDB
  • 7. 45000 30000 ZooKeeper ZooKeeper NDB NDB 40000 HDFS HDFS 25000 35000 Completed requests/s Completed requests/s 30000 20000 25000 15000 20000 15000 10000 10000 5000 5000 0 0 4 8 12 16 24 36 4 8 12 16 24 36 Number of threads Number of threads Figure 7: Scalability Benchmark Results for Read- Figure 8: Scalability Benchmark Results for Write- Intensive Workload Intensive Workload has big deviation between its average and the lowest value the other hand, HDFS performed very poor for this work- during experiment. This big deviation could be attributed load. The highest throughput achieved by NDB with 36 to infrequent intervention from NDB management process threads was only 534.92 requests per second. The poor per- to recalculate the data index for fast access. formance of HDFS could be attributed to the same reasons as explained in Section 4.2.3, which are NameNode-locking Interestingly, ZooKeeper’s throughput were stable for all overhead and inefficient data access pattern for small files. workload types. This throughput stability can be accounted to ZooKeeper’s behavior to linearize incoming requests that 5. RELATED WORK causes read and write request have approximately the same execution time. Another possible explanation for ZooKeeper’s 5.1 Corona throughput stability is the YARN’s ZooKeeper storage mod- Corona [2] introduces a new process called cluster-manager ule implementation. The module implementation code could to take over cluster management functions from MapReduce cause the read and write execution time equal. job-tracker. The main purposes of cluster-manager is to keep track of amount of free resources in and manages the nodes As expected, HDFS had the lowest throughput for all work- in the cluster. Corona utilizes push-based scheduling, i.e. load types. HDFS’ low throughput may be attributed to cluster-manager pushes the allocated resources back to the HDFS’ NameNode-locking overhead and inefficient data ac- job-tracker after it receives resource requests. Furthermore, cess pattern when it processes lots of small files. Each Corona claims that scheduling latency is low since there is no time HDFS receives read or write request, HDFS NameN- periodic heartbeat involved during this resource scheduling. ode needs to acquire a lock for the file path so HDFS can Although Corona solves the MapReduce scalability limita- return a valid result. Acquiring a lock frequently increases tion, it has single-point-of-failure in cluster-manager hence data access time hence decreases the throughput. The inef- the MapReduce availability limitation is still present. ficient data access pattern in HDFS is due to data splitting to fit the data into HDFS block and data replication. Fur- 5.2 KTHFS thermore, the needs to write the data into disk in HDFS KTHFS [12] solves scalability and availability limitation of decreases the throughput as observed in write-intensive and HDFS NameNodes. The filesystem metadata information read-write balance workload. of HDFS NameNodes are stored in NDB, hence the HDFS NameNodes are fully state-less. By being state-less, more 4.2.4 Scalability Benchmark Result than one HDFS NameNodes can run simultaneously and failure of HDFS NameNodes can be easily mitigated by the Figure 7 shows the increases in throughput when we in- remaining alive NameNodes. Furthermore, KTHFS has lin- creased the number of threads for read-intensive workload. ear throughput scalability, that is throughput increment can All of the storage implementation increased their through- be performed by adding HDFS NameNodes or adding NDB put when the number of threads were increased. NDB had DataNodes. KTHFS has inspired the NDB usage to solve the highest increase compared to HDFS and ZooKeeper. For the YARN availability limitation. NDB, doubling the number of threads increased the through- put by 1.69 and it was close to linear scalability. 5.3 Mesos Same trend was observed for write-intensive workload as Mesos [3] is a resource management platform that enables shown in 8. NDB still had the highest increase in through- commodity-cluster-sharing between different cluster comput- put compared to HDFS and ZooKeeper. For NDB, doubling ing frameworks. Cluster utilization is improved due to the the number of threads increased the throughput by 1.67. On sharing mechanism. It has several master processes that
  • 8. have similar roles compared to YARN resource-manager. ¨ thank our colleague: Umit Cavus B¨y¨k¸ahin, Strahinja ¸ u u s The availability of Mesos is achieved by having several stand- Lazetic and, Vasiliki Kalavri for providing feedback through- by master processes to replace the failed active master pro- out this project. Additionally we would like to thank our cess. Mesos utilizes ZooKeeper to monitor the group of mas- EMDC friends: Muhammad Anis uddin Nasir, Emmanouil ter processes. And during master process failures, ZooKeeper Dimogerontakis, Maria Stylianou and Mudit Verma for con- performs leader-election to choose the new active master tinuous support throughout report writing process. process. Reconstruction of state is performed by the newly active master process. This reconstruction mechanism may 8. REFERENCES introduce significant amount of delay when the state is big. [1] J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 5.4 Apache HDFS-1623 51(1):107–113, Jan. 2008. Apache utilizes failover recovery model to solve HDFS Na- [2] Facebook. Under the hood: Scheduling MapReduce meNode single-point-of-failure limitation [9,10]. In this solu- jobs more efficiently with corona, Nov. 2012. Retrieved tion, additional HDFS NameNodes are introduced as stand- at November, 18, 2012 from by NameNodes. The active NameNode writes all changes http://on.fb.me/109FHPD. to the file system namespace into a write-ahead-log in a [3] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, persistent storage. Overhead when storing data is likely to A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. be introduced and the overhead magnitude depends on the Mesos: a platform for fine-grained resource sharing in choice of storage system. This solution supports automatic the data center. In Proceedings of the 8th USENIX failover, but the solution complexity increases due to the conference on Networked systems design and existence of additional processes as failure detectors. These implementation, NSDI’11, page 22, Berkeley, CA, failure detectors trigger automatic failover mechanism when USA, 2011. USENIX Association. they detect NameNode failures. [4] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: wait-free coordination for internet-scale 6. CONCLUSION AND FUTURE WORK systems. In USENIX ATC, volume 10, 2010. We have presented an architecture for highly-available clus- [5] M. Keep. MySQL cluster 7.2 GA released, delivers 1 ter computing management framework. The proposed ar- BILLION queries per minute, Apr. 2012. Retrieved at chitecture incorporated state-less failure model into exist- November, 18, 2012 from http://dev.mysql.com/tech- ing Apache YARN. To achieve the high-availability nature resources/articles/mysql-cluster-7.2-ga.html. and the state-less failure model, MySQL Cluster (NDB) was [6] A. C. Murthy. The next generation of apache hadoop proposed as the storage technology for storing the necessary MapReduce, Feb. 2011. Retrieved at November, 18, state information. 2012 from http://developer.yahoo.com/blogs/hadoop/posts/ As a proof-of-concept, we implemented Apache YARN’s re- 2011/02/mapreduce-nextgen/. covery failure model using NDB (YARN-NDB) and we de- [7] A. C. Murthy. Introducing apache hadoop YARN, veloped zkndb benchmark framework to test it. Availability Aug. 2012. Retrieved at November, 11, 2012 from and scalability of the implementation has been examined http://hortonworks.com/blog/introducing-apache- and proven using unit test, actual resource-manager fail- hadoop-yarn/. ure test and throughput benchmark experiments. Results showed that YARN-NDB was better in term of throughput [8] A. C. Murthy, C. Douglas, M. Konar, O. O’Malley, and ability to scale compared to existing ZooKeeper and S. Radia, S. Agarwal, and V. KV. Architecture of next HDFS-based solutions. generation apache hadoop MapReduce framework. Retrieved at November, 18, 2012 from For future work, we plan to further develop YARN-NDB https://issues.apache.org/jira/secure/attachment/ with fully state-less failure model. As the first step of this 12486023/MapR. plan, more detailed analysis of resource-manager states are [9] A. Myers. High availability for the hadoop distributed needed. After the states are successfully analysed, we plan to file system (HDFS), Mar. 2012. Retrieved at re-design the database to accommodate additional informa- November, 18, 2012 from http://bit.ly/ZT1xIc. tion of states from the analysis. In addition, modifications [10] S. Radia. High availability framework for HDFS NN, in YARN-NDB code are needed to remove the information Feb. 2011. Retrieved at January, 4, 2012 from from memory and always access NDB when the information https://issues.apache.org/jira/browse/HDFS-1623. are needed. Next, we perform evaluation to measure the [11] K. Shvachko, H. Kuang, S. Radia, and R. Chansler. throughput and overhead of the new implementation. Fi- The hadoop distributed file system. In 2010 IEEE nally, after the new implementation successfully passes the 26th Symposium on Mass Storage Systems and evaluations, we should deploy YARN-NDB in significantly Technologies (MSST), pages 1–10, May 2010. big cluster with real-world workload to check for its actual [12] M. Wasif. A distributed namespace for a distributed scalability. The resulting YARN-NDB is expected to run file system, 2012. Retrieved at November, 18, 2012 perfectly in cloud environment and handle the node failures from http://kth.diva-portal.org/smash/ properly. record.jsf?searchId=1&pid=diva2:548037. 7. ACKNOWLEDGEMENT The authors would like to thank our partner M`rio Almeida a for his contribution in the project. We would also like to