More Related Content Similar to Thesis-Submission-Final-28-06-2007 Similar to Thesis-Submission-Final-28-06-2007 (20) Thesis-Submission-Final-28-06-20074. Abstract
Over the last few years, the Grid Computing technologies has rapidly evolved towards
a serviceoriented architecture based on standards developed within the Web Services
and open Grid computing communities, however, few of the current large scale and
national grid projects have utilized the latest WSbased grid middlewares and standards
for constructing a serviceoriented grid, consequently, most of the current grid job
scheduling systems just provide support for the old grid middlewares and standards,
rather than the latest WSbased grid technologies. Furthermore, no matter what
standards or middlewares these metaschedulers support, none of them offer the
capabilities for advanced scheduling which are very important to build a completely
virtualized working environment for domain experts, thus enabling them fully focus
problem solving rather than any specific technical details. All of these problems have
become serious obstacles to fully deliver the potential of a WSbased grid system.
This project aims to make up the gap between the latest WSbased grid middlewares
and information standards and the functionalities provided by current grid job
scheduling systems. Our major work is to customize and deploy a powerful scheduling
system which fully supports forefront serviceoriented grid middlewares and standards,
thus enabling it to fully virtualize the high performance computing resources across a
serviceoriented Grid, and provide the grid users with an easytouse, intelligent job
submission and execution environment. Our implementation initially focuses on the
Australian Nation Grid, a large scale and national grid which is using the latest web
based grid middlewares and standards. Our major contribution will be providing
important information and related experience for virtualizing computing resources across
and deploying a metascheduling system on a serviceoriented Grid, based on the latest
WSbased grid technologies.
1 Introduction
By aggregating computing power, software tools, data storage systems and scientific
instruments that are distributed in heterogeneous systems across multiple locations,
1
5. Grid Computing promises a global virtual supercomputer where users at different
physical locations can cooperate for a specific problem in a high performance, secure,
reliable and costeffective way. As a promising distributed and high performance
computing area, grid computing is receiving more and more attention from academic
researchers and IT industry and has been considered as a key technology of the next
generation internet infrastructure.
Grid computing promises the endusers a single virtual supercomputer. Via resource
virtualization, Grid hides all the details about the underlying computing resources such
as the complexity of how these resources are organized and how computation jobs are
scheduled, thus providing the endusers with a single and unified system perspective, i.e.,
a single yet powerful virtual computer. The ideal usage scenario is that a user just needs
to “tell” the grid what he or she wants to do and what kind of resources (software tools,
data storage, instruments, etc.) are required, the Grid takes care of everything such as
user identity authentication, requirements matchmaking, job submission, scheduling and
execution, failure handling and usage policy control, etc. The implementation of the grid
resource virtualization needs a wide range of distributed computing middlewares, such as
Grid File Transfer (GridFTP) [1], Grid Information Service (GIS) [2], Grid Security
Infrastructure (GSI) [3], etc., among which Grid Information Service plays a key role in
virtualizing the resources on the Grid. Grid Information Services provide a set of
mechanisms for describing, discovering and monitoring resources and services on the
Grid, thus providing the higher level grid middlewares and applications with an
information source of the whole Grid and enabling Grid users to access the Grid in a
seamless and transparent manner.
As we all know, using standards is extremely important for building largescale, stable
and scalable grid infrastructure which in turn can provide collaborative context for
partners to work together, and only in this way can we fully deliver the potential of the
Grid. From version 1.0 in 1998 to now the latest version 4.0 based on Web Service
Resource Framework (WSRF) [4], Globus Toolkit (GT) [5] has rapidly evolved to the de
facto standard for Grid computing. The relevant standards such as Globus API,
Monitoring and Discovery Service (MDS), Grid Laboratory Uniform Environment Schema
(GLUE Schema, a de facto standard used for grid resource information modelling) [7]
have also stepped to a new development phase, which will undoubtedly accelerate the
process of national and global computing resource virtualization.
Although the latest version of grid middlewares and standards (i.e., GT4, MDS4,
GLUE1.2) have already been released as the guidelines of building a WS (Web Service)
based Grid, which in turn can share the same benefits with the serviceoriented
architecture(SOA), most of the current national Grids and the major grid applications
still have not utilized these forefront grid technologies for constructing a serviceoriented
grid, instead, they are still using nonWSbased grid middlewares (e.g., GT2, gLite[24])
and old information standards such as MDS2 and BDII [26] based on LDAP,
consequently, most of the current grid job scheduling middlewares lack the support for
2
6. the latest grid middlewares and information standards, which becomes an obstacle to
fully deliver the potential of the WSbase grid systems.
As a promising and continually increasing national grid program, the Australian
National Grid (ANG) [8] Program is building a national grid infrastructure using the
latest version of grid middlewares and information standards (i.e., GT4, MDS4,
GLUE1.2, etc.). To the best of our knowledge, the ANG grid is first large scale national
grid using the latest WSbased grid middlewares, which makes it a good testbed for grid
applications like job scheduling that can exploit the new features provided by all these
new standards and middlewares.
In order to more accurately describe the resources on the grid, such as providing extra
information entity to describe the available softwares, and the relationship between the
cluster and the queues it manages, which is not supported by current GLUE Schema
(1.2), the ANG Grid is using an extended version of GLUE Schema [7] to describe the
heterogeneous computing resources across the ANG Grid. However, although a number of
metaschedulers (such as Condor/G[9], Nimrod/G[10], GridBus Broker[11] and
GridWay[12], etc) have been developed over the last few years to achieve the goal of
resource virtualization across the grid and provide the endusers with a single entry point
to the grid, none of them support the latest version of GLUE (1.2) (see section 4. Survey
on Current Metaschedulers for details), which is suitable for modelling heterogeneous
resources with various domain policies (such as access control, usage quota and resource
usage priority allocation for a specific user group) on the grid. Furthermore, based on
such heterogeneous resource information model, a fully virtualized Grid needs more
advanced scheduling functionalities such as automatic software requirement
matchmaking, VOView requirement matchmaking, resource allocation policy
enforcement, etc., thus creating a completely virtualized computing environment across
the grid, and enabling the endusers (i.e., domain experts) to fully focus on problem
solving rather than any underlying technical details.
Our project is aimed at customizing and deploying a metascheduling system on a
computational Grid which uses the latest Grid middlewares and standards, our
scheduling system will be customized to completely virtualize the computing resources
across the Grid, thus providing domain experts an easytouse, automatic and intelligent
job execution environment. Our major contribution is to provide important and valuable
information and related experience about utilizing the latest grid middlewares (GT4) and
information standard (GLUE1.2), and virtualizing the computing resources across a
serviceoriented Grid which is using these forefront WSbased grid technologies. Our
decision of choosing GridWay as the basic platform of our scheduling system is based on
the survey conducted in Section 4. Based on GridWay's basic scheduling framework, we
added very important features (i.e., supports for GLUE1.2 resource information model,
software requirement, VOView requirement, software priority and advanced failure
handling) to achieve the real virtualization of the grid resources (see Section 4
Implementation for details). By using the GridWay modified and customized for the
3
7. Australian National Grid (i.e., ANGGridWay), the end user just needs to specify the
input data (what I have), the required resources (What I need) and optionally the user's
VO group (i.e., Virtual Organization) identity (Who I am), the scheduler then
appropriately schedules the tasks through ANG grid and hides all the technical details
such as requirements matchmaking, VOView checking, and failure handling. Our
scheduling policies also offer the ability to enable the grid administrator to enforce
advanced Virtual Organization policies which does not exist in current latest GLUE
specification, such as allowing the scheduler to choose a “preferred” queue for specific
software.
The rest of the thesis is organized as follows. Section 2 briefly introduces the
background information about the Grid Computing technologies, mainly focusing on the
grid information service and metascheduling on the grid. Based on the problems of
current grid middlewares and information standards, Section 3 describes the motivation
of our project. In Section 4, according to the criteria needed by a fully virtualized Grid, we
present a general survey on current major opensource metaschedulers, which helps us
make a decision on choosing an appropriate scheduler as the basic framework of our
customized scheduling system, which is described in Section 5. Section 6 concludes the
thesis.
2 Background
2.1 Grid Computing and Grid Service
Grid computing was proposed ten years ago as a new approach of distributed
computing which enables largescale scientific and engineering applications to access
supercomputing power, data storage, software tools and scientific instruments that are
distributed in heterogeneous systems across multiple locations.
A common description of Grid Computing compares it with an electric power grid,
through which we consume the electrical power on demand, without knowing where and
how the energy is generated. Similarly, Grid Computing technologies hide the details of
the underlying computing resources and the complexity of how these resources are
organized and how computation jobs are scheduled, thus creating a single and unified
system image, as a result, endusers are able to perform resourceintensive or compute
intensive tasks on the Grid as if they were using a single yet powerful virtual computer.
According to W3C (World Wide Wed Consortium) [27] and OASIS (Organization for the
Advancement of Structured Information Standards) [13], Web Service (WS) is a software
system which lets applications share data and interact irrespective of how those
applications were implemented, what operating system or platform they run on, and
what devices are used to access them. Web service achieved this by using a set of
platformindependent protocols and standards based on XML (Extensible Markup
4
8. Language), a universal structure language used by different web service applications to
exchange information. The core specifications of Web Service mainly include Simple
Object Access Protocol (SOAP), Web Services Description Language (WSDL) [28] and
Universal Description, Discovery and Integration (UDDI) [16]. As Web Services
Architecture is common, standard, and open, such a standardized approach enables us to
efficiently develop a distributed service system with lots of desirable advantages such as
interoperability, increased capacity and flexibility. It is naturally considered as the best
choice for gridbased applications. To adapt grid computing technology to a service
oriented architecture, OASIS [13] published the specification of WSRF (Web Service
Resource Framework) [14], a generic and open framework for modeling and accessing
stateful resources using Web services. WSRF defines standard means by which web
services can be associated with one or more stateful resources. A gridenabled web service
is socalled grid service. By encapsulating stateful resources, grid services enable service
requestors to indirectly access state in a standard, consistent and interoperable manner.
By constructing using a variety of technologies and open standards such as Open Grid
Service Architecture (OGSA) [15], gridenabled network provides highly scalable, secure,
and highperformance mechanisms for discovering, and negotiating access to remote
computing resources in a seamless manner. This makes it possible for scientific
organizations to share computing resources on an unprecedented scale, and for
distributed groups to collaborate in ways that were previously impossible.
The fourth edition of Globus Toolkit (GT4) [5], is the major implementation of WSRF.
Globus Toolkit is the de facto standard for open grid computing, it is a componentbased
system including software for security, information infrastructure, resource management,
data management, communication and fault detection, etc. These components can be
used either independently or together to construct gridbased system. Today, especially
with the release of GT4, which fully supports the key Web Service standards and WSRF
specification, Globus Toolkit has rapidly become the most popular middleware for grid
system development.
It's noteworthy to mention here that although most of the latest grid middlewares and
standards (i.e., GT4, MDS4, GLUE1.2, etc.) and related grid applications are moving
towards the serviceoriented architecture (SOA), most of the current national grids still
have not utilized these frontend grid technologies, instead, most of them are still using
old version of grid middlewares and standards such as GT2, MDS2, BDII [26] and gLite
[24]. To the best of our knowledge, the Australian National Grid (ANG) is the first large
scale national grid using the latest and WSbased grid middlewares and standards, thus
providing us with a good testbed for these forefront grid technologies.
2.2 Resource Sharing on the Grid
Resource sharing is one of the most important topics of Grid computing. To some
extent, the process of solving a computational problem on the Grid is the process of
negotiating access and using various resources distributed across the Grid, under the
5
9. restrictions of usage polices enforced by the global and local system administrators. In
this subsection, we give a brief description of the resources on the Grid, and an
important concept that is used to describe a group of individuals and organizations that
share the same interests and have the common goals that need to share resources on the
Grid, i.e., Virtual Organization.
2.2.1 Definition of Resource on the Grid
Generally speaking, anything on the Grid that can be shared and used by users across
geographically dispersed locations and/or different administrative domains can be
considered as a resource, therefore, computing power, software programs, bandwidth
connection, data storage systems, scientific instruments and even human beings can fall
into the category of “Grid resource”. Although these resources vary on different
characteristics and may be used in various purposes, they all have the commonplace that
they can be shared across the Grid network under certain Grid protocols. and therefore in
order to interoperatively discover, monitor and use these resources, a set of standards
must be agreed across multiple Grid sites. Currently the de facto standards for describing
and publishing resources on the Grid are GLUE Schema and MDS, respectively, which
will be discussed in the subsequent sections.
2.2.2 Virtual Organization on the Grid
Virtual Organisation (VO) is formed by a group of geographically dispersed individuals,
institutions and organisations that have the common interests and objectives which need
to share various resources across the Grid under the restriction of a set of administrative
policies. Usually members of a VO have shared responsibilities, shared control, shared
access to computing resources and services.
In GLUE Schema Specification1.2, the information of VO is contained in the VOView
entity, which describes the status of resources specific to a particular VO. Therefore by
making use of the VOView information, more advanced metascheduling can be achieved,
such as access control, resource usage quota control and lots of potential customizable
resource allocation policies, like the software priority discussed in Section 5.5.
2.3 Grid Information Services
Grid Information Services (GIS) are a vital part of any Grid software infrastructure
because they provide fundamental mechanisms for discovering and monitoring, and thus
planning and adapting application behaviour [25], such as appropriately scheduling a
computational job according to the resource information provided by the GIS services.
This section gives us a highlevel overview of the GIS working mechanisms and related
concepts, mainly focusing on Globus's latest implementation (i.e. MDS4 [2]).
6
10. 2.3.1 Concept of Discovery and Monitoring
• Discovery: In the context of Grid computing, discovery is the process for
discovering available resources on which a task can be performed. For example,
a metascheduler (i.e., a global scheduler which dispatches computational tasks
to the local schedulers like PBS [19], SGE [44], etc.) might use the discovery
service to locate all the available hosts on the Grid for future job submission
and execution.
• Monitoring: The grid resource monitoring service provides the functionality
that monitors the status of the resources on the Grid. For example, after
obtaining a list of available hosts on the Grid, the metascheduler might use the
monitoring service to locate a subset of the discovered hosts which satisfies
certain criteria specified by the grid enduser as a part of user requirement and
a set of scheduling policies specified by the grid administrator.
2.3.2 MDS Information Services
This subsection gives us an overview of the Monitoring and Discovery Service (MDS),
which is Globus project's implementation of GIS. We will mainly discuss the latest
version of MDS, i.e., MDS4, which is WSbased and used buy current ANG, but it is
noteworthy that the old version of MDS (i.e., MDS2) and Berkeley Database Information
Index (BDII) [26], which are based on LDAP rather than Web Services, are still the most
commonly used implementations of GIS.
Overview
As a key component of GT4, Monitoring and Discovery System (MDS4) [2] provides a
set of web services which streamlines the tasks of discovering, publishing, aggregating
and monitoring the configuration and status information of resources and services
distributed across multiple locations. MDS plays a key role in metascheduling because
the metascheduler (which uses MDS as the source of the resource information) heavily
depends on the resource information provided by the MDS service to make appropriate
scheduling decisions. Generally, MDS service consists of the following two subservices:
• Index Service
The index service is a WSRFbased service provided by MDS4 to collect grid
resource information from registered information sources, publish the
information as WSRF resource properties and provide query/subscription
interface to the resource information. In the context of grid, an information
source can be any kind of entity (e.g. a file, a program, a web service, or another
networkenabled service), from which the index service can obtain resource
information. The index service is similar to UDDI [16] service but with more
flexibilities. Index service is selfcleaning, which means each registered index
entry has specified lifetime and will be removed if it is not refreshed before it
7
11. expires. Moreover, index services can be registered with each other in a
hierarchical fashion with the upper lever index services being from the lower
level index services.
• Trigger Service
Trigger service is another MDS service which is designed to collect resource
information and take action according to the available data.
The index service and trigger service are built on the MDS Aggregator Framework
[17], which is a software framework designed for constructing higherlevel services
collecting, aggregating resource information and providing notification
mechanisms based on the change of the status of the resources. Because the index
service and trigger service are built on the aggregation framework, they are also
known as aggregator services.
Dynamic Resource Discovery
The following is a brief description of how MDS4 service is used to perform dynamic
resource discovery.
• Step 1 Registration: Each information source is registered with an aggregator
service so that it can collect the information of the local resources. One aggregator
service can be registered to another higherlevel aggregator services, in a
hierarchical fashion.
• Step 2 Collection: Each aggregator service collects the status information of its
underlying resources, such as available job slots, memory, disk space, software
tools, etc.
• Step 3 Publish: Each aggregator service publishes the resource information
• Step 4 Aggregation: The aggregation program collects the uptodate information
from the information sources and consolidates the information from the MDS
hierarchy to a central location, making it available via aggregatorspecific Web
interfaces, thus providing the outside entities (e.g. the enduser, a metascheduler
or any other entity that needs to discover and access the resource information of
the Grid) with a complete description of the Grid resources.
Note that Step 2, 3 and 4 are periodically performed so that the resource information
published to outside entity can be kept uptodate; besides, each registered services has a
lifetime and will automatically disappear if it is not renewed periodically within the
lifetime.
GLUE Schema
As we can see in the previous section, the Globus MDS services provide a standardized
approach for grid resource discovery and monitoring, thus playing a key role in
computing resource virtualization. In more detail, Grid Laboratory Uniform Environment
8
12. (GLUE) Schema [6][7], the de facto standard for abstracting the real world computing
resources into constructs which can be represented in computer systems. MDS services
use GLUE Schema to describe the grid resources in a precise and systematic manner,
thus enabling them to be discovered for subsequent management or use such as MDS
aggregation services and metascheduling. The key GLUE (1.2) elements for meta
scheduling are listed as follows:
• Site: An administrative concept used to aggregate a set of services and
resources that are managed by the same organization
• Cluster: A set of physical resources managed by a local management system
(e.g, PBS, Condor, LSF, SGE, etc). The resources managed by a cluster can be
heterogeneous or homogeneous.
• SubCluster: Provides information about a homogeneous set of hosts. A Cluster
can be considered as a set of SubClusers.
• ComputingElement: The common Grid abstraction for a queue of a system
managing computing resources.
• VOView: Provides a mechanism to describe different viewpoints related to
local policies on the same resources assigned to a queue.
• SoftwarePackage(Added by ANG) [18]: An entity that describes the
characteristics of an installed software, such as software name, version and
module name.
• StorageElement: The core concept of the model for abstracting storage
resources
ANGExtended GLUE Schema
Although GLUE 1.2 is the latest grid information standard, it provides little support in
terms of advanced metascheduling (i.e., global scheduling, see Section 2.4 for details),
which plays a very important role in fully virtualizing computing resources on the Grid.
For example, current GLUE schema does not provide description of software tools, it also
does not describe the relationship between a particular computing resource (i.e. a queue
or a ComputingElement in the GLUE1.2 Schema) and the environment (i.e., the
SubCluster element in the GLUE1.2 Schema) to which it belongs. Consequently, in order
to correctly submit and execute a job, the user has to priorly know the location of the
desired resource and what software tools it is providing. To some extent, this is not real
Grid resource virtualization, although we are using the resources provided by the Grid.
Further more, as the underlying grid infrastructure is becoming more and more complex
and dynamic, it is becoming irrealistic to expect that our user has sufficient knowledge
about the required resources. Therefore, current GLUE schema still cannot satisfy the
needs for the metascheduling system to fully virtualized the resources across the Grid.
To make up the gap between the latest information standard and the needs for
complete resource virtualization on the Grid, ANG has extended GLUE 1.2 to provide
extra information entities describing software tools, mapping between queues and
9
13. SubClusters and VO resource usage control policies, etc [18], which makes it capable of
modelling the complete information of a Grid, including computing power, storage
systems, software tools, the relationship between these resources, and VO policies (usage
quota control) enforced on them. Therefore, based on the extended GLUE Schema, it is
possible for us to build a metascheduling system which fully supports user requirements.
That means enduses do not need to have any prior knowledge about where the required
resources are located, instead, they just need to “tell” the scheduler what they need at an
abstract level, such as the name and version of the desired software or softwares, then
the metascheduler automatically locates the resources according to user requirements.
The GLUE extension made by ANG has been proposed to OGF [29] as a part of the future
release of GLUE standard.
2.4 Metascheduling on the Grid
One important goal of grid computing is resource utilization and virtualization, while
grid metascheduler plays the key role in achieving this goal.
From the viewpoint of resources, a grid is composed of a set of resources and resource
managers. Each of the resource managers such as PBS[19], Condor[9], SGE[20] and
LSF[21], coordinates and controls its local resources, the basic function of such local
resource managers is load balancing, i.e., allocating a specific job to a suitable local
resource.
Currently many users directly submit jobs to the local scheduler or indirectly do the
same thing via Globus grid interfaces (e.g., Globus command lines or Globus APIs). Users
have to make decisions on their own about what kind of resource is suitable for their jobs,
the responsibility of monitoring the job execution also falls onto the shoulders of end
users. If a job fails, they have to manually resubmit it. Also, the end user has to log into
different environments with different accounts if they want to use multiple resource
managers.
Obviously, in a largescale, dynamic and heterogeneous grid environment where
various types of resource managers exist for managing their local computing resources,
this is not an efficient way to manage job execution. Obviously a global resource
manager/scheduler is needed to manage these local schedulers and appropriately
negotiate access to different resource managed by different local schedulers, thus
releasing the burden of job execution management from the endusers. This is the goal of
the grid metascheduler.
In the grid environment, a metascheduler is also known as a global scheduler, which
is aimed at achieving the goal of grid resource utilization and virtualization. The meta
scheduler provides the enduser with a single entry point to the grid computing
environment. It coordinates communications between multiple heterogeneous local
schedulers that operate at the local or cluster level. To schedule an individual job, the
metascheduler typically queries information about available compute resources and
their status from Grid Information Services (e.g. Globus MDS), after determining which
10
14. resource is suitable for the job, the metascheduler dispatches the job to the desired
resource or resources, where the job will be rescheduled by the local scheduler to a
specific compute node within that resource.
By utilizing the Grid Information Services, which abstract the information of all the
available resources on the grid into a single resource perspective, the grid metascheduler
presents the end user a single virtual resource pool, which hides all the details of
scheduling and monitoring jobs on the dynamic and heterogeneous grid.
An ideal grid metascheduler should have the following capabilities:
• Virtualizing the resources on the grid to the most extent, which means:
Hiding all the details about job and resource management
Enabling the end user to submit jobs without any gridspecific knowledge
• Utilizing the resource on the grid, which means:
Using any possible/available resource to the most extent
• Appropriately executing the job on a suitable resource according to the user's
requirements and domain policies
With regard to the easy of use, a good metascheduler which is capable of fully
virtualizing all the resources across the grid and concealing the heterogeneous and
dynamic nature of the grid should only ask the endusers three questions, under the
assumption that the endusers have no any gridspecific knowledge:
• What do you need? (Requirements) This usually involves requirement
matchmaking between known resources on the grid and the resources required by
the end user, such as a particular operating system, a specified queue, free job
slots, desired disk space, a particular software with a specified version, and the
forth.
• What do you have? (Input) This usually needs the end user to specify the input
of the job, such as input files on the local or remote machine, or the arguments of
the executable, or both.
• Who are you? (VO Identity) A grid user's VO identity is used for authentication
or authorization in terms of job submission, data access, accounting, etc. In the
context of metascheduling, the user's VO identity is used to retrieve userspecific
resource information like the accessible free CUPs, disk space and software tools.
In GLUE Schema Specification (1.2), the grid user's VO information is contained in
the VOView entity, thus by making use of the VOView information, it is possible
for the metascheduler to provide the enduser with a “personalized” resource
perspective of the Grid.
The three questions mentioned above indicate the basic requirements for complete
resource virtualization from the viewpoint of enduser. An ideal metascheduler is
supposed to enable the end user to just answer the three questions in an efficient way in
order to automatically submit, schedule, execute and manage a computational job.
11
15. 3 Motivations
As mentioned in previous sections, most of the current national grids and the major
grid applications still have not utilized the forefront WSbased grid technologies for
constructing a serviceoriented grid, instead, they are still using old and nonWSbased
grid middlewares (e.g., GT2, gLite [24]) and old information standards such as MDS2
and BDII [26] which are based on LDAP, consequently, most of the current grid job
scheduling middlewares lack the support for the latest grid middlewares and information
standards. Furthermore, no matter what standards or middlewares these meta
schedulers support, according to our survey (see Section 4), none of them offer the
capabilities for advanced scheduling such as automatic software matchmaking according
to user requirements and VObased resource usage policy enforcement, which are very
important to build a completely virtualized working environment for domain experts, and
enable them fully focus problem solving rather than any specific technical details. All of
these problems have become serious obstacles to fully deliver the potential of a WSbased
grid system.
On the other side, to the best of our knowledge, the Australian National Grid (ANG) is
the first large scale national grid which is using the latest WSbased grid middlewares
and standards (i.e., GT4, MDS4, GLUE1.2) to build a grid infrastructure to support e
Research in Australia, this makes it a good testbed for resource virtualization via the
12
16. metascheduling system which is based on all these forefront grid technologies.
Therefore, the motivations behind this project can be mainly summarized to the task of
virtualizing the high performance computing resources across a serviceoriented Grid,
thus creating a unified system image and providing the grid users with a single entry
point to the serviceoriented grid computing environment and an easytouse, intelligent
job submission and execution environment. Such a scheduling system should also offer
the functionality that allows the grid administrator to appropriately enforce various local
domain policies. Our main contribution will be providing important information and
related experience for virtualizing computing resources across and deploying a full
featured metascheduling system on a serviceoriented Grid, based on the latest WS
based grid middlewares and standards.
• Job Scheduling Automation
For a large scale and continuously increasing computational grid like ANG, it is no
longer suitable and efficient for its endusers to manually submit and monitor jobs
in a dynamic grid environment. It is cumbersome and inconvenient for the user to
select a suitable resource and manually submit the job to the target resource, and
repeat the process if the job fails. Therefore, an automatic and intelligent job
scheduling system is highly desirable for the Grid to enable its endusers (i.e.,
domain experts) to fully focus on problem solving rather than any underlying grid
specific technical details.
• Supporting the Latest Information Standard
As mentioned in Section 2.3.2, GLUE Schema [6] is the de facto standard for
modelling the status and characteristics of various heterogeneous resources on the
Grid. Especially with the release of GLUE 1.2 [7], the latest GLUE schema
specification, the improvements and newly added information entities provided by
GLUE 1.2 enable it to describe the information of the resources, the relationship
between the resources and various VO usage policies enforced on them in a more
precise way, which brings a lots of potential usages for advanced job scheduling.
However, according to our survey (see Section 4), none of the current major meta
schedulers have utilized the latest information standard, therefore adapting the
our metascheduling system to smoothly cooperate with the latest grid information
standard, i.e., GLUE1.2, will be one of our main tasks.
• Fully Supporting User Requirements
As mentioned in Section 2.3.2, by using ANGextended GLUE resource information
model, which provides the information of softwares running on remote resources, it
becomes possible for the metascheduling system to fully support user
requirements, including normal requirements, software requirements and VO
based requirements, which are not supported by any of current metaschedulers.
Our metascheduling system will be the first scheduling system which utilises the
latest grid standards and fully supports user requirements, and our work will
provide valuable information about using these standards and related
13
18. 4 A Survey of Current Metaschedulers
4.1 Motivation
During the last decade a lots of metaschedulers such as Condor/G[9], Community
Scheduler Framework (CSF) [22], GridBus Broker [11], GridWay [12] and Nimrod/G [10],
etc., have been developed to provide the grid users with a transparent job submission and
execution environment. Although they differ from each other by different focuses, all of
them provide the basic job submission and execution functionalities, therefore there is no
need for us to develop a completely new metascheduler for ANG, instead, we conducted a
general survey on current major opensource metaschedulers to help us choose an
appropriate metascheduler as the basic framework of our customized metascheduling
system.
4.2 Current Condition of ANG
• ANG is using the latest WSbased grid middlewares (GT4) as its core grid
middleware, and the latest version of information standard (GLUE 1.2) to describe
the status and characteristics of resources.
• ANG extended GLUE (1.2) in order to more completely and precisely describe the
resources, the relationship between these resources and the various VO policies
enforced on them [18].
• As a Grid using the latest WSbased grid middlewares and information standards,
ANG needs a metascheduling system which fully supports these forefront grid
technologies; meanwhile such system needs to be customized to satisfy the
extended information model, thus making it possible for building a fully virtualized
job submission and execution environment for endusers
• We do not need to start from scratch since there are already several opensource
metaschedulers available for basic job scheduling on the grid
4.3 Survey of Current Opensource MateSchedulers
This section presents our survey on current major opensource grid metaschedulers,
based on the aim of fully virtualizing computing resources on a serviceoriented Grid
which is using the latest WSbased grid middlewares and information standards, and the
needs of cooperating with ANG's GLUE extension [18]. We considered five criteria when
conducting the survey: Scheduling Functionality, Fault Detection and Recovery
Capabilities, User Interfaces Functionality, Application Support, Features and
Limitations. Details of these survey criteria and the corresponding survey content are
15
19. given in each subsection.
4.3.1 Scheduling Functionality
• Dynamic Resource Discovery
Undoubtedly the scheduler is supposed to offer the ability of dynamically discovering
resources across the Grid in order for the scheduler to make right decisions based on
the latest status of the resources on the Grid.
• MDS4 and GLUE1.2 Support
The scheduler heavily depends on the grid resource information service to make
appropriate scheduling decision, therefore supporting the latest grid resource
information service standards (i.e., MDS4 and GLUE Schema 1.2) will be a highly
desirable feature.
• Basic Job Requirement Support
The scheduler should provide the basic mechanism to support ordinary job
requirement such as a specific host name, queue name, available memory, free job
slots, etc.
As for job description, the scheduler should accept job described using Resource
Specification Language (RSL) [30], which is used by current Globus middleware to
describe the requirements of computational jobs for submission to the Grid.
A more attractive feature of the expected metascheduler is the support for Job
Submission Description Language (JSDL) [31], which is a new OGF [29] standard
used for describing the requirements of computational jobs. JSDL will be supported
in the next release of Globus Toolkit.
• Support for AutoSoftware Searching and Matchmaking According to
User Requirements
A highly desirable feature is enabling the enduser to specify what kind of software
tools they need to use without the need to know where the software or softwares are
located on the Grid. Although all of the schedulers allow the user to specify the
executable they need to run, all of them assume that either the executable needs to
be copied to the target resource or the executable already exists on the target
resource. Therefore providing automatic software searching and matchmaking
functionality will undoubtedly be a considerable progress towards the compute
resource virtualization of the Grid base on the latest grid technologies.
• Scheduling Policy Support
Another very desirable feature of the expected scheduler is enforcing scheduling
policies based on users' VO information, which can be extracted from the VOView
element of GLUE 1.2. Such functionality enables the scheduler to perform advanced
scheduling based on userspecific information rather than the general resource
information. Therefore, we expect that the desired scheduler should have basic
scheduling policy enforcement functionality that is easy to be integrated with the
latest resource information standard (i.e., GLUE1.2).
16
20. The following table (Table 4.1) lists the comparisons between current major open
source metaschedulers, in terms of their scheduling functionalities, note that static
resource discovery is supported by all the schedulers listed in the table below.
Dynamic Discovery MDS4 and
GLUE 1.2
Support
Normal Resource
Requirement
Support
Support for
AutoSoftware
searching and
matchmaking
Scheduling Policies
MDS4 GLUE1.
2
Condor
G(v6.8)
No direct support
• Resources can be
added into the local
Condor pool via
Condor Glidein,
which provides a
mechanism to add a
Globus resource to a
local Condor resource
pool.
• A custom system is
needed to achieve
dynamic discovery
Yes No direct
support
Expressed by
ClassAd Boolean
expressions (a list of
keyvalue pairs that
describe a job, a
machine or a grid
site [32])
• Supports RSL*
• No JSDL*
support
No support No direct support for customizing
scheduling algorithms, but can
possibly be achieved by complex
ClassAd expressions
CSF(v4.0.3) No support, resources
need to be pre
configured
Yes No direct
support
• Supports RSL
• No JSDL support
No support Two builtin schedulers:
• simple roundrobin scheduler
• Job throttle, which makes sure
that each resource manager is
not overloaded
Other schedulers can be developed
via CSF java API
GridBus
Broker(v3.0
)
No direct support.
Computing resources
and data storages must
be specified in a service
description (xml) file.
Yes No direct
support
Currently the job
requirements feature
works with only
Globus. The
supported
parameters are the
same as the ones
supported in Globus
RSL.
Supports GGF
JSDL (for non
parametric jobs)
No support Five builtin schedulers are
available:
• DBDataScheduler: takes into
account both data and network
costs
• DBScheduler: a simple economy
based scheduler for the broker,
which does not take into account
remote data files.
• GroupingParameterSweepSched
uler: a basic scheduler for the
broker which implements a
simple roundrobin scheduling
algorithm
• ParameterSweepScheduler
• simple roundrobin scheduler
Other schedulers can be developed
via Gridbus broker API
GridWay
(v5.2)
• Needs to specify the
MDS server
• All the hosts and
queues can be
dynamically
discovered
Yes No direct
support
Expressed by a set of
predefined variables
Supports RSL
Supports JSDL
No support Two builtin scheduling algorithms
available:
• A default builtin scheduling
algorithm using a weighted sum
approach to select the candidate
resource
• A RoundRobin scheduler is
given as an example schedule
User priority and resource priority
are supported. Other scheduling
algorithm should be customized.
17
21. NimrodG
(v3.x)
No support for dynamic
resource discovery, the
resources must be
statically specified in
Nimrod database
Yes No direct
support
No direct support,
user has to specify
the resource for the
given job through
command line
No support • Resource needs to be allocated to
the given job
• Two computational economy
scheduling algorithms
• Deadline Scheduling
• Cost Minimization
* Resource Specification Language (RSL): The de facto standard used to describe the requirements of a computational job for
submission to Globus execution system.
* Job Submission Description Language (JSDL): An OGF [29] recommended standard for computational job description and
submission to resources, particularly Grid environments but not restricted to the latter [31].
Table 4.1 Scheduling Functionality
4.3.2 Fault Detection and Recovery Capabilities
• Job Cancellation
A good metascheduler should make the submitted jobs under control, the bottom
line is allowing endusers to cancel the jobs they submitted through the scheduler.
• Fault Tolerance
Because of the dynamics of the grid environment and the network itself, the job
can not be absolutely guaranteed to be finished successfully according to user
requirements for some unpredictable reasons like local or remote system crash or
network disconnection, therefore the metascheduler is supposed to offer basic
mechanism which can friendly handle the job in case of failure, such as retry,
rescheduling or recovery.
Table 4.2 lists comparisons between these metaschedulers’ functionalities regarding to
job cancellation, resource and client fault tolerance, as we can see, all of these meta
schedulers somehow support job cancellation and fault tolerance, like Command Line
Interface(CLI), Application Programming Interface(API), job retry or resubmission and
job recovery.
Job Cancellation Resource Fault Tolerance Client Fault Tolerance
CondorG(v6.8) Users are allowed to cancel the
job via CLI or API
• Achieved by retrying and restarting
globus JobManager
• Provides enhanced towphase commit
submit protocol
All relevant state for each
submitted job is stored
persistently in the Condor job
queue for job recovery
CSF(v4.0.3) Users are allowed to cancel the
job via Command Line Interface
(CLI) and CSF API
No direct support but can be achieved by CSF job service API
Gridbus Broker
(v3.1)
• No direct command for killing
job via CLI
• Job can be reset by API
Persistence to enable failure management and recovery of an executing
grid application
GridWay(v5.2) Users are allowed to cancel the
job via CLI or API
• Job will be migrated if the job exit
code is not specified
• Job is retried on the same resource in
case of failure
• Job is rescheduled if the number of
retries reaches a specified limit
• The host is banned for a specific time
period if the resource on it fails
The state of GridWay running
on the client host is
periodically saved in order for
job recovery
NimrodG(v3.x) Users are allowed to cancel the
job via CLI
• Achieved by job resubmission Job statuses are recorded in
Nimrod database for
18
24. submission interface that is similar to
Globus Grid Resource Allocation Manager
(GRAM)*
Nimrod
G(v6.8.x)
Specially designed for parameter sweep
applications
No scheduling functionalities
No software requirement support
No VO policy support
* Grid Resource Allocation Manager (GRAM) [35]: A part of Globus Toolkit, which provides a set of web services to submit, monitor
and cancel jobs on Grid computing resources. It is not a resource scheduler but rather an interface which is used to communicate with
various local resource managements (LRM). GRAM is the de facto protocol between the metascheduling system and the LRM.
* GridGateWay (GGW) [34]: A job submission interface which can be integrated with GRAM but can be used for job submission to
GridWay metascheduling system.
Table 4.5 Features and Limitations
4.4 Conclusion
According to the survey we conducted, we came to the conclusions as following:
• None of the current major (opensource) schedulers can fully support the latest
WSbased grid information service standards (i.e., Globus MDS4 with GLUE1.2).
• None of these metaschedulers support automatic software searching and
matchmaking according to user requirements, which is a highly desirable for
automating and virtualizing enduser's job execution environment.
• As the resources across the grid are shared by multiple organizations, a usage
control mechanism is very useful to appropriately allocate resources to users from
different virtual organization; however, none of current metaschedulers have such
support. It is noteworthy that GridWay5.2 has similar support (see Table 4.1) but
that is for the GridWay software user rather than the VO user.
• As a further consequence of the lack of support for the latest GLUE standard, none
of current metaschedulers can automatically enforce VOspecific resource usage
polices.
Considering the needs for fully virtualizing resources on the Grid and supporting the
latest grid middlewares and information standards, we chose GridWay as the basic
scheduling framework according to the survey. GridWay (5.2) has the following
advantages when compared with other metaschedulers in the survey:
• Modular software architecture, which makes the scheduling system easy to
customize and extend.
• Specially designed to work on top of Globus, which is used by most of the large
scale grids. A noticeable advantage of GridWay is that it can be integrated with
GridGateWay [34] which has the same interface with GT4 GRAM that accepts job
submission described using RSL (and JSDL [31] in the future release), so by
integrating GridWay and GridGateWay there is no need to make any modification
to existing Grid applications that submit jobs to GT4.
• Already supports dynamic resource discovery, thus making sure that the meta
scheduling system can use the uptodate resource information.
• Provides basic automatic scheduling functionalities such as normal requirements
declaration and failure handling.
• Supports both C and Java programming API, which is useful for more advanced
21
25. development in the future.
• The builtin scheduling algorithm makes it easy to integrate with the new resource
usage policies based on VO information.
• A very active project, easy to get technical support.
5 Implementation
As discussed in the previously sections, our implementation aims to customize and
deploy a metascheduling system on a serviceoriented Grid which is using the latest WS
based grid middlewares and standards (i.e., GT4, MDS4 and GLUE1.2), thus fully
virtualizing the high performance computing resources on the Grid and providing domain
experts with an easytouse, automatic and intelligent problemsolving environment. As
few of the large scale national Grids around the world have utilized these serviceoriented
forefront grid technologies, our research and implementation on the Australian National
Grid will provide valuable information and experience about using these new
technologies on a real and large scale Grid computing environment.
5.1 GridWay
Based on the survey mentioned in Section 4, we decided to use GridWay as the kernel
of our scheduling system. The most desirable characteristic of GridWay is not only its
ability of dynamic resource discovery, but also the extendable modular architecture,
which makes it easy to customize according to our needs for building a metascheduling
system on the serviceoriented Grid. More favourable features were discussed in Section
4. The latest version of GridWay is 5.2. As GridWay is the base of our scheduling system,
it is necessary to make a brief introduction to the GridWay modular software architecture
and its builtin scheduling policies. More details about GridWay modular architecture
and scheduling policies can be obtained from GridWay's website [12].
5.1.1 GridWay Modular Architecture
This section briefly introduces different software modules of GridWay (see Figure 1).
Please refer to GridWay user guide [23] for more details.
22
27. Fixed Resource Priority Policy: Each resource discovered by the
information manager can have a fixed priority assigned by the system
administrator
Rank Policy: Each resource can have a dynamic rank which is calculated
via the RANK expression
Usage Policy: The resources can be prioritiesed by current usage data
and historical usage data recorded in the accounting database
Besides, the builtin scheduler uses the following Failure Rate Policy to handle
resource failure: Resource with persistent failure will be discarded for a certain
time period according to an exponential linear backoff strategy [23].
5.1.2 Working Mechanisms
This section gives a brief description of the working mechanism of GridWay in terms of
resource discovery and job submission. Please see Section 5.25.6 for details of our
implementation for a fully virtualized job processing environment on the serviceoriented
Grid.
Step1. Dynamic Resource Discovery
• Information Manager (IM) periodically queries MDS information from the MDS4
(GLUE) server and stores the result in a temporary file on the local machine
• IM parses the information and returns a list of available hosts
• According to the host list, IM queries available status information of the host and
the queues it is managing.
• Finally all the hosts and queues managed by each grid site are ready for use.
Step2. Job Requirements Matchmaking and Ranking
• After the resource discovery stage, the job is now ready to submit. However,
before the job submission, the job requirements matchmaking and ranking must
be done if the user specifies any requirements and rank expressions in the job
plan file or the application using GridWay API, otherwise, the job will be
submitted in a default way by the scheduler.
• Matchmaking and ranking is done by our customized matchmaking and ranking
algorithms, see Section 5.3.2 for descriptions of these algorithms
• All the queues that satisfy the requirements will be chosen as the execution
candidates and will be assigned a rank according to the ranking expression. The
final priority of the queue will be computed according to the scheduling policy, if
the default builtin scheduler is enabled, the queue priority for the given job will
be computed using a weighted sum approach described in Section 5.1.1.
Step3. Job Submission
• According to the job plan description and the scheduling algorithm, a temporary
24
28. job description (RSL) file is generated, which specifies the name of the executable
on the remote machine, the queue the scheduling algorithm determines to use,
the input and output files that need to copied to and from the remote machine,
and environment information.
• Finally, GridWay uses the Execution Manager module to submit the job.
5.2 Resource Information Representation
As we know, the representation of resources on the Grid is extremely important for the
metascheduler to carry out correct, reliable and intelligent scheduling. However, current
GridWay metascheduler does not support job scheduling based on GLUE1.2 resource
information model, furthermore, the GLUE1.2 information model itself does not provide
accurate description about the mapping between the queues and the computing
environment they belong to, which is not suitable for accurate metascheduling according
to user requirements. In this section, we discuss the details of the problem we are facing
and propose a solution to this particular problem.
5.2.1 Problem Statement
Problem 1: The metascheduler (GridWay) assumes that one Grid site's GRAM (job
submission interface [30]) is used to communicate with only one single/homogeneous
computing resource, but according to GLUE1.2, a Grid site can be either homogeneous or
heterogeneous, depending on how many homogeneous resources (i.e., the SubCluster
elements in GLUE1.2) the Grid site's GRAM service is managing. A Grid site is
homogeneous if it has only one SubCluster while it will become heterogeneous if it has
two or more different SubClusters. Consequently, the default GridWay will not be able to
appropriately parse the resource information of a Grid site which uses one GRAM service
to manage its heterogeneous resources.
Problem 2: The latest GLUE Schema Specification (1.2) does not provide the mapping
between ComputingElements (i.e., the queues) and SubClusters (i.e., the Host concept
used by GridWay). According to GLUE1.2, a Grid site is a set of resources managed by
the same organization. Within a Grid site, there can be one or more Cluster, which is a
heterogeneous set of computing resources. The homogeneous environment is described
with SubCluster. Current problem in such information model (GLUE 1.2) is that the
queue (ComputingElement) has the parallel relationship with MORE THAN ONE
SubCluster, as a consequence, the relationship between the queue and the SubCluster
becomes indeterministic. This makes a metascheduler unreliable to do any advanced
scheduling if the Grid user specifies any hostrelated requirements. Figure 2 illustrates
the information model base on GLUE Schema (1.2) and the next subsection will show us
a usage example based on such model.
25
30. </ Cluster >
…
</ Site >
Figure 3. A typical representation of a a grid site, using the GLUE Schema Specification (1.2)
The example above describes the computing resources of Grid site SAPAC which is
running three different execution environments represented by three SubClusters. As
mentioned in Problem 1 in Section 5.2.1, the default GridWay assumes one site just has
one SubCluster, this makes it unable to appropriately describe the resource information
of a Grid site which has multiple SubClusters. i.e., the default metascheduler cannot
appropriately work with heterogeneous resource represented by GLUE1.2
Now, let’s suppose a usage scenario in which the scheduler can not work correctly
using the resource information represented by GLUE1.2. In the job plan file (i.e., a file
using a set of predefined variables to describe the requirements of a combinational task,
like the name of the executable, the input and output, etc), the enduser specifies that the
job must be run on host SAPAC and software X must be used for data processing (the
support for software requirement is discussed in Section 5.3). The requirement
expression is written as following: (Assume that only the second SubCluster is running
the desired software and the user does not care which queue should be used)
REQUIREMENTS = HOST_NAME = “SAPAC” & SOFTWARE = “X”
After submission, the scheduler checks the MDS information, finds that a grid site named
“SAPAC” definitely exists and this site also has the execution environment (SubCluster)
which is running the desired software. Because there is no description about the
relationship between the queues of this site and the execution environments (i.e., the
SubClusters), all the queues will be chosen as the candidates for job execution. According
to GridWay's default scheduling algorithm, the first queue (by order of appearance in the
MDS information) will be chosen as the default candidate for the given job, however, the
problem is that we cannot guarantee that the first queue belongs to the desired
SubCluster (the seconde SubCluster in this example) which is running software “X”.
Therefore, in order for the GridWay to perform more accurate scheduling, we need a
customized information model which not only can describe the status and characteristics
of the resources, but also has an appropriate way to indicate the relationships between
these resources, especially the association between the ComputingElements and the
SubClusters.
5.2.3 Solution: ANG Resource Information Model
In order for the metascheduler to appropriately interpret the resource information of a
heterogeneous Grid site represented by GLUE1.2 and correctly schedule a job on the Grid
based on GLUE1.2 standard, our solution is extending current GLUE1.2 information
model by establishing the corresponding relationship between the queues (i.e.,
ComputingElements) and their execution environment (i.e., SubClusters ). For example,
adding a new attribute or child element to the queue (i.e., the ComputingElement) with
27
32. Figure 5. The resource information model used by current ANG Grid
5.2.4 Identifying Heterogeneous Resources
Based on the extended GLUE information model, it becomes possible for the meta
scheduler to perform accurate job scheduling because the relationship between queues
and SubClusters is deterministic. But further work needs to be done for the meta
scheduler to correctly interpret the resource information of a heterogeneous Grid site
whose GRAM service is used to communicate with multiple SubClusters.
As described in Problem 1 of Section 5.2.1, GridWay assumes that the GRAM service of
a Grid site is used to submit jobs to a single SubCluster, therefore, from the default meta
scheduler's point of view, a Grid site is equal to a Host, which has the same meaning as
SubCluster (i.e., a set of homogeneous computing resources, the concept of Host and
SubCluster is interchangeable in the rest of the thesis) in GLUE1.2, but based GLUE1.2
standard, a Grid site may have multiple SubClusters, so from the default meta
scheduler's point view, by using GLUE1.2 information model , a Grid site is not a Host
any more, but a set of Hosts or SubClusters. Therefore, we need to change the way that
how the default GridWay interprets resource information based on GLUE1.2 standard.
As the solution, a new resource information parser was written to interpret the
GLUE1.2based resource information and convert the interpreted data into each Host or
SubCluster unit which the metascheduler can understand. Furthermore, the default
GridWay uses the name of GRAM to uniquely identify a host (see Figure 6 for an
example), but now we can see that this is not appropriate in GLUE1.2 because a GRAM
might be used to communicate with multiple Hosts, so simply using the name of GRAM is
not suitable based on the GLUE1.2 information model, we need extra information to
29
33. separate different SubClusters in a Grid site in order to make the metascheduling more
accurate and more adaptable in a dynamic environment.
Figure 6. The default GridWay uses the GRAM (ng2.sapac.edu.au in this example) name to uniquely
identify a homogeneous environment, which is not suitable when the host is heterogeneous
To adapt the metascheduler to the GLUE1.2 information model, we use the GRAM
name plus the SubCluster ID to uniquely identify a SubCluster, i.e., a homogeneous
environment in a Grid site:
HOSTNAME = GRAM Name / SubCluster ID
For example, the resource information of SAPAC will not be simply represented by a
single host named ng2.sapac.edu.au, but two hosts named
ng2.sapac.edu.au/hydra.sapac.edu.au and
ng2.sapac.edu.au/perseus.sapac.edu.au, with each host managing its own queues and
softwares. The new host name is not only unique, but also expressive for us to identify
different homogeneous computing environments managed by a heterogeneous Grid site.
The following figure shows the ANG computing resources successfully parsed by our
customized GLUE1.2 parser. Each entry in this figure represents a Host, which is
identified by using the new naming format.
Figure 7. The ANG computing resources discovered by the customized GridWay, by using GLUE1.2
information parser and the new naming format
5.3 Software Requirements
According to our survey, current GridWay and other major metaschedulers do not offer
the capability which enables user to specify what kind of software is required without
knowing where the software is located on the Grid. In order to make full use of the MDS4
(GLUE1.2) resource information and support automatic software requirements
matchmaking, we developed a new software requirements supporting mechanism,
30