Designing and Analyzing Resource Allocation Based
on Grouped Grid Nodes
Chih-Ting Tsai
Department of Management Information
Chinese Culture University
55, Hwa Kung Road, Yang-Ming-San, Taipei, Taiwan
chihting.tsai@gmail.com
Huey-Ming Lee
Department of Management Information
Chinese Culture University
55, Hwa Kung Road, Yang-Ming-San, Taipei, Taiwan
hmlee@faculty.pccu.edu.tw
Abstract—Supervising and allocating resources in highly dynamic
grid environment is an important issue. Tsai et al. proposed a
non-grouped grid node model. With non-grouped model, the
CPU consumption caused by supervising nodes would raise with
the increasing number of nodes. To reduce the CPU consumption,
we proposed a grouped grid node model in this study. We
classified grid nodes into grouped nodes based on the specified
CPU scales. Nodes only supervise and allocate resources on the
other nodes in the same group, and supervising and allocating
grouped resources is handles by group-agent nodes in each group.
By implementing the proposed grouped grid node model, we have
good performance.
Keywords-Grid Computing; Resource Allocation
I. INTRODUCTION
The term “Grid” was coined in the mid 1990s to denote a
proposed distributed computing infrastructure for advanced
science and engineering [3]. Grid users should be able to
manage grid or assign jobs into grid with user interface
provided by grid program [4]. The grid would assign jobs by
analyzing nodes' information which is exchanged between each
other. Nodes' resources in grid would not be static as well,
resources on each node changes dynamically and it's difficult to
retrieve the information from each node.
Foster and Kesselman presented grid resource allocation
and management (GRAM) which could simplifies addressing
resources and assigning jobs in grid [3, 5]. It's meaningless
unless the grid delivers jobs to nodes without loose and return
jobs' results. The more insurance we need, the more monitoring
should be proceeded.
After realizing the resources changing dynamically, grid
developers should be able to define what the resources are, and
what the specifications of jobs are. Many precursors had
indicated the difficulty of addressing resources in the grid.
Condor jobs scheduling system was implemented on
workstation with minimized user interference [8]. This system
used a predefined language, ClassAds, which could let
developers to define the resources in grid and jobs specification
[8, 10].
A job allocating mechanism which could be used to match
jobs and nodes is needed by grid. Open Grid Services
Infrastructure known as OGSI provided an infrastructure and
algorithm for jobs scheduling [3]. WS-Resource Framework
that is known as WSRF can be viewed as a straightforward
refactoring of the concepts and interfaces developed in the
OGSI V1.0 specification in a manner that exploits recent
developments in Web services architecture [1]. WSRF provides
a guideline that defines what information should be formatted
and how this information is structured [1]. While dynamically
supervising grid nodes, a custom and predefined information
format would be useful.
Lee et al. [6] proposed a dynamic supervising grid model
which can monitor and utilize the grid resources, e.g. CPU,
storages, etc. With this model, there is only one supervising
node in grid environment, and this node would not only
monitor and allocate the resources on other nodes but also
accept jobs from client node. The supervising node would
retrieve information of resources on every node with an
arbitrary period. The information collected by supervising node
would be recorded and analyzed for job scheduling. When
users assign jobs into grid, the supervising node analyzes the
dynamical information of executing nodes and allocates an
appropriate execute node for job. With analyzing dynamical
information of grid status, the supervising node could pick up a
more suitable node for jobs. The grid would face a significant
impact while the supervising node fails with any kind of
reasons. Single node of failure is an important issue while
building a grid environment should be avoided.
Lee et al. [7] proposed a model based on grid environment
for avoiding single point of failure. With this model, the
administrators of the grid could dedicate few supervising nodes
to manage all execute nodes. Only one primary supervising
node acts as supervisor, and the other backup supervising nodes
act as execute nodes. While users need to assign jobs into grid
nodes, users have to access supervising nodes directly.
Tsai et al. [11] proposed a non-grouped grid node model
without specific supervising nodes, every node in grid could
supervise and allocate other nodes, reduce the overhead on
specific supervising nodes. Every Node performs as two role:
supervisor and worker. The supervisor role monitors and
manages nodes in grid, and the worker role executes jobs. The
monitoring processes are not only broadcasting local
information to other nodes but also gathering information from
other nodes. With non-grouped grid node model, nodes would
exchange XML based nodes' information such as nodes' name,
978-1-4577-0653-0/11/$26.00 ©2011 IEEE 1848
CPU utilities, memory usage, free disk space, and job queue
length. Every node would merge and parse information with a
arbitrary period. After users assign jobs, the node would
analyze jobs request and dispatch jobs based on gathered
information. However, gathering and analyzing information
from every node would make CPU too busy while there many
nodes in grid. The CPU consumption caused by spending too
many resource merging and parsing would make nodes out of
function. With the scenario mentioned above, non-grouped grid
node model is not suitable when there are more than fifty nodes
in grid environment. A new model would be needed while
there are more than fifty nodes in grid environment.
In this paper, we proposed a grouped grid node model.
With this model, nodes would be classified into five
classifications by CPU specified scales. Within each
classification, every node in the same classification would be
assigned into groups, and the CPU consumption caused by
supervising the resources would be effectively reduced.
Supervising and allocating grouped resources is handled by
group-agent node. There is only one group-agent node in each
group. This means there are three kinds of roles that nodes may
perform. Nodes must perform supervisor and worker, but only
one node in a group would perform as group agent. With
grouped grid node model, we proposed a grouped resource
ratio job dispatching algorithm. The detail of framework or
implementation would be discussed in following section.
II. FRAMEWORK OF PROPOSED MODEL
A. Framework overview
There is no specific supervisor in grid that is every node in
grid could manage nodes and execute jobs. While nodes
manage other nodes, node needs to communicate to each other.
The user interface is used for user accessing node. These mean
that there are at least four modules, resource module, worker
module, user interface module, and communication module, in
our model. For reducing internal friction caused by parsing and
exchanging information, we put ten or less nodes in a group.
There is only one group-agent node to exchanging group
information with other groups and deliver to other groups or
accept jobs from other groups. Dispatching jobs in grouped
grid node environment is difficult to handle with, so we
proposed a new module: group module which is handling
grouping processes. The framework of our model is shown in
Figure 1. The user interface module (UIM) is used to provide a
interface between users and grid, and display grid status
information to users. The group module (GP) deals all
grouping process including exchange group information to
other group. The resource module (RM) manages this node,
gather system information and broadcasting information to
other node in the same group. The worker module is only used
to maintain job queue and execute jobs on node. The
communication module (CM) just likes a messenger to deliver
jobs or information.
Figure 1. Framework of proposed model
B. User Interface Module
The user interface module is used to display the real-time
status on this node and information of other node. It also
provides an interface for user requesting job assignment into
grid environment. The framework of user interface module
would be shown in Figure 2.
Figure 2. Framework of User Interface Module
C. Group Module
This module keeps monitoring other nodes and gathering
information from agent nodes. There should not be more than
ten nodes in the same group, based on our experiment one
node would cause about one percent internal friction. For
reducing the monitoring overhead, we use group agent to be
the gate-keeper of each group. The agent node is an agent of
the group. Each node in the same group has a parameter as
priority, which is used to justify taking over agent node when
the agent node of the group off-line. The framework is as
shown in Figure 3. The simplified grouping algorithm is as
follow:
calculate how many nodes are in each specification level;
for each node in grid
assign groupID to node and each group contains no more
than ten nodes;
calculate group specified scale;
pick up agent node of groups;
end;
UserGM UIM
RM
Internal Communication
External Communication
UserUIM
RM WM
CM
Node Nk
Node Nj
GM
Node Ni
N
1849
for each group in grid
calculate group specification ration in grid;
end;
for each node in the same group
calculate node specification ratio in group;
end;
Figure 3. Framework of Group Module
D. Resource Module
There are five components: Initiator, Register, Monitor,
Manager, Dispatcher, which are assisting resource module
manages and monitors nodes in the same group. The grid
program is ignited with Initiator, it would wake up and
Manager and Group Module when finishing initiation.
Manager handle all access on local node and make sure each
module and component work correctly. The Dispatcher is
dealing job assignment by analyzing the information of nodes
in the same group. These five components' frameworks are
shown in Figure 4. The Algorithm of Dispatcher is as follow:
integer Grid-Total-Score equals grid’s total score;
integer Total-Job-Count equals get total job count;
integer jobs-count equals Total-Job-Count;
list group is the group of the grid nodes;
list group-ratio is the group score divided by Grid-Total-
Score;
integer i equals zero;
for i is from one to the number of group
if jobs-count <= Total-Jobs-Count*group-ratio of i-th
group
dispatch jobs-count jobs to i-th group;
integer group-job-count = jobs-count;
for node in i-th group
dispatch group-jobs-count*node-ratio to node;
end;
jobs = 0;
else
dispatch Total-Jobs-Count*(i-th group-ratio) jobs to i-th
group;
for node in i-th group
dispatch group-jobs-count*node-ratio to node;
end;
jobs = jobs – Total-Jobs-Count * (i-th group-ratio );
end;
end;
Figure 4 Framework of Resource Module
E. Worker Module
There are only three components in this module: Job
Queue Keeper, Executer and Redirector. The Job Queue
Keeper maintains all jobs in that node, and if the job queue is
too long, it would wake up Redirector to redirect jobs to other
nodes in the same group. The Executer just executes all jobs
on that node. The Redirector would be active when the Job
Queue Keeper requests, and notify Resource Module to find a
suitable node(s) to redirect jobs. The framework of Worker
Module is shown in Figure 5.
Figure 5 Framework of Worker Module[11]
F. Communication Module
The communication between Resource Module and
communication modules on other nodes is handled by this
module. It's not only exchanging information but also
delivering jobs to other nodes in the same group. The
framework would be shown in Figure 6.
Figure 6. Framework of Communication Module
III. MODEL SIMULATION
For model simulation, we need some parameters to build a
simulation model. We developed a grid program with Visual
C# 2008 and Microsoft .Net Framework 3.5 Service Pack1. We
WM
CM on Other Node
CMGM
Redirector
Job Queue
Keeper
Executer
RM
Worker Module
Initiator
Monitor Manager
Register
UIMCM WM
Resource Module
Dispatcher
GM
Group Agent
UIM
CM
RM
Group Module
1850
choose VMWare ESXi 4.0 as virtualization platform which is
deployed three grid nodes. These nodes are implemented with
Microsoft Server 2003 Service Pack1 and Microsoft .Net
Framework 3.5 Service Pack1. We've built three nodes
scenario to gather nodes' parameter for further simulation [2].
In this scenario, we collect two important parameters: internal
friction coefficient which is related with nodes number and
work-time in different node specified scale. The testing job is a
simple work job which is calculating a=a+1 for one Giga-times.
The average execute-time of one job with different CPU
specified scale would be shown in Table I.
TABLE I. AVERAGE EXECUTE TIME
CPU Specification 0.7GHz 1.4GHz 2.1GHz 2.8GHz
Execute Time (Sec.) 333.92 193.28 139.58 100.65
We built our simulation models with Matlab 2010a. We
simulate three kinds of scenarios: (1) non-grouped model
without internal friction, (2) non-grouped model with internal
friction, and (3) grouped model with internal friction. The node
specified scale for each node is a integer between 1 and 5. We
generate normalized distribution specified scale and assign
these nodes into grid nodes. The normalized distribution is
generated with mean value 3, and the sigma is 0.8. Each
simulation would use the normalized node specified scale and
get average value from executing ten times. Using normalized
node specified scale would be closer to the grid in real world.
The simulation of non-grouped model with internal friction
is shown in Figure 7. The high CPU idle ratio comes from too
many nodes with too few jobs, and this means those nodes
without jobs are idle. The dark blue part (right hand part) is
caused from too few nodes and too many jobs, and that is to
say the more CPU resource used for computing jobs the less
idle bring up. The wave in Figure 7 would be the jobs
dispatched to suitable nodes more precisely, and the CPU idle
ratio could be minimized with the job dispatching algorithm.
When considering internal friction caused by gathering
information, the result of simulation would be shown in Figure
8. In Figure 8, the right blue part representing zero percent of
CPU idle ratio, this means all CPU resource on each node are
used to gather and parse information from every nodes in grid.
With our experimental parameter, when number of nodes are
more than 120~130 nodes, this grid environment would not
able to execute jobs for users. The middle-orange part means
high CPU idle ratio, and it comes from other nodes wait for
low specified scale nodes with very few CPU resources were
left for execute jobs which cause long work-time. The fewer
CPU resources left for executing jobs, the longer work-time it
would be. With very long work time, the CPU idle ratio would
not be a proper performance index for estimating grid model.
This figure points out an important thing the more nodes in grid
would lead nodes spend too many resource to exchange and
parse information, but not execute jobs.
Figure 7. CPU idle ratio with non-grouped and internal friction
Figure 8. CPU idle Ratio with non-grouped but with internal
friction
Now, we consider to grouped grid node model and the
internal friction in each node, the simulation result as show in
Figure 9. In each group, there are no more than ten nodes in the
same group. If there are only less than ten nodes which have
same specified scale would lead job scheduling more balanced.
In this simulation, we calculate the CPU idle ratio with taking
grouped nodes and internal friction in every node into
consideration.
In Figure 9, the left and deep-blue part comes from long
execute time with too few grid nodes and too many jobs. If
there are too many jobs requested into grid, the work-time
would be very long and the idle time would much less than
work-time which is a denominator parameter in simulation.
Also in Figure 9, the red and yellow part comes from too few
jobs for too many nodes in the grid. In this case, there are too
1851
many nodes that are not assigned jobs, and just wait the busy
nodes finishing their execution.
When the number of jobs are more than the number of
nodes, the idle ratio of grid environment is down to between
5~30 percent.
Figure 9. CPU idle ratio with grouped grid model and internal
friction
While using grouped node model in grid, it's very difficult
to avoid single node group. Single node group includes only
one node in that group. Furthermore, single or few nodes group
are not avoidable. These kinds of group would have their group
score much less than other groups. With less group score, the
jobs may not be assigned to that group, and this situation may
cause that group idle all the time. However, grouped grid nodes
model could effectively reduce the internal friction while the
nodes' number raise up. When comparing Figure 7 and Figure 9,
the CPU idle ratio of grouped grid node model is very close to
those from non-grouped grid node model. That is to say this
model could be implemented in massive number of grid nodes.
IV. CONCLUSION
With the proposed model in this study, the CPU
consumption caused by supervising and allocating resources
would be effectively reduced. With grouped resource ratio job
dispatching algorithm, the idle ratio of each group would be
decreased, and the load of node in the same group would be
more balanced. We can have good overall performance of grid
with the proposed model.
REFERENCES
[1] K. Czajkowski, D. Ferguson, I. Foster, J. Frey, S. Graham, T. Maguire,
D. Snelling, and S. Tuecke, “From open grid services infrastructure to W
S-Rource Framework: refactoring & evolution”, Availpable: http://www.
globus.org/wsrf/specs/ogsi_to_wsrf_1.0.pdf, 2004
[2] L. Ferreira, B. Jacob, S. Slevin, M. Brown, S. Sundararajan, and J. Bank,
“Globus toolkit 3.0 quick start”. Available: http://www.redbooks.ibm.co
m/redpapers/pdfs/redp3697.pdf. 2003
[3] I. Foster and C. Kesselman, “The Grid 2: Blueprint for a new computing
infrastructure”, San Francisco: Morgan Kaufmann, 2004.
[4] I. Foster, C. Kesselman, and S. Tuecke, “GRAM: Key concept”,
Available: http://www-unix.globus.org/toolkit/docs/3.2/gram/key/index.
html, July, 2008.
[5] I. Foster, C. Kesselman, J. M. Nick, and S. Tuecke, “The physiology of
the grid – and open grid services architecture for distributed systems
integration”, Available: http://www.globus.org/alliance/publications/pap
ers/ogsa.pdf, 2002.
[6] H.-.M. Lee, C.-C. Hsu, and M.-H. Hsu, “A dynamic supervising model
based on grid environment”, Lecture Notes in Computer Sciences,
LNCS 3682, Springer-Verlag, pp. 1258-1264, 2005.
[7] H.-M. Lee, J.-S. Su, and C.-H. Chung, “Resource allocation analysis
model based on grid environment”, International Journal of Innovative
Computing, Information and Control, Vol. 7, No. 5(A), pp. 2099-2108,
2011.
[8] M. J. Litzkow, M Livny, and M. W. Mutka, “Condor-A Hunter of Idle
Workstations”, in Proceedings of the 8th
International Conference of
Distributed Computing Systems, 1998
[9] L. Quirk, “Ownership of a queue for practical lock-free scheduling”, Av
ailable: http://www.cs.brown.edu/research/pubs/theses/ugrad/2008/quirk.
pdf
[10] R. Raman “ClassAds Programming Tutorial (C++)”, Available:
http://www.cs.wisc.edu/condor/classad/c++tut.html, 2000
[11] C-T Tsai, H-S Chen, J-S Su, and H-M Lee, “Designing and analizing
grid node job process schduling”, ICIC Express Letters,Vol. 5, No. 10,
pp. 3731-3735, 2011
1852

IEEE_SMC_2011

  • 2.
    Designing and AnalyzingResource Allocation Based on Grouped Grid Nodes Chih-Ting Tsai Department of Management Information Chinese Culture University 55, Hwa Kung Road, Yang-Ming-San, Taipei, Taiwan chihting.tsai@gmail.com Huey-Ming Lee Department of Management Information Chinese Culture University 55, Hwa Kung Road, Yang-Ming-San, Taipei, Taiwan hmlee@faculty.pccu.edu.tw Abstract—Supervising and allocating resources in highly dynamic grid environment is an important issue. Tsai et al. proposed a non-grouped grid node model. With non-grouped model, the CPU consumption caused by supervising nodes would raise with the increasing number of nodes. To reduce the CPU consumption, we proposed a grouped grid node model in this study. We classified grid nodes into grouped nodes based on the specified CPU scales. Nodes only supervise and allocate resources on the other nodes in the same group, and supervising and allocating grouped resources is handles by group-agent nodes in each group. By implementing the proposed grouped grid node model, we have good performance. Keywords-Grid Computing; Resource Allocation I. INTRODUCTION The term “Grid” was coined in the mid 1990s to denote a proposed distributed computing infrastructure for advanced science and engineering [3]. Grid users should be able to manage grid or assign jobs into grid with user interface provided by grid program [4]. The grid would assign jobs by analyzing nodes' information which is exchanged between each other. Nodes' resources in grid would not be static as well, resources on each node changes dynamically and it's difficult to retrieve the information from each node. Foster and Kesselman presented grid resource allocation and management (GRAM) which could simplifies addressing resources and assigning jobs in grid [3, 5]. It's meaningless unless the grid delivers jobs to nodes without loose and return jobs' results. The more insurance we need, the more monitoring should be proceeded. After realizing the resources changing dynamically, grid developers should be able to define what the resources are, and what the specifications of jobs are. Many precursors had indicated the difficulty of addressing resources in the grid. Condor jobs scheduling system was implemented on workstation with minimized user interference [8]. This system used a predefined language, ClassAds, which could let developers to define the resources in grid and jobs specification [8, 10]. A job allocating mechanism which could be used to match jobs and nodes is needed by grid. Open Grid Services Infrastructure known as OGSI provided an infrastructure and algorithm for jobs scheduling [3]. WS-Resource Framework that is known as WSRF can be viewed as a straightforward refactoring of the concepts and interfaces developed in the OGSI V1.0 specification in a manner that exploits recent developments in Web services architecture [1]. WSRF provides a guideline that defines what information should be formatted and how this information is structured [1]. While dynamically supervising grid nodes, a custom and predefined information format would be useful. Lee et al. [6] proposed a dynamic supervising grid model which can monitor and utilize the grid resources, e.g. CPU, storages, etc. With this model, there is only one supervising node in grid environment, and this node would not only monitor and allocate the resources on other nodes but also accept jobs from client node. The supervising node would retrieve information of resources on every node with an arbitrary period. The information collected by supervising node would be recorded and analyzed for job scheduling. When users assign jobs into grid, the supervising node analyzes the dynamical information of executing nodes and allocates an appropriate execute node for job. With analyzing dynamical information of grid status, the supervising node could pick up a more suitable node for jobs. The grid would face a significant impact while the supervising node fails with any kind of reasons. Single node of failure is an important issue while building a grid environment should be avoided. Lee et al. [7] proposed a model based on grid environment for avoiding single point of failure. With this model, the administrators of the grid could dedicate few supervising nodes to manage all execute nodes. Only one primary supervising node acts as supervisor, and the other backup supervising nodes act as execute nodes. While users need to assign jobs into grid nodes, users have to access supervising nodes directly. Tsai et al. [11] proposed a non-grouped grid node model without specific supervising nodes, every node in grid could supervise and allocate other nodes, reduce the overhead on specific supervising nodes. Every Node performs as two role: supervisor and worker. The supervisor role monitors and manages nodes in grid, and the worker role executes jobs. The monitoring processes are not only broadcasting local information to other nodes but also gathering information from other nodes. With non-grouped grid node model, nodes would exchange XML based nodes' information such as nodes' name, 978-1-4577-0653-0/11/$26.00 ©2011 IEEE 1848
  • 3.
    CPU utilities, memoryusage, free disk space, and job queue length. Every node would merge and parse information with a arbitrary period. After users assign jobs, the node would analyze jobs request and dispatch jobs based on gathered information. However, gathering and analyzing information from every node would make CPU too busy while there many nodes in grid. The CPU consumption caused by spending too many resource merging and parsing would make nodes out of function. With the scenario mentioned above, non-grouped grid node model is not suitable when there are more than fifty nodes in grid environment. A new model would be needed while there are more than fifty nodes in grid environment. In this paper, we proposed a grouped grid node model. With this model, nodes would be classified into five classifications by CPU specified scales. Within each classification, every node in the same classification would be assigned into groups, and the CPU consumption caused by supervising the resources would be effectively reduced. Supervising and allocating grouped resources is handled by group-agent node. There is only one group-agent node in each group. This means there are three kinds of roles that nodes may perform. Nodes must perform supervisor and worker, but only one node in a group would perform as group agent. With grouped grid node model, we proposed a grouped resource ratio job dispatching algorithm. The detail of framework or implementation would be discussed in following section. II. FRAMEWORK OF PROPOSED MODEL A. Framework overview There is no specific supervisor in grid that is every node in grid could manage nodes and execute jobs. While nodes manage other nodes, node needs to communicate to each other. The user interface is used for user accessing node. These mean that there are at least four modules, resource module, worker module, user interface module, and communication module, in our model. For reducing internal friction caused by parsing and exchanging information, we put ten or less nodes in a group. There is only one group-agent node to exchanging group information with other groups and deliver to other groups or accept jobs from other groups. Dispatching jobs in grouped grid node environment is difficult to handle with, so we proposed a new module: group module which is handling grouping processes. The framework of our model is shown in Figure 1. The user interface module (UIM) is used to provide a interface between users and grid, and display grid status information to users. The group module (GP) deals all grouping process including exchange group information to other group. The resource module (RM) manages this node, gather system information and broadcasting information to other node in the same group. The worker module is only used to maintain job queue and execute jobs on node. The communication module (CM) just likes a messenger to deliver jobs or information. Figure 1. Framework of proposed model B. User Interface Module The user interface module is used to display the real-time status on this node and information of other node. It also provides an interface for user requesting job assignment into grid environment. The framework of user interface module would be shown in Figure 2. Figure 2. Framework of User Interface Module C. Group Module This module keeps monitoring other nodes and gathering information from agent nodes. There should not be more than ten nodes in the same group, based on our experiment one node would cause about one percent internal friction. For reducing the monitoring overhead, we use group agent to be the gate-keeper of each group. The agent node is an agent of the group. Each node in the same group has a parameter as priority, which is used to justify taking over agent node when the agent node of the group off-line. The framework is as shown in Figure 3. The simplified grouping algorithm is as follow: calculate how many nodes are in each specification level; for each node in grid assign groupID to node and each group contains no more than ten nodes; calculate group specified scale; pick up agent node of groups; end; UserGM UIM RM Internal Communication External Communication UserUIM RM WM CM Node Nk Node Nj GM Node Ni N 1849
  • 4.
    for each groupin grid calculate group specification ration in grid; end; for each node in the same group calculate node specification ratio in group; end; Figure 3. Framework of Group Module D. Resource Module There are five components: Initiator, Register, Monitor, Manager, Dispatcher, which are assisting resource module manages and monitors nodes in the same group. The grid program is ignited with Initiator, it would wake up and Manager and Group Module when finishing initiation. Manager handle all access on local node and make sure each module and component work correctly. The Dispatcher is dealing job assignment by analyzing the information of nodes in the same group. These five components' frameworks are shown in Figure 4. The Algorithm of Dispatcher is as follow: integer Grid-Total-Score equals grid’s total score; integer Total-Job-Count equals get total job count; integer jobs-count equals Total-Job-Count; list group is the group of the grid nodes; list group-ratio is the group score divided by Grid-Total- Score; integer i equals zero; for i is from one to the number of group if jobs-count <= Total-Jobs-Count*group-ratio of i-th group dispatch jobs-count jobs to i-th group; integer group-job-count = jobs-count; for node in i-th group dispatch group-jobs-count*node-ratio to node; end; jobs = 0; else dispatch Total-Jobs-Count*(i-th group-ratio) jobs to i-th group; for node in i-th group dispatch group-jobs-count*node-ratio to node; end; jobs = jobs – Total-Jobs-Count * (i-th group-ratio ); end; end; Figure 4 Framework of Resource Module E. Worker Module There are only three components in this module: Job Queue Keeper, Executer and Redirector. The Job Queue Keeper maintains all jobs in that node, and if the job queue is too long, it would wake up Redirector to redirect jobs to other nodes in the same group. The Executer just executes all jobs on that node. The Redirector would be active when the Job Queue Keeper requests, and notify Resource Module to find a suitable node(s) to redirect jobs. The framework of Worker Module is shown in Figure 5. Figure 5 Framework of Worker Module[11] F. Communication Module The communication between Resource Module and communication modules on other nodes is handled by this module. It's not only exchanging information but also delivering jobs to other nodes in the same group. The framework would be shown in Figure 6. Figure 6. Framework of Communication Module III. MODEL SIMULATION For model simulation, we need some parameters to build a simulation model. We developed a grid program with Visual C# 2008 and Microsoft .Net Framework 3.5 Service Pack1. We WM CM on Other Node CMGM Redirector Job Queue Keeper Executer RM Worker Module Initiator Monitor Manager Register UIMCM WM Resource Module Dispatcher GM Group Agent UIM CM RM Group Module 1850
  • 5.
    choose VMWare ESXi4.0 as virtualization platform which is deployed three grid nodes. These nodes are implemented with Microsoft Server 2003 Service Pack1 and Microsoft .Net Framework 3.5 Service Pack1. We've built three nodes scenario to gather nodes' parameter for further simulation [2]. In this scenario, we collect two important parameters: internal friction coefficient which is related with nodes number and work-time in different node specified scale. The testing job is a simple work job which is calculating a=a+1 for one Giga-times. The average execute-time of one job with different CPU specified scale would be shown in Table I. TABLE I. AVERAGE EXECUTE TIME CPU Specification 0.7GHz 1.4GHz 2.1GHz 2.8GHz Execute Time (Sec.) 333.92 193.28 139.58 100.65 We built our simulation models with Matlab 2010a. We simulate three kinds of scenarios: (1) non-grouped model without internal friction, (2) non-grouped model with internal friction, and (3) grouped model with internal friction. The node specified scale for each node is a integer between 1 and 5. We generate normalized distribution specified scale and assign these nodes into grid nodes. The normalized distribution is generated with mean value 3, and the sigma is 0.8. Each simulation would use the normalized node specified scale and get average value from executing ten times. Using normalized node specified scale would be closer to the grid in real world. The simulation of non-grouped model with internal friction is shown in Figure 7. The high CPU idle ratio comes from too many nodes with too few jobs, and this means those nodes without jobs are idle. The dark blue part (right hand part) is caused from too few nodes and too many jobs, and that is to say the more CPU resource used for computing jobs the less idle bring up. The wave in Figure 7 would be the jobs dispatched to suitable nodes more precisely, and the CPU idle ratio could be minimized with the job dispatching algorithm. When considering internal friction caused by gathering information, the result of simulation would be shown in Figure 8. In Figure 8, the right blue part representing zero percent of CPU idle ratio, this means all CPU resource on each node are used to gather and parse information from every nodes in grid. With our experimental parameter, when number of nodes are more than 120~130 nodes, this grid environment would not able to execute jobs for users. The middle-orange part means high CPU idle ratio, and it comes from other nodes wait for low specified scale nodes with very few CPU resources were left for execute jobs which cause long work-time. The fewer CPU resources left for executing jobs, the longer work-time it would be. With very long work time, the CPU idle ratio would not be a proper performance index for estimating grid model. This figure points out an important thing the more nodes in grid would lead nodes spend too many resource to exchange and parse information, but not execute jobs. Figure 7. CPU idle ratio with non-grouped and internal friction Figure 8. CPU idle Ratio with non-grouped but with internal friction Now, we consider to grouped grid node model and the internal friction in each node, the simulation result as show in Figure 9. In each group, there are no more than ten nodes in the same group. If there are only less than ten nodes which have same specified scale would lead job scheduling more balanced. In this simulation, we calculate the CPU idle ratio with taking grouped nodes and internal friction in every node into consideration. In Figure 9, the left and deep-blue part comes from long execute time with too few grid nodes and too many jobs. If there are too many jobs requested into grid, the work-time would be very long and the idle time would much less than work-time which is a denominator parameter in simulation. Also in Figure 9, the red and yellow part comes from too few jobs for too many nodes in the grid. In this case, there are too 1851
  • 6.
    many nodes thatare not assigned jobs, and just wait the busy nodes finishing their execution. When the number of jobs are more than the number of nodes, the idle ratio of grid environment is down to between 5~30 percent. Figure 9. CPU idle ratio with grouped grid model and internal friction While using grouped node model in grid, it's very difficult to avoid single node group. Single node group includes only one node in that group. Furthermore, single or few nodes group are not avoidable. These kinds of group would have their group score much less than other groups. With less group score, the jobs may not be assigned to that group, and this situation may cause that group idle all the time. However, grouped grid nodes model could effectively reduce the internal friction while the nodes' number raise up. When comparing Figure 7 and Figure 9, the CPU idle ratio of grouped grid node model is very close to those from non-grouped grid node model. That is to say this model could be implemented in massive number of grid nodes. IV. CONCLUSION With the proposed model in this study, the CPU consumption caused by supervising and allocating resources would be effectively reduced. With grouped resource ratio job dispatching algorithm, the idle ratio of each group would be decreased, and the load of node in the same group would be more balanced. We can have good overall performance of grid with the proposed model. REFERENCES [1] K. Czajkowski, D. Ferguson, I. Foster, J. Frey, S. Graham, T. Maguire, D. Snelling, and S. Tuecke, “From open grid services infrastructure to W S-Rource Framework: refactoring & evolution”, Availpable: http://www. globus.org/wsrf/specs/ogsi_to_wsrf_1.0.pdf, 2004 [2] L. Ferreira, B. Jacob, S. Slevin, M. Brown, S. Sundararajan, and J. Bank, “Globus toolkit 3.0 quick start”. Available: http://www.redbooks.ibm.co m/redpapers/pdfs/redp3697.pdf. 2003 [3] I. Foster and C. Kesselman, “The Grid 2: Blueprint for a new computing infrastructure”, San Francisco: Morgan Kaufmann, 2004. [4] I. Foster, C. Kesselman, and S. Tuecke, “GRAM: Key concept”, Available: http://www-unix.globus.org/toolkit/docs/3.2/gram/key/index. html, July, 2008. [5] I. Foster, C. Kesselman, J. M. Nick, and S. Tuecke, “The physiology of the grid – and open grid services architecture for distributed systems integration”, Available: http://www.globus.org/alliance/publications/pap ers/ogsa.pdf, 2002. [6] H.-.M. Lee, C.-C. Hsu, and M.-H. Hsu, “A dynamic supervising model based on grid environment”, Lecture Notes in Computer Sciences, LNCS 3682, Springer-Verlag, pp. 1258-1264, 2005. [7] H.-M. Lee, J.-S. Su, and C.-H. Chung, “Resource allocation analysis model based on grid environment”, International Journal of Innovative Computing, Information and Control, Vol. 7, No. 5(A), pp. 2099-2108, 2011. [8] M. J. Litzkow, M Livny, and M. W. Mutka, “Condor-A Hunter of Idle Workstations”, in Proceedings of the 8th International Conference of Distributed Computing Systems, 1998 [9] L. Quirk, “Ownership of a queue for practical lock-free scheduling”, Av ailable: http://www.cs.brown.edu/research/pubs/theses/ugrad/2008/quirk. pdf [10] R. Raman “ClassAds Programming Tutorial (C++)”, Available: http://www.cs.wisc.edu/condor/classad/c++tut.html, 2000 [11] C-T Tsai, H-S Chen, J-S Su, and H-M Lee, “Designing and analizing grid node job process schduling”, ICIC Express Letters,Vol. 5, No. 10, pp. 3731-3735, 2011 1852