An investigation into Cluster CPU load balancing in the JVM

An Investigation into Cluster CPU load balancing
in the JVM
Calum James Beck
Submitted in partial fulfilment of
the requirements of Edinburgh Napier University
for the Degree of Bachelor of Engineering with
Honours in Software Engineering
School of Computing

Abstract
The JVM CPU Cluster Balancer is a scalable, proof of concept system designed
to distribute processes over a network to perform multiple tasks at once, in a
language of high abstraction. Once distributed, workers return results to an
access server, all while monitoring their respective CPUs for computational stress
in terms of CPU usage. CPU’s incurring set stress then have their respective
processes moved to a less intensive area in the cluster, balancing work overall.
The system works by enrolling Universal Clients (CPU’s waiting for work) to an
access server, which then requests processes to be sent from the users desired
Process Server. Each Process comes in the form of a Process Definition
complying with the Agent interface, self-contained in an object. During run time,
the Process Definition object acts as a subtype of the process manager,
assuming responsibility for saving and restoring the state of the process.
Each Client has four Process Nodes which it can delegate work to. The selected
Process Node then connects to the received Process using two internal channels
and runs using an instance of a Process Manager. During runtime, the Client also
implements a Node Monitor which monitors the CPU usage of the Client in real
time. When a set percentage of stress is met (CPU usage), the Universal Client
informs the server that an alternative node is needed, on a different machine to
finish the instance of work.
The Process Definition then stops its runnable logic. The server searches through
enrolled Clients and sends the address of an underwhelmed CPU in the cluster to
the requesting Node. A dynamic TCP/IP channel is then created between the
node and the foreign Process Manager. The process object is then serialized
allowing transferal, in its paused state, and is resumed at the new client.
The system is developed using pre-set processes to ensure repeatability of
results and is based entirely in any system running the JVM.
This project results in a working system which can distribute work based on CPU
stress, but concludes that in order to be labelled complete, more functionality
needs to be added to find the system an adequate application.
Page | 3

The Java language, JCSP, the Groovy scripting language and the Sigar
Application Programming Interface (API), which provides pure C bindings to Java,
have been used in this project. All code, written and complied using Eclipse Mars
IDE.
Page | 4

Contents
1 INTRODUCTION................................................................................................12
1.1 Background..............................................................................................................................................13
1.2 Aims and Objectives................................................................................................................................14
1.3 Scope and Limitations ............................................................................................................................15
1.4 Structure of Dissertation.........................................................................................................................16
2 BACKGROUND, KEY COMPONENTS AND THEORY....................................17
2.1 Data and Task Parallelism......................................................................................................................17
2.2 Hoare’s Communicating Sequential Processes (CSP).........................................................................17
2.3 Channels...................................................................................................................................................18
2.4 Groovy......................................................................................................................................................19
2.5 Communicating Sequential Processes for Java (JCSP).......................................................................19
2.6 Channel Mobility in JCSP......................................................................................................................19
3 METHODOLOGY...............................................................................................21
3.1 Monitoring CPU Usage...........................................................................................................................21
3.2 Process Creation and Distribution ........................................................................................................24
3.3 Process Movement associated Methods.................................................................................................26
4 INITIAL EXPERIMENTS....................................................................................29
4.1 Monitoring CPU usage............................................................................................................................29
5 ARCHITECTURAL DESIGN..............................................................................34
5.1 Central Repository..................................................................................................................................34
5.2 Ring System with Travelling Agents.....................................................................................................36
5.3 Work & Node Manager System.............................................................................................................37
5.4 Network Structure Analysis...................................................................................................................38
6 INTRODUCING PROCESS MOVEMENT ........................................................39
6.1 Java Memory Model ..............................................................................................................................39
6.2 Moving processes within a JVM............................................................................................................40
Page | 5

6.3 Thread Serialization impossible with current JVM.............................................................................40
6.4 Adapting Process definitions as Agents.................................................................................................42
6.5 Sending process definitions in current state.........................................................................................43
7 PROTOTYPE.....................................................................................................44
7.1 Design.......................................................................................................................................................44
7.2 Components.............................................................................................................................................46
7.3 Experiment Setup....................................................................................................................................55
7.4 Results ......................................................................................................................................................56
7.5 Comparative Analysis.............................................................................................................................56
7.6 Local Concurrency Vs Distributed........................................................................................................58
8 CONCLUSION...................................................................................................59
8.1 Has the Project met its Aim and Objectives?.......................................................................................59
8.2 Deployment Analysis and Critique........................................................................................................60
8.3 Further Research and Work..................................................................................................................61
8.4 Reflective Statements..............................................................................................................................64
9 REFERENCES...................................................................................................67
A. Searched Terms........................................................................................................................................70
B. Meeting Diagrams ....................................................................................................................................72
........................................................................................................................................................................74
........................................................................................................................................................................75
........................................................................................................................................................................76
........................................................................................................................................................................77
........................................................................................................................................................................78
........................................................................................................................................................................79
........................................................................................................................................................................80
........................................................................................................................................................................81
........................................................................................................................................................................82
........................................................................................................................................................................83
C. Github analytics........................................................................................................................................84
Page | 6

........................................................................................................................................................................84
Initial Project Overview...............................................................................................................................86
Initial Project Overview...............................................................................................................................86
SOC10101 Honours Project (40 Credits) .............................................86
Page | 7

List of Figures
FIGURE 1. BASIC CONCEPT OF PROCESS MIGRATION...............................14
FIGURE 2. JAVA BEANS STRUCTURE.............................................................22
FIGURE 3. BASIC JNI INTERFACE PROCESS..................................................23
FIGURE 4. VISUAL REPRESENTATION OF VALUE GENERATOR.................24
FIGURE 5. SERVER-CLIENT PATTERN DIAGRAM..........................................26
FIGURE 6. VISUAL REPRESENTATION OF AGENT RUNNING IN PROCESS
MANAGER............................................................................................................28
FIGURE 7. MK I: HOST NODE SYSTEM DIAGRAM..........................................35
FIGURE 8. NODE RING NETWORK DIAGRAM.................................................36
FIGURE 9. WORK AND NODE MANAGER NETWORK DIAGRAM..................37
FIGURE 10. LOGICAL VIEW OF JAVA MEMORY RELATIOSN (JENKOV,
N.D.)......................................................................................................................39
FIGURE 11. JAVA MEMORY MODEL INTERACTION WITH CPU MEMORY
MODEL (JENKOV, N.D.).....................................................................................41
FIGURE 12. ORDER OF EVENTS FOR CONNECTING TO AGENT................42
FIGURE 13. METHOD AND CONTENTS OF PROCESS (THIS)........................43
FIGURE 14. FINAL PROTOTYPE, SERVER-CLIENT NETWORK.....................45
FIGURE 15. ANY2ONE CHANNEL CONCEPT...................................................48
FIGURE 16. INTERNAL CONNECTION MECHANISMS OF AGENT.................50
FIGURE 17. SERVER INTERACTION DIAGRAM FOR PROTOTYPE...............55
FIGURE 18. TABLE OF EXPERIMENT RESULTS.............................................56
FIGURE 19. TEST RESULTS GRAPH; CPU USAGE AND TIME SPENT.........57
Page | 8

FIGURE 20. NODE INTERACTION DIAGRAM...................................................62
Page | 9

List of Screenshots
SCREENSHOT 1. WINDOWS 10 TASK MANAGER AND RESOURCE
MANAGER............................................................................................................29
SCREENSHOT 2. CONSOLE LOG: BASE READING OF CPU USAGE ON
CLIENT 1...............................................................................................................31
SCREENSHOT 3. CONSOLE LOG: CIENT 2 AFFECTING CLIENT 1 CPU
READINGS............................................................................................................32
SCREENSHOT 4. CLIENT INITIALISING UI.......................................................51
SCREENSHOT 5. SERVER NOT STARTED OR CRASHED ERROR MESSAGE
...............................................................................................................................51
SCREENSHOT 6. CONSOLE LOG: NODE REGISTERED ON SERVER..........52
SCREENSHOT 7. BASIC USER UI......................................................................52
SCREENSHOT 8. CONSOLE LOG: NODE SHOWING READY.......................52
SCREENSHOT 9. CONSOLE LOG: NODE DOING WORK AND RELEASING
PROCESS NODE 1 WHEN FINISHED................................................................53
SCREENSHOT 10. CONSOLE LOG: WHEN PROCESS 4 STARTS, CPU IS
HIGH (62%), AGENT IS CONTACTED (I AM READING), THE PROCESS IS
DISCONNECTED, SENT (LETS GO) AND PROCESS NODE 4 IS RELEASED
...............................................................................................................................54
SCREENSHOT 11. CONSOLE LOG: SERVER DELETES ADDRESS..............54
Page | 10

Acknowledgements
Firstly, I would like to profusely thank Professor Jon Kerridge who has been an
invaluable source of confidence and knowledge throughout this whole project. He
has been a guide and kept me steadfast in what needed to be completed through
challenging times.
Secondly, I’d like to thank Doctor Kevin Chalmers who has always been
compassionate and a nurturing presence throughout my time in University, from
my first to fourth year.
I would also like to personally thank Charlotte Leask for her constant support and
eternal patience throughout the whole process.
Page | 11

1 Introduction
As the world tends towards the finite end of physical enhancements in computing, it
is the aim to continue increased speeds and finding new methods of surpassing
these limitations.
In the past, the first step in augmenting any computer in terms of speed and
performance has been reducing transistors size and increasing speed henceforth.
Co-founder of Intel Gordon E. Moore stated that the number of transistors able to fit
on a processor would double every 18 months, fundamentally increasing the speed
of computer for at least the next decade. This model of thought is still used regularly
in the computing industry today, however it was initially stated in 1965 and since
then, many things have changed.
The problem we are met with today is distance, heat and conduction. The physical
size we are hitting on distance between cache memory and cores is reducing, more
and more. We are starting to hit almost instantaneous transmissions and this comes
with another set of problems. Heat is generated when a CPU core is pushed to
compute at the rates we demand and can require more intricate ways to cool the
system, and this can all be down to bad allocation of resources.
We hence need to look at how we balance our work. Software needs to reflect the
modern multitasking environment that we have come to expect and hence, must
change in order to cope with increasing demand as hardware cannot be relied on to
be the sole supporter in this venture. I plan to build a system which allows a proper
allocation of resources available and increased the efficiency of hardware use in
order to achieve a faster, reliable system. 1
This project endeavours to meet these needs with a system which distributes
processes over a cluster of computers, regulating work based on CPU load. This is a
1
Taken from IPO
Page | 12

means of using Idle CPU’s without exceeding a threshold impeding on the users
everyday use.
The final product aims to be proof of concept that load balancing is possible in a high
level language, in a portable environment. Hence, it displays the means and
capabilities required to further develop a fully, automated system for everyday users
with access to multiple devices with Java compatibility.
1.1 Background
Most processing enhancing implementations fall under Cloud computing; outsourcing
processing to external data centres, platform services or application hosting, whilst
remotely managing computer resources (Winias & Brown, n.d.). However, not all
businesses have access to scalable hardware architectures, these architectures
being expensive to build, run, and upkeep.
Shifting foci to performance, creating efficient software diminishes the need for in-
depth management of system architectures and is a fundamental code of conduct
for emerging professional IT bodies (such as the British Computing Society).
However, different programming languages support different levels of control on a
system. Programming in languages of high abstraction do not fundamentally afford
the same level of efficiency low level languages can attain, and low level languages
are platform specific and do not pertain to portable methods.
So taking advantage of current user environments rather than reimplementation of
code or hardware is, logically, the most cost effective and least disruptive route. This
can be done by effectively managing processing loads; maximising processing
resource capabilities.
Utilising idle CPU resources on a network of computers (cluster) is fundamentally
guaranteed to speed up processing and work all around. In order to do so, these
resources must be directed to work together towards a common goal (i.e. Task
parallelism).
Many current system such as Incredibuild implement this parallel design for build
environments, working with low level code to facilitate high level build concepts.
Page | 13

(Xoreax Software Ltd., n.d.) With such high profile clients such as Microsoft, Google,
IBM and Disney using their product to maximise their system use, it’s obvious that
this task distribution method is proven to work.
However, for the average user or start up business, system specifics might still prove
elusive. So why not implement this distribution system in a portable, high level
language?
Java is a widely used platform, built to be compiled in memory, running in a Virtual
Machine aiming for multiplatform portability. According to Oracle, 97% of Enterprise
Desktops run Java alongside 3 Billion Mobile phones worldwide (Oracle, 2015).
Building a system in Java allows for the opportunity to port to multiple platforms with
relative ease, making the potential for networked devices joining the system
exponential.
It should be noted that in researching this area, there is very little on the subject of
load balancing in high level languages in a cluster environment within the last 6 – 10
years. Appendix A documents the search criteria used and the relevancy of results.
1.2 Aims and Objectives
The aim of this project is to distribute and regulate processes over multiple CPU’s in
a cluster setting using the Java programming language, with the Java Virtual
Machine (JVM) as the environment. This involves monitoring CPU usage in real time,
stopping processes which appear to overload set terminal and then moving them to
CPU’s experiencing less stress in the cluster.
Figure 1. Basic concept of process migration
Page | 14

The main objectives in order to create such a system, in practise, are outlined below:
1) Monitor CPU usage incurred by an instance of JVM.
2) Processes must have a way to be interrupted and saved in their current state.
3) Processes need to a have a way to move and reinitialise at different nodes, on
different CPU’s.
This report documents the steps taken to achieve these goals from inception to
completion. This project aims to provide a system which endeavours to successfully
manage load over several terminals in a cluster, using a language with a high level of
abstraction: Java.
1.3 Scope and Limitations
In order to provide a proof of concept system within the projects allotted time, certain
areas of the project had to be kept within reasonable limitations. In this case, a
limited amount of processes are programmed and sent automatically over the cluster
to ensure that overload can be attained at a percentage certainty of time. This means
the system does not afford user input yet and runs fairly autonomously.
In addition, to show the scalability of the system, it must be ensured that the
computer which will be distributing tasks runs at a proficient speed to facilitate access
from multiple user-end nodes with, preferably, one underperforming CPU.
As the system relies on communication, many options of transmission are available
but are kept only to TCP/IP network protocols. This form of communication was
chosen as it is a proven, reliable and a widely-used method which is supported by
virtually all OS and platforms that Java can be run on.
This project will also be using a Java scripting language called Groovy which
facilitates the use of ‘Communicating Sequential Processes for Java’ (JCSP). This
allows the manipulation of threads at a low level with high level abstraction resulting
Page | 15

in a parallelised system and can use TCP/IP protocols as its main mechanism for
communication between systems.
As the project is created to prove that Java can be utilised with a capacity to
distribute and balance a system over a cluster, all aspects of the system will be
implemented in Java, to the constraints of the JVM, whilst maintaining high level of
abstraction in the source code. Other programming languages will only be
considered when it is conceptually and physically impossible to implement the
requisites for completion with the author’s current knowledge and skills.
1.4 Structure of Dissertation
The structure of this document is as follows;
• Section 2 introduces the methodology, the theory and the practises behind the
message passing mechanics of the system which revolves around the JCSP.
• Section 3 discusses the methods implemented throughout the project as well
as the discussions made as a result of research, to reach the finished
prototype
• Section 4 will present the initial experiments conducted. This documents the
limitations and barriers which had to be overcome in order to develop a
functioning prototype.
• Section 5 describes the main incarnations of the system and how each
implementation lead to a better system
• Section 6 provides the mechanics behind moving processes and the
difficulties face in doing so
• Section 7 elaborates on and demonstrates the prototype system; reviewing
design and implementation as well as experimentation with the system.
• Section 8 details the results and evaluation of the system, and project. Section
8 concludes with a critical evaluation of the project covered by the paper
including short comings of the project and possible avenues of work on the
system which can be undertaken in the future.
Page | 16

2 Background, Key Components and Theory
Throughout this report, the majority of components described have been taught
through, and defined by “Using Concurrency and Parallelism Effectively” I & II
(Kerridge, 2014), which builds upon Hoare’s Communicating Sequential Processes
(CSP) theory. Unless explicitly referenced otherwise, these are the main sources of
information disclosed herein. In this section the basic elements from which the
prototype product is derived, are explained.
2.1 Data and Task Parallelism
One of the driving forces in this project is concurrency and parallelism. Task
parallelism allows the user to run multiple processes simultaneously on the one CPU
or over a network. Sequential code follows a specified order, so programmers don’t
tend to think about the order of events in a system once it has been coded and
compiled.
In order to process tasks moving around the intended system, processes will have to
be fairly autonomous and removed from the main body of code. This means that
concurrent and parallel code with have to stop and synchronise with each other on
transfer, interact in timely a manner so as not to disrupt running processes, finish in
an expected order despite being intrinsically non-deterministic in nature due to
running on different platforms, at different speeds all whilst the possibility of migration
plays an active role.
2.2 Hoare’s Communicating Sequential Processes (CSP)
Hoares CSP concepts (Hoare, 2004) dictate that everything encapsulated in code
can be broken down into algebraic functions. By doing so, everything within
programming can be reduced to simple, understandable functions, rules and
patterns.
By doing so, all code can be reduced to smaller chunks which can be moved around
to suit the success of the formula. What you see, is what you get. The following
mechanisms facilitate this concept, and is the basis of the end prototype.
Page | 17

2.2.1 Process
A Process is a piece of code that can be executed in parallel with other processes. A
network of processes form a solution to a single problem, with processes
communicating with each other using Channels (detailed in 2.3). Processes typically
contain repeating sequences of sequential code with communication interspersed.
Any process that is idle consumes no processor resources.
2.2.2 Timer
A Timer is a means of introducing time management into processes. Timers can be
read to find the current time and introduce delays or alarms for future events. They
can also be used in ALTs as guards for reading channels.
2.2.3 Alternatives (ALT)
Alternatives (ALT) allows selection of one ready guard from several possible guards.
Guards comprise of three different types: input communications, timers, or SKIPS
and dictate how a process should proceed. A guard is ready if input is ready, an
alarm time has passed, or SKIP is a defined guard. SKIPs are always ready and
allow guards to continuously run.
The ALT will wait until a guard is ready and then undertake the associated code. If
one guard is ready, then it undertakes the associated code. If more than one is
ready, it selects one according to predefined options and then obeys the code. These
options can include priority reading, if both are ready, or fair, turn based reading.
2.3 Channels
This is a main mechanic of the system described in this report, as the main aim is to
send processes over a cluster network. A Channel is a one-way, point-to-point,
unbuffered connection between two processes. Channels synchronise the processes
to pass data from one to another and do not use polling or loops to determine their
status, meaning no processing is consumed during transactions.
The first process attempts to communicate and goes idle when synchronising. The
second process attempting to communicate will then discover the situation,
Page | 18

undertake the data transfer and then both processes will continue in parallel, or
concurrently if they were both executed on the same processor. It does not matter
which process attempts communication first as the mechanism is symmetric.
When communication between processors takes place, the underlying system
creates a copy of the data object and begins transferal. As such, objects containing
process logic can be transferred, to be executed by a Process Manager, and run
asynchronously, which will form the basis of the project.
2.4 Groovy
The Groovy scripting language allows the programmer to write concurrent systems
with a high level of abstraction and is underpinned the four basic principles detailed
above.
2.5 Communicating Sequential Processes for Java (JCSP)
JCSP is based on Hoare’s basic algebraic functions, allowing virtual connection to be
created via NetChannelLocation structures sent between nodes. Using Java allows
the programmer the ability to send objects via serialisation methods; breaking down
the components into sequences of bytes to be transferred (Chalmers, Kerridge, &
Romdhani, A critique of JCSP Networking, 2008).
With this framework, objects containing code definitions can be sent along with a
control signal to recreate the object at the receiving end. Communicating Sequential
Processes for Java is the cornerstone of this project and allows us to build upon
Hoare’s concepts to create a simple to understand communication network.
2.6 Channel Mobility in JCSP
Channel Mobility refers to the dynamic capabilities that can be found when creating
self-propagating NetChannels and other communication models in this project.
Channels afford us a robustness of connection between the input and out end whilst
allowing sufficient models to a support the ubiquitous nature of the intended system.
(Chalmers, Investigating Communicating Sequential Processes For Java To Support
Ubiquitous Computing, 2008)
Page | 19

As the project does not endeavour to change these underlying mechanisms, a high
level of information is presented. However, it can be stated that Channel mobility is
paramount to attaining, transferring and moving processes successfully.
Page | 20

3 Methodology
The main aim of the system is to create a way to send processes from one node to
another running in the same computing cluster. This would be initiated by rising CPU
usage at each terminal. As such, there were three main problem areas which needed
to be addressed;
1. How to get CPU usage at any given time from within a JVM runtime.
2. How to create and deliver processes around a dynamic network.
3. How to stop set process when CPU usage reaches a predetermined amount
and send to an underused node in the network
3.1 Monitoring CPU Usage
At the time of writing, there were no pure Java API’s available to gather CPU
information. The investigation continued as fact finding into gathering as much
system data as possible within Java.
3.1.1 MBeans
MBeans are managed Java objects, similar to JavaBeans, which can represent a
device, an application or any resource that needs to be managed.
Page | 21

Figure 2. Java Beans Structure
This means we can monitor any of the resources being used by an instance of the
JVM. However, as an MBean can be any type of object and can expose attribute of
any different type, each client has to implement class definitions each time an MBean
is called and can lead to high overheads in themselves when repeatedly queried.
3.1.2 OperatingSystemMXBean
MXBean is native to Java (1.6 upwards) and allows the user to utilize an MBean with
a reduced set of types, meaning there is not a requirement for model-specific
classes. This makes the MBean accessible by any local or remote clients; essentially
conforming to an interface.
OperatingSystemMXBean allows the user access to an interface developed for
retrieving system properties about the operating system on which the JVM is running.
Page | 22

This includes, free memory of the computer, allocated memory for the JVM and CPU
time dedicated to a task. MXBeans were the only native mechanism provide by Java
Management Extensions (JME) which could facilitate the objectives mentioned.
3.1.3 Java Native Interface (JNI)
The Java Native interface is a native programming interface that is part of the Java
Software Development Kit. JNI allows Java code the use of fragments and libraries
written in other languages such as C and C++.
While Java breaks code down into Objects to be interpreted, C allows for the use of
procedural code which is compiled and breaks down into functions. The JNI connects
Java Class Methods with C functions, fundamentally allowing the programmer to call
C functions at any given time.
Figure 3. Basic JNI interface process
This allows the user access to lower levels of programming and can read values,
such as CPU usage, from assembly code. Although this approach seems the most
enticing, it can lead to the destabilisation of a JVM instance through subtle C errors.
Writing small scripts may not pose a huge problem, but garbage collection is not
Page | 23

handled by the JVM in these instances and a basic understanding of memory
allocation must also be known. Additionally, using the JNI results in a system which
is not wholly portable as the code written in C is platform specific.
3.2 Process Creation and Distribution
As one of the main prerequisite of this system, a network architecture had to be
designed to facilitate communication.2
This section will focus on how data and work is
spawned to test the proof of concept system.
It should be noted that although the aim of the project is a proof of concept, the ideal
system would spawn multiple instances of work which would accumulate to a large
amount of CPU usage in order to adequately balance the system.
3.2.1 Value Generator
Here, different volumes of data are generated by a data generator and sent to a
Node to be processed. The perceived complexity of the data should be proportional
to the increase in CPU usage created.
Figure 4. Visual Representation of Value Generator
2
. Development and iterative design is documented in Section 5.
Page | 24

This would require a fixed process at each node initialisation to manipulate the
randomly generated data sets being produces by the generators. All interactions are
handled by channel interactions, as shown above in figure 4.
3.2.2 Random Process selection
In this instance, each node would have access to pre-set process definitions which
would generate varying loads. At run time, a timer would be initiated requesting a
random process to run, one of which would create a large spike in CPU usage. This
would allow reproduction of an overloaded state with a high degree of certainty while
demonstrating. The structure would be similar to the above but would not require the
DataGenerator as the initial input would remain the same.
3.2.3 Server Hosted Process Definitions
This method would require processes being hosted remotely at a specific IP location,
and requested by the client when needed. The client would access a server which
would have the network locations of all the relevant process servers. The request
would then be forwarded, with the clients’ location, to the process location and sent
via a TCP/IP channel back to the client.
Page | 25

Figure 5. Server-Client Pattern Diagram
This would work by using objects containing serializable process definitions sent over
channels.
3.3 Process Movement associated Methods
As copying large amounts of data around a system would prove to be inefficient, not
withholding the large overheads in processing and memory allocation, the system
has to handle data manipulation locally, within one processor.
This means that processes in their entirety have to be sent between nodes, to
complete the full process, sharing as little data during computation as possible. The
aim is to send initial parameters and results.
The methods implemented for this aspect of the system heavily rely on the JCSP
API, the underpinning functions for Groovy. Hence, the definitions and descriptions
pertaining to JCSP methods below are based and paraphrased from the API
specifications hosted by the University of Kent at Canterbury (htt1). Implementing
Page | 26

process movement is covered with more detail in chapter 6, documenting limitations
and boundaries.
3.3.1 JCSP Process Manager
The ProcessManager class enables a CSProcess to be spawned concurrently with
the process doing the spawning. This means we can have multiple processes
running and allows the nodes in the system to deal with multiple processes being
sent on the same channel.
Dealing with processes as they arrive, allows the system to pertain to a client-server
pattern, making chances of deadlock in the system (pertaining to this area), very
slim.
3.3.2 Process Definition Serialisation in Objects
In order to take advantage of the Process Manager capabilities, process definitions
need to be designed to be CSProcesses. In order to do so, a process is defined in
its’ entirety and encapsulated in an object.
In doing so, we ensure the objects class implements two interfaces; CSProcess and
Serializable.
3.3.2.1 CSProcess
According to the JCSP documentation, “a CSP process is a component that
encapsulates data structures and algorithms for manipulating that data” (htt). This
basically means the data involved is private and cannot be accessed outside the
object itself.
Essentially, each instance of the process is alive, executing its own algorithms on its
own data and its’ actions are defined by a single run method. To avoid race-hazards,
the processes in this system do not require outside data or interaction with other
running threads. Only primitive data types will be sent to activate switches or request
new data. No procedures outside of defined data manipulation take place within the
Process Manager.
Page | 27

3.3.2.2 Serializable
A class with Serializable uses the java.io.Serializable interface and allows subtypes
of classes to be serialized for communication transfer. The interface itself doees not
have any methods but serves only to identify the semantics of being serializable.
It should be noted here that CS classes (not classes implementing this interface)
such as CSTimer do not conform to serializable semantics and will be covered later
in this document.
3.3.3 Agents
The Agent interface implements both CSProcess and Serializable but also adds
connect and disconnect methods. These are used to connect input and output
channels from the internal mechanisms of the sent process definition resources, to
an outside host.
Figure 6. Visual Representation of Agent running in Process Manager
The agent has two channels by which it connects to the host during runtime. This
means the data inside the agent CSProcess can be influenced from outside the
Process Manager. By exploiting the Agent interfaces, we can enable communication
from outside threads during run time giving agents access to two different code
structures.
Page | 28

4 Initial Experiments
In order to evaluate which methods would lead to a successful system, the
methodologies aforementioned were investigated and implemented in different
circumstances, testing for compatibility with the project.
4.1 Monitoring CPU usage
Monitoring CPU usage would take part in two stages; designing code which will
generate high usage and code which can interpret the CPU usage by percentage.
Results would be compared in conjunction with the Task Manager and Resource
Manager native to Windows 10.
Screenshot 1. Windows 10 Task Manager and Resource Manager
4.1.1 Creating Work
Creating work consisted of doing two different functions which would change
intermittently to test increases in CPU usage. Small work creates an int value,
comprising of a basic multiplication operation followed by a timer to create time
between set operations. CSTimers, as part of JCSP, work as a guard for the code,
acting as an ALT, meaning there is no processing wasted during execution.
For larger CPU usage, a more complicated problem has been run to generate more
work, creating a long variable, as seen below:
Page | 29

Long j = (Math.pow((Math.pow((60339*339398/2*33323),2348958)),
30000000000))*(Math.pow((Math.pow((454339*339765645398/26*354563323),2348456459
58)), 3000004564500000))
4.1.2 Monitoring Work
A basic system was implemented to create expected, repeatable workloads on the
CPU that could be measured to inspect whether monitoring usage was successful.
The system of operations is shown as a process diagram in figure 7 below.
Figure 7. Test Process Diagram
Page | 30

The process is simple; a timer is set for a predetermined time in which a process of
high CPU usage is implemented. CPU usage at this point would be verified by
addressing the task manager seen in screenshot 1.
4.1.3 Accessing CPU Usage
Measuring CPU usage is a difficult achievement whilst using Java. Firstly, in order for
this project to succeed, we need to distinguish the actual work being done on a
processor as opposed to the memory usage of the JSP. The latter is very easily
accomplished with native Java commands but as any Java program is essentially
interpreted by the system as a ‘process’, it cannot access the necessary tools in
order to gain CPU usage insight in likeness of the Task Manager(screenshot 1).
4.1.3.1 Native Monitoring
There are ways to obtain CPU usage which do not offer real time performance
monitoring but can be based on timed events. For multi-threaded tasks,
ThreadMXBean methods can give you the CPU usage and user time for any running
thread. However, using operatingSystemMXBeans, (explained in Chapter 2, figure 2)
only returns the CPU usage for all JVMs running (i.e. it cannot distinguish between
processes of different PID and returns the CPU usage of all JVMs). In figure x, we
can see the relationship between two JVMs working concurrently.
Screenshot 2. Console Log: Base Reading of CPU usage on Client 1
Client 1 (right) is using independent code to monitor itself whilst Client 2 (left) is
waiting for work. operatingSystemMXBeans returns the use of the CPU with 1 being
100% usage and 0 being 0%. At the moment, on monitoring, the system sits at 12%
usage.
Page | 31

Screenshot 3. Console Log: Cient 2 affecting Client 1 CPU readings
However, as new processes are started in Client 2, Client 1 continues to monitor high
CPU consumption, proportional to the work of Client 1, despite having no work itself.
opratingSystemMXBeans are further influenced by any other Java application
running. Hence a way to distinguish between JVMs running has to be identified.
It should be mentioned that as of Java 9, there is a new process API that allows the
user to get the current process ID. However, on the date of writing, this was still in
Beta testing and Java 8 was opted for use due to its comparative stability.
4.1.3.2 JNI interface
C affords the low level functionality to physical components needed to identify a
JVMs Process Identifier (PID). PIDs are numbers which uniquely identify a process
while it runs and is used in Linux, Unix, Mac OS X and Windows.
The problem however, is system calls are still defined differently on each OS.
Language libraries need to be recompiled for the specific target operating system, to
utilize the particular underlying components of the operating system (kernel).
Page | 32

As this research was beginning to deviate from the original project scope, delving
further into low level code, an API was imported to give multi-platform compatibility.
4.1.4 Sigar API
Sigar is a multiplatform API for Java and other languages. It allows the user to
monitor Per-process memory, CPU, credential info, state, arguments and other
relevant information (MacEachern, n.d.). By incorporating Sigar, the program can
produce percentage numbers based on the amount of CPU usage attributed to the
PID of a JVM.
4.1.5 Transferring Objects
By connecting two nodes by a TCP/IP connection, we can send an object very easily.
Implementing a serializable interface, an empty object is sent to another node at a
defined IP. This was to ensure objects were being sent and not references.
If a read was successful, a printed statement would display in the eclipse console
stating “Success!”.
4.1.6 Running Process Definitions
As process definitions can be contained within object, a simple system can be
created using two nodes and instances of a process manager.
Process definitions are sent using a timer testing one process running then two
concurrently and the task manager is consulted to ensure process are being run
correctly.
Page | 33

5 Architectural Design
Throughout the project, many different systems were designed to monitor processes
and set up an architecture of communication which could theoretically facilitate this.
The various designs are presented and critically evaluated below.
5.1 Central Repository
This design attempts to meet the aim of process movement. Each node has a
process node which creates and runs a process on the process Manager attached.
The results are then sent to a host node which keeps track.
Page | 34

Figure 7. MK I: Host Node System Diagram
Each node would monitor the CPU usage of the JVM. Once a certain level is met, the
process would then be packed and sent to another node.
5.1.1 Central Repository - Issues
The problem with this system is all channels must be created on initialisation,
meaning no room for scalability. It’s also essentially working on a ring topology and
more suited for a single system. This network is easily set up in a single JVM as well,
meaning only references are passed, rather than the actual objects.
Although good for initial tests (scaled back to two nodes and Host Node), the main
drawback to this design is the ring element itself. This design was expanded upon to
work with Agents below and the problems of ring networks is explored in more detail.
Page | 35

5.2 Ring System with Travelling Agents
The Agent System opens up the network, allowing communication over different
JVMs. The processes are no longer spawned within the node, but sent by a manager
as Process Definitions.
The Manager then runs the process whilst a monitor reviews CPU usage. When
needed, an Agent is created, with the relevant process.
Figure 8. Node Ring Network Diagram
5.2.1 Ring and Agents - Issues
It was during this iteration that the underlying principles of threads were explored in
more detail and found to be non-serializable, meaning the running process could not
be sent with the Agent in its current state. This meant this design would
fundamentally not work. The agent could get the process definition, but in its
unedited state, not during processing.
Not only that but, when dealing with task parallelism, ring systems are inherently
prone to deadlock. As processes are created at nodes, the communication between
ring elements proved to be non-deterministic due to the random uncertainty as to
Page | 36

which processes were being spawned where, and when they exceeded pre-set CPU
usage, and needed to be moved.
If too many events were triggered, all of the processes involved in the ring would be
attempting to output at the same time, resulting in deadlock. In non-uniform network,
where computer architectures are different (providing varying computational power),
this problem would become more prevalent.
To alleviate this, we could have nodes probe the ring first with empty packets, waiting
for them to return but this would result in half the network activity on the ring being
empty data packets; a detriment to efficiency.
5.3 Work & Node Manager System
The work and node manager took the ring element out of the design and introduced
server client properties.
The problem with this design is the servers are very closely related and can end a
closed system. The final prototype changed this.
Figure 9. Work and Node Manager Network Diagram
Page | 37

5.4 Network Structure Analysis
In order to minimise incidents of deadlock, the Client-Server pattern seemed most
logical to implement. A server orientated network permitted:
• Decreased chance of deadlock
• Process Discovery
o Nodes receive complete set of required process
o Allowing dynamic amendment of process definitions
• Process Control
o User not restricted to only one choice
o Timing of process delivery
• Centralised repository for client lists and results
• Scalability
o Users added by location (IP) rather than assigned place
Page | 38

6 Introducing Process Movement
Processes movement was easily implemented when it occurred on the same
physical machine in the cases of the first two prototypes. However, complication
increases when functionality is extended to a network.
Process Definitions are easily sent in a static state, but getting the state of a process
in execution requires finding all relevant data saved in the JVM.
6.1 Java Memory Model
In order for Java to be architecture neutral, it is built to operate and exist solely within
memory (RAM). Hence, to mimic a computers infrastructure, the JVM inherently
includes its own memory model.
The Java memory model divides memory between thread stacks and the heap. It can
be seen logically in figure 14.
Figure 10. Logical view of Java Memory Relatiosn (Jenkov, n.d.)
Page | 39

Each thread running in the JVM has its own stack which contains information about
which methods have been called, point of execution and the local variables for set
methods. The local variable consist of primitive types and are fully stored within a
thread stack. Hence, they cannot be seen by any other components of the JVM
during execution.
The heap contains all objects created in the Java application. The main point of
contention for moving processes in the JVM, is the fact that all manipulation occurs
within a thread stack. If the object containing the process definition being worked on
is moved (even if thread is suspended during processing), it will be moved in its
original, unedited state.
6.2 Moving processes within a JVM
As all classes exist within a single JVM during runtime, hence initial tests for moving
processes were misleading. Simply suspending a thread and calling set thread in
another class leads to a seemingly successful process manoeuvre.
This is achieved by suspending the process manager (essentially a concurrent
thread) and sending it through a channel. In this case, as the channel connects two
host processes within a single JVM, it’s only the thread reference which is
communicated meaning it has technically remained in the same place and is only
being restarted; just by another process.
6.3 Thread Serialization impossible with current JVM
Each method ran in a Java program has a stack frame associated with it. The stack
frame holds the state of a method with three sets of data: the methods local
variables, the methods execution environment and the methods operand stack.
It would stand to reason that by copying the values at suspension, copying a thread
could be achieved. However, the thread object would be allocated with none of the
native implementation. The JVM emulates a machine for each instance a Java
program is started, and a thread run on one of these machines becomes intricately
tied into the internal mechanisms of the machine. The context of operations is simply
lost.
Page | 40

Reading the locations of the threads on the physical machine would prove difficult as
well. Not only would this require a separate language to access the data, but memory
allocation would have to be monitored from the inside of the JVM as well as outside.
Hardware memory does not distinguish between the heap and threads; hence parts
of thread stack can be present in CPU caches as well as the CPU register.
Figure 11. Java Memory model interaction with CPU Memory Model (Jenkov, n.d.)
Also, Java relies on C procedures for some of its native methods. If the stack were to
be copied, it may contain native Java methods that, in turn, have called C
procedures. This indicates a complicated mixture of Java constructs and C pointers
would have to be recorded.
At this point, not only does it increase the amount of data to be transferred at once
over a network, but goes against the ethos of this investigation to find a solution with
high abstraction. This is also why reconstructing byte code, instructions used by the
JVM to resemble Assembler instructions, and the monitoring the JVM instruction set,
have not gone under further investigation. 3
3
Using the Java Class file disassembler proved to be a cumbersome method to determine the
sequence of events and was essentially the lowest level format possible with Java.
Page | 41

6.4 Adapting Process definitions as Agents
In order to move processes, we have to look at the object itself which is being edited.
As the supertype class, process manager, is not serializable, the subtype object must
assume responsibility for saving and restoring.
As the process definitions already contain a run function, the system must be
amended to stop the internal code from executing, and getting the edited values. This
means each process object must be created as a new instance so as to keep track of
its own local variables and have a method of communicating with the host process
whilst running concurrently.
Adapting the processes to conform to an Agent interface introduces two new
methods which will allow this: connect and disconnect (agent seen in figure 6). The
host is fitted with two new channels, generated at run time, which allow the agent to
connect when received. The basic order of events can be seen in Figure 16.
Figure 12. Order of Events for connecting to Agent
Page | 42

6.5 Sending process definitions in current state
In Java, objects can refer to themselves simply by calling “this”, meaning once the
internal code has been paused and variables saved, the object itself can be
packaged and written to a channel as a serializable object to be run by a new
process manager.
Figure 13. Method and Contents of Process (this)
This way, as long as the process definition contains all the run code required, the
state of the process is reflected in the object state. This meets the requirements for
process movement and is a main part of the prototypes design.
Page | 43

7 Prototype
7.1 Design
The final implementation extends the Server Client design by adding Process Nodes
to the universal client so multiple instances of a Process sent can be ran
concurrently, whilst connecting to their respective host.
It is based on the six paradigms for code mobility (Chalmers, Kerridge, & Romdhani,
2007):
• Client-server
o Client executes code on the server.
• Remote evaluation
o Remote node downloads code then executes it.
• Code on demand
o Clients download code as required.
• Process migration
o Processes move from one node to another.
• Mobile agents
o Programs move based on their own logic.
• Active networks
o packets reprogram the network infrastructure
Page | 44

In the case of this design, agents are begin manipulated as means of internal
communication as well as movement. The final design is seen in figure 18.
Figure 14. Final Prototype, Server-Client Network
The Universal node comprises of a node Monitor, which is periodically, checking the
CPU usage of the current JVM it is running in. In order to do so, a concurrent thread
is spawned on run time with the sole purpose of returning the current CPU usage.
Using Sigar, the CPU usage is checked every 10 milliseconds and if it above a
certain threshold, a new node request is sent to the Access Server.
The Node Monitor has four Process Nodes which are connected by two one2one
Channels. Each Process node runs a Process Manager for incoming processes to
connect with. At any given point, if either the Client or Server are waiting, they do so
idle, consuming no processing power.
Page | 45

Process Movement is handled mostly by nodes to avoid over reliance of the servers
involved. If a process has to be stopped, it is directly sent from the Process Manager
running it, and transferred straight to a new Client, rather than via the Access Server.
This essentially allows the system to move processes in the most direct manor
conceived.
The system conforms to a Client-Server pattern between the Universal Node and the
Access Server. They are connected at initialisation by an any2net (toAccess) and
numberedNet2One (processRecieve) Channel. This is also true for the relationship
between the Access Sever and Process Servers, however there is only one
connection for interaction as the Process Servers have nothing to return.
7.2 Components
Detailed below are all the component which connect the system together as well as
their role in the whole process.
7.2.1 Nodes
In the context of this system, Nodes are autonomous, concurrently running
processes. They control connectivity to the process locations, deal with work and
monitor CPU usage.
7.2.2 Node Monitor
The Node Monitor initialises the user system and creates a connection to the Access
Server, adding its IP and port location on connection and removing set location when
disconnecting. Currently, the server address is hard coded but any server with the
same infrastructure could be added and defined by the user.
It self-monitors its respective instance of a JVM for CPU usage and keeps track of
which process nodes are in use.
The Node Monitor requests processes to be run and delegates the work to the
available Process Nodes asynchronously.
Page | 46

It can also stop process Nodes from continuing work when CPU load is too high. It
then selects the last Node activated, requests another Universal Client location from
the Access Server and sends the location to the Process Node.
7.2.3 Process Nodes
Process Nodes receive process definitions and put them to work using a Process
Manager. Each Process Node provides channel ends on which the Process
definitions can connect to facilitate interaction between the received process
definition and the host.
This connection allows the Process Node to inform Processes to stop and move
when a new channel location is received as well as alert the Node Manager as to
when a process has finished.
7.2.3.1 Process Manager
The Process Manager (detailed in section 3.3.1) runs the processes received
concurrently.
7.2.4 Channels
Channels comprise of two channel ends:
• A channel input where data is read into the system component
• A channel output where data is written out of the system component
Channels in this system pertain to be one to one connection. The only exception is
the stop line from Process Node to Process Manager. This connection is an any2one
connection where the input can come from any node but the output is a specific
channel end.
Page | 47

Figure 15. Any2One channel concept
7.2.5 Net Channel
Net Channels work in the same way as regular Channels but the output is directed to
a designated port at a new IP address.
7.2.5.1 Automatic Net Channels
Generated during runtime, Automatic Net Channels create a Channel Input on-the-fly
and use input IP addresses as its location.
7.2.6 Servers
The Servers keep track of Clients available and allows the Client hosts to initialise
waiting for processes to run.
Page | 48

7.2.6.1 Access Server
The Access server has the IP location of the Process Servers, and connects users to
processes requested. The Process Servers IP addresses are stored and connected
whenever an instance of the associated process is requested by the user.
This server deals with user access requests (capabilities; in this system, an
interface), process requests, find other client requests and client dismissals.
7.2.6.2 Access Manager
The Access Manager registers new initialised Nodes onto the server
and keeps track of active clients. This is the basis on finding new
client locations when a client becomes overloaded.
7.2.6.3 Process Servers
The Process Severs provide Process Definitions an IP address and port at which
they are accessible by the Access Server. The Access Server must know which the
locations at initialisation in order to incorporate them into the Client capabilities.
However, the Process Definitions themselves can be amended and adapted during
runtime as the location is the only parameter needed in between requests.
7.2.7 Process Definitions
Process Definitions are objects with their own self-contained logic and variables
activated by a run method. They conform to the CSProcess and Serializable
interfaces.
7.2.7.1 Agent Definitions
Agents afford the same capabilities as other Process Definition but introduce connect
and disconnect methods. This allows Processes to travel with channels defined,
connecting on reception. It is up to the host process to establish the channel
connections.
Page | 49

7.2.7.2 Agent Channels
The Agent Channels allow the host process to connect to the internal logic being run
by the Process Manager. The channels are defined in the host process to then be
connected on reception of the Agent (before running the agent process definition),
during host run time by the connect method.
The input and output to the Agent, and the input and output from the host are then
connected together as seen in figure 20.
Figure 16. Internal Connection Mechanisms of Agent
7.2.8 Request Identification
Request objects allow the Access Server to react in the manor required to process
the data received in the correct way. The simplest is ClientRequestData which
dictates that the string sent within the object corresponds with the service needed
(i.e. “Process Spawn” requires service B) and the address of the requesting client.
Other requests comprised of simple IP addresses required to be interpreted in
different ways. Address locations were packed into said objects to differentiate
between the contexts they were to be treated. These include:
• ClientLocation
Page | 50

o Registers the Client and send capabilities
• LeaveRequest
o Removes Client details from Access Server
• NodeRequest
o Request another Client be found with different IP to send processes to
• NewRequest
o Same as Node request used exclusively for Process Manager and
includes the Nodes ID
7.2.9 Implementation
The system runs in the following manner.
Migration
• Process Servers are initialised and followed by the Access Server at set
IPs
• The Universal Client then instantiates itself with a base IP address and
randomly generated port. It starts four process manager connected.
Screenshot 4. Client Initialising UI
• If the port matched another, an error message is show to try again (range 1 –
10,000)
Screenshot 5. Server Not Started or Crashed error message
Page | 51

• Client connects to Server and Server enrols client into list
Screenshot 6. Console log: Node registered on server
• Server then send backs the Client capabilities
Screenshot 7. Basic user UI
• The Client can then choose different processes to call
o It shows ready in the Console; as the system does not need to show
the general public its workings, the console in Eclipse is used to
monitor transactions
Screenshot 8. Console Log: Node showing ready
Page | 52

• The service needed and IP of the Client are then sent to the Access
Server, which then relays these values to the process server needed.
• The Process sever then send the process to the requesting node directly
• The Universal Client node then assigns the work to one of its free Process
Nodes and marks that node as unavailable
Screenshot 9. Console log: Node doing work and releasing Process Node 1 when finished
• When the first process is received, the node Manager then spawns a new
thread to monitor the CPU usage.
• The process (agent) is then connected to the Process Node and the
process is ran.
• Once finished, the Process Node is released to work again
• At any given point, a new node can become active
Stopping and Moving
• Once a node consumes too much CPU usage, the Manager notifies the
server that it needs a new node.
• Another node is chosen and the address returned to the requesting node
• The manager then selects an active node (last process manager started)
and send a message with the new address to the Process Node.
• The Process Node then interprets that type of object and stops the Agent
whilst simultaneously letting the Node Monitor know it can release that
Process Node
Page | 53

Screenshot 10. Console Log: When Process 4 starts, CPU is high (62%), agent is contacted (I
am reading), the Process is disconnected, sent (LETS GO) and Process Node 4 is released
• The Agent then packs itself and sends itself to the next node where it
continues
• When the node is closed, the server is alerted and removes it from its
active clients
Screenshot 11. Console Log: Server deletes address
To clarify, a server interaction diagram has been created to reflect to order of events
in fig x.
Page | 54

Figure 17. Server Interaction Diagram for Prototype
7.3 Experiment Setup
In order to test the validity of the system, the work described in 4.1.2 was completed
20,000 times for a total of 20 runs and timed for each 20 durations of work. The CPU
usage was also recorded using MXBeans (for accuracy) and averaged. The
experiment was conducted on computers with the specs below.
Page | 55

Hardware
• CPU - i7 4770 @3 .4GHz
• Ram - 16Gb DDR3
• GPU - NVIDIA NVS 510 (2047 MB)
• OS - Windows 7 Professional 64-bit
• Network Speed – 1GB/s
7.4 Results
These experiments were conducted 12 times and the results averaged, ignoring the
two polar, outlying values:
1. A single computer running the processes sequentially with processes hosted
locally.
2. A single computer running the processes concurrently over 4 process nodes
with processes hosted on process servers.
3. Two computers running the load balancing system with process from process
servers.
The results are detailed below (fig 21)
Workers Average time taken CPU Usage
1 CPU: 1 Sequential Worker 24.36 Seconds 12%
1 CPU: 4 Concurrent Workers 10.18 Seconds 87%
2 CPU: 8 Concurrent Workers 8.78 Seconds 46%
Figure 18. Table of Experiment Results
7.5 Comparative Analysis
By visualising the data collected, we can correlation between the amount of CPU’s,
time and work.
Page | 56

Figure 19. Test results Graph; CPU Usage and Time Spent
Speed:
• Increasing workers increases the speed of the work
o This is not proportional to the amount added, but a vast improvement
o Never Expected directly proportional speed up due to communication
overheads
• Adding additional CPU caused minor increase compared to increasing Native
Resources
o Due to synchronisation and distribution times, limited by connection
protocols (Network speed very fast)
o Speed Up still apparent
CPU Usage
• CPU usage for single process very low
Page | 57

o To be expected as the CPU is literally doing the least amount at a time
during execution of test
• CPU usage increases 7 times over for 4 workers
o Although the more CPU usage was expected to be consumed, it was
not expected to grow this much.
• Added CPU for balancing reduces CPU usage to almost half
o Considering how much difference there is compared to sequential and
concurrent methods, almost halving the stress is a great result
7.6 Local Concurrency Vs Distributed
The results trend toward better results in terms of time and processing consumption.
It however does not grow exponentially when more CPUs are added. It was assumed
when going into the experiments that there would be a boundary for performance
based solely on communication times.
Judging from the sharp change in CPU usage however, we can conclude that the
system does balance the load whilst increasing efficiency in processing. This is
logical with more workers, doing more things.
With small amounts of work however, sequential processing will yield better results
due to the nature of saving small values and little processing needed, compared to
moving data around a network. However, small amounts of work would not be what
the system was designed for.
Page | 58

8 Conclusion
8.1 Has the Project met its Aim and Objectives?
The aim of this project was to create a system which can distribute work and regulate
set work over multiple computers, ensuring CPU usage does not exceed a specified
threshold on each terminal.
As the tests in 7.5 show, the functionality to facilitate regulation does exist in the
current prototype. The main objectives stated in 1.2 are recapped and addressed
below:
1) A method of monitoring CPU usage by implemented in the JVM over multiple
CPU’s must be implemented.
The Sigar API (and java Beans to a certain extent) afford this functionality. By
spawning a thread in the Universal Clients Node Monitor, the monitoring
function remains functional throughout execution. It is not affected by other
events and allows constant vigilance.
Although this project did set out to complete everything at a high level, there
was one barrier which could not be dealt with otherwise. It can be argued
through that most of the Java native methods run using the JNI is parse C
language, so it still conforms as implementation within the JVM.
2) Processes must have a way to be interrupted and saved in their current state.
With the system sending process definitions, the position running position of a
process using a Process Manager is reflected in the state of the object. By
delegating saving responsibility to the subtype in process management, we
can essentially pick up the work from a previous running instance.
Page | 59

As explored in chapter 6, it is impossible to serialize and send threads using
high level techniques, this method yields a large amount of efficiency,
providing variables are saved in a tolerable fashion.
3) Processes need to a have a way to move and reinitialise at different nodes on
different CPU’s.
Using the Serializable interface, Channels, Process Managers and objects
containing process definitions, this aspect of the system has been successfully
implemented and has been rigorously studied.
It can be concluded that objectives and aims have been accomplished. The system
outlined at inception has been completed, as a proof-of-concept, functional system
as long as the user controls the processes introduced. However, during development
and implementation, more aspects have been identified which need to be addressed
in order to label this project finished.
8.2 Deployment Analysis and Critique
8.2.1 CPU Monitoring Critique
With user supervision, the system can be seen to send, receive, run, stop and move
processes. The CPU monitoring gives adequate coverage and timely response to
spikes in CPU usage. Ideally, MXBeans should be used if implementation can be
guaranteed in an environment with no other instances on the JVM running, as the
results tend to be more accurate.
Using the JNI and C results in CPU polling roughly once for every thousand
instructions and gives insight to that instant of process CPU usage information.
Information available via Sigar (CPU usage time) does not update all the time, and being
instant, can sometime return 0 making viable readings even more infrequent. However,
the frequency and scope of accuracy are still adequate for this system to function.
8.2.2 Process Movement Critique
Within the time constraints, the project was built to prove that active process
migration could be achieved, and the mechanics, and theory behind the actual
Page | 60

process movement are sound. However, user end process management requires
more work.
The problem pertains to the amount of process nodes at each Universal Client. As
each process definition needs a manager to connect to, a process manager does not
suffice for the intended process interaction. So, if more than 4 processes are sent,
the Manager Node does not have the option to deal with the excess process read.
At this point the Client Server environment breaks down, as the Client is no longer
waiting for input, and a deadlock can occur if a process node is in a busy state at the
point of reception. Having redundant nodes on the system which receive overloading
processes in this case, could relive nodes, or simply having more process nodes
instantiated at run time.
Adapting this aspect of the system really depends on whether the user has the
intention to regulate large amounts of work in a cluster, or wants to use the program
in the background of home systems to automate smaller projects. The scalability
options of the system in these aspects is a great resource.
8.3 Further Research and Work
Aside from user testing, small patches and implementing a targeted application (such
as distributed raytracing), identified improvements in functionality have been listed
below.
8.3.1 Process Interaction
Once processes are distributed currently, the process sent must be a standalone
procedure. In this case, the main sever would have to be more involved, keeping
note of which processes have been distributed where. The list of current clients could
be expanded to be a list of lists, containing the Node address as well as the current
processes. If we consider one process at each node for simplicity, cross process
interaction could be implemented by doing the following:
Page | 61

Figure 20. Node interaction diagram
1) A Client would request additional data relevant to the process being ran from
the server.
2) The Server, knowing which processes are running in the overall system, would
find a node running the needed data and halt its procedure.
3) The required node would confirm it is ready to set up connection with the other
node. The requesting node must initiate the setup, to a node which is currently
paused, due to the nature of channels having to have an input end set up as a
pretence for communication.
4) The new node address would be sent to the initial client where relevant Net
Channels would be automatically created, similar to moving when moving
processes, for transfer and control mechanics.
5) The nodes when then act like a client and server. The Server node would then
send an initiation signal, causing the client node to run, and being transfer.
Page | 62

With the current infrastructure of the implemented prototype, with some configuration,
this new system could be successfully implemented. The framework of this design is
not hard to implement in theory, but the semantics and order of communication would
have to be thoroughly deliberated upon.
8.3.2 Process Node Quantities
This is simply allowing the user to define how many Process Nodes they would like to
initialise. In order to keep processing limits within a reasonable window, the users
processing capabilities would have to be assed, limiting the amount of concurrent
processes.
This would also require either the user or developer to have prior knowledge of
estimated processing power that each individual process can consumes, otherwise
the system could spend a lot of time moving processes.
8.3.3 User Defined Processes
Implementing user processes would have to have two specific points of contention:
1) Methods would have to be adapted to conform to Agent classes
2) Code must be runnable.
This means code would have to be scanned or tested during run time to ensure all
aspects are serializable. This could be done by creating a Test Node which,
comprising of a try, catch system which returns exceptions when met.
Having runnable code is the main function of the CSProcess class, so methods
would have to be identified at input. This could include an interface which asks for
variables and the associated process separately.
Another method, which involve having some knowledge of the system, would be
implementing a wrapper classes which could affix the required connect methods for
Agents if the user under stands CSProcesses.
Page | 63

8.3.4 Extended Network to Internet
This method is easily implementable, but does not conform to the Aims of this report.
By simply changing the node and server IP Addresses to public IPs rather than local,
the systems scale can be opened up to user in any location.
The problem then lies with security. There are currently no security measures in
place during communication. Although the mechanisms of the system are not
common place in Java, objects are still a universally used data type.
8.3.5 Automated Process Delivery
As the system stands, the universal clients are in tasked with acquiring processes.
This was implemented to regulate the speed of requests and allow easier debugging.
Automated processes can be implemented by keep track of how many processes are
running at each node from the Server.
If the Server records a Client node with free process Nodes, it can continue to send
more processes to the under loaded area. Polling for CPU usage at completion of
tasks to indicate whether more processes are needed would result in a well-balanced
system overall, but would result in higher volumes of traffic.
As previously stated, these options should be aligned to the chosen application of the
system and could be user controls put in place at initialisation.
8.4 Reflective Statements
During this project there have been multiple setbacks, those which could be avoided
and those which were unforeseeable. With most large projects, developers will never
be truly happy in what they have accomplished. Despite meeting the initial aims of
this investigation and being relatively pleased with the finished product, there are still
areas which could have been addressed sooner and shortcomings which will not be
repeated in the future.
Page | 64

1) Progress trail
The first objective during this project was establishing a method of progression
monitoring. In turn a blog was created to document progress. However, the first
incarnation was hacked after two weeks. 1
This was a major setback in the project and resulted in decline of adequate tracking.
In the future, security implementation for a public web space will adhered to.
More importantly, a structured, documented development diary will be a higher
priority in the future. Keeping track of developments and meetings would have led to
much more streamlined approach and a better implementation overall. This also
pertains to the week 7 report which took place in the form of a viva voce in the Napier
Games Lab in the first week of December
2) Inadequate background understanding
Going into this project, I believed I had the sufficient understanding of the
fundamental concepts and technologies involved to create this system. Searching for
previous attempts at the problem proved fruitless (see appendix A) indicating there
was not a lot of reading on the subject. The IPO, although has the same conceptual
ethos, talks about accessing system hardware from a High level language and was
naïve in considering some of the goals for the time permitted and the level of work
expected.
However, we never know the depth of our own ignorance as this proved true when
half way through the project, I realised that a thread, the main method of running
work, was not serializable.
With future development, I will ensure that I read not only papers on implementation,
but technical documentation on processes and data types to ensure I grasp the
conceptual limitations as well as technical limitations.
In summation, I have learned that preparation and the process, are just as important
as the actual development.
Page | 65

3) Time Management
For some of the project, personal circumstances dictated lack of work, but time
management could have been much better from the start. Work days were
established as Wednesdays but this was not particularly adhered to at the start of the
project. A gannt chart was drafted but after personal circumstances interfered, it was
not reviewed until after half the allotted time had transpired.
More tests into the efficiency of the system can still be ran and should be considered
part of further work.
Page | 66

9 References
1. Austin, P., & Welch, P. (2008). CSP for JavaTM (JCSP) 1.1-rc4 API
Specification. Retrieved from CSP for Java:
https://www.cs.kent.ac.uk/projects/ofa/jcsp/jcsp-1.1-rc4/jcsp-doc/
2. Austin, P., & Welch, P. (2008). Interface CSProcess. Retrieved from CSP for
Java: https://www.cs.kent.ac.uk/projects/ofa/jcsp/jcsp-1.1-rc4/jcsp-
doc/org/jcsp/lang/CSProcess.html
3. Chalmers, K. (2008). Investigating Communicating Sequential Processes For
Java To Support Ubiquitous Computing. Edinburgh Napier University.
Retrieved April 22, 2016, from
https://www.researchgate.net/publication/239568086_INVESTIGATING_COM
MUNICATING_SEQUENTIAL_PROCESSES_FOR_JAVA_TO_SUPPORT_U
BIQUITOUS_COMPUTING
4. Chalmers, K., Kerridge, J. M., & Romdhani, I. (2007, July 8-11). Mobility in
JCSP: New Mobile Channel and Mobile Process Models. Retrieved 04 24,
2016, from ResearchGate: https://www.researchgate.net
5. Chalmers, K., Kerridge, J. M., & Romdhani, I. (2008). A critique of JCSP
Networking. The thirty-first Communicating Process Architectures Conference,
(pp. 7-10). York: P.H. Welch et al. doi:DOI: 10.3233/978-1-58603-907-3-27
6. Doallo, R., Expósito, R. R., Ramos, S., Taboada, G. L., & Touriño, J. (2013,
May 1). Java in the High Performance Computing arena: Research, practice
and experience. Science of Computer Programming, 78(5), 425-444.
http://www.sciencedirect.com/science/article/pii/S0167642311001420
7. Doallo, R., Taboada, G. L., & Juan, T. (2009, April). F-MPJ: scalable Java
message-passing communications on parallel systems. The Journal of
Supercomputing, 60(1), 117-140. Retrieved April 22, 2016, from
http://link.springer.com/article/10.1007/s11227-009-0270-0
8. Funika, W., Godowski, P., & Pęgiel, P. (2008). A Semantic-Oriented Platform
for Performance Monitoring of Distributed Java Applications. Computational
Page | 67

Science – ICCS 2008, 5103 , 233-242. Retrieved April 22, 2016, from
http://link.springer.com/chapter/10.1007/978-3-540-69389-5_27#page-1
9. Hoare, C. A. (2004). Communicating Sequentual Processes. C.A.R. Hoare,
Prentice Hall International. Retrieved April 22, 2016, from
http://www.usingcsp.com/cspbook.pdf
10.Islam, N., & Shoaib, S. (2002, June 24). US Patent No. US 7454458 B2.
Retrieved April 22, 2016, from https://www.google.com/patents/US7454458
11.Jenkov, J. (n.d.). Retrieved from http://tutorials.jenkov.com/java-
concurrency/java-memory-model.html
12.Kerridge, J. (2014). Using Concurrency and Parallelism Effectively - 2nd
edition. BookBoon.
13.Lam, K. T., Luo, Y., & Wang, C.-L. (2010). Adaptive sampling-based profiling
techniques for optimizing the distributed JVM runtime. Parallel & Distributed
Processing (IPDPS), 2010 IEEE International Symposium on (pp. 1-11).
Atlanta: IEEE. doi:10.1109/IPDPS.2010.5470461
14.Lemos, J., Simão, J., & Veiga, L. (2011). A 2 -VM : A Cooperative Java VM
with Support for Resource-Awareness and Cluster-Wide Thread Scheduling.
On the Move to Meaningful Internet Systems: OTM 2011, 7044, 302-320.
http://link.springer.com/chapter/10.1007%2F978-3-642-25109-2_20
15. MacEachern, D. (n.d.). (C. Technologies, Producer, & Hyperic) Retrieved from
https://support.hyperic.com/display/SIGAR/Home
16. Meddeber, M., & Yagoubi, B. (2010, September 22). Distributed Load
Balancing Model for Grid Computing. ARIMA Journal, 12. Retrieved April 22,
2016, from http://arima.inria.fr/012/pdf/Vol.12.pp.43-60.pdf
17. Olivier, S. (2008). Scalable Dynamic Load Balancing Using UPC. 2008 37th
International Conference on Parallel Processing. Portland: IEEE. Retrieved
April 22, 2016
18. Oracle. (2015 , 02 14). Learn About Java Technology. Retrieved from Java:
http://java.com/en/about/
19. Oracle. (2016). Interface OperatingSystemMXBean. Retrieved from Java™
Platform, Standard Edition 7:
https://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSy
stemMXBean.html
Page | 68

20. Shaw, B. (n.d.). Retrieved from
http://www.codeproject.com/Articles/30422/How-the-Java-Virtual-Machine-
JVM-Works
21. Winias, T. B., & Brown, J. S. (n.d.). Retrieved from
http://www.johnseelybrown.com/cloudcomputingpapers.pdf
22. Xoreax Software Ltd. (2016). Incredibuild. Retrieved from Incredibuild Beyond
Acceleration: https://www.incredibuild.com/
Page | 69

Appendix
A. Searched Terms
All results from 2005 were considered for inclusion.
Some results were duplicated for searches resulting in “0 Relevant” for later searches
Checked as of 22/04/2016
• “Load Balancing in Java” :
o “Distributed Load Balancing Model for Grid Computing” (Meddeber &
Yagoubi, 2010) – Focusses on modellling toppologies of Balancing with
basic information on system implementation
o “Scalable Dynamic Load Balancing Using UPC” (Olivier, 2008) – Uses
Unified Parallel C
o “Method and system for application load balancing” (US Patent No. US
7454458 B2, 2002) – Patent for similar system with no implementation.
Only conceptual with ambiguity in implementation.
• “CPU load balancing in Java” :
o “A Semantic-Oriented Platform for Performance Monitoring of
Distributed Java Applications” (Funika, Godowski, & Pęgiel, 2008) –
Platform for monitoring resources for online Java technologies
• “Java cluster computing”
o “Java in the High Performance Computing arena: Research, practice
and experience” (Doallo, Expósito, Ramos, Taboada, & Touriño, 2013)
– Looks into the methods facilitating the possibilities of High
Performance code using Java (Shared memory model, MPI etc...)
o “F-MPJ: scalable Java message-passing communications on parallel
systems” (Doallo, Taboada, & Juan, F-MPJ: scalable Java message-
passing communications on parallel systems, 2009) – Different MPI
implementation Document
• “Load balancing cluster computing Java” : 0 Relevant
• “CPU balancing cluster Java” : 0 Relevant
Page | 70

• “Load balancing cluster JVM” :
o “A 2
-VM : A Cooperative Java VM with Support for Resource-
Awareness and Cluster-Wide Thread Scheduling” (Lemos, Simão, &
Veiga, 2011) – Cluster infrastructure for Cloud computing systems
o “Adaptive sampling-based profiling techniques for optimizing the
distributed JVM runtime” (Lam, Luo, & Wang, 2010) – Builds a system
based on global variable for cluster, paying closed attention to thread
stacks
• “Load balancing cluster JCSP” : 0 Relevant
• “Load balancing asynchronous cluster Java” : 0 Relevant
• “CPU monitoring load balance cluster Java” : 0 Relevant
• “Cluster process sending Java” : 0 Relevant
Page | 71

Appendix Item 1. Basic concepts
Page | 73

Appendix Item 2. Agent structure
Page | 74

Appendix Item 3. Ring implementation Conversation
Page | 75

Appendix Item 4. Ring Evolution
Page | 76

Appendix Item 5. Extended Ring Elements
Page | 77

Appendix Item 6. Implementing Agent Channels
Page | 78

Appendix Item 7. Losing the Ring
Page | 79

Appendix Item 8. Closed Client Server
Page | 80

Appendix Item 9. Client Server with Managers
Page | 81

Appendix Item 10. Interacting Processes
Page | 82

Further comments and discussion can be found at
http://honsproject.calumbeck.com/
Page | 83

C. Github analytics
Appendix Item 11. Work distribution by day
Appendix Item 12. Git Activity Concentrations
Page | 84

Appendix Item 13. Busy commit periods
Page | 85

Initial Project Overview
Initial Project Overview
SOC10101 Honours Project (40 Credits)
Title of Project: CPU Load Balancer
Overview of Project Content and Milestones
The Main Deliverable(s):
I intend to create a system which monitors CPU core usage over a cluster of
computers and calls another terminal to take on more load when one is starting to
reach maximum capacity; increasing speed and efficiency overall.
The system will implement the use of Agents which will move around the system,
arriving at each node (processor or core in this case) and connect to their main
processing stack to ascertain the current efficiency. Once finished, the Agent
disconnects and then moves itself on to the next core in the system. Using multiple
agents will be a goal for the project and attaining basic concurrency will be the first
milestone event.
As such, the system will be designed and implemented using the GROOVY 2.3
libraries for Java. This allows the user to easily manipulate threads at a high level
through the predominant use of message passing. It is not certain whether a hybrid
of message passing and shared memory will be possible to attain as it is noted that
pure message passing has a large overhead for copying messages from one process
to another. This is not a problem at a high level of programming, but at CPU or even
GPU instructions speeds, it is worth mentioning at the point that it’s not certain
whether will have a positive or negative impact.
Testing in the system will include the use of software metrics to ensure results are
expected in certain situations such as the coherency of specific function calls at point
of load shifting. CPU usage will be constantly observed and compared with different
methodologies and will be documented and collated in full throughout the whole
report.
The final product will be discreet during use and will not increase overhead
processing between operations when Agents are idle or during their transit between
nodes. It will be easy initiate and close with a basic visual monitoring system for the
user including concrete feedback for changes or problems. It should automatically
detect the amount of cores in use and be proficient over different architectures
although intel based chips will be the basis for development. It is not obvious at the
moment whether the use of hyper threading in conjunction may be possible, but it will
be documented when attempted.
Page | 86

The Target Audience for the Deliverable(s):
As the system will spread over multiple computers, it will be hindered by physical
restraints and associated speeds ramifications. Hence, as proof of concept, the
system will handle large computation problems which will not be I/O dependent. As
such, the system will be used to aid with large computations or those in need of
make shift data farms.
The Work to be Undertaken:
• Design a system which allows concurrent processing in a cluster computing
environment
• Dealing with interaction with other devices over network
o Adapting system to work on Mobile Devices
• Comparative analysis of communication methods (i.e. Ethernet, Wi-Fi etc.)
o Analysis of result output in correlation with message passing
parameters
• Comparative tests on different hardware architectures
Additional Information / Knowledge Required:
• Java Language
o Groovy library knowledge
• Concurrent and Parallel architecture knowledge
• Fundamental Android understanding (for mobile development)
• CPU usage metrics
Information Sources that Provide a Context for the Project:
Background and Rationale:
Computer hardware has evolved and so has the amount we attempt to implement at
any given point. From the initial single core processors to the Octocores of today,
engineers have strived to have the most powerful computers, greater speeds.
However, over time it’s become apparent that the implementation methods we have
been working from and towards are starting to level off. In the past, the first step in
augmenting any computer in terms of speed and performance has been reducing
transistors size and increasing speed henceforth. Co-founder of Intel Gordon E.
Moore stated that the number of transistors able to fit on a processor would double
every 18 months, fundamentally increasing the speed of computer for at least the
Page | 87

next decade. This model of thought is still used regularly in the computing industry
today, however it was initially stated in 1965 and since then, many things have
changed.
The problem we are met with today is distance, heat and conduction. The
physical size we are hitting on distance between cache memory and cores is become
reduced, more and more. We are starting to hit almost instantaneous transmissions
and this comes with another set of problems. Heat is generated when a CPU core is
pushed to compute at the rates we demand and can require more intricate ways to
cool the system, and this can all be down to bad allocation of resources.
We hence need to look at how we balance our work. Software needs to reflect the
modern multitasking environment that we have come to expect and hence, must
change in order to cope with increasing demand as hardware cannot be relied on to
be the sole supporter in this venture. I plan to build a system which allows a proper
allocation of resources available and increased the efficiency of hardware use in
order to achieve a faster, reliable system.
The Importance of the Project:
This project will be proof of concept for using multiple computers in a personal
environment to complete large computational problems with little impact on
performance on a whole in a discreet manor.
The Key Challenge(s) to be Overcome:
The initial challenge will be to ascertain whether an agent can become active on CPU
usage getting to a certain level on a terminal. On activation, the agent will report to a
central repository of addresses and move to a new terminal with lower CPU usage.
From here it should be able to display message on this machine. This will be done as
outlined below:
• Use Monte Carlo algorithm to processes large computation
o Create Agent to look at CPU usage
o CPU usage should report high
o Have Agent report to another resource
o Println “I am overloaded”
o Then build an event handler that has access to the channel which is
waiting for input from the processor
From here, we can then move onto moving key data. The intention is to create a
central repository of agents which then looks for a node which does not have an
agent active. From here we can move resources to the new processor.
The biggest challenge to overcome if the above system is complete in due course is
to be able to implement on a single CPU. Using cores would be the ultimate goal to
spread even use on one terminal but in choosing Java as the main platform, the JVM
involved gives little potential for working over cores. Using a different language could
be an answer but would require a large amount of research and development. For
the time being, what is detailed in the main deliverables is the main aim.
Page | 88

An investigation into Cluster CPU load balancing in the JVM

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to An investigation into Cluster CPU load balancing in the JVM

Similar to An investigation into Cluster CPU load balancing in the JVM (20)

Recently uploaded

Recently uploaded (20)

An investigation into Cluster CPU load balancing in the JVM