SlideShare a Scribd company logo
An Investigation into Cluster CPU load balancing
in the JVM
Calum James Beck
Submitted in partial fulfilment of
the requirements of Edinburgh Napier University
for the Degree of Bachelor of Engineering with
Honours in Software Engineering
School of Computing
April 2016
Page | 2
Abstract
The JVM CPU Cluster Balancer is a scalable, proof of concept system designed
to distribute processes over a network to perform multiple tasks at once, in a
language of high abstraction. Once distributed, workers return results to an
access server, all while monitoring their respective CPUs for computational stress
in terms of CPU usage. CPU’s incurring set stress then have their respective
processes moved to a less intensive area in the cluster, balancing work overall.
The system works by enrolling Universal Clients (CPU’s waiting for work) to an
access server, which then requests processes to be sent from the users desired
Process Server. Each Process comes in the form of a Process Definition
complying with the Agent interface, self-contained in an object. During run time,
the Process Definition object acts as a subtype of the process manager,
assuming responsibility for saving and restoring the state of the process.
Each Client has four Process Nodes which it can delegate work to. The selected
Process Node then connects to the received Process using two internal channels
and runs using an instance of a Process Manager. During runtime, the Client also
implements a Node Monitor which monitors the CPU usage of the Client in real
time. When a set percentage of stress is met (CPU usage), the Universal Client
informs the server that an alternative node is needed, on a different machine to
finish the instance of work.
The Process Definition then stops its runnable logic. The server searches through
enrolled Clients and sends the address of an underwhelmed CPU in the cluster to
the requesting Node. A dynamic TCP/IP channel is then created between the
node and the foreign Process Manager. The process object is then serialized
allowing transferal, in its paused state, and is resumed at the new client.
The system is developed using pre-set processes to ensure repeatability of
results and is based entirely in any system running the JVM.
This project results in a working system which can distribute work based on CPU
stress, but concludes that in order to be labelled complete, more functionality
needs to be added to find the system an adequate application.
Page | 3
The Java language, JCSP, the Groovy scripting language and the Sigar
Application Programming Interface (API), which provides pure C bindings to Java,
have been used in this project. All code, written and complied using Eclipse Mars
IDE.
Page | 4
Contents
1 INTRODUCTION................................................................................................12
1.1 Background..............................................................................................................................................13
1.2 Aims and Objectives................................................................................................................................14
1.3 Scope and Limitations ............................................................................................................................15
1.4 Structure of Dissertation.........................................................................................................................16
2 BACKGROUND, KEY COMPONENTS AND THEORY....................................17
2.1 Data and Task Parallelism......................................................................................................................17
2.2 Hoare’s Communicating Sequential Processes (CSP).........................................................................17
2.3 Channels...................................................................................................................................................18
2.4 Groovy......................................................................................................................................................19
2.5 Communicating Sequential Processes for Java (JCSP).......................................................................19
2.6 Channel Mobility in JCSP......................................................................................................................19
3 METHODOLOGY...............................................................................................21
3.1 Monitoring CPU Usage...........................................................................................................................21
3.2 Process Creation and Distribution ........................................................................................................24
3.3 Process Movement associated Methods.................................................................................................26
4 INITIAL EXPERIMENTS....................................................................................29
4.1 Monitoring CPU usage............................................................................................................................29
5 ARCHITECTURAL DESIGN..............................................................................34
5.1 Central Repository..................................................................................................................................34
5.2 Ring System with Travelling Agents.....................................................................................................36
5.3 Work & Node Manager System.............................................................................................................37
5.4 Network Structure Analysis...................................................................................................................38
6 INTRODUCING PROCESS MOVEMENT ........................................................39
6.1 Java Memory Model ..............................................................................................................................39
6.2 Moving processes within a JVM............................................................................................................40
Page | 5
6.3 Thread Serialization impossible with current JVM.............................................................................40
6.4 Adapting Process definitions as Agents.................................................................................................42
6.5 Sending process definitions in current state.........................................................................................43
7 PROTOTYPE.....................................................................................................44
7.1 Design.......................................................................................................................................................44
7.2 Components.............................................................................................................................................46
7.3 Experiment Setup....................................................................................................................................55
7.4 Results ......................................................................................................................................................56
7.5 Comparative Analysis.............................................................................................................................56
7.6 Local Concurrency Vs Distributed........................................................................................................58
8 CONCLUSION...................................................................................................59
8.1 Has the Project met its Aim and Objectives?.......................................................................................59
8.2 Deployment Analysis and Critique........................................................................................................60
8.3 Further Research and Work..................................................................................................................61
8.4 Reflective Statements..............................................................................................................................64
9 REFERENCES...................................................................................................67
A. Searched Terms........................................................................................................................................70
B. Meeting Diagrams ....................................................................................................................................72
........................................................................................................................................................................74
........................................................................................................................................................................75
........................................................................................................................................................................76
........................................................................................................................................................................77
........................................................................................................................................................................78
........................................................................................................................................................................79
........................................................................................................................................................................80
........................................................................................................................................................................81
........................................................................................................................................................................82
........................................................................................................................................................................83
C. Github analytics........................................................................................................................................84
Page | 6
........................................................................................................................................................................84
Initial Project Overview...............................................................................................................................86
Initial Project Overview...............................................................................................................................86
SOC10101 Honours Project (40 Credits) .............................................86
Page | 7
List of Figures
FIGURE 1. BASIC CONCEPT OF PROCESS MIGRATION...............................14
FIGURE 2. JAVA BEANS STRUCTURE.............................................................22
FIGURE 3. BASIC JNI INTERFACE PROCESS..................................................23
FIGURE 4. VISUAL REPRESENTATION OF VALUE GENERATOR.................24
FIGURE 5. SERVER-CLIENT PATTERN DIAGRAM..........................................26
FIGURE 6. VISUAL REPRESENTATION OF AGENT RUNNING IN PROCESS
MANAGER............................................................................................................28
FIGURE 7. MK I: HOST NODE SYSTEM DIAGRAM..........................................35
FIGURE 8. NODE RING NETWORK DIAGRAM.................................................36
FIGURE 9. WORK AND NODE MANAGER NETWORK DIAGRAM..................37
FIGURE 10. LOGICAL VIEW OF JAVA MEMORY RELATIOSN (JENKOV,
N.D.)......................................................................................................................39
FIGURE 11. JAVA MEMORY MODEL INTERACTION WITH CPU MEMORY
MODEL (JENKOV, N.D.).....................................................................................41
FIGURE 12. ORDER OF EVENTS FOR CONNECTING TO AGENT................42
FIGURE 13. METHOD AND CONTENTS OF PROCESS (THIS)........................43
FIGURE 14. FINAL PROTOTYPE, SERVER-CLIENT NETWORK.....................45
FIGURE 15. ANY2ONE CHANNEL CONCEPT...................................................48
FIGURE 16. INTERNAL CONNECTION MECHANISMS OF AGENT.................50
FIGURE 17. SERVER INTERACTION DIAGRAM FOR PROTOTYPE...............55
FIGURE 18. TABLE OF EXPERIMENT RESULTS.............................................56
FIGURE 19. TEST RESULTS GRAPH; CPU USAGE AND TIME SPENT.........57
Page | 8
FIGURE 20. NODE INTERACTION DIAGRAM...................................................62
Page | 9
List of Screenshots
SCREENSHOT 1. WINDOWS 10 TASK MANAGER AND RESOURCE
MANAGER............................................................................................................29
SCREENSHOT 2. CONSOLE LOG: BASE READING OF CPU USAGE ON
CLIENT 1...............................................................................................................31
SCREENSHOT 3. CONSOLE LOG: CIENT 2 AFFECTING CLIENT 1 CPU
READINGS............................................................................................................32
SCREENSHOT 4. CLIENT INITIALISING UI.......................................................51
SCREENSHOT 5. SERVER NOT STARTED OR CRASHED ERROR MESSAGE
...............................................................................................................................51
SCREENSHOT 6. CONSOLE LOG: NODE REGISTERED ON SERVER..........52
SCREENSHOT 7. BASIC USER UI......................................................................52
SCREENSHOT 8. CONSOLE LOG: NODE SHOWING READY.......................52
SCREENSHOT 9. CONSOLE LOG: NODE DOING WORK AND RELEASING
PROCESS NODE 1 WHEN FINISHED................................................................53
SCREENSHOT 10. CONSOLE LOG: WHEN PROCESS 4 STARTS, CPU IS
HIGH (62%), AGENT IS CONTACTED (I AM READING), THE PROCESS IS
DISCONNECTED, SENT (LETS GO) AND PROCESS NODE 4 IS RELEASED
...............................................................................................................................54
SCREENSHOT 11. CONSOLE LOG: SERVER DELETES ADDRESS..............54
Page | 10
Acknowledgements
Firstly, I would like to profusely thank Professor Jon Kerridge who has been an
invaluable source of confidence and knowledge throughout this whole project. He
has been a guide and kept me steadfast in what needed to be completed through
challenging times.
Secondly, I’d like to thank Doctor Kevin Chalmers who has always been
compassionate and a nurturing presence throughout my time in University, from
my first to fourth year.
I would also like to personally thank Charlotte Leask for her constant support and
eternal patience throughout the whole process.
Page | 11
1 Introduction
As the world tends towards the finite end of physical enhancements in computing, it
is the aim to continue increased speeds and finding new methods of surpassing
these limitations.
In the past, the first step in augmenting any computer in terms of speed and
performance has been reducing transistors size and increasing speed henceforth.
Co-founder of Intel Gordon E. Moore stated that the number of transistors able to fit
on a processor would double every 18 months, fundamentally increasing the speed
of computer for at least the next decade. This model of thought is still used regularly
in the computing industry today, however it was initially stated in 1965 and since
then, many things have changed.
The problem we are met with today is distance, heat and conduction. The physical
size we are hitting on distance between cache memory and cores is reducing, more
and more. We are starting to hit almost instantaneous transmissions and this comes
with another set of problems. Heat is generated when a CPU core is pushed to
compute at the rates we demand and can require more intricate ways to cool the
system, and this can all be down to bad allocation of resources.
We hence need to look at how we balance our work. Software needs to reflect the
modern multitasking environment that we have come to expect and hence, must
change in order to cope with increasing demand as hardware cannot be relied on to
be the sole supporter in this venture. I plan to build a system which allows a proper
allocation of resources available and increased the efficiency of hardware use in
order to achieve a faster, reliable system. 1
This project endeavours to meet these needs with a system which distributes
processes over a cluster of computers, regulating work based on CPU load. This is a
1
Taken from IPO
Page | 12
means of using Idle CPU’s without exceeding a threshold impeding on the users
everyday use.
The final product aims to be proof of concept that load balancing is possible in a high
level language, in a portable environment. Hence, it displays the means and
capabilities required to further develop a fully, automated system for everyday users
with access to multiple devices with Java compatibility.
1.1 Background
Most processing enhancing implementations fall under Cloud computing; outsourcing
processing to external data centres, platform services or application hosting, whilst
remotely managing computer resources (Winias & Brown, n.d.). However, not all
businesses have access to scalable hardware architectures, these architectures
being expensive to build, run, and upkeep.
Shifting foci to performance, creating efficient software diminishes the need for in-
depth management of system architectures and is a fundamental code of conduct
for emerging professional IT bodies (such as the British Computing Society).
However, different programming languages support different levels of control on a
system. Programming in languages of high abstraction do not fundamentally afford
the same level of efficiency low level languages can attain, and low level languages
are platform specific and do not pertain to portable methods.
So taking advantage of current user environments rather than reimplementation of
code or hardware is, logically, the most cost effective and least disruptive route. This
can be done by effectively managing processing loads; maximising processing
resource capabilities.
Utilising idle CPU resources on a network of computers (cluster) is fundamentally
guaranteed to speed up processing and work all around. In order to do so, these
resources must be directed to work together towards a common goal (i.e. Task
parallelism).
Many current system such as Incredibuild implement this parallel design for build
environments, working with low level code to facilitate high level build concepts.
Page | 13
(Xoreax Software Ltd., n.d.) With such high profile clients such as Microsoft, Google,
IBM and Disney using their product to maximise their system use, it’s obvious that
this task distribution method is proven to work.
However, for the average user or start up business, system specifics might still prove
elusive. So why not implement this distribution system in a portable, high level
language?
Java is a widely used platform, built to be compiled in memory, running in a Virtual
Machine aiming for multiplatform portability. According to Oracle, 97% of Enterprise
Desktops run Java alongside 3 Billion Mobile phones worldwide (Oracle, 2015).
Building a system in Java allows for the opportunity to port to multiple platforms with
relative ease, making the potential for networked devices joining the system
exponential.
It should be noted that in researching this area, there is very little on the subject of
load balancing in high level languages in a cluster environment within the last 6 – 10
years. Appendix A documents the search criteria used and the relevancy of results.
1.2 Aims and Objectives
The aim of this project is to distribute and regulate processes over multiple CPU’s in
a cluster setting using the Java programming language, with the Java Virtual
Machine (JVM) as the environment. This involves monitoring CPU usage in real time,
stopping processes which appear to overload set terminal and then moving them to
CPU’s experiencing less stress in the cluster.
Figure 1. Basic concept of process migration
Page | 14
The main objectives in order to create such a system, in practise, are outlined below:
1) Monitor CPU usage incurred by an instance of JVM.
2) Processes must have a way to be interrupted and saved in their current state.
3) Processes need to a have a way to move and reinitialise at different nodes, on
different CPU’s.
This report documents the steps taken to achieve these goals from inception to
completion. This project aims to provide a system which endeavours to successfully
manage load over several terminals in a cluster, using a language with a high level of
abstraction: Java.
1.3 Scope and Limitations
In order to provide a proof of concept system within the projects allotted time, certain
areas of the project had to be kept within reasonable limitations. In this case, a
limited amount of processes are programmed and sent automatically over the cluster
to ensure that overload can be attained at a percentage certainty of time. This means
the system does not afford user input yet and runs fairly autonomously.
In addition, to show the scalability of the system, it must be ensured that the
computer which will be distributing tasks runs at a proficient speed to facilitate access
from multiple user-end nodes with, preferably, one underperforming CPU.
As the system relies on communication, many options of transmission are available
but are kept only to TCP/IP network protocols. This form of communication was
chosen as it is a proven, reliable and a widely-used method which is supported by
virtually all OS and platforms that Java can be run on.
This project will also be using a Java scripting language called Groovy which
facilitates the use of ‘Communicating Sequential Processes for Java’ (JCSP). This
allows the manipulation of threads at a low level with high level abstraction resulting
Page | 15
in a parallelised system and can use TCP/IP protocols as its main mechanism for
communication between systems.
As the project is created to prove that Java can be utilised with a capacity to
distribute and balance a system over a cluster, all aspects of the system will be
implemented in Java, to the constraints of the JVM, whilst maintaining high level of
abstraction in the source code. Other programming languages will only be
considered when it is conceptually and physically impossible to implement the
requisites for completion with the author’s current knowledge and skills.
1.4 Structure of Dissertation
The structure of this document is as follows;
• Section 2 introduces the methodology, the theory and the practises behind the
message passing mechanics of the system which revolves around the JCSP.
• Section 3 discusses the methods implemented throughout the project as well
as the discussions made as a result of research, to reach the finished
prototype
• Section 4 will present the initial experiments conducted. This documents the
limitations and barriers which had to be overcome in order to develop a
functioning prototype.
• Section 5 describes the main incarnations of the system and how each
implementation lead to a better system
• Section 6 provides the mechanics behind moving processes and the
difficulties face in doing so
• Section 7 elaborates on and demonstrates the prototype system; reviewing
design and implementation as well as experimentation with the system.
• Section 8 details the results and evaluation of the system, and project. Section
8 concludes with a critical evaluation of the project covered by the paper
including short comings of the project and possible avenues of work on the
system which can be undertaken in the future.
Page | 16
2 Background, Key Components and Theory
Throughout this report, the majority of components described have been taught
through, and defined by “Using Concurrency and Parallelism Effectively” I & II
(Kerridge, 2014), which builds upon Hoare’s Communicating Sequential Processes
(CSP) theory. Unless explicitly referenced otherwise, these are the main sources of
information disclosed herein. In this section the basic elements from which the
prototype product is derived, are explained.
2.1 Data and Task Parallelism
One of the driving forces in this project is concurrency and parallelism. Task
parallelism allows the user to run multiple processes simultaneously on the one CPU
or over a network. Sequential code follows a specified order, so programmers don’t
tend to think about the order of events in a system once it has been coded and
compiled.
In order to process tasks moving around the intended system, processes will have to
be fairly autonomous and removed from the main body of code. This means that
concurrent and parallel code with have to stop and synchronise with each other on
transfer, interact in timely a manner so as not to disrupt running processes, finish in
an expected order despite being intrinsically non-deterministic in nature due to
running on different platforms, at different speeds all whilst the possibility of migration
plays an active role.
2.2 Hoare’s Communicating Sequential Processes (CSP)
Hoares CSP concepts (Hoare, 2004) dictate that everything encapsulated in code
can be broken down into algebraic functions. By doing so, everything within
programming can be reduced to simple, understandable functions, rules and
patterns.
By doing so, all code can be reduced to smaller chunks which can be moved around
to suit the success of the formula. What you see, is what you get. The following
mechanisms facilitate this concept, and is the basis of the end prototype.
Page | 17
2.2.1 Process
A Process is a piece of code that can be executed in parallel with other processes. A
network of processes form a solution to a single problem, with processes
communicating with each other using Channels (detailed in 2.3). Processes typically
contain repeating sequences of sequential code with communication interspersed.
Any process that is idle consumes no processor resources.
2.2.2 Timer
A Timer is a means of introducing time management into processes. Timers can be
read to find the current time and introduce delays or alarms for future events. They
can also be used in ALTs as guards for reading channels.
2.2.3 Alternatives (ALT)
Alternatives (ALT) allows selection of one ready guard from several possible guards.
Guards comprise of three different types: input communications, timers, or SKIPS
and dictate how a process should proceed. A guard is ready if input is ready, an
alarm time has passed, or SKIP is a defined guard. SKIPs are always ready and
allow guards to continuously run.
The ALT will wait until a guard is ready and then undertake the associated code. If
one guard is ready, then it undertakes the associated code. If more than one is
ready, it selects one according to predefined options and then obeys the code. These
options can include priority reading, if both are ready, or fair, turn based reading.
2.3 Channels
This is a main mechanic of the system described in this report, as the main aim is to
send processes over a cluster network. A Channel is a one-way, point-to-point,
unbuffered connection between two processes. Channels synchronise the processes
to pass data from one to another and do not use polling or loops to determine their
status, meaning no processing is consumed during transactions.
The first process attempts to communicate and goes idle when synchronising. The
second process attempting to communicate will then discover the situation,
Page | 18
undertake the data transfer and then both processes will continue in parallel, or
concurrently if they were both executed on the same processor. It does not matter
which process attempts communication first as the mechanism is symmetric.
When communication between processors takes place, the underlying system
creates a copy of the data object and begins transferal. As such, objects containing
process logic can be transferred, to be executed by a Process Manager, and run
asynchronously, which will form the basis of the project.
2.4 Groovy
The Groovy scripting language allows the programmer to write concurrent systems
with a high level of abstraction and is underpinned the four basic principles detailed
above.
2.5 Communicating Sequential Processes for Java (JCSP)
JCSP is based on Hoare’s basic algebraic functions, allowing virtual connection to be
created via NetChannelLocation structures sent between nodes. Using Java allows
the programmer the ability to send objects via serialisation methods; breaking down
the components into sequences of bytes to be transferred (Chalmers, Kerridge, &
Romdhani, A critique of JCSP Networking, 2008).
With this framework, objects containing code definitions can be sent along with a
control signal to recreate the object at the receiving end. Communicating Sequential
Processes for Java is the cornerstone of this project and allows us to build upon
Hoare’s concepts to create a simple to understand communication network.
2.6 Channel Mobility in JCSP
Channel Mobility refers to the dynamic capabilities that can be found when creating
self-propagating NetChannels and other communication models in this project.
Channels afford us a robustness of connection between the input and out end whilst
allowing sufficient models to a support the ubiquitous nature of the intended system.
(Chalmers, Investigating Communicating Sequential Processes For Java To Support
Ubiquitous Computing, 2008)
Page | 19
As the project does not endeavour to change these underlying mechanisms, a high
level of information is presented. However, it can be stated that Channel mobility is
paramount to attaining, transferring and moving processes successfully.
Page | 20
3 Methodology
The main aim of the system is to create a way to send processes from one node to
another running in the same computing cluster. This would be initiated by rising CPU
usage at each terminal. As such, there were three main problem areas which needed
to be addressed;
1. How to get CPU usage at any given time from within a JVM runtime.
2. How to create and deliver processes around a dynamic network.
3. How to stop set process when CPU usage reaches a predetermined amount
and send to an underused node in the network
3.1 Monitoring CPU Usage
At the time of writing, there were no pure Java API’s available to gather CPU
information. The investigation continued as fact finding into gathering as much
system data as possible within Java.
3.1.1 MBeans
MBeans are managed Java objects, similar to JavaBeans, which can represent a
device, an application or any resource that needs to be managed.
Page | 21
Figure 2. Java Beans Structure
This means we can monitor any of the resources being used by an instance of the
JVM. However, as an MBean can be any type of object and can expose attribute of
any different type, each client has to implement class definitions each time an MBean
is called and can lead to high overheads in themselves when repeatedly queried.
3.1.2 OperatingSystemMXBean
MXBean is native to Java (1.6 upwards) and allows the user to utilize an MBean with
a reduced set of types, meaning there is not a requirement for model-specific
classes. This makes the MBean accessible by any local or remote clients; essentially
conforming to an interface.
OperatingSystemMXBean allows the user access to an interface developed for
retrieving system properties about the operating system on which the JVM is running.
Page | 22
This includes, free memory of the computer, allocated memory for the JVM and CPU
time dedicated to a task. MXBeans were the only native mechanism provide by Java
Management Extensions (JME) which could facilitate the objectives mentioned.
3.1.3 Java Native Interface (JNI)
The Java Native interface is a native programming interface that is part of the Java
Software Development Kit. JNI allows Java code the use of fragments and libraries
written in other languages such as C and C++.
While Java breaks code down into Objects to be interpreted, C allows for the use of
procedural code which is compiled and breaks down into functions. The JNI connects
Java Class Methods with C functions, fundamentally allowing the programmer to call
C functions at any given time.
Figure 3. Basic JNI interface process
This allows the user access to lower levels of programming and can read values,
such as CPU usage, from assembly code. Although this approach seems the most
enticing, it can lead to the destabilisation of a JVM instance through subtle C errors.
Writing small scripts may not pose a huge problem, but garbage collection is not
Page | 23
handled by the JVM in these instances and a basic understanding of memory
allocation must also be known. Additionally, using the JNI results in a system which
is not wholly portable as the code written in C is platform specific.
3.2 Process Creation and Distribution
As one of the main prerequisite of this system, a network architecture had to be
designed to facilitate communication.2
This section will focus on how data and work is
spawned to test the proof of concept system.
It should be noted that although the aim of the project is a proof of concept, the ideal
system would spawn multiple instances of work which would accumulate to a large
amount of CPU usage in order to adequately balance the system.
3.2.1 Value Generator
Here, different volumes of data are generated by a data generator and sent to a
Node to be processed. The perceived complexity of the data should be proportional
to the increase in CPU usage created.
Figure 4. Visual Representation of Value Generator
2
. Development and iterative design is documented in Section 5.
Page | 24
This would require a fixed process at each node initialisation to manipulate the
randomly generated data sets being produces by the generators. All interactions are
handled by channel interactions, as shown above in figure 4.
3.2.2 Random Process selection
In this instance, each node would have access to pre-set process definitions which
would generate varying loads. At run time, a timer would be initiated requesting a
random process to run, one of which would create a large spike in CPU usage. This
would allow reproduction of an overloaded state with a high degree of certainty while
demonstrating. The structure would be similar to the above but would not require the
DataGenerator as the initial input would remain the same.
3.2.3 Server Hosted Process Definitions
This method would require processes being hosted remotely at a specific IP location,
and requested by the client when needed. The client would access a server which
would have the network locations of all the relevant process servers. The request
would then be forwarded, with the clients’ location, to the process location and sent
via a TCP/IP channel back to the client.
Page | 25
Figure 5. Server-Client Pattern Diagram
This would work by using objects containing serializable process definitions sent over
channels.
3.3 Process Movement associated Methods
As copying large amounts of data around a system would prove to be inefficient, not
withholding the large overheads in processing and memory allocation, the system
has to handle data manipulation locally, within one processor.
This means that processes in their entirety have to be sent between nodes, to
complete the full process, sharing as little data during computation as possible. The
aim is to send initial parameters and results.
The methods implemented for this aspect of the system heavily rely on the JCSP
API, the underpinning functions for Groovy. Hence, the definitions and descriptions
pertaining to JCSP methods below are based and paraphrased from the API
specifications hosted by the University of Kent at Canterbury (htt1). Implementing
Page | 26
process movement is covered with more detail in chapter 6, documenting limitations
and boundaries.
3.3.1 JCSP Process Manager
The ProcessManager class enables a CSProcess to be spawned concurrently with
the process doing the spawning. This means we can have multiple processes
running and allows the nodes in the system to deal with multiple processes being
sent on the same channel.
Dealing with processes as they arrive, allows the system to pertain to a client-server
pattern, making chances of deadlock in the system (pertaining to this area), very
slim.
3.3.2 Process Definition Serialisation in Objects
In order to take advantage of the Process Manager capabilities, process definitions
need to be designed to be CSProcesses. In order to do so, a process is defined in
its’ entirety and encapsulated in an object.
In doing so, we ensure the objects class implements two interfaces; CSProcess and
Serializable.
3.3.2.1 CSProcess
According to the JCSP documentation, “a CSP process is a component that
encapsulates data structures and algorithms for manipulating that data” (htt). This
basically means the data involved is private and cannot be accessed outside the
object itself.
Essentially, each instance of the process is alive, executing its own algorithms on its
own data and its’ actions are defined by a single run method. To avoid race-hazards,
the processes in this system do not require outside data or interaction with other
running threads. Only primitive data types will be sent to activate switches or request
new data. No procedures outside of defined data manipulation take place within the
Process Manager.
Page | 27
3.3.2.2 Serializable
A class with Serializable uses the java.io.Serializable interface and allows subtypes
of classes to be serialized for communication transfer. The interface itself doees not
have any methods but serves only to identify the semantics of being serializable.
It should be noted here that CS classes (not classes implementing this interface)
such as CSTimer do not conform to serializable semantics and will be covered later
in this document.
3.3.3 Agents
The Agent interface implements both CSProcess and Serializable but also adds
connect and disconnect methods. These are used to connect input and output
channels from the internal mechanisms of the sent process definition resources, to
an outside host.
Figure 6. Visual Representation of Agent running in Process Manager
The agent has two channels by which it connects to the host during runtime. This
means the data inside the agent CSProcess can be influenced from outside the
Process Manager. By exploiting the Agent interfaces, we can enable communication
from outside threads during run time giving agents access to two different code
structures.
Page | 28
4 Initial Experiments
In order to evaluate which methods would lead to a successful system, the
methodologies aforementioned were investigated and implemented in different
circumstances, testing for compatibility with the project.
4.1 Monitoring CPU usage
Monitoring CPU usage would take part in two stages; designing code which will
generate high usage and code which can interpret the CPU usage by percentage.
Results would be compared in conjunction with the Task Manager and Resource
Manager native to Windows 10.
Screenshot 1. Windows 10 Task Manager and Resource Manager
4.1.1 Creating Work
Creating work consisted of doing two different functions which would change
intermittently to test increases in CPU usage. Small work creates an int value,
comprising of a basic multiplication operation followed by a timer to create time
between set operations. CSTimers, as part of JCSP, work as a guard for the code,
acting as an ALT, meaning there is no processing wasted during execution.
For larger CPU usage, a more complicated problem has been run to generate more
work, creating a long variable, as seen below:
Page | 29
Long j = (Math.pow((Math.pow((60339*339398/2*33323),2348958)),
30000000000))*(Math.pow((Math.pow((454339*339765645398/26*354563323),2348456459
58)), 3000004564500000))
4.1.2 Monitoring Work
A basic system was implemented to create expected, repeatable workloads on the
CPU that could be measured to inspect whether monitoring usage was successful.
The system of operations is shown as a process diagram in figure 7 below.
Figure 7. Test Process Diagram
Page | 30
The process is simple; a timer is set for a predetermined time in which a process of
high CPU usage is implemented. CPU usage at this point would be verified by
addressing the task manager seen in screenshot 1.
4.1.3 Accessing CPU Usage
Measuring CPU usage is a difficult achievement whilst using Java. Firstly, in order for
this project to succeed, we need to distinguish the actual work being done on a
processor as opposed to the memory usage of the JSP. The latter is very easily
accomplished with native Java commands but as any Java program is essentially
interpreted by the system as a ‘process’, it cannot access the necessary tools in
order to gain CPU usage insight in likeness of the Task Manager(screenshot 1).
4.1.3.1 Native Monitoring
There are ways to obtain CPU usage which do not offer real time performance
monitoring but can be based on timed events. For multi-threaded tasks,
ThreadMXBean methods can give you the CPU usage and user time for any running
thread. However, using operatingSystemMXBeans, (explained in Chapter 2, figure 2)
only returns the CPU usage for all JVMs running (i.e. it cannot distinguish between
processes of different PID and returns the CPU usage of all JVMs). In figure x, we
can see the relationship between two JVMs working concurrently.
Screenshot 2. Console Log: Base Reading of CPU usage on Client 1
Client 1 (right) is using independent code to monitor itself whilst Client 2 (left) is
waiting for work. operatingSystemMXBeans returns the use of the CPU with 1 being
100% usage and 0 being 0%. At the moment, on monitoring, the system sits at 12%
usage.
Page | 31
Screenshot 3. Console Log: Cient 2 affecting Client 1 CPU readings
However, as new processes are started in Client 2, Client 1 continues to monitor high
CPU consumption, proportional to the work of Client 1, despite having no work itself.
opratingSystemMXBeans are further influenced by any other Java application
running. Hence a way to distinguish between JVMs running has to be identified.
It should be mentioned that as of Java 9, there is a new process API that allows the
user to get the current process ID. However, on the date of writing, this was still in
Beta testing and Java 8 was opted for use due to its comparative stability.
4.1.3.2 JNI interface
C affords the low level functionality to physical components needed to identify a
JVMs Process Identifier (PID). PIDs are numbers which uniquely identify a process
while it runs and is used in Linux, Unix, Mac OS X and Windows.
The problem however, is system calls are still defined differently on each OS.
Language libraries need to be recompiled for the specific target operating system, to
utilize the particular underlying components of the operating system (kernel).
Page | 32
As this research was beginning to deviate from the original project scope, delving
further into low level code, an API was imported to give multi-platform compatibility.
4.1.4 Sigar API
Sigar is a multiplatform API for Java and other languages. It allows the user to
monitor Per-process memory, CPU, credential info, state, arguments and other
relevant information (MacEachern, n.d.). By incorporating Sigar, the program can
produce percentage numbers based on the amount of CPU usage attributed to the
PID of a JVM.
4.1.5 Transferring Objects
By connecting two nodes by a TCP/IP connection, we can send an object very easily.
Implementing a serializable interface, an empty object is sent to another node at a
defined IP. This was to ensure objects were being sent and not references.
If a read was successful, a printed statement would display in the eclipse console
stating “Success!”.
4.1.6 Running Process Definitions
As process definitions can be contained within object, a simple system can be
created using two nodes and instances of a process manager.
Process definitions are sent using a timer testing one process running then two
concurrently and the task manager is consulted to ensure process are being run
correctly.
Page | 33
5 Architectural Design
Throughout the project, many different systems were designed to monitor processes
and set up an architecture of communication which could theoretically facilitate this.
The various designs are presented and critically evaluated below.
5.1 Central Repository
This design attempts to meet the aim of process movement. Each node has a
process node which creates and runs a process on the process Manager attached.
The results are then sent to a host node which keeps track.
Page | 34
Figure 7. MK I: Host Node System Diagram
Each node would monitor the CPU usage of the JVM. Once a certain level is met, the
process would then be packed and sent to another node.
5.1.1 Central Repository - Issues
The problem with this system is all channels must be created on initialisation,
meaning no room for scalability. It’s also essentially working on a ring topology and
more suited for a single system. This network is easily set up in a single JVM as well,
meaning only references are passed, rather than the actual objects.
Although good for initial tests (scaled back to two nodes and Host Node), the main
drawback to this design is the ring element itself. This design was expanded upon to
work with Agents below and the problems of ring networks is explored in more detail.
Page | 35
5.2 Ring System with Travelling Agents
The Agent System opens up the network, allowing communication over different
JVMs. The processes are no longer spawned within the node, but sent by a manager
as Process Definitions.
The Manager then runs the process whilst a monitor reviews CPU usage. When
needed, an Agent is created, with the relevant process.
Figure 8. Node Ring Network Diagram
5.2.1 Ring and Agents - Issues
It was during this iteration that the underlying principles of threads were explored in
more detail and found to be non-serializable, meaning the running process could not
be sent with the Agent in its current state. This meant this design would
fundamentally not work. The agent could get the process definition, but in its
unedited state, not during processing.
Not only that but, when dealing with task parallelism, ring systems are inherently
prone to deadlock. As processes are created at nodes, the communication between
ring elements proved to be non-deterministic due to the random uncertainty as to
Page | 36
which processes were being spawned where, and when they exceeded pre-set CPU
usage, and needed to be moved.
If too many events were triggered, all of the processes involved in the ring would be
attempting to output at the same time, resulting in deadlock. In non-uniform network,
where computer architectures are different (providing varying computational power),
this problem would become more prevalent.
To alleviate this, we could have nodes probe the ring first with empty packets, waiting
for them to return but this would result in half the network activity on the ring being
empty data packets; a detriment to efficiency.
5.3 Work & Node Manager System
The work and node manager took the ring element out of the design and introduced
server client properties.
The problem with this design is the servers are very closely related and can end a
closed system. The final prototype changed this.
Figure 9. Work and Node Manager Network Diagram
Page | 37
5.4 Network Structure Analysis
In order to minimise incidents of deadlock, the Client-Server pattern seemed most
logical to implement. A server orientated network permitted:
• Decreased chance of deadlock
• Process Discovery
o Nodes receive complete set of required process
o Allowing dynamic amendment of process definitions
• Process Control
o User not restricted to only one choice
o Timing of process delivery
• Centralised repository for client lists and results
• Scalability
o Users added by location (IP) rather than assigned place
Page | 38
6 Introducing Process Movement
Processes movement was easily implemented when it occurred on the same
physical machine in the cases of the first two prototypes. However, complication
increases when functionality is extended to a network.
Process Definitions are easily sent in a static state, but getting the state of a process
in execution requires finding all relevant data saved in the JVM.
6.1 Java Memory Model
In order for Java to be architecture neutral, it is built to operate and exist solely within
memory (RAM). Hence, to mimic a computers infrastructure, the JVM inherently
includes its own memory model.
The Java memory model divides memory between thread stacks and the heap. It can
be seen logically in figure 14.
Figure 10. Logical view of Java Memory Relatiosn (Jenkov, n.d.)
Page | 39
Each thread running in the JVM has its own stack which contains information about
which methods have been called, point of execution and the local variables for set
methods. The local variable consist of primitive types and are fully stored within a
thread stack. Hence, they cannot be seen by any other components of the JVM
during execution.
The heap contains all objects created in the Java application. The main point of
contention for moving processes in the JVM, is the fact that all manipulation occurs
within a thread stack. If the object containing the process definition being worked on
is moved (even if thread is suspended during processing), it will be moved in its
original, unedited state.
6.2 Moving processes within a JVM
As all classes exist within a single JVM during runtime, hence initial tests for moving
processes were misleading. Simply suspending a thread and calling set thread in
another class leads to a seemingly successful process manoeuvre.
This is achieved by suspending the process manager (essentially a concurrent
thread) and sending it through a channel. In this case, as the channel connects two
host processes within a single JVM, it’s only the thread reference which is
communicated meaning it has technically remained in the same place and is only
being restarted; just by another process.
6.3 Thread Serialization impossible with current JVM
Each method ran in a Java program has a stack frame associated with it. The stack
frame holds the state of a method with three sets of data: the methods local
variables, the methods execution environment and the methods operand stack.
It would stand to reason that by copying the values at suspension, copying a thread
could be achieved. However, the thread object would be allocated with none of the
native implementation. The JVM emulates a machine for each instance a Java
program is started, and a thread run on one of these machines becomes intricately
tied into the internal mechanisms of the machine. The context of operations is simply
lost.
Page | 40
Reading the locations of the threads on the physical machine would prove difficult as
well. Not only would this require a separate language to access the data, but memory
allocation would have to be monitored from the inside of the JVM as well as outside.
Hardware memory does not distinguish between the heap and threads; hence parts
of thread stack can be present in CPU caches as well as the CPU register.
Figure 11. Java Memory model interaction with CPU Memory Model (Jenkov, n.d.)
Also, Java relies on C procedures for some of its native methods. If the stack were to
be copied, it may contain native Java methods that, in turn, have called C
procedures. This indicates a complicated mixture of Java constructs and C pointers
would have to be recorded.
At this point, not only does it increase the amount of data to be transferred at once
over a network, but goes against the ethos of this investigation to find a solution with
high abstraction. This is also why reconstructing byte code, instructions used by the
JVM to resemble Assembler instructions, and the monitoring the JVM instruction set,
have not gone under further investigation. 3
3
Using the Java Class file disassembler proved to be a cumbersome method to determine the
sequence of events and was essentially the lowest level format possible with Java.
Page | 41
6.4 Adapting Process definitions as Agents
In order to move processes, we have to look at the object itself which is being edited.
As the supertype class, process manager, is not serializable, the subtype object must
assume responsibility for saving and restoring.
As the process definitions already contain a run function, the system must be
amended to stop the internal code from executing, and getting the edited values. This
means each process object must be created as a new instance so as to keep track of
its own local variables and have a method of communicating with the host process
whilst running concurrently.
Adapting the processes to conform to an Agent interface introduces two new
methods which will allow this: connect and disconnect (agent seen in figure 6). The
host is fitted with two new channels, generated at run time, which allow the agent to
connect when received. The basic order of events can be seen in Figure 16.
Figure 12. Order of Events for connecting to Agent
Page | 42
6.5 Sending process definitions in current state
In Java, objects can refer to themselves simply by calling “this”, meaning once the
internal code has been paused and variables saved, the object itself can be
packaged and written to a channel as a serializable object to be run by a new
process manager.
Figure 13. Method and Contents of Process (this)
This way, as long as the process definition contains all the run code required, the
state of the process is reflected in the object state. This meets the requirements for
process movement and is a main part of the prototypes design.
Page | 43
7 Prototype
7.1 Design
The final implementation extends the Server Client design by adding Process Nodes
to the universal client so multiple instances of a Process sent can be ran
concurrently, whilst connecting to their respective host.
It is based on the six paradigms for code mobility (Chalmers, Kerridge, & Romdhani,
2007):
• Client-server
o Client executes code on the server.
• Remote evaluation
o Remote node downloads code then executes it.
• Code on demand
o Clients download code as required.
• Process migration
o Processes move from one node to another.
• Mobile agents
o Programs move based on their own logic.
• Active networks
o packets reprogram the network infrastructure
Page | 44
In the case of this design, agents are begin manipulated as means of internal
communication as well as movement. The final design is seen in figure 18.
Figure 14. Final Prototype, Server-Client Network
The Universal node comprises of a node Monitor, which is periodically, checking the
CPU usage of the current JVM it is running in. In order to do so, a concurrent thread
is spawned on run time with the sole purpose of returning the current CPU usage.
Using Sigar, the CPU usage is checked every 10 milliseconds and if it above a
certain threshold, a new node request is sent to the Access Server.
The Node Monitor has four Process Nodes which are connected by two one2one
Channels. Each Process node runs a Process Manager for incoming processes to
connect with. At any given point, if either the Client or Server are waiting, they do so
idle, consuming no processing power.
Page | 45
Process Movement is handled mostly by nodes to avoid over reliance of the servers
involved. If a process has to be stopped, it is directly sent from the Process Manager
running it, and transferred straight to a new Client, rather than via the Access Server.
This essentially allows the system to move processes in the most direct manor
conceived.
The system conforms to a Client-Server pattern between the Universal Node and the
Access Server. They are connected at initialisation by an any2net (toAccess) and
numberedNet2One (processRecieve) Channel. This is also true for the relationship
between the Access Sever and Process Servers, however there is only one
connection for interaction as the Process Servers have nothing to return.
7.2 Components
Detailed below are all the component which connect the system together as well as
their role in the whole process.
7.2.1 Nodes
In the context of this system, Nodes are autonomous, concurrently running
processes. They control connectivity to the process locations, deal with work and
monitor CPU usage.
7.2.2 Node Monitor
The Node Monitor initialises the user system and creates a connection to the Access
Server, adding its IP and port location on connection and removing set location when
disconnecting. Currently, the server address is hard coded but any server with the
same infrastructure could be added and defined by the user.
It self-monitors its respective instance of a JVM for CPU usage and keeps track of
which process nodes are in use.
The Node Monitor requests processes to be run and delegates the work to the
available Process Nodes asynchronously.
Page | 46
It can also stop process Nodes from continuing work when CPU load is too high. It
then selects the last Node activated, requests another Universal Client location from
the Access Server and sends the location to the Process Node.
7.2.3 Process Nodes
Process Nodes receive process definitions and put them to work using a Process
Manager. Each Process Node provides channel ends on which the Process
definitions can connect to facilitate interaction between the received process
definition and the host.
This connection allows the Process Node to inform Processes to stop and move
when a new channel location is received as well as alert the Node Manager as to
when a process has finished.
7.2.3.1 Process Manager
The Process Manager (detailed in section 3.3.1) runs the processes received
concurrently.
7.2.4 Channels
Channels comprise of two channel ends:
• A channel input where data is read into the system component
• A channel output where data is written out of the system component
Channels in this system pertain to be one to one connection. The only exception is
the stop line from Process Node to Process Manager. This connection is an any2one
connection where the input can come from any node but the output is a specific
channel end.
Page | 47
Figure 15. Any2One channel concept
7.2.5 Net Channel
Net Channels work in the same way as regular Channels but the output is directed to
a designated port at a new IP address.
7.2.5.1 Automatic Net Channels
Generated during runtime, Automatic Net Channels create a Channel Input on-the-fly
and use input IP addresses as its location.
7.2.6 Servers
The Servers keep track of Clients available and allows the Client hosts to initialise
waiting for processes to run.
Page | 48
7.2.6.1 Access Server
The Access server has the IP location of the Process Servers, and connects users to
processes requested. The Process Servers IP addresses are stored and connected
whenever an instance of the associated process is requested by the user.
This server deals with user access requests (capabilities; in this system, an
interface), process requests, find other client requests and client dismissals.
7.2.6.2 Access Manager
The Access Manager registers new initialised Nodes onto the server
and keeps track of active clients. This is the basis on finding new
client locations when a client becomes overloaded.
7.2.6.3 Process Servers
The Process Severs provide Process Definitions an IP address and port at which
they are accessible by the Access Server. The Access Server must know which the
locations at initialisation in order to incorporate them into the Client capabilities.
However, the Process Definitions themselves can be amended and adapted during
runtime as the location is the only parameter needed in between requests.
7.2.7 Process Definitions
Process Definitions are objects with their own self-contained logic and variables
activated by a run method. They conform to the CSProcess and Serializable
interfaces.
7.2.7.1 Agent Definitions
Agents afford the same capabilities as other Process Definition but introduce connect
and disconnect methods. This allows Processes to travel with channels defined,
connecting on reception. It is up to the host process to establish the channel
connections.
Page | 49
7.2.7.2 Agent Channels
The Agent Channels allow the host process to connect to the internal logic being run
by the Process Manager. The channels are defined in the host process to then be
connected on reception of the Agent (before running the agent process definition),
during host run time by the connect method.
The input and output to the Agent, and the input and output from the host are then
connected together as seen in figure 20.
Figure 16. Internal Connection Mechanisms of Agent
7.2.8 Request Identification
Request objects allow the Access Server to react in the manor required to process
the data received in the correct way. The simplest is ClientRequestData which
dictates that the string sent within the object corresponds with the service needed
(i.e. “Process Spawn” requires service B) and the address of the requesting client.
Other requests comprised of simple IP addresses required to be interpreted in
different ways. Address locations were packed into said objects to differentiate
between the contexts they were to be treated. These include:
• ClientLocation
Page | 50
o Registers the Client and send capabilities
• LeaveRequest
o Removes Client details from Access Server
• NodeRequest
o Request another Client be found with different IP to send processes to
• NewRequest
o Same as Node request used exclusively for Process Manager and
includes the Nodes ID
7.2.9 Implementation
The system runs in the following manner.
Migration
• Process Servers are initialised and followed by the Access Server at set
IPs
• The Universal Client then instantiates itself with a base IP address and
randomly generated port. It starts four process manager connected.
Screenshot 4. Client Initialising UI
• If the port matched another, an error message is show to try again (range 1 –
10,000)
Screenshot 5. Server Not Started or Crashed error message
Page | 51
• Client connects to Server and Server enrols client into list
Screenshot 6. Console log: Node registered on server
• Server then send backs the Client capabilities
Screenshot 7. Basic user UI
• The Client can then choose different processes to call
o It shows ready in the Console; as the system does not need to show
the general public its workings, the console in Eclipse is used to
monitor transactions
Screenshot 8. Console Log: Node showing ready
Page | 52
• The service needed and IP of the Client are then sent to the Access
Server, which then relays these values to the process server needed.
• The Process sever then send the process to the requesting node directly
• The Universal Client node then assigns the work to one of its free Process
Nodes and marks that node as unavailable
Screenshot 9. Console log: Node doing work and releasing Process Node 1 when finished
• When the first process is received, the node Manager then spawns a new
thread to monitor the CPU usage.
• The process (agent) is then connected to the Process Node and the
process is ran.
• Once finished, the Process Node is released to work again
• At any given point, a new node can become active
Stopping and Moving
• Once a node consumes too much CPU usage, the Manager notifies the
server that it needs a new node.
• Another node is chosen and the address returned to the requesting node
• The manager then selects an active node (last process manager started)
and send a message with the new address to the Process Node.
• The Process Node then interprets that type of object and stops the Agent
whilst simultaneously letting the Node Monitor know it can release that
Process Node
Page | 53
Screenshot 10. Console Log: When Process 4 starts, CPU is high (62%), agent is contacted (I
am reading), the Process is disconnected, sent (LETS GO) and Process Node 4 is released
• The Agent then packs itself and sends itself to the next node where it
continues
• When the node is closed, the server is alerted and removes it from its
active clients
Screenshot 11. Console Log: Server deletes address
To clarify, a server interaction diagram has been created to reflect to order of events
in fig x.
Page | 54
Figure 17. Server Interaction Diagram for Prototype
7.3 Experiment Setup
In order to test the validity of the system, the work described in 4.1.2 was completed
20,000 times for a total of 20 runs and timed for each 20 durations of work. The CPU
usage was also recorded using MXBeans (for accuracy) and averaged. The
experiment was conducted on computers with the specs below.
Page | 55
Hardware
• CPU - i7 4770 @3 .4GHz
• Ram - 16Gb DDR3
• GPU - NVIDIA NVS 510 (2047 MB)
• OS - Windows 7 Professional 64-bit
• Network Speed – 1GB/s
7.4 Results
These experiments were conducted 12 times and the results averaged, ignoring the
two polar, outlying values:
1. A single computer running the processes sequentially with processes hosted
locally.
2. A single computer running the processes concurrently over 4 process nodes
with processes hosted on process servers.
3. Two computers running the load balancing system with process from process
servers.
The results are detailed below (fig 21)
Workers Average time taken CPU Usage
1 CPU: 1 Sequential Worker 24.36 Seconds 12%
1 CPU: 4 Concurrent Workers 10.18 Seconds 87%
2 CPU: 8 Concurrent Workers 8.78 Seconds 46%
Figure 18. Table of Experiment Results
7.5 Comparative Analysis
By visualising the data collected, we can correlation between the amount of CPU’s,
time and work.
Page | 56
Figure 19. Test results Graph; CPU Usage and Time Spent
Speed:
• Increasing workers increases the speed of the work
o This is not proportional to the amount added, but a vast improvement
o Never Expected directly proportional speed up due to communication
overheads
• Adding additional CPU caused minor increase compared to increasing Native
Resources
o Due to synchronisation and distribution times, limited by connection
protocols (Network speed very fast)
o Speed Up still apparent
CPU Usage
• CPU usage for single process very low
Page | 57
o To be expected as the CPU is literally doing the least amount at a time
during execution of test
• CPU usage increases 7 times over for 4 workers
o Although the more CPU usage was expected to be consumed, it was
not expected to grow this much.
• Added CPU for balancing reduces CPU usage to almost half
o Considering how much difference there is compared to sequential and
concurrent methods, almost halving the stress is a great result
7.6 Local Concurrency Vs Distributed
The results trend toward better results in terms of time and processing consumption.
It however does not grow exponentially when more CPUs are added. It was assumed
when going into the experiments that there would be a boundary for performance
based solely on communication times.
Judging from the sharp change in CPU usage however, we can conclude that the
system does balance the load whilst increasing efficiency in processing. This is
logical with more workers, doing more things.
With small amounts of work however, sequential processing will yield better results
due to the nature of saving small values and little processing needed, compared to
moving data around a network. However, small amounts of work would not be what
the system was designed for.
Page | 58
8 Conclusion
8.1 Has the Project met its Aim and Objectives?
The aim of this project was to create a system which can distribute work and regulate
set work over multiple computers, ensuring CPU usage does not exceed a specified
threshold on each terminal.
As the tests in 7.5 show, the functionality to facilitate regulation does exist in the
current prototype. The main objectives stated in 1.2 are recapped and addressed
below:
1) A method of monitoring CPU usage by implemented in the JVM over multiple
CPU’s must be implemented.
The Sigar API (and java Beans to a certain extent) afford this functionality. By
spawning a thread in the Universal Clients Node Monitor, the monitoring
function remains functional throughout execution. It is not affected by other
events and allows constant vigilance.
Although this project did set out to complete everything at a high level, there
was one barrier which could not be dealt with otherwise. It can be argued
through that most of the Java native methods run using the JNI is parse C
language, so it still conforms as implementation within the JVM.
2) Processes must have a way to be interrupted and saved in their current state.
With the system sending process definitions, the position running position of a
process using a Process Manager is reflected in the state of the object. By
delegating saving responsibility to the subtype in process management, we
can essentially pick up the work from a previous running instance.
Page | 59
As explored in chapter 6, it is impossible to serialize and send threads using
high level techniques, this method yields a large amount of efficiency,
providing variables are saved in a tolerable fashion.
3) Processes need to a have a way to move and reinitialise at different nodes on
different CPU’s.
Using the Serializable interface, Channels, Process Managers and objects
containing process definitions, this aspect of the system has been successfully
implemented and has been rigorously studied.
It can be concluded that objectives and aims have been accomplished. The system
outlined at inception has been completed, as a proof-of-concept, functional system
as long as the user controls the processes introduced. However, during development
and implementation, more aspects have been identified which need to be addressed
in order to label this project finished.
8.2 Deployment Analysis and Critique
8.2.1 CPU Monitoring Critique
With user supervision, the system can be seen to send, receive, run, stop and move
processes. The CPU monitoring gives adequate coverage and timely response to
spikes in CPU usage. Ideally, MXBeans should be used if implementation can be
guaranteed in an environment with no other instances on the JVM running, as the
results tend to be more accurate.
Using the JNI and C results in CPU polling roughly once for every thousand
instructions and gives insight to that instant of process CPU usage information.
Information available via Sigar (CPU usage time) does not update all the time, and being
instant, can sometime return 0 making viable readings even more infrequent. However,
the frequency and scope of accuracy are still adequate for this system to function.
8.2.2 Process Movement Critique
Within the time constraints, the project was built to prove that active process
migration could be achieved, and the mechanics, and theory behind the actual
Page | 60
process movement are sound. However, user end process management requires
more work.
The problem pertains to the amount of process nodes at each Universal Client. As
each process definition needs a manager to connect to, a process manager does not
suffice for the intended process interaction. So, if more than 4 processes are sent,
the Manager Node does not have the option to deal with the excess process read.
At this point the Client Server environment breaks down, as the Client is no longer
waiting for input, and a deadlock can occur if a process node is in a busy state at the
point of reception. Having redundant nodes on the system which receive overloading
processes in this case, could relive nodes, or simply having more process nodes
instantiated at run time.
Adapting this aspect of the system really depends on whether the user has the
intention to regulate large amounts of work in a cluster, or wants to use the program
in the background of home systems to automate smaller projects. The scalability
options of the system in these aspects is a great resource.
8.3 Further Research and Work
Aside from user testing, small patches and implementing a targeted application (such
as distributed raytracing), identified improvements in functionality have been listed
below.
8.3.1 Process Interaction
Once processes are distributed currently, the process sent must be a standalone
procedure. In this case, the main sever would have to be more involved, keeping
note of which processes have been distributed where. The list of current clients could
be expanded to be a list of lists, containing the Node address as well as the current
processes. If we consider one process at each node for simplicity, cross process
interaction could be implemented by doing the following:
Page | 61
Figure 20. Node interaction diagram
1) A Client would request additional data relevant to the process being ran from
the server.
2) The Server, knowing which processes are running in the overall system, would
find a node running the needed data and halt its procedure.
3) The required node would confirm it is ready to set up connection with the other
node. The requesting node must initiate the setup, to a node which is currently
paused, due to the nature of channels having to have an input end set up as a
pretence for communication.
4) The new node address would be sent to the initial client where relevant Net
Channels would be automatically created, similar to moving when moving
processes, for transfer and control mechanics.
5) The nodes when then act like a client and server. The Server node would then
send an initiation signal, causing the client node to run, and being transfer.
Page | 62
With the current infrastructure of the implemented prototype, with some configuration,
this new system could be successfully implemented. The framework of this design is
not hard to implement in theory, but the semantics and order of communication would
have to be thoroughly deliberated upon.
8.3.2 Process Node Quantities
This is simply allowing the user to define how many Process Nodes they would like to
initialise. In order to keep processing limits within a reasonable window, the users
processing capabilities would have to be assed, limiting the amount of concurrent
processes.
This would also require either the user or developer to have prior knowledge of
estimated processing power that each individual process can consumes, otherwise
the system could spend a lot of time moving processes.
8.3.3 User Defined Processes
Implementing user processes would have to have two specific points of contention:
1) Methods would have to be adapted to conform to Agent classes
2) Code must be runnable.
This means code would have to be scanned or tested during run time to ensure all
aspects are serializable. This could be done by creating a Test Node which,
comprising of a try, catch system which returns exceptions when met.
Having runnable code is the main function of the CSProcess class, so methods
would have to be identified at input. This could include an interface which asks for
variables and the associated process separately.
Another method, which involve having some knowledge of the system, would be
implementing a wrapper classes which could affix the required connect methods for
Agents if the user under stands CSProcesses.
Page | 63
8.3.4 Extended Network to Internet
This method is easily implementable, but does not conform to the Aims of this report.
By simply changing the node and server IP Addresses to public IPs rather than local,
the systems scale can be opened up to user in any location.
The problem then lies with security. There are currently no security measures in
place during communication. Although the mechanisms of the system are not
common place in Java, objects are still a universally used data type.
8.3.5 Automated Process Delivery
As the system stands, the universal clients are in tasked with acquiring processes.
This was implemented to regulate the speed of requests and allow easier debugging.
Automated processes can be implemented by keep track of how many processes are
running at each node from the Server.
If the Server records a Client node with free process Nodes, it can continue to send
more processes to the under loaded area. Polling for CPU usage at completion of
tasks to indicate whether more processes are needed would result in a well-balanced
system overall, but would result in higher volumes of traffic.
As previously stated, these options should be aligned to the chosen application of the
system and could be user controls put in place at initialisation.
8.4 Reflective Statements
During this project there have been multiple setbacks, those which could be avoided
and those which were unforeseeable. With most large projects, developers will never
be truly happy in what they have accomplished. Despite meeting the initial aims of
this investigation and being relatively pleased with the finished product, there are still
areas which could have been addressed sooner and shortcomings which will not be
repeated in the future.
Page | 64
1) Progress trail
The first objective during this project was establishing a method of progression
monitoring. In turn a blog was created to document progress. However, the first
incarnation was hacked after two weeks. 1
This was a major setback in the project and resulted in decline of adequate tracking.
In the future, security implementation for a public web space will adhered to.
More importantly, a structured, documented development diary will be a higher
priority in the future. Keeping track of developments and meetings would have led to
much more streamlined approach and a better implementation overall. This also
pertains to the week 7 report which took place in the form of a viva voce in the Napier
Games Lab in the first week of December
2) Inadequate background understanding
Going into this project, I believed I had the sufficient understanding of the
fundamental concepts and technologies involved to create this system. Searching for
previous attempts at the problem proved fruitless (see appendix A) indicating there
was not a lot of reading on the subject. The IPO, although has the same conceptual
ethos, talks about accessing system hardware from a High level language and was
naïve in considering some of the goals for the time permitted and the level of work
expected.
However, we never know the depth of our own ignorance as this proved true when
half way through the project, I realised that a thread, the main method of running
work, was not serializable.
With future development, I will ensure that I read not only papers on implementation,
but technical documentation on processes and data types to ensure I grasp the
conceptual limitations as well as technical limitations.
In summation, I have learned that preparation and the process, are just as important
as the actual development.
Page | 65
3) Time Management
For some of the project, personal circumstances dictated lack of work, but time
management could have been much better from the start. Work days were
established as Wednesdays but this was not particularly adhered to at the start of the
project. A gannt chart was drafted but after personal circumstances interfered, it was
not reviewed until after half the allotted time had transpired.
More tests into the efficiency of the system can still be ran and should be considered
part of further work.
Page | 66
9 References
1. Austin, P., & Welch, P. (2008). CSP for JavaTM (JCSP) 1.1-rc4 API
Specification. Retrieved from CSP for Java:
https://www.cs.kent.ac.uk/projects/ofa/jcsp/jcsp-1.1-rc4/jcsp-doc/
2. Austin, P., & Welch, P. (2008). Interface CSProcess. Retrieved from CSP for
Java: https://www.cs.kent.ac.uk/projects/ofa/jcsp/jcsp-1.1-rc4/jcsp-
doc/org/jcsp/lang/CSProcess.html
3. Chalmers, K. (2008). Investigating Communicating Sequential Processes For
Java To Support Ubiquitous Computing. Edinburgh Napier University.
Retrieved April 22, 2016, from
https://www.researchgate.net/publication/239568086_INVESTIGATING_COM
MUNICATING_SEQUENTIAL_PROCESSES_FOR_JAVA_TO_SUPPORT_U
BIQUITOUS_COMPUTING
4. Chalmers, K., Kerridge, J. M., & Romdhani, I. (2007, July 8-11). Mobility in
JCSP: New Mobile Channel and Mobile Process Models. Retrieved 04 24,
2016, from ResearchGate: https://www.researchgate.net
5. Chalmers, K., Kerridge, J. M., & Romdhani, I. (2008). A critique of JCSP
Networking. The thirty-first Communicating Process Architectures Conference,
(pp. 7-10). York: P.H. Welch et al. doi:DOI: 10.3233/978-1-58603-907-3-27
6. Doallo, R., Expósito, R. R., Ramos, S., Taboada, G. L., & Touriño, J. (2013,
May 1). Java in the High Performance Computing arena: Research, practice
and experience. Science of Computer Programming, 78(5), 425-444.
Retrieved April 22, 2016, from
http://www.sciencedirect.com/science/article/pii/S0167642311001420
7. Doallo, R., Taboada, G. L., & Juan, T. (2009, April). F-MPJ: scalable Java
message-passing communications on parallel systems. The Journal of
Supercomputing, 60(1), 117-140. Retrieved April 22, 2016, from
http://link.springer.com/article/10.1007/s11227-009-0270-0
8. Funika, W., Godowski, P., & Pęgiel, P. (2008). A Semantic-Oriented Platform
for Performance Monitoring of Distributed Java Applications. Computational
Page | 67
Science – ICCS 2008, 5103 , 233-242. Retrieved April 22, 2016, from
http://link.springer.com/chapter/10.1007/978-3-540-69389-5_27#page-1
9. Hoare, C. A. (2004). Communicating Sequentual Processes. C.A.R. Hoare,
Prentice Hall International. Retrieved April 22, 2016, from
http://www.usingcsp.com/cspbook.pdf
10.Islam, N., & Shoaib, S. (2002, June 24). US Patent No. US 7454458 B2.
Retrieved April 22, 2016, from https://www.google.com/patents/US7454458
11.Jenkov, J. (n.d.). Retrieved from http://tutorials.jenkov.com/java-
concurrency/java-memory-model.html
12.Kerridge, J. (2014). Using Concurrency and Parallelism Effectively - 2nd
edition. BookBoon.
13.Lam, K. T., Luo, Y., & Wang, C.-L. (2010). Adaptive sampling-based profiling
techniques for optimizing the distributed JVM runtime. Parallel & Distributed
Processing (IPDPS), 2010 IEEE International Symposium on (pp. 1-11).
Atlanta: IEEE. doi:10.1109/IPDPS.2010.5470461
14.Lemos, J., Simão, J., & Veiga, L. (2011). A 2 -VM : A Cooperative Java VM
with Support for Resource-Awareness and Cluster-Wide Thread Scheduling.
On the Move to Meaningful Internet Systems: OTM 2011, 7044, 302-320.
Retrieved April 22, 2016, from
http://link.springer.com/chapter/10.1007%2F978-3-642-25109-2_20
15. MacEachern, D. (n.d.). (C. Technologies, Producer, & Hyperic) Retrieved from
https://support.hyperic.com/display/SIGAR/Home
16. Meddeber, M., & Yagoubi, B. (2010, September 22). Distributed Load
Balancing Model for Grid Computing. ARIMA Journal, 12. Retrieved April 22,
2016, from http://arima.inria.fr/012/pdf/Vol.12.pp.43-60.pdf
17. Olivier, S. (2008). Scalable Dynamic Load Balancing Using UPC. 2008 37th
International Conference on Parallel Processing. Portland: IEEE. Retrieved
April 22, 2016
18. Oracle. (2015 , 02 14). Learn About Java Technology. Retrieved from Java:
http://java.com/en/about/
19. Oracle. (2016). Interface OperatingSystemMXBean. Retrieved from Java™
Platform, Standard Edition 7:
https://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSy
stemMXBean.html
Page | 68
20. Shaw, B. (n.d.). Retrieved from
http://www.codeproject.com/Articles/30422/How-the-Java-Virtual-Machine-
JVM-Works
21. Winias, T. B., & Brown, J. S. (n.d.). Retrieved from
http://www.johnseelybrown.com/cloudcomputingpapers.pdf
22. Xoreax Software Ltd. (2016). Incredibuild. Retrieved from Incredibuild Beyond
Acceleration: https://www.incredibuild.com/
Page | 69
Appendix
A. Searched Terms
All results from 2005 were considered for inclusion.
Some results were duplicated for searches resulting in “0 Relevant” for later searches
Checked as of 22/04/2016
• “Load Balancing in Java” :
o “Distributed Load Balancing Model for Grid Computing” (Meddeber &
Yagoubi, 2010) – Focusses on modellling toppologies of Balancing with
basic information on system implementation
o “Scalable Dynamic Load Balancing Using UPC” (Olivier, 2008) – Uses
Unified Parallel C
o “Method and system for application load balancing” (US Patent No. US
7454458 B2, 2002) – Patent for similar system with no implementation.
Only conceptual with ambiguity in implementation.
• “CPU load balancing in Java” :
o “A Semantic-Oriented Platform for Performance Monitoring of
Distributed Java Applications” (Funika, Godowski, & Pęgiel, 2008) –
Platform for monitoring resources for online Java technologies
• “Java cluster computing”
o “Java in the High Performance Computing arena: Research, practice
and experience” (Doallo, Expósito, Ramos, Taboada, & Touriño, 2013)
– Looks into the methods facilitating the possibilities of High
Performance code using Java (Shared memory model, MPI etc...)
o “F-MPJ: scalable Java message-passing communications on parallel
systems” (Doallo, Taboada, & Juan, F-MPJ: scalable Java message-
passing communications on parallel systems, 2009) – Different MPI
implementation Document
• “Load balancing cluster computing Java” : 0 Relevant
• “CPU balancing cluster Java” : 0 Relevant
Page | 70
• “Load balancing cluster JVM” :
o “A 2
-VM : A Cooperative Java VM with Support for Resource-
Awareness and Cluster-Wide Thread Scheduling” (Lemos, Simão, &
Veiga, 2011) – Cluster infrastructure for Cloud computing systems
o “Adaptive sampling-based profiling techniques for optimizing the
distributed JVM runtime” (Lam, Luo, & Wang, 2010) – Builds a system
based on global variable for cluster, paying closed attention to thread
stacks
• “Load balancing cluster JCSP” : 0 Relevant
• “Load balancing asynchronous cluster Java” : 0 Relevant
• “CPU monitoring load balance cluster Java” : 0 Relevant
• “Cluster process sending Java” : 0 Relevant
Page | 71
B. Meeting Diagrams
Page | 72
Appendix Item 1. Basic concepts
Page | 73
Appendix Item 2. Agent structure
Page | 74
Appendix Item 3. Ring implementation Conversation
Page | 75
Appendix Item 4. Ring Evolution
Page | 76
Appendix Item 5. Extended Ring Elements
Page | 77
Appendix Item 6. Implementing Agent Channels
Page | 78
Appendix Item 7. Losing the Ring
Page | 79
Appendix Item 8. Closed Client Server
Page | 80
Appendix Item 9. Client Server with Managers
Page | 81
Appendix Item 10. Interacting Processes
Page | 82
Further comments and discussion can be found at
http://honsproject.calumbeck.com/
Page | 83
C. Github analytics
Appendix Item 11. Work distribution by day
Appendix Item 12. Git Activity Concentrations
Page | 84
Appendix Item 13. Busy commit periods
Page | 85
Initial Project Overview
Initial Project Overview
SOC10101 Honours Project (40 Credits)
Title of Project: CPU Load Balancer
Overview of Project Content and Milestones
The Main Deliverable(s):
I intend to create a system which monitors CPU core usage over a cluster of
computers and calls another terminal to take on more load when one is starting to
reach maximum capacity; increasing speed and efficiency overall.
The system will implement the use of Agents which will move around the system,
arriving at each node (processor or core in this case) and connect to their main
processing stack to ascertain the current efficiency. Once finished, the Agent
disconnects and then moves itself on to the next core in the system. Using multiple
agents will be a goal for the project and attaining basic concurrency will be the first
milestone event.
As such, the system will be designed and implemented using the GROOVY 2.3
libraries for Java. This allows the user to easily manipulate threads at a high level
through the predominant use of message passing. It is not certain whether a hybrid
of message passing and shared memory will be possible to attain as it is noted that
pure message passing has a large overhead for copying messages from one process
to another. This is not a problem at a high level of programming, but at CPU or even
GPU instructions speeds, it is worth mentioning at the point that it’s not certain
whether will have a positive or negative impact.
Testing in the system will include the use of software metrics to ensure results are
expected in certain situations such as the coherency of specific function calls at point
of load shifting. CPU usage will be constantly observed and compared with different
methodologies and will be documented and collated in full throughout the whole
report.
The final product will be discreet during use and will not increase overhead
processing between operations when Agents are idle or during their transit between
nodes. It will be easy initiate and close with a basic visual monitoring system for the
user including concrete feedback for changes or problems. It should automatically
detect the amount of cores in use and be proficient over different architectures
although intel based chips will be the basis for development. It is not obvious at the
moment whether the use of hyper threading in conjunction may be possible, but it will
be documented when attempted.
Page | 86
The Target Audience for the Deliverable(s):
As the system will spread over multiple computers, it will be hindered by physical
restraints and associated speeds ramifications. Hence, as proof of concept, the
system will handle large computation problems which will not be I/O dependent. As
such, the system will be used to aid with large computations or those in need of
make shift data farms.
The Work to be Undertaken:
• Design a system which allows concurrent processing in a cluster computing
environment
• Dealing with interaction with other devices over network
o Adapting system to work on Mobile Devices
• Comparative analysis of communication methods (i.e. Ethernet, Wi-Fi etc.)
o Analysis of result output in correlation with message passing
parameters
• Comparative tests on different hardware architectures
Additional Information / Knowledge Required:
• Java Language
o Groovy library knowledge
• Concurrent and Parallel architecture knowledge
• Fundamental Android understanding (for mobile development)
• CPU usage metrics
Information Sources that Provide a Context for the Project:
Background and Rationale:
Computer hardware has evolved and so has the amount we attempt to implement at
any given point. From the initial single core processors to the Octocores of today,
engineers have strived to have the most powerful computers, greater speeds.
However, over time it’s become apparent that the implementation methods we have
been working from and towards are starting to level off. In the past, the first step in
augmenting any computer in terms of speed and performance has been reducing
transistors size and increasing speed henceforth. Co-founder of Intel Gordon E.
Moore stated that the number of transistors able to fit on a processor would double
every 18 months, fundamentally increasing the speed of computer for at least the
Page | 87
next decade. This model of thought is still used regularly in the computing industry
today, however it was initially stated in 1965 and since then, many things have
changed.
The problem we are met with today is distance, heat and conduction. The
physical size we are hitting on distance between cache memory and cores is become
reduced, more and more. We are starting to hit almost instantaneous transmissions
and this comes with another set of problems. Heat is generated when a CPU core is
pushed to compute at the rates we demand and can require more intricate ways to
cool the system, and this can all be down to bad allocation of resources.
We hence need to look at how we balance our work. Software needs to reflect the
modern multitasking environment that we have come to expect and hence, must
change in order to cope with increasing demand as hardware cannot be relied on to
be the sole supporter in this venture. I plan to build a system which allows a proper
allocation of resources available and increased the efficiency of hardware use in
order to achieve a faster, reliable system.
The Importance of the Project:
This project will be proof of concept for using multiple computers in a personal
environment to complete large computational problems with little impact on
performance on a whole in a discreet manor.
The Key Challenge(s) to be Overcome:
The initial challenge will be to ascertain whether an agent can become active on CPU
usage getting to a certain level on a terminal. On activation, the agent will report to a
central repository of addresses and move to a new terminal with lower CPU usage.
From here it should be able to display message on this machine. This will be done as
outlined below:
• Use Monte Carlo algorithm to processes large computation
o Create Agent to look at CPU usage
o CPU usage should report high
o Have Agent report to another resource
o Println “I am overloaded”
o Then build an event handler that has access to the channel which is
waiting for input from the processor
From here, we can then move onto moving key data. The intention is to create a
central repository of agents which then looks for a node which does not have an
agent active. From here we can move resources to the new processor.
The biggest challenge to overcome if the above system is complete in due course is
to be able to implement on a single CPU. Using cores would be the ultimate goal to
spread even use on one terminal but in choosing Java as the main platform, the JVM
involved gives little potential for working over cores. Using a different language could
be an answer but would require a large amount of research and development. For
the time being, what is detailed in the main deliverables is the main aim.
Page | 88

More Related Content

What's hot

Solve it Differently with Reactive Programming
Solve it Differently with Reactive ProgrammingSolve it Differently with Reactive Programming
Solve it Differently with Reactive Programming
Supun Dissanayake
 
Advance Java Programming (CM5I)5.Interacting with-database
Advance Java Programming (CM5I)5.Interacting with-databaseAdvance Java Programming (CM5I)5.Interacting with-database
Advance Java Programming (CM5I)5.Interacting with-database
Payal Dungarwal
 
Kubernetes deployment strategies - CNCF Webinar
Kubernetes deployment strategies - CNCF WebinarKubernetes deployment strategies - CNCF Webinar
Kubernetes deployment strategies - CNCF Webinar
Etienne Tremel
 
IBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging Capabilities
IBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging CapabilitiesIBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging Capabilities
IBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging Capabilities
Peter Broadhurst
 
IBM IMPACT 2014 - AMC-1883 - Where's My Message - Analyze IBM WebSphere MQ Re...
IBM IMPACT 2014 - AMC-1883 - Where's My Message - Analyze IBM WebSphere MQ Re...IBM IMPACT 2014 - AMC-1883 - Where's My Message - Analyze IBM WebSphere MQ Re...
IBM IMPACT 2014 - AMC-1883 - Where's My Message - Analyze IBM WebSphere MQ Re...
Peter Broadhurst
 
Apache ActiveMQ - Enterprise messaging in action
Apache ActiveMQ - Enterprise messaging in actionApache ActiveMQ - Enterprise messaging in action
Apache ActiveMQ - Enterprise messaging in action
dejanb
 
Service Discovery in Distributed System with DCOS & Kubernettes. - Sahil Sawhney
Service Discovery in Distributed System with DCOS & Kubernettes. - Sahil SawhneyService Discovery in Distributed System with DCOS & Kubernettes. - Sahil Sawhney
Service Discovery in Distributed System with DCOS & Kubernettes. - Sahil Sawhney
Knoldus Inc.
 
IBM Managing Workload Scalability with MQ Clusters
IBM Managing Workload Scalability with MQ ClustersIBM Managing Workload Scalability with MQ Clusters
IBM Managing Workload Scalability with MQ Clusters
IBM Systems UKI
 
Continuous deployment of polyglot microservices: A practical approach
Continuous deployment of polyglot microservices: A practical approachContinuous deployment of polyglot microservices: A practical approach
Continuous deployment of polyglot microservices: A practical approach
Juan Larriba
 
draft_myungho
draft_myunghodraft_myungho
draft_myungho
Myungho Jung
 
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
Peter Broadhurst
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
MarkTaylorIBM
 
Containers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes LeoContainers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes Leo
Léopold Gault
 
The mysqlnd replication and load balancing plugin
The mysqlnd replication and load balancing pluginThe mysqlnd replication and load balancing plugin
The mysqlnd replication and load balancing plugin
Ulf Wendel
 
Grokking TechTalk #16: React stack at lozi
Grokking TechTalk #16: React stack at loziGrokking TechTalk #16: React stack at lozi
Grokking TechTalk #16: React stack at lozi
Grokking VN
 
weblogic perfomence tuning
weblogic perfomence tuningweblogic perfomence tuning
weblogic perfomence tuning
prathap kumar
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,post
vamsi krishna
 
Let the alpakka pull your stream
Let the alpakka pull your streamLet the alpakka pull your stream
Let the alpakka pull your stream
Enno Runne
 
Container Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and KubernetesContainer Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and Kubernetes
Will Hall
 
ZooKeeper Partitioning - A project report
ZooKeeper Partitioning - A project reportZooKeeper Partitioning - A project report
ZooKeeper Partitioning - A project report
pramodbiligiri
 

What's hot (20)

Solve it Differently with Reactive Programming
Solve it Differently with Reactive ProgrammingSolve it Differently with Reactive Programming
Solve it Differently with Reactive Programming
 
Advance Java Programming (CM5I)5.Interacting with-database
Advance Java Programming (CM5I)5.Interacting with-databaseAdvance Java Programming (CM5I)5.Interacting with-database
Advance Java Programming (CM5I)5.Interacting with-database
 
Kubernetes deployment strategies - CNCF Webinar
Kubernetes deployment strategies - CNCF WebinarKubernetes deployment strategies - CNCF Webinar
Kubernetes deployment strategies - CNCF Webinar
 
IBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging Capabilities
IBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging CapabilitiesIBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging Capabilities
IBM IMPACT 2014 AMC-1866 Introduction to IBM Messaging Capabilities
 
IBM IMPACT 2014 - AMC-1883 - Where's My Message - Analyze IBM WebSphere MQ Re...
IBM IMPACT 2014 - AMC-1883 - Where's My Message - Analyze IBM WebSphere MQ Re...IBM IMPACT 2014 - AMC-1883 - Where's My Message - Analyze IBM WebSphere MQ Re...
IBM IMPACT 2014 - AMC-1883 - Where's My Message - Analyze IBM WebSphere MQ Re...
 
Apache ActiveMQ - Enterprise messaging in action
Apache ActiveMQ - Enterprise messaging in actionApache ActiveMQ - Enterprise messaging in action
Apache ActiveMQ - Enterprise messaging in action
 
Service Discovery in Distributed System with DCOS & Kubernettes. - Sahil Sawhney
Service Discovery in Distributed System with DCOS & Kubernettes. - Sahil SawhneyService Discovery in Distributed System with DCOS & Kubernettes. - Sahil Sawhney
Service Discovery in Distributed System with DCOS & Kubernettes. - Sahil Sawhney
 
IBM Managing Workload Scalability with MQ Clusters
IBM Managing Workload Scalability with MQ ClustersIBM Managing Workload Scalability with MQ Clusters
IBM Managing Workload Scalability with MQ Clusters
 
Continuous deployment of polyglot microservices: A practical approach
Continuous deployment of polyglot microservices: A practical approachContinuous deployment of polyglot microservices: A practical approach
Continuous deployment of polyglot microservices: A practical approach
 
draft_myungho
draft_myunghodraft_myungho
draft_myungho
 
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
 
Containers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes LeoContainers and Kubernetes -Notes Leo
Containers and Kubernetes -Notes Leo
 
The mysqlnd replication and load balancing plugin
The mysqlnd replication and load balancing pluginThe mysqlnd replication and load balancing plugin
The mysqlnd replication and load balancing plugin
 
Grokking TechTalk #16: React stack at lozi
Grokking TechTalk #16: React stack at loziGrokking TechTalk #16: React stack at lozi
Grokking TechTalk #16: React stack at lozi
 
weblogic perfomence tuning
weblogic perfomence tuningweblogic perfomence tuning
weblogic perfomence tuning
 
Servletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,postServletarchitecture,lifecycle,get,post
Servletarchitecture,lifecycle,get,post
 
Let the alpakka pull your stream
Let the alpakka pull your streamLet the alpakka pull your stream
Let the alpakka pull your stream
 
Container Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and KubernetesContainer Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and Kubernetes
 
ZooKeeper Partitioning - A project report
ZooKeeper Partitioning - A project reportZooKeeper Partitioning - A project report
ZooKeeper Partitioning - A project report
 

Similar to An investigation into Cluster CPU load balancing in the JVM

A Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and ContainersA Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and Containers
prashant desai
 
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Qualcomm Developer Network
 
Cloud 2010
Cloud 2010Cloud 2010
Cloud 2010
steccami
 
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingKubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Theofilos Papapanagiotou
 
A Distributed Control Law for Load Balancing in Content Delivery Networks
A Distributed Control Law for Load Balancing in Content Delivery NetworksA Distributed Control Law for Load Balancing in Content Delivery Networks
A Distributed Control Law for Load Balancing in Content Delivery Networks
Sruthi Kamal
 
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
ijceronline
 
Effective VM Scheduling Strategy for Heterogeneous Cloud Environment
Effective VM Scheduling Strategy for Heterogeneous Cloud EnvironmentEffective VM Scheduling Strategy for Heterogeneous Cloud Environment
Effective VM Scheduling Strategy for Heterogeneous Cloud Environment
International Journal of Science and Research (IJSR)
 
Day in the life event-driven workshop
Day in the life  event-driven workshopDay in the life  event-driven workshop
Day in the life event-driven workshop
Christina Lin
 
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized APIImplementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
IJCSIS Research Publications
 
RESEARCH ON DISTRIBUTED SOFTWARE TESTING PLATFORM BASED ON CLOUD RESOURCE
RESEARCH ON DISTRIBUTED SOFTWARE TESTING  PLATFORM BASED ON CLOUD RESOURCERESEARCH ON DISTRIBUTED SOFTWARE TESTING  PLATFORM BASED ON CLOUD RESOURCE
RESEARCH ON DISTRIBUTED SOFTWARE TESTING PLATFORM BASED ON CLOUD RESOURCE
ijcses
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
Swapnil Shahade
 
1844 1849
1844 18491844 1849
1844 1849
Editor IJARCET
 
1844 1849
1844 18491844 1849
1844 1849
Editor IJARCET
 
Building A Linux Cluster Using Raspberry PI #1!
Building A Linux Cluster Using Raspberry PI #1!Building A Linux Cluster Using Raspberry PI #1!
Building A Linux Cluster Using Raspberry PI #1!
A Jorge Garcia
 
GCF
GCFGCF
Chat application through client server management system project.pdf
Chat application through client server management system project.pdfChat application through client server management system project.pdf
Chat application through client server management system project.pdf
Kamal Acharya
 
NodeJS guide for beginners
NodeJS guide for beginnersNodeJS guide for beginners
NodeJS guide for beginners
Enoch Joshua
 
Srushti_M.E_PPT.ppt
Srushti_M.E_PPT.pptSrushti_M.E_PPT.ppt
Srushti_M.E_PPT.ppt
khalid aberbach
 
Server-side JS with NodeJS
Server-side JS with NodeJSServer-side JS with NodeJS
Server-side JS with NodeJS
Lilia Sfaxi
 
Cs556 section2
Cs556 section2Cs556 section2
Cs556 section2
farshad33
 

Similar to An investigation into Cluster CPU load balancing in the JVM (20)

A Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and ContainersA Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and Containers
 
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
Power-Efficient Programming Using Qualcomm Multicore Asynchronous Runtime Env...
 
Cloud 2010
Cloud 2010Cloud 2010
Cloud 2010
 
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingKubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
 
A Distributed Control Law for Load Balancing in Content Delivery Networks
A Distributed Control Law for Load Balancing in Content Delivery NetworksA Distributed Control Law for Load Balancing in Content Delivery Networks
A Distributed Control Law for Load Balancing in Content Delivery Networks
 
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
Efficient Resource Allocation to Virtual Machine in Cloud Computing Using an ...
 
Effective VM Scheduling Strategy for Heterogeneous Cloud Environment
Effective VM Scheduling Strategy for Heterogeneous Cloud EnvironmentEffective VM Scheduling Strategy for Heterogeneous Cloud Environment
Effective VM Scheduling Strategy for Heterogeneous Cloud Environment
 
Day in the life event-driven workshop
Day in the life  event-driven workshopDay in the life  event-driven workshop
Day in the life event-driven workshop
 
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized APIImplementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
 
RESEARCH ON DISTRIBUTED SOFTWARE TESTING PLATFORM BASED ON CLOUD RESOURCE
RESEARCH ON DISTRIBUTED SOFTWARE TESTING  PLATFORM BASED ON CLOUD RESOURCERESEARCH ON DISTRIBUTED SOFTWARE TESTING  PLATFORM BASED ON CLOUD RESOURCE
RESEARCH ON DISTRIBUTED SOFTWARE TESTING PLATFORM BASED ON CLOUD RESOURCE
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
 
1844 1849
1844 18491844 1849
1844 1849
 
1844 1849
1844 18491844 1849
1844 1849
 
Building A Linux Cluster Using Raspberry PI #1!
Building A Linux Cluster Using Raspberry PI #1!Building A Linux Cluster Using Raspberry PI #1!
Building A Linux Cluster Using Raspberry PI #1!
 
GCF
GCFGCF
GCF
 
Chat application through client server management system project.pdf
Chat application through client server management system project.pdfChat application through client server management system project.pdf
Chat application through client server management system project.pdf
 
NodeJS guide for beginners
NodeJS guide for beginnersNodeJS guide for beginners
NodeJS guide for beginners
 
Srushti_M.E_PPT.ppt
Srushti_M.E_PPT.pptSrushti_M.E_PPT.ppt
Srushti_M.E_PPT.ppt
 
Server-side JS with NodeJS
Server-side JS with NodeJSServer-side JS with NodeJS
Server-side JS with NodeJS
 
Cs556 section2
Cs556 section2Cs556 section2
Cs556 section2
 

Recently uploaded

Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
Hironori Washizaki
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
aymanquadri279
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 

Recently uploaded (20)

Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024SWEBOK and Education at FUSE Okinawa 2024
SWEBOK and Education at FUSE Okinawa 2024
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
What is Master Data Management by PiLog Group
What is Master Data Management by PiLog GroupWhat is Master Data Management by PiLog Group
What is Master Data Management by PiLog Group
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 

An investigation into Cluster CPU load balancing in the JVM

  • 1. An Investigation into Cluster CPU load balancing in the JVM Calum James Beck Submitted in partial fulfilment of the requirements of Edinburgh Napier University for the Degree of Bachelor of Engineering with Honours in Software Engineering School of Computing
  • 3. Abstract The JVM CPU Cluster Balancer is a scalable, proof of concept system designed to distribute processes over a network to perform multiple tasks at once, in a language of high abstraction. Once distributed, workers return results to an access server, all while monitoring their respective CPUs for computational stress in terms of CPU usage. CPU’s incurring set stress then have their respective processes moved to a less intensive area in the cluster, balancing work overall. The system works by enrolling Universal Clients (CPU’s waiting for work) to an access server, which then requests processes to be sent from the users desired Process Server. Each Process comes in the form of a Process Definition complying with the Agent interface, self-contained in an object. During run time, the Process Definition object acts as a subtype of the process manager, assuming responsibility for saving and restoring the state of the process. Each Client has four Process Nodes which it can delegate work to. The selected Process Node then connects to the received Process using two internal channels and runs using an instance of a Process Manager. During runtime, the Client also implements a Node Monitor which monitors the CPU usage of the Client in real time. When a set percentage of stress is met (CPU usage), the Universal Client informs the server that an alternative node is needed, on a different machine to finish the instance of work. The Process Definition then stops its runnable logic. The server searches through enrolled Clients and sends the address of an underwhelmed CPU in the cluster to the requesting Node. A dynamic TCP/IP channel is then created between the node and the foreign Process Manager. The process object is then serialized allowing transferal, in its paused state, and is resumed at the new client. The system is developed using pre-set processes to ensure repeatability of results and is based entirely in any system running the JVM. This project results in a working system which can distribute work based on CPU stress, but concludes that in order to be labelled complete, more functionality needs to be added to find the system an adequate application. Page | 3
  • 4. The Java language, JCSP, the Groovy scripting language and the Sigar Application Programming Interface (API), which provides pure C bindings to Java, have been used in this project. All code, written and complied using Eclipse Mars IDE. Page | 4
  • 5. Contents 1 INTRODUCTION................................................................................................12 1.1 Background..............................................................................................................................................13 1.2 Aims and Objectives................................................................................................................................14 1.3 Scope and Limitations ............................................................................................................................15 1.4 Structure of Dissertation.........................................................................................................................16 2 BACKGROUND, KEY COMPONENTS AND THEORY....................................17 2.1 Data and Task Parallelism......................................................................................................................17 2.2 Hoare’s Communicating Sequential Processes (CSP).........................................................................17 2.3 Channels...................................................................................................................................................18 2.4 Groovy......................................................................................................................................................19 2.5 Communicating Sequential Processes for Java (JCSP).......................................................................19 2.6 Channel Mobility in JCSP......................................................................................................................19 3 METHODOLOGY...............................................................................................21 3.1 Monitoring CPU Usage...........................................................................................................................21 3.2 Process Creation and Distribution ........................................................................................................24 3.3 Process Movement associated Methods.................................................................................................26 4 INITIAL EXPERIMENTS....................................................................................29 4.1 Monitoring CPU usage............................................................................................................................29 5 ARCHITECTURAL DESIGN..............................................................................34 5.1 Central Repository..................................................................................................................................34 5.2 Ring System with Travelling Agents.....................................................................................................36 5.3 Work & Node Manager System.............................................................................................................37 5.4 Network Structure Analysis...................................................................................................................38 6 INTRODUCING PROCESS MOVEMENT ........................................................39 6.1 Java Memory Model ..............................................................................................................................39 6.2 Moving processes within a JVM............................................................................................................40 Page | 5
  • 6. 6.3 Thread Serialization impossible with current JVM.............................................................................40 6.4 Adapting Process definitions as Agents.................................................................................................42 6.5 Sending process definitions in current state.........................................................................................43 7 PROTOTYPE.....................................................................................................44 7.1 Design.......................................................................................................................................................44 7.2 Components.............................................................................................................................................46 7.3 Experiment Setup....................................................................................................................................55 7.4 Results ......................................................................................................................................................56 7.5 Comparative Analysis.............................................................................................................................56 7.6 Local Concurrency Vs Distributed........................................................................................................58 8 CONCLUSION...................................................................................................59 8.1 Has the Project met its Aim and Objectives?.......................................................................................59 8.2 Deployment Analysis and Critique........................................................................................................60 8.3 Further Research and Work..................................................................................................................61 8.4 Reflective Statements..............................................................................................................................64 9 REFERENCES...................................................................................................67 A. Searched Terms........................................................................................................................................70 B. Meeting Diagrams ....................................................................................................................................72 ........................................................................................................................................................................74 ........................................................................................................................................................................75 ........................................................................................................................................................................76 ........................................................................................................................................................................77 ........................................................................................................................................................................78 ........................................................................................................................................................................79 ........................................................................................................................................................................80 ........................................................................................................................................................................81 ........................................................................................................................................................................82 ........................................................................................................................................................................83 C. Github analytics........................................................................................................................................84 Page | 6
  • 7. ........................................................................................................................................................................84 Initial Project Overview...............................................................................................................................86 Initial Project Overview...............................................................................................................................86 SOC10101 Honours Project (40 Credits) .............................................86 Page | 7
  • 8. List of Figures FIGURE 1. BASIC CONCEPT OF PROCESS MIGRATION...............................14 FIGURE 2. JAVA BEANS STRUCTURE.............................................................22 FIGURE 3. BASIC JNI INTERFACE PROCESS..................................................23 FIGURE 4. VISUAL REPRESENTATION OF VALUE GENERATOR.................24 FIGURE 5. SERVER-CLIENT PATTERN DIAGRAM..........................................26 FIGURE 6. VISUAL REPRESENTATION OF AGENT RUNNING IN PROCESS MANAGER............................................................................................................28 FIGURE 7. MK I: HOST NODE SYSTEM DIAGRAM..........................................35 FIGURE 8. NODE RING NETWORK DIAGRAM.................................................36 FIGURE 9. WORK AND NODE MANAGER NETWORK DIAGRAM..................37 FIGURE 10. LOGICAL VIEW OF JAVA MEMORY RELATIOSN (JENKOV, N.D.)......................................................................................................................39 FIGURE 11. JAVA MEMORY MODEL INTERACTION WITH CPU MEMORY MODEL (JENKOV, N.D.).....................................................................................41 FIGURE 12. ORDER OF EVENTS FOR CONNECTING TO AGENT................42 FIGURE 13. METHOD AND CONTENTS OF PROCESS (THIS)........................43 FIGURE 14. FINAL PROTOTYPE, SERVER-CLIENT NETWORK.....................45 FIGURE 15. ANY2ONE CHANNEL CONCEPT...................................................48 FIGURE 16. INTERNAL CONNECTION MECHANISMS OF AGENT.................50 FIGURE 17. SERVER INTERACTION DIAGRAM FOR PROTOTYPE...............55 FIGURE 18. TABLE OF EXPERIMENT RESULTS.............................................56 FIGURE 19. TEST RESULTS GRAPH; CPU USAGE AND TIME SPENT.........57 Page | 8
  • 9. FIGURE 20. NODE INTERACTION DIAGRAM...................................................62 Page | 9
  • 10. List of Screenshots SCREENSHOT 1. WINDOWS 10 TASK MANAGER AND RESOURCE MANAGER............................................................................................................29 SCREENSHOT 2. CONSOLE LOG: BASE READING OF CPU USAGE ON CLIENT 1...............................................................................................................31 SCREENSHOT 3. CONSOLE LOG: CIENT 2 AFFECTING CLIENT 1 CPU READINGS............................................................................................................32 SCREENSHOT 4. CLIENT INITIALISING UI.......................................................51 SCREENSHOT 5. SERVER NOT STARTED OR CRASHED ERROR MESSAGE ...............................................................................................................................51 SCREENSHOT 6. CONSOLE LOG: NODE REGISTERED ON SERVER..........52 SCREENSHOT 7. BASIC USER UI......................................................................52 SCREENSHOT 8. CONSOLE LOG: NODE SHOWING READY.......................52 SCREENSHOT 9. CONSOLE LOG: NODE DOING WORK AND RELEASING PROCESS NODE 1 WHEN FINISHED................................................................53 SCREENSHOT 10. CONSOLE LOG: WHEN PROCESS 4 STARTS, CPU IS HIGH (62%), AGENT IS CONTACTED (I AM READING), THE PROCESS IS DISCONNECTED, SENT (LETS GO) AND PROCESS NODE 4 IS RELEASED ...............................................................................................................................54 SCREENSHOT 11. CONSOLE LOG: SERVER DELETES ADDRESS..............54 Page | 10
  • 11. Acknowledgements Firstly, I would like to profusely thank Professor Jon Kerridge who has been an invaluable source of confidence and knowledge throughout this whole project. He has been a guide and kept me steadfast in what needed to be completed through challenging times. Secondly, I’d like to thank Doctor Kevin Chalmers who has always been compassionate and a nurturing presence throughout my time in University, from my first to fourth year. I would also like to personally thank Charlotte Leask for her constant support and eternal patience throughout the whole process. Page | 11
  • 12. 1 Introduction As the world tends towards the finite end of physical enhancements in computing, it is the aim to continue increased speeds and finding new methods of surpassing these limitations. In the past, the first step in augmenting any computer in terms of speed and performance has been reducing transistors size and increasing speed henceforth. Co-founder of Intel Gordon E. Moore stated that the number of transistors able to fit on a processor would double every 18 months, fundamentally increasing the speed of computer for at least the next decade. This model of thought is still used regularly in the computing industry today, however it was initially stated in 1965 and since then, many things have changed. The problem we are met with today is distance, heat and conduction. The physical size we are hitting on distance between cache memory and cores is reducing, more and more. We are starting to hit almost instantaneous transmissions and this comes with another set of problems. Heat is generated when a CPU core is pushed to compute at the rates we demand and can require more intricate ways to cool the system, and this can all be down to bad allocation of resources. We hence need to look at how we balance our work. Software needs to reflect the modern multitasking environment that we have come to expect and hence, must change in order to cope with increasing demand as hardware cannot be relied on to be the sole supporter in this venture. I plan to build a system which allows a proper allocation of resources available and increased the efficiency of hardware use in order to achieve a faster, reliable system. 1 This project endeavours to meet these needs with a system which distributes processes over a cluster of computers, regulating work based on CPU load. This is a 1 Taken from IPO Page | 12
  • 13. means of using Idle CPU’s without exceeding a threshold impeding on the users everyday use. The final product aims to be proof of concept that load balancing is possible in a high level language, in a portable environment. Hence, it displays the means and capabilities required to further develop a fully, automated system for everyday users with access to multiple devices with Java compatibility. 1.1 Background Most processing enhancing implementations fall under Cloud computing; outsourcing processing to external data centres, platform services or application hosting, whilst remotely managing computer resources (Winias & Brown, n.d.). However, not all businesses have access to scalable hardware architectures, these architectures being expensive to build, run, and upkeep. Shifting foci to performance, creating efficient software diminishes the need for in- depth management of system architectures and is a fundamental code of conduct for emerging professional IT bodies (such as the British Computing Society). However, different programming languages support different levels of control on a system. Programming in languages of high abstraction do not fundamentally afford the same level of efficiency low level languages can attain, and low level languages are platform specific and do not pertain to portable methods. So taking advantage of current user environments rather than reimplementation of code or hardware is, logically, the most cost effective and least disruptive route. This can be done by effectively managing processing loads; maximising processing resource capabilities. Utilising idle CPU resources on a network of computers (cluster) is fundamentally guaranteed to speed up processing and work all around. In order to do so, these resources must be directed to work together towards a common goal (i.e. Task parallelism). Many current system such as Incredibuild implement this parallel design for build environments, working with low level code to facilitate high level build concepts. Page | 13
  • 14. (Xoreax Software Ltd., n.d.) With such high profile clients such as Microsoft, Google, IBM and Disney using their product to maximise their system use, it’s obvious that this task distribution method is proven to work. However, for the average user or start up business, system specifics might still prove elusive. So why not implement this distribution system in a portable, high level language? Java is a widely used platform, built to be compiled in memory, running in a Virtual Machine aiming for multiplatform portability. According to Oracle, 97% of Enterprise Desktops run Java alongside 3 Billion Mobile phones worldwide (Oracle, 2015). Building a system in Java allows for the opportunity to port to multiple platforms with relative ease, making the potential for networked devices joining the system exponential. It should be noted that in researching this area, there is very little on the subject of load balancing in high level languages in a cluster environment within the last 6 – 10 years. Appendix A documents the search criteria used and the relevancy of results. 1.2 Aims and Objectives The aim of this project is to distribute and regulate processes over multiple CPU’s in a cluster setting using the Java programming language, with the Java Virtual Machine (JVM) as the environment. This involves monitoring CPU usage in real time, stopping processes which appear to overload set terminal and then moving them to CPU’s experiencing less stress in the cluster. Figure 1. Basic concept of process migration Page | 14
  • 15. The main objectives in order to create such a system, in practise, are outlined below: 1) Monitor CPU usage incurred by an instance of JVM. 2) Processes must have a way to be interrupted and saved in their current state. 3) Processes need to a have a way to move and reinitialise at different nodes, on different CPU’s. This report documents the steps taken to achieve these goals from inception to completion. This project aims to provide a system which endeavours to successfully manage load over several terminals in a cluster, using a language with a high level of abstraction: Java. 1.3 Scope and Limitations In order to provide a proof of concept system within the projects allotted time, certain areas of the project had to be kept within reasonable limitations. In this case, a limited amount of processes are programmed and sent automatically over the cluster to ensure that overload can be attained at a percentage certainty of time. This means the system does not afford user input yet and runs fairly autonomously. In addition, to show the scalability of the system, it must be ensured that the computer which will be distributing tasks runs at a proficient speed to facilitate access from multiple user-end nodes with, preferably, one underperforming CPU. As the system relies on communication, many options of transmission are available but are kept only to TCP/IP network protocols. This form of communication was chosen as it is a proven, reliable and a widely-used method which is supported by virtually all OS and platforms that Java can be run on. This project will also be using a Java scripting language called Groovy which facilitates the use of ‘Communicating Sequential Processes for Java’ (JCSP). This allows the manipulation of threads at a low level with high level abstraction resulting Page | 15
  • 16. in a parallelised system and can use TCP/IP protocols as its main mechanism for communication between systems. As the project is created to prove that Java can be utilised with a capacity to distribute and balance a system over a cluster, all aspects of the system will be implemented in Java, to the constraints of the JVM, whilst maintaining high level of abstraction in the source code. Other programming languages will only be considered when it is conceptually and physically impossible to implement the requisites for completion with the author’s current knowledge and skills. 1.4 Structure of Dissertation The structure of this document is as follows; • Section 2 introduces the methodology, the theory and the practises behind the message passing mechanics of the system which revolves around the JCSP. • Section 3 discusses the methods implemented throughout the project as well as the discussions made as a result of research, to reach the finished prototype • Section 4 will present the initial experiments conducted. This documents the limitations and barriers which had to be overcome in order to develop a functioning prototype. • Section 5 describes the main incarnations of the system and how each implementation lead to a better system • Section 6 provides the mechanics behind moving processes and the difficulties face in doing so • Section 7 elaborates on and demonstrates the prototype system; reviewing design and implementation as well as experimentation with the system. • Section 8 details the results and evaluation of the system, and project. Section 8 concludes with a critical evaluation of the project covered by the paper including short comings of the project and possible avenues of work on the system which can be undertaken in the future. Page | 16
  • 17. 2 Background, Key Components and Theory Throughout this report, the majority of components described have been taught through, and defined by “Using Concurrency and Parallelism Effectively” I & II (Kerridge, 2014), which builds upon Hoare’s Communicating Sequential Processes (CSP) theory. Unless explicitly referenced otherwise, these are the main sources of information disclosed herein. In this section the basic elements from which the prototype product is derived, are explained. 2.1 Data and Task Parallelism One of the driving forces in this project is concurrency and parallelism. Task parallelism allows the user to run multiple processes simultaneously on the one CPU or over a network. Sequential code follows a specified order, so programmers don’t tend to think about the order of events in a system once it has been coded and compiled. In order to process tasks moving around the intended system, processes will have to be fairly autonomous and removed from the main body of code. This means that concurrent and parallel code with have to stop and synchronise with each other on transfer, interact in timely a manner so as not to disrupt running processes, finish in an expected order despite being intrinsically non-deterministic in nature due to running on different platforms, at different speeds all whilst the possibility of migration plays an active role. 2.2 Hoare’s Communicating Sequential Processes (CSP) Hoares CSP concepts (Hoare, 2004) dictate that everything encapsulated in code can be broken down into algebraic functions. By doing so, everything within programming can be reduced to simple, understandable functions, rules and patterns. By doing so, all code can be reduced to smaller chunks which can be moved around to suit the success of the formula. What you see, is what you get. The following mechanisms facilitate this concept, and is the basis of the end prototype. Page | 17
  • 18. 2.2.1 Process A Process is a piece of code that can be executed in parallel with other processes. A network of processes form a solution to a single problem, with processes communicating with each other using Channels (detailed in 2.3). Processes typically contain repeating sequences of sequential code with communication interspersed. Any process that is idle consumes no processor resources. 2.2.2 Timer A Timer is a means of introducing time management into processes. Timers can be read to find the current time and introduce delays or alarms for future events. They can also be used in ALTs as guards for reading channels. 2.2.3 Alternatives (ALT) Alternatives (ALT) allows selection of one ready guard from several possible guards. Guards comprise of three different types: input communications, timers, or SKIPS and dictate how a process should proceed. A guard is ready if input is ready, an alarm time has passed, or SKIP is a defined guard. SKIPs are always ready and allow guards to continuously run. The ALT will wait until a guard is ready and then undertake the associated code. If one guard is ready, then it undertakes the associated code. If more than one is ready, it selects one according to predefined options and then obeys the code. These options can include priority reading, if both are ready, or fair, turn based reading. 2.3 Channels This is a main mechanic of the system described in this report, as the main aim is to send processes over a cluster network. A Channel is a one-way, point-to-point, unbuffered connection between two processes. Channels synchronise the processes to pass data from one to another and do not use polling or loops to determine their status, meaning no processing is consumed during transactions. The first process attempts to communicate and goes idle when synchronising. The second process attempting to communicate will then discover the situation, Page | 18
  • 19. undertake the data transfer and then both processes will continue in parallel, or concurrently if they were both executed on the same processor. It does not matter which process attempts communication first as the mechanism is symmetric. When communication between processors takes place, the underlying system creates a copy of the data object and begins transferal. As such, objects containing process logic can be transferred, to be executed by a Process Manager, and run asynchronously, which will form the basis of the project. 2.4 Groovy The Groovy scripting language allows the programmer to write concurrent systems with a high level of abstraction and is underpinned the four basic principles detailed above. 2.5 Communicating Sequential Processes for Java (JCSP) JCSP is based on Hoare’s basic algebraic functions, allowing virtual connection to be created via NetChannelLocation structures sent between nodes. Using Java allows the programmer the ability to send objects via serialisation methods; breaking down the components into sequences of bytes to be transferred (Chalmers, Kerridge, & Romdhani, A critique of JCSP Networking, 2008). With this framework, objects containing code definitions can be sent along with a control signal to recreate the object at the receiving end. Communicating Sequential Processes for Java is the cornerstone of this project and allows us to build upon Hoare’s concepts to create a simple to understand communication network. 2.6 Channel Mobility in JCSP Channel Mobility refers to the dynamic capabilities that can be found when creating self-propagating NetChannels and other communication models in this project. Channels afford us a robustness of connection between the input and out end whilst allowing sufficient models to a support the ubiquitous nature of the intended system. (Chalmers, Investigating Communicating Sequential Processes For Java To Support Ubiquitous Computing, 2008) Page | 19
  • 20. As the project does not endeavour to change these underlying mechanisms, a high level of information is presented. However, it can be stated that Channel mobility is paramount to attaining, transferring and moving processes successfully. Page | 20
  • 21. 3 Methodology The main aim of the system is to create a way to send processes from one node to another running in the same computing cluster. This would be initiated by rising CPU usage at each terminal. As such, there were three main problem areas which needed to be addressed; 1. How to get CPU usage at any given time from within a JVM runtime. 2. How to create and deliver processes around a dynamic network. 3. How to stop set process when CPU usage reaches a predetermined amount and send to an underused node in the network 3.1 Monitoring CPU Usage At the time of writing, there were no pure Java API’s available to gather CPU information. The investigation continued as fact finding into gathering as much system data as possible within Java. 3.1.1 MBeans MBeans are managed Java objects, similar to JavaBeans, which can represent a device, an application or any resource that needs to be managed. Page | 21
  • 22. Figure 2. Java Beans Structure This means we can monitor any of the resources being used by an instance of the JVM. However, as an MBean can be any type of object and can expose attribute of any different type, each client has to implement class definitions each time an MBean is called and can lead to high overheads in themselves when repeatedly queried. 3.1.2 OperatingSystemMXBean MXBean is native to Java (1.6 upwards) and allows the user to utilize an MBean with a reduced set of types, meaning there is not a requirement for model-specific classes. This makes the MBean accessible by any local or remote clients; essentially conforming to an interface. OperatingSystemMXBean allows the user access to an interface developed for retrieving system properties about the operating system on which the JVM is running. Page | 22
  • 23. This includes, free memory of the computer, allocated memory for the JVM and CPU time dedicated to a task. MXBeans were the only native mechanism provide by Java Management Extensions (JME) which could facilitate the objectives mentioned. 3.1.3 Java Native Interface (JNI) The Java Native interface is a native programming interface that is part of the Java Software Development Kit. JNI allows Java code the use of fragments and libraries written in other languages such as C and C++. While Java breaks code down into Objects to be interpreted, C allows for the use of procedural code which is compiled and breaks down into functions. The JNI connects Java Class Methods with C functions, fundamentally allowing the programmer to call C functions at any given time. Figure 3. Basic JNI interface process This allows the user access to lower levels of programming and can read values, such as CPU usage, from assembly code. Although this approach seems the most enticing, it can lead to the destabilisation of a JVM instance through subtle C errors. Writing small scripts may not pose a huge problem, but garbage collection is not Page | 23
  • 24. handled by the JVM in these instances and a basic understanding of memory allocation must also be known. Additionally, using the JNI results in a system which is not wholly portable as the code written in C is platform specific. 3.2 Process Creation and Distribution As one of the main prerequisite of this system, a network architecture had to be designed to facilitate communication.2 This section will focus on how data and work is spawned to test the proof of concept system. It should be noted that although the aim of the project is a proof of concept, the ideal system would spawn multiple instances of work which would accumulate to a large amount of CPU usage in order to adequately balance the system. 3.2.1 Value Generator Here, different volumes of data are generated by a data generator and sent to a Node to be processed. The perceived complexity of the data should be proportional to the increase in CPU usage created. Figure 4. Visual Representation of Value Generator 2 . Development and iterative design is documented in Section 5. Page | 24
  • 25. This would require a fixed process at each node initialisation to manipulate the randomly generated data sets being produces by the generators. All interactions are handled by channel interactions, as shown above in figure 4. 3.2.2 Random Process selection In this instance, each node would have access to pre-set process definitions which would generate varying loads. At run time, a timer would be initiated requesting a random process to run, one of which would create a large spike in CPU usage. This would allow reproduction of an overloaded state with a high degree of certainty while demonstrating. The structure would be similar to the above but would not require the DataGenerator as the initial input would remain the same. 3.2.3 Server Hosted Process Definitions This method would require processes being hosted remotely at a specific IP location, and requested by the client when needed. The client would access a server which would have the network locations of all the relevant process servers. The request would then be forwarded, with the clients’ location, to the process location and sent via a TCP/IP channel back to the client. Page | 25
  • 26. Figure 5. Server-Client Pattern Diagram This would work by using objects containing serializable process definitions sent over channels. 3.3 Process Movement associated Methods As copying large amounts of data around a system would prove to be inefficient, not withholding the large overheads in processing and memory allocation, the system has to handle data manipulation locally, within one processor. This means that processes in their entirety have to be sent between nodes, to complete the full process, sharing as little data during computation as possible. The aim is to send initial parameters and results. The methods implemented for this aspect of the system heavily rely on the JCSP API, the underpinning functions for Groovy. Hence, the definitions and descriptions pertaining to JCSP methods below are based and paraphrased from the API specifications hosted by the University of Kent at Canterbury (htt1). Implementing Page | 26
  • 27. process movement is covered with more detail in chapter 6, documenting limitations and boundaries. 3.3.1 JCSP Process Manager The ProcessManager class enables a CSProcess to be spawned concurrently with the process doing the spawning. This means we can have multiple processes running and allows the nodes in the system to deal with multiple processes being sent on the same channel. Dealing with processes as they arrive, allows the system to pertain to a client-server pattern, making chances of deadlock in the system (pertaining to this area), very slim. 3.3.2 Process Definition Serialisation in Objects In order to take advantage of the Process Manager capabilities, process definitions need to be designed to be CSProcesses. In order to do so, a process is defined in its’ entirety and encapsulated in an object. In doing so, we ensure the objects class implements two interfaces; CSProcess and Serializable. 3.3.2.1 CSProcess According to the JCSP documentation, “a CSP process is a component that encapsulates data structures and algorithms for manipulating that data” (htt). This basically means the data involved is private and cannot be accessed outside the object itself. Essentially, each instance of the process is alive, executing its own algorithms on its own data and its’ actions are defined by a single run method. To avoid race-hazards, the processes in this system do not require outside data or interaction with other running threads. Only primitive data types will be sent to activate switches or request new data. No procedures outside of defined data manipulation take place within the Process Manager. Page | 27
  • 28. 3.3.2.2 Serializable A class with Serializable uses the java.io.Serializable interface and allows subtypes of classes to be serialized for communication transfer. The interface itself doees not have any methods but serves only to identify the semantics of being serializable. It should be noted here that CS classes (not classes implementing this interface) such as CSTimer do not conform to serializable semantics and will be covered later in this document. 3.3.3 Agents The Agent interface implements both CSProcess and Serializable but also adds connect and disconnect methods. These are used to connect input and output channels from the internal mechanisms of the sent process definition resources, to an outside host. Figure 6. Visual Representation of Agent running in Process Manager The agent has two channels by which it connects to the host during runtime. This means the data inside the agent CSProcess can be influenced from outside the Process Manager. By exploiting the Agent interfaces, we can enable communication from outside threads during run time giving agents access to two different code structures. Page | 28
  • 29. 4 Initial Experiments In order to evaluate which methods would lead to a successful system, the methodologies aforementioned were investigated and implemented in different circumstances, testing for compatibility with the project. 4.1 Monitoring CPU usage Monitoring CPU usage would take part in two stages; designing code which will generate high usage and code which can interpret the CPU usage by percentage. Results would be compared in conjunction with the Task Manager and Resource Manager native to Windows 10. Screenshot 1. Windows 10 Task Manager and Resource Manager 4.1.1 Creating Work Creating work consisted of doing two different functions which would change intermittently to test increases in CPU usage. Small work creates an int value, comprising of a basic multiplication operation followed by a timer to create time between set operations. CSTimers, as part of JCSP, work as a guard for the code, acting as an ALT, meaning there is no processing wasted during execution. For larger CPU usage, a more complicated problem has been run to generate more work, creating a long variable, as seen below: Page | 29
  • 30. Long j = (Math.pow((Math.pow((60339*339398/2*33323),2348958)), 30000000000))*(Math.pow((Math.pow((454339*339765645398/26*354563323),2348456459 58)), 3000004564500000)) 4.1.2 Monitoring Work A basic system was implemented to create expected, repeatable workloads on the CPU that could be measured to inspect whether monitoring usage was successful. The system of operations is shown as a process diagram in figure 7 below. Figure 7. Test Process Diagram Page | 30
  • 31. The process is simple; a timer is set for a predetermined time in which a process of high CPU usage is implemented. CPU usage at this point would be verified by addressing the task manager seen in screenshot 1. 4.1.3 Accessing CPU Usage Measuring CPU usage is a difficult achievement whilst using Java. Firstly, in order for this project to succeed, we need to distinguish the actual work being done on a processor as opposed to the memory usage of the JSP. The latter is very easily accomplished with native Java commands but as any Java program is essentially interpreted by the system as a ‘process’, it cannot access the necessary tools in order to gain CPU usage insight in likeness of the Task Manager(screenshot 1). 4.1.3.1 Native Monitoring There are ways to obtain CPU usage which do not offer real time performance monitoring but can be based on timed events. For multi-threaded tasks, ThreadMXBean methods can give you the CPU usage and user time for any running thread. However, using operatingSystemMXBeans, (explained in Chapter 2, figure 2) only returns the CPU usage for all JVMs running (i.e. it cannot distinguish between processes of different PID and returns the CPU usage of all JVMs). In figure x, we can see the relationship between two JVMs working concurrently. Screenshot 2. Console Log: Base Reading of CPU usage on Client 1 Client 1 (right) is using independent code to monitor itself whilst Client 2 (left) is waiting for work. operatingSystemMXBeans returns the use of the CPU with 1 being 100% usage and 0 being 0%. At the moment, on monitoring, the system sits at 12% usage. Page | 31
  • 32. Screenshot 3. Console Log: Cient 2 affecting Client 1 CPU readings However, as new processes are started in Client 2, Client 1 continues to monitor high CPU consumption, proportional to the work of Client 1, despite having no work itself. opratingSystemMXBeans are further influenced by any other Java application running. Hence a way to distinguish between JVMs running has to be identified. It should be mentioned that as of Java 9, there is a new process API that allows the user to get the current process ID. However, on the date of writing, this was still in Beta testing and Java 8 was opted for use due to its comparative stability. 4.1.3.2 JNI interface C affords the low level functionality to physical components needed to identify a JVMs Process Identifier (PID). PIDs are numbers which uniquely identify a process while it runs and is used in Linux, Unix, Mac OS X and Windows. The problem however, is system calls are still defined differently on each OS. Language libraries need to be recompiled for the specific target operating system, to utilize the particular underlying components of the operating system (kernel). Page | 32
  • 33. As this research was beginning to deviate from the original project scope, delving further into low level code, an API was imported to give multi-platform compatibility. 4.1.4 Sigar API Sigar is a multiplatform API for Java and other languages. It allows the user to monitor Per-process memory, CPU, credential info, state, arguments and other relevant information (MacEachern, n.d.). By incorporating Sigar, the program can produce percentage numbers based on the amount of CPU usage attributed to the PID of a JVM. 4.1.5 Transferring Objects By connecting two nodes by a TCP/IP connection, we can send an object very easily. Implementing a serializable interface, an empty object is sent to another node at a defined IP. This was to ensure objects were being sent and not references. If a read was successful, a printed statement would display in the eclipse console stating “Success!”. 4.1.6 Running Process Definitions As process definitions can be contained within object, a simple system can be created using two nodes and instances of a process manager. Process definitions are sent using a timer testing one process running then two concurrently and the task manager is consulted to ensure process are being run correctly. Page | 33
  • 34. 5 Architectural Design Throughout the project, many different systems were designed to monitor processes and set up an architecture of communication which could theoretically facilitate this. The various designs are presented and critically evaluated below. 5.1 Central Repository This design attempts to meet the aim of process movement. Each node has a process node which creates and runs a process on the process Manager attached. The results are then sent to a host node which keeps track. Page | 34
  • 35. Figure 7. MK I: Host Node System Diagram Each node would monitor the CPU usage of the JVM. Once a certain level is met, the process would then be packed and sent to another node. 5.1.1 Central Repository - Issues The problem with this system is all channels must be created on initialisation, meaning no room for scalability. It’s also essentially working on a ring topology and more suited for a single system. This network is easily set up in a single JVM as well, meaning only references are passed, rather than the actual objects. Although good for initial tests (scaled back to two nodes and Host Node), the main drawback to this design is the ring element itself. This design was expanded upon to work with Agents below and the problems of ring networks is explored in more detail. Page | 35
  • 36. 5.2 Ring System with Travelling Agents The Agent System opens up the network, allowing communication over different JVMs. The processes are no longer spawned within the node, but sent by a manager as Process Definitions. The Manager then runs the process whilst a monitor reviews CPU usage. When needed, an Agent is created, with the relevant process. Figure 8. Node Ring Network Diagram 5.2.1 Ring and Agents - Issues It was during this iteration that the underlying principles of threads were explored in more detail and found to be non-serializable, meaning the running process could not be sent with the Agent in its current state. This meant this design would fundamentally not work. The agent could get the process definition, but in its unedited state, not during processing. Not only that but, when dealing with task parallelism, ring systems are inherently prone to deadlock. As processes are created at nodes, the communication between ring elements proved to be non-deterministic due to the random uncertainty as to Page | 36
  • 37. which processes were being spawned where, and when they exceeded pre-set CPU usage, and needed to be moved. If too many events were triggered, all of the processes involved in the ring would be attempting to output at the same time, resulting in deadlock. In non-uniform network, where computer architectures are different (providing varying computational power), this problem would become more prevalent. To alleviate this, we could have nodes probe the ring first with empty packets, waiting for them to return but this would result in half the network activity on the ring being empty data packets; a detriment to efficiency. 5.3 Work & Node Manager System The work and node manager took the ring element out of the design and introduced server client properties. The problem with this design is the servers are very closely related and can end a closed system. The final prototype changed this. Figure 9. Work and Node Manager Network Diagram Page | 37
  • 38. 5.4 Network Structure Analysis In order to minimise incidents of deadlock, the Client-Server pattern seemed most logical to implement. A server orientated network permitted: • Decreased chance of deadlock • Process Discovery o Nodes receive complete set of required process o Allowing dynamic amendment of process definitions • Process Control o User not restricted to only one choice o Timing of process delivery • Centralised repository for client lists and results • Scalability o Users added by location (IP) rather than assigned place Page | 38
  • 39. 6 Introducing Process Movement Processes movement was easily implemented when it occurred on the same physical machine in the cases of the first two prototypes. However, complication increases when functionality is extended to a network. Process Definitions are easily sent in a static state, but getting the state of a process in execution requires finding all relevant data saved in the JVM. 6.1 Java Memory Model In order for Java to be architecture neutral, it is built to operate and exist solely within memory (RAM). Hence, to mimic a computers infrastructure, the JVM inherently includes its own memory model. The Java memory model divides memory between thread stacks and the heap. It can be seen logically in figure 14. Figure 10. Logical view of Java Memory Relatiosn (Jenkov, n.d.) Page | 39
  • 40. Each thread running in the JVM has its own stack which contains information about which methods have been called, point of execution and the local variables for set methods. The local variable consist of primitive types and are fully stored within a thread stack. Hence, they cannot be seen by any other components of the JVM during execution. The heap contains all objects created in the Java application. The main point of contention for moving processes in the JVM, is the fact that all manipulation occurs within a thread stack. If the object containing the process definition being worked on is moved (even if thread is suspended during processing), it will be moved in its original, unedited state. 6.2 Moving processes within a JVM As all classes exist within a single JVM during runtime, hence initial tests for moving processes were misleading. Simply suspending a thread and calling set thread in another class leads to a seemingly successful process manoeuvre. This is achieved by suspending the process manager (essentially a concurrent thread) and sending it through a channel. In this case, as the channel connects two host processes within a single JVM, it’s only the thread reference which is communicated meaning it has technically remained in the same place and is only being restarted; just by another process. 6.3 Thread Serialization impossible with current JVM Each method ran in a Java program has a stack frame associated with it. The stack frame holds the state of a method with three sets of data: the methods local variables, the methods execution environment and the methods operand stack. It would stand to reason that by copying the values at suspension, copying a thread could be achieved. However, the thread object would be allocated with none of the native implementation. The JVM emulates a machine for each instance a Java program is started, and a thread run on one of these machines becomes intricately tied into the internal mechanisms of the machine. The context of operations is simply lost. Page | 40
  • 41. Reading the locations of the threads on the physical machine would prove difficult as well. Not only would this require a separate language to access the data, but memory allocation would have to be monitored from the inside of the JVM as well as outside. Hardware memory does not distinguish between the heap and threads; hence parts of thread stack can be present in CPU caches as well as the CPU register. Figure 11. Java Memory model interaction with CPU Memory Model (Jenkov, n.d.) Also, Java relies on C procedures for some of its native methods. If the stack were to be copied, it may contain native Java methods that, in turn, have called C procedures. This indicates a complicated mixture of Java constructs and C pointers would have to be recorded. At this point, not only does it increase the amount of data to be transferred at once over a network, but goes against the ethos of this investigation to find a solution with high abstraction. This is also why reconstructing byte code, instructions used by the JVM to resemble Assembler instructions, and the monitoring the JVM instruction set, have not gone under further investigation. 3 3 Using the Java Class file disassembler proved to be a cumbersome method to determine the sequence of events and was essentially the lowest level format possible with Java. Page | 41
  • 42. 6.4 Adapting Process definitions as Agents In order to move processes, we have to look at the object itself which is being edited. As the supertype class, process manager, is not serializable, the subtype object must assume responsibility for saving and restoring. As the process definitions already contain a run function, the system must be amended to stop the internal code from executing, and getting the edited values. This means each process object must be created as a new instance so as to keep track of its own local variables and have a method of communicating with the host process whilst running concurrently. Adapting the processes to conform to an Agent interface introduces two new methods which will allow this: connect and disconnect (agent seen in figure 6). The host is fitted with two new channels, generated at run time, which allow the agent to connect when received. The basic order of events can be seen in Figure 16. Figure 12. Order of Events for connecting to Agent Page | 42
  • 43. 6.5 Sending process definitions in current state In Java, objects can refer to themselves simply by calling “this”, meaning once the internal code has been paused and variables saved, the object itself can be packaged and written to a channel as a serializable object to be run by a new process manager. Figure 13. Method and Contents of Process (this) This way, as long as the process definition contains all the run code required, the state of the process is reflected in the object state. This meets the requirements for process movement and is a main part of the prototypes design. Page | 43
  • 44. 7 Prototype 7.1 Design The final implementation extends the Server Client design by adding Process Nodes to the universal client so multiple instances of a Process sent can be ran concurrently, whilst connecting to their respective host. It is based on the six paradigms for code mobility (Chalmers, Kerridge, & Romdhani, 2007): • Client-server o Client executes code on the server. • Remote evaluation o Remote node downloads code then executes it. • Code on demand o Clients download code as required. • Process migration o Processes move from one node to another. • Mobile agents o Programs move based on their own logic. • Active networks o packets reprogram the network infrastructure Page | 44
  • 45. In the case of this design, agents are begin manipulated as means of internal communication as well as movement. The final design is seen in figure 18. Figure 14. Final Prototype, Server-Client Network The Universal node comprises of a node Monitor, which is periodically, checking the CPU usage of the current JVM it is running in. In order to do so, a concurrent thread is spawned on run time with the sole purpose of returning the current CPU usage. Using Sigar, the CPU usage is checked every 10 milliseconds and if it above a certain threshold, a new node request is sent to the Access Server. The Node Monitor has four Process Nodes which are connected by two one2one Channels. Each Process node runs a Process Manager for incoming processes to connect with. At any given point, if either the Client or Server are waiting, they do so idle, consuming no processing power. Page | 45
  • 46. Process Movement is handled mostly by nodes to avoid over reliance of the servers involved. If a process has to be stopped, it is directly sent from the Process Manager running it, and transferred straight to a new Client, rather than via the Access Server. This essentially allows the system to move processes in the most direct manor conceived. The system conforms to a Client-Server pattern between the Universal Node and the Access Server. They are connected at initialisation by an any2net (toAccess) and numberedNet2One (processRecieve) Channel. This is also true for the relationship between the Access Sever and Process Servers, however there is only one connection for interaction as the Process Servers have nothing to return. 7.2 Components Detailed below are all the component which connect the system together as well as their role in the whole process. 7.2.1 Nodes In the context of this system, Nodes are autonomous, concurrently running processes. They control connectivity to the process locations, deal with work and monitor CPU usage. 7.2.2 Node Monitor The Node Monitor initialises the user system and creates a connection to the Access Server, adding its IP and port location on connection and removing set location when disconnecting. Currently, the server address is hard coded but any server with the same infrastructure could be added and defined by the user. It self-monitors its respective instance of a JVM for CPU usage and keeps track of which process nodes are in use. The Node Monitor requests processes to be run and delegates the work to the available Process Nodes asynchronously. Page | 46
  • 47. It can also stop process Nodes from continuing work when CPU load is too high. It then selects the last Node activated, requests another Universal Client location from the Access Server and sends the location to the Process Node. 7.2.3 Process Nodes Process Nodes receive process definitions and put them to work using a Process Manager. Each Process Node provides channel ends on which the Process definitions can connect to facilitate interaction between the received process definition and the host. This connection allows the Process Node to inform Processes to stop and move when a new channel location is received as well as alert the Node Manager as to when a process has finished. 7.2.3.1 Process Manager The Process Manager (detailed in section 3.3.1) runs the processes received concurrently. 7.2.4 Channels Channels comprise of two channel ends: • A channel input where data is read into the system component • A channel output where data is written out of the system component Channels in this system pertain to be one to one connection. The only exception is the stop line from Process Node to Process Manager. This connection is an any2one connection where the input can come from any node but the output is a specific channel end. Page | 47
  • 48. Figure 15. Any2One channel concept 7.2.5 Net Channel Net Channels work in the same way as regular Channels but the output is directed to a designated port at a new IP address. 7.2.5.1 Automatic Net Channels Generated during runtime, Automatic Net Channels create a Channel Input on-the-fly and use input IP addresses as its location. 7.2.6 Servers The Servers keep track of Clients available and allows the Client hosts to initialise waiting for processes to run. Page | 48
  • 49. 7.2.6.1 Access Server The Access server has the IP location of the Process Servers, and connects users to processes requested. The Process Servers IP addresses are stored and connected whenever an instance of the associated process is requested by the user. This server deals with user access requests (capabilities; in this system, an interface), process requests, find other client requests and client dismissals. 7.2.6.2 Access Manager The Access Manager registers new initialised Nodes onto the server and keeps track of active clients. This is the basis on finding new client locations when a client becomes overloaded. 7.2.6.3 Process Servers The Process Severs provide Process Definitions an IP address and port at which they are accessible by the Access Server. The Access Server must know which the locations at initialisation in order to incorporate them into the Client capabilities. However, the Process Definitions themselves can be amended and adapted during runtime as the location is the only parameter needed in between requests. 7.2.7 Process Definitions Process Definitions are objects with their own self-contained logic and variables activated by a run method. They conform to the CSProcess and Serializable interfaces. 7.2.7.1 Agent Definitions Agents afford the same capabilities as other Process Definition but introduce connect and disconnect methods. This allows Processes to travel with channels defined, connecting on reception. It is up to the host process to establish the channel connections. Page | 49
  • 50. 7.2.7.2 Agent Channels The Agent Channels allow the host process to connect to the internal logic being run by the Process Manager. The channels are defined in the host process to then be connected on reception of the Agent (before running the agent process definition), during host run time by the connect method. The input and output to the Agent, and the input and output from the host are then connected together as seen in figure 20. Figure 16. Internal Connection Mechanisms of Agent 7.2.8 Request Identification Request objects allow the Access Server to react in the manor required to process the data received in the correct way. The simplest is ClientRequestData which dictates that the string sent within the object corresponds with the service needed (i.e. “Process Spawn” requires service B) and the address of the requesting client. Other requests comprised of simple IP addresses required to be interpreted in different ways. Address locations were packed into said objects to differentiate between the contexts they were to be treated. These include: • ClientLocation Page | 50
  • 51. o Registers the Client and send capabilities • LeaveRequest o Removes Client details from Access Server • NodeRequest o Request another Client be found with different IP to send processes to • NewRequest o Same as Node request used exclusively for Process Manager and includes the Nodes ID 7.2.9 Implementation The system runs in the following manner. Migration • Process Servers are initialised and followed by the Access Server at set IPs • The Universal Client then instantiates itself with a base IP address and randomly generated port. It starts four process manager connected. Screenshot 4. Client Initialising UI • If the port matched another, an error message is show to try again (range 1 – 10,000) Screenshot 5. Server Not Started or Crashed error message Page | 51
  • 52. • Client connects to Server and Server enrols client into list Screenshot 6. Console log: Node registered on server • Server then send backs the Client capabilities Screenshot 7. Basic user UI • The Client can then choose different processes to call o It shows ready in the Console; as the system does not need to show the general public its workings, the console in Eclipse is used to monitor transactions Screenshot 8. Console Log: Node showing ready Page | 52
  • 53. • The service needed and IP of the Client are then sent to the Access Server, which then relays these values to the process server needed. • The Process sever then send the process to the requesting node directly • The Universal Client node then assigns the work to one of its free Process Nodes and marks that node as unavailable Screenshot 9. Console log: Node doing work and releasing Process Node 1 when finished • When the first process is received, the node Manager then spawns a new thread to monitor the CPU usage. • The process (agent) is then connected to the Process Node and the process is ran. • Once finished, the Process Node is released to work again • At any given point, a new node can become active Stopping and Moving • Once a node consumes too much CPU usage, the Manager notifies the server that it needs a new node. • Another node is chosen and the address returned to the requesting node • The manager then selects an active node (last process manager started) and send a message with the new address to the Process Node. • The Process Node then interprets that type of object and stops the Agent whilst simultaneously letting the Node Monitor know it can release that Process Node Page | 53
  • 54. Screenshot 10. Console Log: When Process 4 starts, CPU is high (62%), agent is contacted (I am reading), the Process is disconnected, sent (LETS GO) and Process Node 4 is released • The Agent then packs itself and sends itself to the next node where it continues • When the node is closed, the server is alerted and removes it from its active clients Screenshot 11. Console Log: Server deletes address To clarify, a server interaction diagram has been created to reflect to order of events in fig x. Page | 54
  • 55. Figure 17. Server Interaction Diagram for Prototype 7.3 Experiment Setup In order to test the validity of the system, the work described in 4.1.2 was completed 20,000 times for a total of 20 runs and timed for each 20 durations of work. The CPU usage was also recorded using MXBeans (for accuracy) and averaged. The experiment was conducted on computers with the specs below. Page | 55
  • 56. Hardware • CPU - i7 4770 @3 .4GHz • Ram - 16Gb DDR3 • GPU - NVIDIA NVS 510 (2047 MB) • OS - Windows 7 Professional 64-bit • Network Speed – 1GB/s 7.4 Results These experiments were conducted 12 times and the results averaged, ignoring the two polar, outlying values: 1. A single computer running the processes sequentially with processes hosted locally. 2. A single computer running the processes concurrently over 4 process nodes with processes hosted on process servers. 3. Two computers running the load balancing system with process from process servers. The results are detailed below (fig 21) Workers Average time taken CPU Usage 1 CPU: 1 Sequential Worker 24.36 Seconds 12% 1 CPU: 4 Concurrent Workers 10.18 Seconds 87% 2 CPU: 8 Concurrent Workers 8.78 Seconds 46% Figure 18. Table of Experiment Results 7.5 Comparative Analysis By visualising the data collected, we can correlation between the amount of CPU’s, time and work. Page | 56
  • 57. Figure 19. Test results Graph; CPU Usage and Time Spent Speed: • Increasing workers increases the speed of the work o This is not proportional to the amount added, but a vast improvement o Never Expected directly proportional speed up due to communication overheads • Adding additional CPU caused minor increase compared to increasing Native Resources o Due to synchronisation and distribution times, limited by connection protocols (Network speed very fast) o Speed Up still apparent CPU Usage • CPU usage for single process very low Page | 57
  • 58. o To be expected as the CPU is literally doing the least amount at a time during execution of test • CPU usage increases 7 times over for 4 workers o Although the more CPU usage was expected to be consumed, it was not expected to grow this much. • Added CPU for balancing reduces CPU usage to almost half o Considering how much difference there is compared to sequential and concurrent methods, almost halving the stress is a great result 7.6 Local Concurrency Vs Distributed The results trend toward better results in terms of time and processing consumption. It however does not grow exponentially when more CPUs are added. It was assumed when going into the experiments that there would be a boundary for performance based solely on communication times. Judging from the sharp change in CPU usage however, we can conclude that the system does balance the load whilst increasing efficiency in processing. This is logical with more workers, doing more things. With small amounts of work however, sequential processing will yield better results due to the nature of saving small values and little processing needed, compared to moving data around a network. However, small amounts of work would not be what the system was designed for. Page | 58
  • 59. 8 Conclusion 8.1 Has the Project met its Aim and Objectives? The aim of this project was to create a system which can distribute work and regulate set work over multiple computers, ensuring CPU usage does not exceed a specified threshold on each terminal. As the tests in 7.5 show, the functionality to facilitate regulation does exist in the current prototype. The main objectives stated in 1.2 are recapped and addressed below: 1) A method of monitoring CPU usage by implemented in the JVM over multiple CPU’s must be implemented. The Sigar API (and java Beans to a certain extent) afford this functionality. By spawning a thread in the Universal Clients Node Monitor, the monitoring function remains functional throughout execution. It is not affected by other events and allows constant vigilance. Although this project did set out to complete everything at a high level, there was one barrier which could not be dealt with otherwise. It can be argued through that most of the Java native methods run using the JNI is parse C language, so it still conforms as implementation within the JVM. 2) Processes must have a way to be interrupted and saved in their current state. With the system sending process definitions, the position running position of a process using a Process Manager is reflected in the state of the object. By delegating saving responsibility to the subtype in process management, we can essentially pick up the work from a previous running instance. Page | 59
  • 60. As explored in chapter 6, it is impossible to serialize and send threads using high level techniques, this method yields a large amount of efficiency, providing variables are saved in a tolerable fashion. 3) Processes need to a have a way to move and reinitialise at different nodes on different CPU’s. Using the Serializable interface, Channels, Process Managers and objects containing process definitions, this aspect of the system has been successfully implemented and has been rigorously studied. It can be concluded that objectives and aims have been accomplished. The system outlined at inception has been completed, as a proof-of-concept, functional system as long as the user controls the processes introduced. However, during development and implementation, more aspects have been identified which need to be addressed in order to label this project finished. 8.2 Deployment Analysis and Critique 8.2.1 CPU Monitoring Critique With user supervision, the system can be seen to send, receive, run, stop and move processes. The CPU monitoring gives adequate coverage and timely response to spikes in CPU usage. Ideally, MXBeans should be used if implementation can be guaranteed in an environment with no other instances on the JVM running, as the results tend to be more accurate. Using the JNI and C results in CPU polling roughly once for every thousand instructions and gives insight to that instant of process CPU usage information. Information available via Sigar (CPU usage time) does not update all the time, and being instant, can sometime return 0 making viable readings even more infrequent. However, the frequency and scope of accuracy are still adequate for this system to function. 8.2.2 Process Movement Critique Within the time constraints, the project was built to prove that active process migration could be achieved, and the mechanics, and theory behind the actual Page | 60
  • 61. process movement are sound. However, user end process management requires more work. The problem pertains to the amount of process nodes at each Universal Client. As each process definition needs a manager to connect to, a process manager does not suffice for the intended process interaction. So, if more than 4 processes are sent, the Manager Node does not have the option to deal with the excess process read. At this point the Client Server environment breaks down, as the Client is no longer waiting for input, and a deadlock can occur if a process node is in a busy state at the point of reception. Having redundant nodes on the system which receive overloading processes in this case, could relive nodes, or simply having more process nodes instantiated at run time. Adapting this aspect of the system really depends on whether the user has the intention to regulate large amounts of work in a cluster, or wants to use the program in the background of home systems to automate smaller projects. The scalability options of the system in these aspects is a great resource. 8.3 Further Research and Work Aside from user testing, small patches and implementing a targeted application (such as distributed raytracing), identified improvements in functionality have been listed below. 8.3.1 Process Interaction Once processes are distributed currently, the process sent must be a standalone procedure. In this case, the main sever would have to be more involved, keeping note of which processes have been distributed where. The list of current clients could be expanded to be a list of lists, containing the Node address as well as the current processes. If we consider one process at each node for simplicity, cross process interaction could be implemented by doing the following: Page | 61
  • 62. Figure 20. Node interaction diagram 1) A Client would request additional data relevant to the process being ran from the server. 2) The Server, knowing which processes are running in the overall system, would find a node running the needed data and halt its procedure. 3) The required node would confirm it is ready to set up connection with the other node. The requesting node must initiate the setup, to a node which is currently paused, due to the nature of channels having to have an input end set up as a pretence for communication. 4) The new node address would be sent to the initial client where relevant Net Channels would be automatically created, similar to moving when moving processes, for transfer and control mechanics. 5) The nodes when then act like a client and server. The Server node would then send an initiation signal, causing the client node to run, and being transfer. Page | 62
  • 63. With the current infrastructure of the implemented prototype, with some configuration, this new system could be successfully implemented. The framework of this design is not hard to implement in theory, but the semantics and order of communication would have to be thoroughly deliberated upon. 8.3.2 Process Node Quantities This is simply allowing the user to define how many Process Nodes they would like to initialise. In order to keep processing limits within a reasonable window, the users processing capabilities would have to be assed, limiting the amount of concurrent processes. This would also require either the user or developer to have prior knowledge of estimated processing power that each individual process can consumes, otherwise the system could spend a lot of time moving processes. 8.3.3 User Defined Processes Implementing user processes would have to have two specific points of contention: 1) Methods would have to be adapted to conform to Agent classes 2) Code must be runnable. This means code would have to be scanned or tested during run time to ensure all aspects are serializable. This could be done by creating a Test Node which, comprising of a try, catch system which returns exceptions when met. Having runnable code is the main function of the CSProcess class, so methods would have to be identified at input. This could include an interface which asks for variables and the associated process separately. Another method, which involve having some knowledge of the system, would be implementing a wrapper classes which could affix the required connect methods for Agents if the user under stands CSProcesses. Page | 63
  • 64. 8.3.4 Extended Network to Internet This method is easily implementable, but does not conform to the Aims of this report. By simply changing the node and server IP Addresses to public IPs rather than local, the systems scale can be opened up to user in any location. The problem then lies with security. There are currently no security measures in place during communication. Although the mechanisms of the system are not common place in Java, objects are still a universally used data type. 8.3.5 Automated Process Delivery As the system stands, the universal clients are in tasked with acquiring processes. This was implemented to regulate the speed of requests and allow easier debugging. Automated processes can be implemented by keep track of how many processes are running at each node from the Server. If the Server records a Client node with free process Nodes, it can continue to send more processes to the under loaded area. Polling for CPU usage at completion of tasks to indicate whether more processes are needed would result in a well-balanced system overall, but would result in higher volumes of traffic. As previously stated, these options should be aligned to the chosen application of the system and could be user controls put in place at initialisation. 8.4 Reflective Statements During this project there have been multiple setbacks, those which could be avoided and those which were unforeseeable. With most large projects, developers will never be truly happy in what they have accomplished. Despite meeting the initial aims of this investigation and being relatively pleased with the finished product, there are still areas which could have been addressed sooner and shortcomings which will not be repeated in the future. Page | 64
  • 65. 1) Progress trail The first objective during this project was establishing a method of progression monitoring. In turn a blog was created to document progress. However, the first incarnation was hacked after two weeks. 1 This was a major setback in the project and resulted in decline of adequate tracking. In the future, security implementation for a public web space will adhered to. More importantly, a structured, documented development diary will be a higher priority in the future. Keeping track of developments and meetings would have led to much more streamlined approach and a better implementation overall. This also pertains to the week 7 report which took place in the form of a viva voce in the Napier Games Lab in the first week of December 2) Inadequate background understanding Going into this project, I believed I had the sufficient understanding of the fundamental concepts and technologies involved to create this system. Searching for previous attempts at the problem proved fruitless (see appendix A) indicating there was not a lot of reading on the subject. The IPO, although has the same conceptual ethos, talks about accessing system hardware from a High level language and was naïve in considering some of the goals for the time permitted and the level of work expected. However, we never know the depth of our own ignorance as this proved true when half way through the project, I realised that a thread, the main method of running work, was not serializable. With future development, I will ensure that I read not only papers on implementation, but technical documentation on processes and data types to ensure I grasp the conceptual limitations as well as technical limitations. In summation, I have learned that preparation and the process, are just as important as the actual development. Page | 65
  • 66. 3) Time Management For some of the project, personal circumstances dictated lack of work, but time management could have been much better from the start. Work days were established as Wednesdays but this was not particularly adhered to at the start of the project. A gannt chart was drafted but after personal circumstances interfered, it was not reviewed until after half the allotted time had transpired. More tests into the efficiency of the system can still be ran and should be considered part of further work. Page | 66
  • 67. 9 References 1. Austin, P., & Welch, P. (2008). CSP for JavaTM (JCSP) 1.1-rc4 API Specification. Retrieved from CSP for Java: https://www.cs.kent.ac.uk/projects/ofa/jcsp/jcsp-1.1-rc4/jcsp-doc/ 2. Austin, P., & Welch, P. (2008). Interface CSProcess. Retrieved from CSP for Java: https://www.cs.kent.ac.uk/projects/ofa/jcsp/jcsp-1.1-rc4/jcsp- doc/org/jcsp/lang/CSProcess.html 3. Chalmers, K. (2008). Investigating Communicating Sequential Processes For Java To Support Ubiquitous Computing. Edinburgh Napier University. Retrieved April 22, 2016, from https://www.researchgate.net/publication/239568086_INVESTIGATING_COM MUNICATING_SEQUENTIAL_PROCESSES_FOR_JAVA_TO_SUPPORT_U BIQUITOUS_COMPUTING 4. Chalmers, K., Kerridge, J. M., & Romdhani, I. (2007, July 8-11). Mobility in JCSP: New Mobile Channel and Mobile Process Models. Retrieved 04 24, 2016, from ResearchGate: https://www.researchgate.net 5. Chalmers, K., Kerridge, J. M., & Romdhani, I. (2008). A critique of JCSP Networking. The thirty-first Communicating Process Architectures Conference, (pp. 7-10). York: P.H. Welch et al. doi:DOI: 10.3233/978-1-58603-907-3-27 6. Doallo, R., Expósito, R. R., Ramos, S., Taboada, G. L., & Touriño, J. (2013, May 1). Java in the High Performance Computing arena: Research, practice and experience. Science of Computer Programming, 78(5), 425-444. Retrieved April 22, 2016, from http://www.sciencedirect.com/science/article/pii/S0167642311001420 7. Doallo, R., Taboada, G. L., & Juan, T. (2009, April). F-MPJ: scalable Java message-passing communications on parallel systems. The Journal of Supercomputing, 60(1), 117-140. Retrieved April 22, 2016, from http://link.springer.com/article/10.1007/s11227-009-0270-0 8. Funika, W., Godowski, P., & Pęgiel, P. (2008). A Semantic-Oriented Platform for Performance Monitoring of Distributed Java Applications. Computational Page | 67
  • 68. Science – ICCS 2008, 5103 , 233-242. Retrieved April 22, 2016, from http://link.springer.com/chapter/10.1007/978-3-540-69389-5_27#page-1 9. Hoare, C. A. (2004). Communicating Sequentual Processes. C.A.R. Hoare, Prentice Hall International. Retrieved April 22, 2016, from http://www.usingcsp.com/cspbook.pdf 10.Islam, N., & Shoaib, S. (2002, June 24). US Patent No. US 7454458 B2. Retrieved April 22, 2016, from https://www.google.com/patents/US7454458 11.Jenkov, J. (n.d.). Retrieved from http://tutorials.jenkov.com/java- concurrency/java-memory-model.html 12.Kerridge, J. (2014). Using Concurrency and Parallelism Effectively - 2nd edition. BookBoon. 13.Lam, K. T., Luo, Y., & Wang, C.-L. (2010). Adaptive sampling-based profiling techniques for optimizing the distributed JVM runtime. Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on (pp. 1-11). Atlanta: IEEE. doi:10.1109/IPDPS.2010.5470461 14.Lemos, J., Simão, J., & Veiga, L. (2011). A 2 -VM : A Cooperative Java VM with Support for Resource-Awareness and Cluster-Wide Thread Scheduling. On the Move to Meaningful Internet Systems: OTM 2011, 7044, 302-320. Retrieved April 22, 2016, from http://link.springer.com/chapter/10.1007%2F978-3-642-25109-2_20 15. MacEachern, D. (n.d.). (C. Technologies, Producer, & Hyperic) Retrieved from https://support.hyperic.com/display/SIGAR/Home 16. Meddeber, M., & Yagoubi, B. (2010, September 22). Distributed Load Balancing Model for Grid Computing. ARIMA Journal, 12. Retrieved April 22, 2016, from http://arima.inria.fr/012/pdf/Vol.12.pp.43-60.pdf 17. Olivier, S. (2008). Scalable Dynamic Load Balancing Using UPC. 2008 37th International Conference on Parallel Processing. Portland: IEEE. Retrieved April 22, 2016 18. Oracle. (2015 , 02 14). Learn About Java Technology. Retrieved from Java: http://java.com/en/about/ 19. Oracle. (2016). Interface OperatingSystemMXBean. Retrieved from Java™ Platform, Standard Edition 7: https://docs.oracle.com/javase/7/docs/api/java/lang/management/OperatingSy stemMXBean.html Page | 68
  • 69. 20. Shaw, B. (n.d.). Retrieved from http://www.codeproject.com/Articles/30422/How-the-Java-Virtual-Machine- JVM-Works 21. Winias, T. B., & Brown, J. S. (n.d.). Retrieved from http://www.johnseelybrown.com/cloudcomputingpapers.pdf 22. Xoreax Software Ltd. (2016). Incredibuild. Retrieved from Incredibuild Beyond Acceleration: https://www.incredibuild.com/ Page | 69
  • 70. Appendix A. Searched Terms All results from 2005 were considered for inclusion. Some results were duplicated for searches resulting in “0 Relevant” for later searches Checked as of 22/04/2016 • “Load Balancing in Java” : o “Distributed Load Balancing Model for Grid Computing” (Meddeber & Yagoubi, 2010) – Focusses on modellling toppologies of Balancing with basic information on system implementation o “Scalable Dynamic Load Balancing Using UPC” (Olivier, 2008) – Uses Unified Parallel C o “Method and system for application load balancing” (US Patent No. US 7454458 B2, 2002) – Patent for similar system with no implementation. Only conceptual with ambiguity in implementation. • “CPU load balancing in Java” : o “A Semantic-Oriented Platform for Performance Monitoring of Distributed Java Applications” (Funika, Godowski, & Pęgiel, 2008) – Platform for monitoring resources for online Java technologies • “Java cluster computing” o “Java in the High Performance Computing arena: Research, practice and experience” (Doallo, Expósito, Ramos, Taboada, & Touriño, 2013) – Looks into the methods facilitating the possibilities of High Performance code using Java (Shared memory model, MPI etc...) o “F-MPJ: scalable Java message-passing communications on parallel systems” (Doallo, Taboada, & Juan, F-MPJ: scalable Java message- passing communications on parallel systems, 2009) – Different MPI implementation Document • “Load balancing cluster computing Java” : 0 Relevant • “CPU balancing cluster Java” : 0 Relevant Page | 70
  • 71. • “Load balancing cluster JVM” : o “A 2 -VM : A Cooperative Java VM with Support for Resource- Awareness and Cluster-Wide Thread Scheduling” (Lemos, Simão, & Veiga, 2011) – Cluster infrastructure for Cloud computing systems o “Adaptive sampling-based profiling techniques for optimizing the distributed JVM runtime” (Lam, Luo, & Wang, 2010) – Builds a system based on global variable for cluster, paying closed attention to thread stacks • “Load balancing cluster JCSP” : 0 Relevant • “Load balancing asynchronous cluster Java” : 0 Relevant • “CPU monitoring load balance cluster Java” : 0 Relevant • “Cluster process sending Java” : 0 Relevant Page | 71
  • 73. Appendix Item 1. Basic concepts Page | 73
  • 74. Appendix Item 2. Agent structure Page | 74
  • 75. Appendix Item 3. Ring implementation Conversation Page | 75
  • 76. Appendix Item 4. Ring Evolution Page | 76
  • 77. Appendix Item 5. Extended Ring Elements Page | 77
  • 78. Appendix Item 6. Implementing Agent Channels Page | 78
  • 79. Appendix Item 7. Losing the Ring Page | 79
  • 80. Appendix Item 8. Closed Client Server Page | 80
  • 81. Appendix Item 9. Client Server with Managers Page | 81
  • 82. Appendix Item 10. Interacting Processes Page | 82
  • 83. Further comments and discussion can be found at http://honsproject.calumbeck.com/ Page | 83
  • 84. C. Github analytics Appendix Item 11. Work distribution by day Appendix Item 12. Git Activity Concentrations Page | 84
  • 85. Appendix Item 13. Busy commit periods Page | 85
  • 86. Initial Project Overview Initial Project Overview SOC10101 Honours Project (40 Credits) Title of Project: CPU Load Balancer Overview of Project Content and Milestones The Main Deliverable(s): I intend to create a system which monitors CPU core usage over a cluster of computers and calls another terminal to take on more load when one is starting to reach maximum capacity; increasing speed and efficiency overall. The system will implement the use of Agents which will move around the system, arriving at each node (processor or core in this case) and connect to their main processing stack to ascertain the current efficiency. Once finished, the Agent disconnects and then moves itself on to the next core in the system. Using multiple agents will be a goal for the project and attaining basic concurrency will be the first milestone event. As such, the system will be designed and implemented using the GROOVY 2.3 libraries for Java. This allows the user to easily manipulate threads at a high level through the predominant use of message passing. It is not certain whether a hybrid of message passing and shared memory will be possible to attain as it is noted that pure message passing has a large overhead for copying messages from one process to another. This is not a problem at a high level of programming, but at CPU or even GPU instructions speeds, it is worth mentioning at the point that it’s not certain whether will have a positive or negative impact. Testing in the system will include the use of software metrics to ensure results are expected in certain situations such as the coherency of specific function calls at point of load shifting. CPU usage will be constantly observed and compared with different methodologies and will be documented and collated in full throughout the whole report. The final product will be discreet during use and will not increase overhead processing between operations when Agents are idle or during their transit between nodes. It will be easy initiate and close with a basic visual monitoring system for the user including concrete feedback for changes or problems. It should automatically detect the amount of cores in use and be proficient over different architectures although intel based chips will be the basis for development. It is not obvious at the moment whether the use of hyper threading in conjunction may be possible, but it will be documented when attempted. Page | 86
  • 87. The Target Audience for the Deliverable(s): As the system will spread over multiple computers, it will be hindered by physical restraints and associated speeds ramifications. Hence, as proof of concept, the system will handle large computation problems which will not be I/O dependent. As such, the system will be used to aid with large computations or those in need of make shift data farms. The Work to be Undertaken: • Design a system which allows concurrent processing in a cluster computing environment • Dealing with interaction with other devices over network o Adapting system to work on Mobile Devices • Comparative analysis of communication methods (i.e. Ethernet, Wi-Fi etc.) o Analysis of result output in correlation with message passing parameters • Comparative tests on different hardware architectures Additional Information / Knowledge Required: • Java Language o Groovy library knowledge • Concurrent and Parallel architecture knowledge • Fundamental Android understanding (for mobile development) • CPU usage metrics Information Sources that Provide a Context for the Project: Background and Rationale: Computer hardware has evolved and so has the amount we attempt to implement at any given point. From the initial single core processors to the Octocores of today, engineers have strived to have the most powerful computers, greater speeds. However, over time it’s become apparent that the implementation methods we have been working from and towards are starting to level off. In the past, the first step in augmenting any computer in terms of speed and performance has been reducing transistors size and increasing speed henceforth. Co-founder of Intel Gordon E. Moore stated that the number of transistors able to fit on a processor would double every 18 months, fundamentally increasing the speed of computer for at least the Page | 87
  • 88. next decade. This model of thought is still used regularly in the computing industry today, however it was initially stated in 1965 and since then, many things have changed. The problem we are met with today is distance, heat and conduction. The physical size we are hitting on distance between cache memory and cores is become reduced, more and more. We are starting to hit almost instantaneous transmissions and this comes with another set of problems. Heat is generated when a CPU core is pushed to compute at the rates we demand and can require more intricate ways to cool the system, and this can all be down to bad allocation of resources. We hence need to look at how we balance our work. Software needs to reflect the modern multitasking environment that we have come to expect and hence, must change in order to cope with increasing demand as hardware cannot be relied on to be the sole supporter in this venture. I plan to build a system which allows a proper allocation of resources available and increased the efficiency of hardware use in order to achieve a faster, reliable system. The Importance of the Project: This project will be proof of concept for using multiple computers in a personal environment to complete large computational problems with little impact on performance on a whole in a discreet manor. The Key Challenge(s) to be Overcome: The initial challenge will be to ascertain whether an agent can become active on CPU usage getting to a certain level on a terminal. On activation, the agent will report to a central repository of addresses and move to a new terminal with lower CPU usage. From here it should be able to display message on this machine. This will be done as outlined below: • Use Monte Carlo algorithm to processes large computation o Create Agent to look at CPU usage o CPU usage should report high o Have Agent report to another resource o Println “I am overloaded” o Then build an event handler that has access to the channel which is waiting for input from the processor From here, we can then move onto moving key data. The intention is to create a central repository of agents which then looks for a node which does not have an agent active. From here we can move resources to the new processor. The biggest challenge to overcome if the above system is complete in due course is to be able to implement on a single CPU. Using cores would be the ultimate goal to spread even use on one terminal but in choosing Java as the main platform, the JVM involved gives little potential for working over cores. Using a different language could be an answer but would require a large amount of research and development. For the time being, what is detailed in the main deliverables is the main aim. Page | 88