SlideShare a Scribd company logo
1 of 11
Download to read offline
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
DOI : 10.5121/ijcnc.2015.7408 115
PERFORMANCE IMPROVEMENT BY
TEMPORARY EXTENSION OF AN HPC CLUSTER
Pil Seong Park
Department of Computer Science, University of Suwon, Hwasung, Korea
ABSTRACT
Small HPC clusters are widely used in many small labs since they are easy to build and cost-effective.
When more power is needed, instead of adding costly new nodes to old clusters, we may try to make use of
the idle times of some servers in the same building, that work independently for their own purposes,
especially during the night. However such extension across a firewall raises not only some security
problem but also a load balancing problem caused by heterogeneity of the resulting system. In this paper,
we devise a method to solve such problems using only old techniques applicable to our old cluster systems
as is, without requiring any upgrade for hardware or software. We also discuss about how to deal with
heterogeneity and load balancing in application, using a two-queue overflow queuing network problem as
a sample problem.
KEYWORDS
HPC clusters, Security, NFS, SSH tunnelling, Load balancing
1. INTRODUCTION
A variety of architectures and configurations have appeared according to the desire of getting
more computing power and higher reliability by orchestrating a number of low cost commercial
off-the-shelf processors. One such example is a computer cluster, which consists of a set of
loosely or tightly coupled computers that are usually connected to each other through a fast LAN
and work together so that they can be viewed as a single system in many respects.
Many kinds of commercial clusters are already on the market. However the technologies to build
similar systems using off-the-shelf components are widely known (e.g., see [1]), and it is easy to
build a low-cost small cluster [2]. Many small labs commonly build their own cluster, and
gradually grow it by adding more dedicated nodes later. However, instead of adding dedicated
nodes, if there are some nodes that are being used for other purposes on the same LAN, we may
try to utilize their idle times as long as they are not always busy enough, especially during the
night. The Berkeley NOW (Network of Workstations) project is one of such early attempts to
utilize the power of clustered machines on a building-wide scale [3]. However such extension
gives rise to difficulties in security, administering the cluster, and load balancing for optimal
performance.
In this paper, we consider some technical issues arising in extending a small Linux cluster to
include some non-dedicated nodes in the same building across a firewall. Of course there are
already various new good methods that can be used to avoid such technical issues. However we
do not adopt such state-of-the-art technologies, since the sole purpose of this paper is to achieve
our goal with our old HPC cluster as is, without any hardware or software upgrade. Some results
of the experiments to evaluate the performance of the extended system are given at the end.
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
116
2. BACKGROUND
2.1. HPC Clusters
Computer clusters may be configured for different purposes. Load-balancing (LB) clusters are
configurations in which nodes share computational workload like a web server cluster. High-
performance computing (HPC) clusters are used for computation-intensive purposes, rather than
handling IO-oriented operations. High-availability (HA) clusters improve the availability of the
cluster, by having redundant nodes, which are then used to provide service when system
components fail. The activities of all compute nodes are orchestrated by a cluster middleware that
allows treating the whole cluster system via a single system image concept.
Well-known HPC middlewares based on message passing are the Message Passing Interface
(MPI) [4] and the Parallel Virtual Machine (PVM) [5], the former being the de facto standard.
Many commercial or non-commercial implementation libraries have been developed in
accordance with the MPI standard. LAM/MPI, FT-MPI, and LA-MPI are some of widely used
non-commercial libraries, and their technologies and resources have been combined into the on-
going Open MPI project [6].
An HPC cluster normally consists of the dedicated nodes that reside on a secure isolated private
network behind a firewall. Users normally access the master/login node (master, for short) only,
which sits in front of compute nodes (slave nodes, or slaves for short). All the nodes in an HPC
cluster share not only executable codes but also data via insecure NFS (Network File System). It
is perfectly all right as long as the whole cluster nodes are on a secure LAN behind a firewall.
However, to extend the cluster to include other nodes outside of the firewall, we will be
confronted with many problems with security and data sharing among nodes. Moreover, such
growth results in a heterogeneous cluster with nodes of different power, possibly running
different Linux versions.
2.2. Linux/Unix File Sharing Protocols
NFS, created by Sun Microsystems in 1984, is a protocol to allow file sharing between
Linux/UNIX systems residing on a LAN. Linux NFS clients support three versions of the NFS
protocol: NFSv2 (1989), NFSv3 (1995), and NFSv4 (2000). However the NFS protocol as is has
many problems in extending the cluster, since its packets are not encrypted and due to other
shortcomings to be discussed later.
Other alternatives to NFS include AFS (Andrew File System), DFS (Distributed File System),
RFS (Remote File System), Netware, etc. [7]. There are also various clustered file systems shared
by multiple servers [8]. However we do not adopt such new technologies since they are not
supported by our old cluster’s hardware and software.
2.3. SSH Tunnelling
Encryption of NFS traffic is necessary for secure extension across a firewall. One of the common
techniques that are commonly used is known as cryptographically protected tunnelling. An IP-
level or TCP-level stream of packets is used to tunnel application-layer segments [9]. A TCP
tunnel is a technology that aggregates and transfers packets between two hosts as a single TCP
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
117
connection. By using a TCP tunnel, several protocols can be transparently transmitted through a
firewall. Under certain conditions, it is known that the use of a TCP tunnel severely degrades the
end-to-end TCP performance, which is called TCP meltdown problem [10].
The SSH protocol allows any client and server programs to communicate securely over an
insecure network. Furthermore, it allows tunnelling (port forwarding) of any TCP connection on
top of SSH, so as to cryptographically protect any application that uses clear-text protocols.
3. TEMPORARY EXTENSION OF AN HPC CLUSTER ACROSS A
FIREWALL
We would like to extend our old cluster (consisting of the master and slave nodes in Figure 1) to
include some other nodes, say Ext1 and Ext2, outside of a firewall, as shown in Figure 1. The
master node generally has attached storage that is also accessible by diskless slave nodes using
insecure NFS. Since NFS relies on the inherently insecure unencrypted UDP protocol (up to
NFSv3), transactions between host and client are not encrypted, and IP spoofing is possible.
Moreover, firewall configuration is difficult because of the way NFS daemons work.
Figure 1. Cluster extension to include external nodes Ext1 and Ext2 across a firewall.
3.1. NFS over TCP and Fixing NFS Ports
SSH tunnelling, which is based on SSH port forwarding, is commonly used to encrypt some
unencrypted packets or to bypass firewalls, e.g., see [9,11]. SSH tunnels support only TCP
protocols of fixed ports, but NFS uses UDP protocols by default, and the ports of some daemons
essential for the operation of NFS are variable.
Fortunately NFS over TCP protocols are also supported from the Linux kernel 2.4 and later on the
NFS client side, and from the kernel 2.4.19 on the server side [12]. Since all the nodes of our old
cluster satisfy this condition, we can make NFS work over TCP just by specifying the option "-o
tcp" in the mounting command. The following shows an example of mounting the NFS server’s
/nfs_dir directory on the client’s mount_pt.
# mount -t nfs -o tcp server:/nfs_dir mount_pt
For tunnelling, we need fix NFS ports. The daemons essential for NFS operation are 2 kinds.
Portmapper (port 111) and rpc.nfsd (port 2049) use fixed ports, while rpc.statd, rpc.lockd,
rpc.mountd, and rpc.rquotad use ports that are randomly assigned by the operating system.
However the ports of the latter can be fixed by specifying port numbers in appropriate
configuration files [13], as shown below.
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
118
The ports of the first three (i.e., rpc.statd, rpc.lockd, and rpc.mountd) can be fixed by adding the
following lines in the NFS configuration file /etc/sysconfig/nfs, for example,
STATD_PORT=4001
LOCKD_TCPPORT=4002
LOCKD_UDPPORT=4002
MOUNTD_PORT=4003
The rest ports can be fixed by defining a new port number 4004 in the configuration file
/etc/services, for example
rquotad 4004/tcp
rquotad 4004/udp
3.2. Setting up an SSH Tunnel
For tunnelling of NFS, it is necessary for the server to mount its own NFS directory to be
exported to clients. Hence on the server side, the configuration file /etc/exports has to be modified,
for example, for exporting its /nfs_dir to itself,
/nfs_dir localhost (sync,rw,insecure,root_squash)
where “insecure” means it allows connection from ports higher than 1023, and “root_squash”
means it squashes the root permissions for the client and denies root access to access/create files
on the NFS server for security.
Now an SSH tunnel has to be set up from the client’s side. On the client side, to forward the ports
11000 and 12000, for example, to the fixed ports 2049 (rpc.nfsd) and 4003 (rpc.mountd),
respectively, we can use the command
# ssh nfssvr -L 11000:localhost:2049 -L 12000:localhost:4003 -f sleep 600m
where "nfssvr" is the IP or the name of the NFS server registered in the configuration file
/etc/hosts, and "-f sleep 600m" means that port forwarding is to last for 600 minutes in the
background.
Figure 2. Local port forwarding for SSH tunnelling.
When connected to the NFS server, an SSH tunnel will be open if the correct password is entered.
Then the NFS server’s export directory can be mounted on the client’s side. The following
command is an example to mount the /nfs_dir directory of the NFS server on the /client_dir
directory of the client.
# mount -t nfs -o tcp,hard,intr,port=11000, mountport=12000 localhost:/client_dir
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
119
3.3. Suggested Structure
Even though the NFS through an SSH tunnel is encrypted, it has a serious drawback if we cannot
completely trust the local users on the NFS server [14]. For example, if some local user on the
NFS server can login on the server and create an SSH tunnel, any ordinary user on the server can
mount the file systems with the same rights as root on the client.
One possible solution might be prohibiting local users' direct login to the NFS server to prevent
creating an SSH tunnel. One simple way is changing all local users' login shells from /bin/bash to
/sbin/nologin in the file /etc/passwd. However it causes another problem. In general, the master
node normally works as the NFS server too in small HPC clusters. However local users should be
allowed to login the master node to use the cluster, which is dangerous when using NFS through
an SSH tunnel, as was pointed out previously.
Figure 3 is an alternative of the structure of an extended HPC cluster that takes everything into
account. The structure has a separate NFS server which does not allow local users' login. An SSH
tunnel can be created by the root user on Ext1 or Ext2, and local users just login on the master
node to use the cluster.
Figure 3. The suggested structure
The firewall setting of the master node does not need any modification. However, since the NFS
service is provided through SSH tunnelling, it is required for the NFS server to open the port 22
only to the computation nodes outside of its firewall.
The NFS server should be configured so that it releases as little information about itself as
possible. For example, services like portmapper should be protected from outside. And it is
suggested to specify explicitly the hosts that are allowed to access all the services on the NFS
server. This can be done by setting ALL:ALL in the configuration file /etc/hosts.deny, and listing
only the hosts (or their IPs) together with the services which are allowed to access, in the file
/etc/hosts.allow.
The use of “rsh” command, which is not encrypted, is common on old HPC clusters with old
middlewares. Fortunately LAM/MPI v.7.1.2 allows SSH login with an authentication key but
without a password, e.g., see [15].
4. DYNAMIC LOAD BALANCING
The original HPC cluster may be homogeneous, i.e., the power of all nodes may be the same.
However, the extended cluster may be heterogeneous or may behave like heterogeneous one even
if all the nodes have the same power, since the communication speeds between nodes are not the
same depending on various factors: the types of network, existence of firewall, and encryption.
Moreover the workload of the non-dedicated nodes outside of the firewall may change
continually, and the communication speed may change continually because of the traffic on the
outside LAN. Hence we need to use a dynamic run-time load balancing strategy [16], while
initially assigning equal amount of work to the original slave nodes of the cluster.
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
120
Data partitioning and load balancing have been important components in parallel computation.
Since earlier works (e.g., see [17]), many authors have studied load balancing using different
strategies on dedicated/non-dedicated heterogeneous systems [18,19,20,21], but it was not
possible to find related works on the security problems arising in cluster expansion, which is
more technical rather than academic.
4.1. The Sample Problem
Dynamic load balancing strategies depend on the problem we deal with, and we need introduce
our problem first. We would like to solve the old famous two-queue overflow queuing problem
given by the Kolmogorov balance equation
( ) ( )
( ) ( )
1 2 2
1
1 , 1 , 1 2 , 1 1 1 2 2 ,
1 ,0 1, 1 , 1 1 1,
[ 1 1 m in ( , ) m in ( , )]
= 1 1 m in ( 1, ) , , 1, 2 ,
i n j n j n i j
i i j i n i j
i s j s p
p i s p i j
λ δ δ λ δ µ µ
λ δ µ δ
− − −
− − +
− + − + +
− + − + =
where δi,j is the Kronecker delta, pi,j is the steady-state probability distribution giving the
probability that there are ij customers in the j-th queue. It is assumed that, in the i-th queue, there
are si parallel servers and ni-si-1 waiting spaces. Customers enter the i-th queue with mean arrival
rate λi, and depart at the mean rate µi. The model allows overflow from the first queue to the
second queue if the first queue is full. The total number of possible states is n1×n2.
It is known that no analytic solution exists, and the balance equation has to be solved explicitly.
We can write the discrete equation as a singular linear system 0A =x , where x is the vector
consisting of all states pi,j’s. Even for systems with relatively small numbers of queues, waiting
spaces, and servers per queue, the size of the resulting matrix is huge. The numbers of waiting
spaces in our problem are 200 and 100 in each queue, respectively, and the resulting matrix size is
20,000ⅹ20,000.
4.2. Job Assignment by Domain Decomposition
It is easy to see that the graph of the resulting matrix of a k-queue problem is the same as the
graph of the matrix corresponding to the k-dimensional Laplacian operator. Thus the graph of the
matrix A is a two-dimensional rectangular grid.
As an example, Figure 4.a) shows the grid of a two-queue overflow model with 9ⅹ5 = 45 states,
where each dot corresponds to some state pi,j, i=1,2,…9 and j=1,2….5. If we use 3 computation
nodes (or processes, one at each node), we may assign 15 grid points to each node. However to
compute the values at boundary points (in vertical dotted boxes), each node needs the values at
the boundary points to be computed by its adjacent nodes. As a result, node 0 and node 2 should
have enough memory to store 20 grid points (or 4 vertical grid lines) and node 2 needs to store 25
grid points (or 5 vertical grid lines), even though each node updates only 15 grid points (or 3
vertical grid lines). The extra grid points are for storing the values to be received from adjacent
nodes.
Figure 4.b) shows partitioning of the corresponding matrix A of size 45ⅹ45, where each non-zero
entry is marked by a dot. Each submatrix Ai is of size 5ⅹ45, and for better load balancing,
workload will be assigned in units of one vertical line (that corresponds to the computation with
one submatrix Ai) on the 2D grid. Since we deal with a huge matrix problem of size
20,000ⅹ20,000, one unit workload is computation with a 100ⅹ20,000 submatrix Ai.
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
121
0
5
10
15
20
25
30
35
40
45
403020100
Process 2
Process 1
A7
A8
A9
A6
A5
A4
A3
A2
A1
Process 0
0
nz=197
a) b)
Figure 4. Examples showing a) job assignment to 3 nodes, and b) matrix partitioning
for a 9ⅹ5 = 45 state two-queue overflow queuing model.
To solve the singular system 0A =x , we use a splitting method A=D-C where D is the diagonal
matrix that consists of all diagonal entries of A, and C=D-A. Then we can rewrite D C=x x
which leads to a convergent eigenvalue problem 1
D C−
=x x . For convergence test, we use the
size of the residual defined by
1
2 2 2|| r || || ( ) || where || || 1.I D C−
= − =x x
4.3. Communication for Asynchronous Iteration
In normal synchronous parallel computation, each node updates the values of the grid points
assigned to it in lockstep fashion, so that it can give the same result as that by serial computation.
During computation, each node should communicate with its adjacent nodes to exchange
boundary values, and all nodes should synchronize computation.
The workload of the added nodes Ext1 and Ext2 may change dynamically at all times, and it is
unpredictable and uncontrollable. If any of these is busy doing some other jobs, it cannot
communicate with other nodes immediately, degrading overall performance by making all other
nodes wait. Asynchronous algorithms appeared naturally in parallel computation and have been
heavily exploited applications. Allowing nodes to operate in an asynchronous manner simplifies
the implementation of parallel algorithms and eliminates the overhead associated with
synchronization. Since 1990s, extensive theories on asynchronous iteration have been developed,
e.g., see [22]. Fortunately, the matrix A in our problem satisfies the necessary condition for a
matrix to satisfy for convergence of asynchronous iteration. Hence we adopt asynchronous
iteration freely, but we need some more work for implementation details.
We assume the master node is always alive and is not busy so that it can immediately handle
communication with other nodes. Data exchange between any two nodes should be done via the
master node, and the master node should always keep the most recent values at all grid points.
But all communication should be initiated by compute nodes (including the added non-dedicate
nodes), and the master node should just respond to compute nodes’ request.
We do not present our full asynchronous parallel algorithm, but show only the key algorithm that
allows asynchronous computation. Figure 5 is the master’s algorithm using MPI functions to
handle other nodes’ request. The master node continually checks if any message has arrived from
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
122
any other nodes by calling the MPI_Iprobe( ) function. If there is any message arrived, it then
identifies the sender, tag, and sometimes the number of data received if necessary. Based on the
tag received, the master takes some appropriate action. For example, if the received tag is 1000, it
means “Receive the boundary data I send, and send the boundary data I need.”, etc. Then any
compute node can continue its work without delay.
do {
MPI_Iprobe(MPI_ANY_SOURCE,MPI_ANY_TAG,communicator,&flag,&status);
if (flag) { // TRUE if some message has arrived.
sender=status.MPI_SOURCE; // Identify the sender
tag=status.MPI_TAG; // Identify the type of the received message.
If ( some data has been received )
MPI_Get_count(&status,MPI_datatype,&n); // Identify no. of data received.
// Depending on the value of “tag”, take an appropriate action.
}
} until all other nodes finish computation
Figure 5. A master’s algorithm to handle other nodes’ request.
We also adopt a dynamic load balancing strategy by measuring the time interval between two
consecutive data requests by the same node. It is necessary only for non-dedicated nodes, since
the others are dedicated and have the same computational power. The workload of the non-
dedicated node should be reduced if
N t
n
T
α<
where N is the average workload (in units of the number of vertical grid lines), T is the average
time interval between two consecutive data requests by the same dedicated node, t is the time
interval between two consecutive requests of non-dedicated nodes, and α is some tolerance
parameter less than 1 (we used 0.8). Similarly, the workload of the non-dedicate node is increased
if
N t
n
T
β>
for some β (we used 1.2).
In adjusting workloads, we used a simpler strategy. When one of these conditions is satisfied, we
just decrease/increase two boundary grid lines of non-dedicated nodes and assigned them to/from
adjacent nodes. For this, to each non-dedicated node, we assigned the grid region which is
between the two regions assigned to dedicated nodes. It takes very little time to re-assign grid
lines to some other node, since the matrix is very sparse (just 5 nonzero entries in each row) and
any submatrix Ai can be generated by each node instantly whenever necessary.
5. PERFORMANCE TESTS
Our old HPC cluster we want to extend has 2 nodes alive, each with dual 1GHz Pentium 3 Xeon
processors running Fedora Core 4, with LAM/MPI v.7.1.2 installed. The nodes are interconnected
via a Gigabit LAN, and NFS is used for file sharing. Both of the non-dedicated nodes Ext1 and
Ext2 that are on the outside fast Ethernet LAN have a 2.4 GHz Intel® Core™ i5 CPU with 8 GB
memory.
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
123
The performance of the NFS protocol through SSH tunnelling will inevitably drop due to
encryption overhead. Communication speed tests were performed between the NFS server node
and the node Ext1, using UDP or TCP, and with or without SSH tunnelling across the firewall.
The times it took to read or write a file of size 1GB from the exterior node Ext1 were measured at
varying NFS block sizes. Since NFSSVC_MAXBLKSIZE (maximum block size) of the NFS in
Fedora Core 4 is 32*1024 (e.g., see /usr/src/kernels/2.6.11-1.1369_FC4-i686/include/linux/
nfsd/const.h), tests were performed at block sizes of 4k, 8k, 16k, and 32k, respectively, 3 times
for each and the results are averaged. In addition, to delete any remaining data in cache, NFS file
system was manually unmounted and mounted again between tests.
The following are example commands that first measure the time taken to create the file
/home/testfile of size 1GB on the NFS server and read it using block size 16KB.
# time dd if=/dev/zero of=/home/testfile bs=16k count=65536
# time dd if=/home/testfile of=/dev/null bs=16k
The results of the NFS performance test, with or without SSH tunnelling over the fast Ethernet
LAN are given in Table 1. For the common NFS without tunnelling, the figures in parentheses are
the times it took when TCP is used, and others are when UDP is used.
As is expected, the larger the NFS block size, the faster in all cases. For the common NFS without
SSH tunnelling, write operation using UDP is slightly faster than TCP, but it is not the case for
read operation. Moreover the NFS with SSH tunnelling takes 3.9%-10.1% more time for write
and 1.4-4.3% more for read, than the common NFS using UDP.
Table 1. The speed of NFS, with or without SSH tunnelling (sec).
Block
size
Common NFS SSH tunnelled
Read Write Read Write
4K
96.12
(95.31)
119.37
(120.95)
100.29 131.45
8K
95.07
(94.43)
115.74
(120.31)
98.12 122.13
16K
93.85
(92.21)
114.70
(118.32)
95.31 121.72
32K
92.51
(91.45)
112.35
(115.87)
93.89 116.83
As long as the NFS block size is taken as large as possible, the tunnelling overhead may not be
large, since the external nodes outside of the firewall need not read or write so often through NFS,
which is common in high performance parallel computation.
Figure 6 shows the decrease pattern of residual until it reaches the size less than 10-5
. The line
marked with A is by synchronous computation using only the dedicated nodes of our old cluster
which has 4 cores in total. The line marked with B is the result when we add the node Ext1, and
that marked with C is when we add both Ext1 and Ext2 (we created just 1 MPI process on each
node Ext1 or Ext2). The size of the residual and elapsed time were checked every time the first
slave reports the intermediate result. The running times to converge to the result with residual less
than 10-5
were 955.47 sec for curve A, 847.24 sec for curve B, and 777.08 sec for curve C,
respectively. Hence the speedup by B and C are 1.129 and 1.231, respectively.
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
124
Figure 6. Comparison results.
The result is somewhat disappointing, since the speed of the CPU core on Ext1 or Ext2 is almost
twice as fast as that of the old cluster. Probably it is mainly due to much slower network speed
(fast Ethernet) of the outside network on which both Ext1 and Ext2 reside. Other reasons might
be due to some extra workload imposed on them, and also due to the network traffic which
influences the communication speed.
5. CONCLUSIONS
Small HPC clusters are widely used in many small labs, because they are easy to build, cost-
effective, and easy to grow. Instead of adding new nodes, we can extend the clusters to include
some other Linux servers in the same building, so that we can make use of their idle times.
However, unlike a tightly-coupled HPC cluster behind a firewall, the resulting system suffers a
security problem with NFS which is vital for HPC clusters. In addition, the resulting system
becomes heterogeneous.
Of course there are many new good solutions using recent technologies. However we do adopt
such strategy, because they require upgrade of hardwares and/or softwares including the operating
system. Instead we devise a solution using SSH tunnelling, which can be applied to the old
system as is. Probably this approach may be helpful for many small old cluster systems. We also
considered a dynamic load balancing method which is based on overlapped domain
decomposition and asynchronous iteration, using a two-queue overflow queuing network problem
of matrix size 20,000ⅹ20,000. The speedup obtained by using extra processes on a much faster
non-dedicated nodes is somewhat disappointing, mainly because of the slow network speed of the
fast Ethernet between the cluster and the exterior nodes. However we can expect much higher
speedup if the outer network is upgraded to a Gigabit LAN, for example.
ACKNOWLEDGEMENTS
This work was supported by the GRRC program of Gyeonggi province [GRRC SUWON2014-B1,
Intelligent Context Aware & Realtime Security Response System].
REFERENCES
[1] G. W. Burris, "Setting Up an HPC Cluster", ADMIN Magazine, 2015, http://www.admin-
magazine.com/HPC/Articles/real_world_hpc_setting_up_an_hpc_cluster
[2] M. Baker and R. Buyya, "Cluster Computing: the commodity supercomputer", Software-Practice and
Experience, vol.29(6), 1999, pp.551-576.
[3] Berkeley NOW Project, http://now.cs.berkeley.edu/
International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015
125
[4] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. J. Dongarra, MPI: The Complete Reference.
MIT Press, Cambridge, MA, USA, 1996.
[5] G. A. Geist, A. Beguelin, J. J. Dongarra, W. Jiang, R. Manchek, and V. S. Sunderam, PVM: Parallel
Virtual Machine: A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press,
Cambridge, MA, USA 1994.
[6] Open MPI, http://www.open-mpi.org
[7] Linux NFS-HOWTO, http://nfs.sourceforge.net/nfs-howto/
[8] Clustered file system, http://en.wikipedia.org/wiki/Clustered_file_system
[9] M. Dusi, F. Gringoli, and L. Salgarelli, "A Preliminary Look at the Privacy of SSH Tunnels", in Proc.
17th International Conference on Computer Communications and Networks(ICCCN '08), 2008.
[10] O. Honda, H. Ohsaki, M. Imase, M. Ishizuka, and J. Murayama, "Understanding TCP over TCP:
Effects of TCP tunneling on end-to-end throughput and latency," in Proc. 2005 OpticsEast/ITCom,
Oct. 2005.
[11] Port Forwarding Using SSH Tunnel, http://www.fclose.com/b/linux/818/port-forwarding-using-ssh-
tunnel/
[12] Linux NFS Overview, FAQ and HOWTO, http://nfs.sourceforge.net/
[13] http://www.linuxquestions.org/questions/linux-security-4/firewall-blocking-nfs-even-though-ports-
are-open-294069/
[14] Linux NFS Overview, FAQ and HOWTO, http://nfs.sourceforge.net/
[15] ssh-keygen: password-less SSH login, http://rcsg-gsir.imsb-dsgi.nrc-cnrc.gc.ca/documents/ internet/
node31.html
[16] L. V. Kale, M. Bhandarkar, and R. Brunner, "Run-time Support for Adaptive Load Balancing", in
Lecture Notes in Computer Science, Proc. 4th Workshop on Runtime Systems for Parallel
Programming (RTSPP) Cancun - Mexico, vol.1800, 2000, pp.1152-1159.
[17] M. Zaki, W. Li, and S. Parthasarathy, "Customized Dynamic Load Balancing in a Heterogeneous
Network of Workstations", in 1996 Proc. 5th IEEE Int. Symposium on High Performance Distributed
Computing.
[18] M. Eggen, N. Franklin, and R. Eggen, "Load Balancing on a Non-dedicated Heterogeneous Network
of Workstations." in International Conference on Parallel and Distributed Processing Techniques and
Applications (PDPTA 2002), June 2002.
[19] J. Faik, L. G. Gervasio, J. E. Flaherty, J. Chang, J. D. Teresco, E.G. Boman, and K. D. Devine, "A
model for resource-aware load balancing on heterogeneous clusters", Tech. Rep. CS-03-03, Williams
College Dept. of Computer Science, http://www.cs.williams.edu/drum/, 2003.
[20] I. Galindo, F. Almeida, and J. M. Badia-Contelles, "Dynamic Load Balancing on Dedicated
Heterogeneous Systems", Recent Advances in Parallel Virtual Machine and Message Passing
Interface, Lecture Notes in Computer Science, vol.5205, 2008, pp 64-74
[21] K. Lu, R. Subrata, and A. Y. Zomaya, "On the performance-driven load distribution for
heterogeneous computational grids", J. of Computer and System Sciences vol.73, 2007, pp.1191-
1206.
[22] M. P. Raju and S. Khaitan, "Domain Decomposition Based High Performance Computing",
International J. of Computer Science Issues, vol.5, 2009,pp.27-32.
AUTHOR
Prof. Pil Seong Park received his B.S. degree in Physical Oceanography from Seoul
National University, Korea in 1977, and M.S. degree in Applied Mathematics from Old
Dominion University, U.S.A. in 1984. He received his Ph.D. degree in Interdisciplinary
Applied Mathematics (with emphasis on computer science) from University of Maryland at
College Park, U.S.A in 1991. He worked as the director of Computer & Data Center at
Korea Ocean Research & Development Institute from 1991 to 1995. Currently, he is a professor of the
Department of Computer Science, University of Suwon in Korea. His research interest includes high
performance computing, Linux clusters, digital image processing, and knowledge-based information
systems. He is a member of several academic societies in Korea.

More Related Content

What's hot

Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascale
inside-BigData.com
 
Memory management in vx works
Memory management in vx worksMemory management in vx works
Memory management in vx works
Dhan V Sagar
 
LF_DPDK17_GRO/GSO Libraries: Bring Significant Performance Gains to DPDK-base...
LF_DPDK17_GRO/GSO Libraries: Bring Significant Performance Gains to DPDK-base...LF_DPDK17_GRO/GSO Libraries: Bring Significant Performance Gains to DPDK-base...
LF_DPDK17_GRO/GSO Libraries: Bring Significant Performance Gains to DPDK-base...
LF_DPDK
 
Windows Internals for Linux Kernel Developers
Windows Internals for Linux Kernel DevelopersWindows Internals for Linux Kernel Developers
Windows Internals for Linux Kernel Developers
Kernel TLV
 
Linux process management
Linux process managementLinux process management
Linux process management
Raghu nath
 
LF_DPDK17_Accelerating P4-based Dataplane with DPDK
LF_DPDK17_Accelerating P4-based Dataplane with DPDKLF_DPDK17_Accelerating P4-based Dataplane with DPDK
LF_DPDK17_Accelerating P4-based Dataplane with DPDK
LF_DPDK
 
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Hien Nguyen Van
 
Kernel maintainance in Linux distributions: Debian
Kernel maintainance in Linux distributions: DebianKernel maintainance in Linux distributions: Debian
Kernel maintainance in Linux distributions: Debian
Anne Nicolas
 

What's hot (20)

Open shmem
Open shmemOpen shmem
Open shmem
 
Preparing OpenSHMEM for Exascale
Preparing OpenSHMEM for ExascalePreparing OpenSHMEM for Exascale
Preparing OpenSHMEM for Exascale
 
01 Metasploit kung fu introduction
01 Metasploit kung fu introduction01 Metasploit kung fu introduction
01 Metasploit kung fu introduction
 
Dotnet network prog_chap07
Dotnet network prog_chap07Dotnet network prog_chap07
Dotnet network prog_chap07
 
Memory management in vx works
Memory management in vx worksMemory management in vx works
Memory management in vx works
 
LF_DPDK17_GRO/GSO Libraries: Bring Significant Performance Gains to DPDK-base...
LF_DPDK17_GRO/GSO Libraries: Bring Significant Performance Gains to DPDK-base...LF_DPDK17_GRO/GSO Libraries: Bring Significant Performance Gains to DPDK-base...
LF_DPDK17_GRO/GSO Libraries: Bring Significant Performance Gains to DPDK-base...
 
Interprocess Communication
Interprocess CommunicationInterprocess Communication
Interprocess Communication
 
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinksVSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
 
Inter-Process Communication in distributed systems
Inter-Process Communication in distributed systemsInter-Process Communication in distributed systems
Inter-Process Communication in distributed systems
 
Windows Internals for Linux Kernel Developers
Windows Internals for Linux Kernel DevelopersWindows Internals for Linux Kernel Developers
Windows Internals for Linux Kernel Developers
 
Linux process management
Linux process managementLinux process management
Linux process management
 
Linux introduction
Linux introductionLinux introduction
Linux introduction
 
LF_DPDK17_Accelerating P4-based Dataplane with DPDK
LF_DPDK17_Accelerating P4-based Dataplane with DPDKLF_DPDK17_Accelerating P4-based Dataplane with DPDK
LF_DPDK17_Accelerating P4-based Dataplane with DPDK
 
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
 
PF_DIRECT@TMA12
PF_DIRECT@TMA12PF_DIRECT@TMA12
PF_DIRECT@TMA12
 
Kernel maintainance in Linux distributions: Debian
Kernel maintainance in Linux distributions: DebianKernel maintainance in Linux distributions: Debian
Kernel maintainance in Linux distributions: Debian
 
Linux clustering solution
Linux clustering solutionLinux clustering solution
Linux clustering solution
 
Fiware: Connecting to robots
Fiware: Connecting to robotsFiware: Connecting to robots
Fiware: Connecting to robots
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
 

Viewers also liked

ANALYZING THE PROCESS CAPABILITY FOR AN AUTO MANUAL TRANSMISSION BASE PLATE M...
ANALYZING THE PROCESS CAPABILITY FOR AN AUTO MANUAL TRANSMISSION BASE PLATE M...ANALYZING THE PROCESS CAPABILITY FOR AN AUTO MANUAL TRANSMISSION BASE PLATE M...
ANALYZING THE PROCESS CAPABILITY FOR AN AUTO MANUAL TRANSMISSION BASE PLATE M...
ijmvsc
 
ANALYTIC HIERARCHY PROCESS FOR QOS EVALUATION OF MOBILE DATA NETWORKS
ANALYTIC HIERARCHY PROCESS FOR QOS EVALUATION OF MOBILE DATA NETWORKSANALYTIC HIERARCHY PROCESS FOR QOS EVALUATION OF MOBILE DATA NETWORKS
ANALYTIC HIERARCHY PROCESS FOR QOS EVALUATION OF MOBILE DATA NETWORKS
IJCNCJournal
 
JOINT-DESIGN OF LINK-ADAPTIVE MODULATION AND CODING WITH ADAPTIVE ARQ FOR COO...
JOINT-DESIGN OF LINK-ADAPTIVE MODULATION AND CODING WITH ADAPTIVE ARQ FOR COO...JOINT-DESIGN OF LINK-ADAPTIVE MODULATION AND CODING WITH ADAPTIVE ARQ FOR COO...
JOINT-DESIGN OF LINK-ADAPTIVE MODULATION AND CODING WITH ADAPTIVE ARQ FOR COO...
IJCNCJournal
 
Value Stream Analysis of Vegetable Supply Chain in Bangladesh: A Case Study
Value Stream Analysis of Vegetable Supply Chain in Bangladesh: A Case StudyValue Stream Analysis of Vegetable Supply Chain in Bangladesh: A Case Study
Value Stream Analysis of Vegetable Supply Chain in Bangladesh: A Case Study
ijmvsc
 
Inventory Models Involving Lead Time Crashing Cost as an Exponential Function
Inventory Models Involving Lead Time Crashing Cost as an Exponential FunctionInventory Models Involving Lead Time Crashing Cost as an Exponential Function
Inventory Models Involving Lead Time Crashing Cost as an Exponential Function
ijmvsc
 
A CASE STUDY ON CONSTRAINTS AFFECTING THE PRODUCTIVITY OF READYMADE GARMENT (...
A CASE STUDY ON CONSTRAINTS AFFECTING THE PRODUCTIVITY OF READYMADE GARMENT (...A CASE STUDY ON CONSTRAINTS AFFECTING THE PRODUCTIVITY OF READYMADE GARMENT (...
A CASE STUDY ON CONSTRAINTS AFFECTING THE PRODUCTIVITY OF READYMADE GARMENT (...
ijmvsc
 

Viewers also liked (19)

ANALYZING THE PROCESS CAPABILITY FOR AN AUTO MANUAL TRANSMISSION BASE PLATE M...
ANALYZING THE PROCESS CAPABILITY FOR AN AUTO MANUAL TRANSMISSION BASE PLATE M...ANALYZING THE PROCESS CAPABILITY FOR AN AUTO MANUAL TRANSMISSION BASE PLATE M...
ANALYZING THE PROCESS CAPABILITY FOR AN AUTO MANUAL TRANSMISSION BASE PLATE M...
 
Application of compressed sampling
Application of compressed samplingApplication of compressed sampling
Application of compressed sampling
 
ANALYTIC HIERARCHY PROCESS FOR QOS EVALUATION OF MOBILE DATA NETWORKS
ANALYTIC HIERARCHY PROCESS FOR QOS EVALUATION OF MOBILE DATA NETWORKSANALYTIC HIERARCHY PROCESS FOR QOS EVALUATION OF MOBILE DATA NETWORKS
ANALYTIC HIERARCHY PROCESS FOR QOS EVALUATION OF MOBILE DATA NETWORKS
 
JOINT-DESIGN OF LINK-ADAPTIVE MODULATION AND CODING WITH ADAPTIVE ARQ FOR COO...
JOINT-DESIGN OF LINK-ADAPTIVE MODULATION AND CODING WITH ADAPTIVE ARQ FOR COO...JOINT-DESIGN OF LINK-ADAPTIVE MODULATION AND CODING WITH ADAPTIVE ARQ FOR COO...
JOINT-DESIGN OF LINK-ADAPTIVE MODULATION AND CODING WITH ADAPTIVE ARQ FOR COO...
 
Coding openflow enable network
Coding openflow enable networkCoding openflow enable network
Coding openflow enable network
 
An interoperability framework for
An interoperability framework forAn interoperability framework for
An interoperability framework for
 
THE INFLUENCE OF COLLABORATION IN PROCUREMENT RELATIONSHIPS
THE INFLUENCE OF COLLABORATION IN PROCUREMENT RELATIONSHIPSTHE INFLUENCE OF COLLABORATION IN PROCUREMENT RELATIONSHIPS
THE INFLUENCE OF COLLABORATION IN PROCUREMENT RELATIONSHIPS
 
IMPROVING BITTORRENT’S PEER SELECTION FOR MULTIMEDIA CONTENT ON-DEMAND DELIVERY
IMPROVING BITTORRENT’S PEER SELECTION FOR MULTIMEDIA CONTENT ON-DEMAND DELIVERYIMPROVING BITTORRENT’S PEER SELECTION FOR MULTIMEDIA CONTENT ON-DEMAND DELIVERY
IMPROVING BITTORRENT’S PEER SELECTION FOR MULTIMEDIA CONTENT ON-DEMAND DELIVERY
 
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHMPERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
 
DYNAMIC RE-CLUSTERING LEACH-BASED (DR-LEACH) PROTOCOL FOR WIRELESS SENSOR NET...
DYNAMIC RE-CLUSTERING LEACH-BASED (DR-LEACH) PROTOCOL FOR WIRELESS SENSOR NET...DYNAMIC RE-CLUSTERING LEACH-BASED (DR-LEACH) PROTOCOL FOR WIRELESS SENSOR NET...
DYNAMIC RE-CLUSTERING LEACH-BASED (DR-LEACH) PROTOCOL FOR WIRELESS SENSOR NET...
 
Value Stream Analysis of Vegetable Supply Chain in Bangladesh: A Case Study
Value Stream Analysis of Vegetable Supply Chain in Bangladesh: A Case StudyValue Stream Analysis of Vegetable Supply Chain in Bangladesh: A Case Study
Value Stream Analysis of Vegetable Supply Chain in Bangladesh: A Case Study
 
VALUE CHAIN DRIVEN HUMAN CAPITAL DEVELOPMENT: AN EVIDENCE FROM AGRICULTURAL V...
VALUE CHAIN DRIVEN HUMAN CAPITAL DEVELOPMENT: AN EVIDENCE FROM AGRICULTURAL V...VALUE CHAIN DRIVEN HUMAN CAPITAL DEVELOPMENT: AN EVIDENCE FROM AGRICULTURAL V...
VALUE CHAIN DRIVEN HUMAN CAPITAL DEVELOPMENT: AN EVIDENCE FROM AGRICULTURAL V...
 
TCP INCAST AVOIDANCE BASED ON CONNECTION SERIALIZATION IN DATA CENTER NETWORKS
TCP INCAST AVOIDANCE BASED ON CONNECTION SERIALIZATION IN DATA CENTER NETWORKSTCP INCAST AVOIDANCE BASED ON CONNECTION SERIALIZATION IN DATA CENTER NETWORKS
TCP INCAST AVOIDANCE BASED ON CONNECTION SERIALIZATION IN DATA CENTER NETWORKS
 
Design and implementation of new routing
Design and implementation of new routingDesign and implementation of new routing
Design and implementation of new routing
 
Inventory Models Involving Lead Time Crashing Cost as an Exponential Function
Inventory Models Involving Lead Time Crashing Cost as an Exponential FunctionInventory Models Involving Lead Time Crashing Cost as an Exponential Function
Inventory Models Involving Lead Time Crashing Cost as an Exponential Function
 
A CASE STUDY ON CONSTRAINTS AFFECTING THE PRODUCTIVITY OF READYMADE GARMENT (...
A CASE STUDY ON CONSTRAINTS AFFECTING THE PRODUCTIVITY OF READYMADE GARMENT (...A CASE STUDY ON CONSTRAINTS AFFECTING THE PRODUCTIVITY OF READYMADE GARMENT (...
A CASE STUDY ON CONSTRAINTS AFFECTING THE PRODUCTIVITY OF READYMADE GARMENT (...
 
RESOURCE ALLOCATION METHOD FOR CLOUD COMPUTING ENVIRONMENTS WITH DIFFERENT SE...
RESOURCE ALLOCATION METHOD FOR CLOUD COMPUTING ENVIRONMENTS WITH DIFFERENT SE...RESOURCE ALLOCATION METHOD FOR CLOUD COMPUTING ENVIRONMENTS WITH DIFFERENT SE...
RESOURCE ALLOCATION METHOD FOR CLOUD COMPUTING ENVIRONMENTS WITH DIFFERENT SE...
 
Referencing tool for reputation and
Referencing tool for reputation andReferencing tool for reputation and
Referencing tool for reputation and
 
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOLDYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
DYNAMICALLY ADAPTABLE IMPROVED OLSR (DA-IOLSR) PROTOCOL
 

Similar to Performance improvement by

Computer_Clustering_Technologies
Computer_Clustering_TechnologiesComputer_Clustering_Technologies
Computer_Clustering_Technologies
Manish Chopra
 
Ap 06 4_10_simek
Ap 06 4_10_simekAp 06 4_10_simek
Ap 06 4_10_simek
Nguyen Vinh
 
Authenticated Key Exchange Protocols for Parallel Network File Systems
Authenticated Key Exchange Protocols for Parallel Network File SystemsAuthenticated Key Exchange Protocols for Parallel Network File Systems
Authenticated Key Exchange Protocols for Parallel Network File Systems
1crore projects
 
Openstack_administration
Openstack_administrationOpenstack_administration
Openstack_administration
Ashish Sharma
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
NIKHIL NAIR
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
Dan Frincu
 

Similar to Performance improvement by (20)

Computer_Clustering_Technologies
Computer_Clustering_TechnologiesComputer_Clustering_Technologies
Computer_Clustering_Technologies
 
Ap 06 4_10_simek
Ap 06 4_10_simekAp 06 4_10_simek
Ap 06 4_10_simek
 
Authenticated key exchange protocols for parallel network file systems
Authenticated key exchange protocols for parallel network file systemsAuthenticated key exchange protocols for parallel network file systems
Authenticated key exchange protocols for parallel network file systems
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
Authenticated Key Exchange Protocols for Parallel Network File Systems
Authenticated Key Exchange Protocols for Parallel Network File SystemsAuthenticated Key Exchange Protocols for Parallel Network File Systems
Authenticated Key Exchange Protocols for Parallel Network File Systems
 
CSS 12 - Q1 - Lesson 3.pptx
CSS 12 - Q1 - Lesson 3.pptxCSS 12 - Q1 - Lesson 3.pptx
CSS 12 - Q1 - Lesson 3.pptx
 
Openstack_administration
Openstack_administrationOpenstack_administration
Openstack_administration
 
Cluster computer
Cluster  computerCluster  computer
Cluster computer
 
MIcrokernel
MIcrokernelMIcrokernel
MIcrokernel
 
Adhoc mobile wireless network enhancement based on cisco devices
Adhoc mobile wireless network enhancement based on cisco devicesAdhoc mobile wireless network enhancement based on cisco devices
Adhoc mobile wireless network enhancement based on cisco devices
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Network simulator 2 a simulation tool for linux
Network simulator 2 a simulation tool for linuxNetwork simulator 2 a simulation tool for linux
Network simulator 2 a simulation tool for linux
 
Computer cluster
Computer clusterComputer cluster
Computer cluster
 
Computer cluster
Computer clusterComputer cluster
Computer cluster
 
Introduction to TCP / IP model
Introduction to TCP / IP modelIntroduction to TCP / IP model
Introduction to TCP / IP model
 
Plenzogan technology
Plenzogan technologyPlenzogan technology
Plenzogan technology
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Paper9250 implementation of an i pv6 stack for ns-3
Paper9250 implementation of an i pv6 stack for ns-3Paper9250 implementation of an i pv6 stack for ns-3
Paper9250 implementation of an i pv6 stack for ns-3
 
Clustering by AKASHMSHAH
Clustering by AKASHMSHAHClustering by AKASHMSHAH
Clustering by AKASHMSHAH
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 

More from IJCNCJournal

Advanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation SystemsAdvanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
IJCNCJournal
 
DEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
DEF: Deep Ensemble Neural Network Classifier for Android Malware DetectionDEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
DEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
IJCNCJournal
 
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation SystemsAdvanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
IJCNCJournal
 
DEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
DEF: Deep Ensemble Neural Network Classifier for Android Malware DetectionDEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
DEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
IJCNCJournal
 
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
IJCNCJournal
 
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
IJCNCJournal
 
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
IJCNCJournal
 
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
IJCNCJournal
 
Adaptive Multi-Criteria-Based Load Balancing Technique for Resource Allocatio...
Adaptive Multi-Criteria-Based Load Balancing Technique for Resource Allocatio...Adaptive Multi-Criteria-Based Load Balancing Technique for Resource Allocatio...
Adaptive Multi-Criteria-Based Load Balancing Technique for Resource Allocatio...
IJCNCJournal
 
Comparative Study of Orchestration using gRPC API and REST API in Server Crea...
Comparative Study of Orchestration using gRPC API and REST API in Server Crea...Comparative Study of Orchestration using gRPC API and REST API in Server Crea...
Comparative Study of Orchestration using gRPC API and REST API in Server Crea...
IJCNCJournal
 

More from IJCNCJournal (20)

Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...
Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...
Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...
 
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation SystemsAdvanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
 
April 2024 - Top 10 Read Articles in Computer Networks & Communications
April 2024 - Top 10 Read Articles in Computer Networks & CommunicationsApril 2024 - Top 10 Read Articles in Computer Networks & Communications
April 2024 - Top 10 Read Articles in Computer Networks & Communications
 
DEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
DEF: Deep Ensemble Neural Network Classifier for Android Malware DetectionDEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
DEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
 
High Performance NMF Based Intrusion Detection System for Big Data IOT Traffic
High Performance NMF Based Intrusion Detection System for Big Data IOT TrafficHigh Performance NMF Based Intrusion Detection System for Big Data IOT Traffic
High Performance NMF Based Intrusion Detection System for Big Data IOT Traffic
 
A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...
A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...
A Novel Medium Access Control Strategy for Heterogeneous Traffic in Wireless ...
 
A Topology Control Algorithm Taking into Account Energy and Quality of Transm...
A Topology Control Algorithm Taking into Account Energy and Quality of Transm...A Topology Control Algorithm Taking into Account Energy and Quality of Transm...
A Topology Control Algorithm Taking into Account Energy and Quality of Transm...
 
Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...
Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...
Multi-Server user Authentication Scheme for Privacy Preservation with Fuzzy C...
 
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation SystemsAdvanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
Advanced Privacy Scheme to Improve Road Safety in Smart Transportation Systems
 
DEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
DEF: Deep Ensemble Neural Network Classifier for Android Malware DetectionDEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
DEF: Deep Ensemble Neural Network Classifier for Android Malware Detection
 
High Performance NMF based Intrusion Detection System for Big Data IoT Traffic
High Performance NMF based Intrusion Detection System for Big Data IoT TrafficHigh Performance NMF based Intrusion Detection System for Big Data IoT Traffic
High Performance NMF based Intrusion Detection System for Big Data IoT Traffic
 
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
 
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
 
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
IoT Guardian: A Novel Feature Discovery and Cooperative Game Theory Empowered...
 
** Connect, Collaborate, And Innovate: IJCNC - Where Networking Futures Take ...
** Connect, Collaborate, And Innovate: IJCNC - Where Networking Futures Take ...** Connect, Collaborate, And Innovate: IJCNC - Where Networking Futures Take ...
** Connect, Collaborate, And Innovate: IJCNC - Where Networking Futures Take ...
 
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
Enhancing Traffic Routing Inside a Network through IoT Technology & Network C...
 
Multipoint Relay Path for Efficient Topology Maintenance Algorithm in Optimiz...
Multipoint Relay Path for Efficient Topology Maintenance Algorithm in Optimiz...Multipoint Relay Path for Efficient Topology Maintenance Algorithm in Optimiz...
Multipoint Relay Path for Efficient Topology Maintenance Algorithm in Optimiz...
 
March 2024 - Top 10 Read Articles in Computer Networks & Communications
March 2024 - Top 10 Read Articles in Computer Networks & CommunicationsMarch 2024 - Top 10 Read Articles in Computer Networks & Communications
March 2024 - Top 10 Read Articles in Computer Networks & Communications
 
Adaptive Multi-Criteria-Based Load Balancing Technique for Resource Allocatio...
Adaptive Multi-Criteria-Based Load Balancing Technique for Resource Allocatio...Adaptive Multi-Criteria-Based Load Balancing Technique for Resource Allocatio...
Adaptive Multi-Criteria-Based Load Balancing Technique for Resource Allocatio...
 
Comparative Study of Orchestration using gRPC API and REST API in Server Crea...
Comparative Study of Orchestration using gRPC API and REST API in Server Crea...Comparative Study of Orchestration using gRPC API and REST API in Server Crea...
Comparative Study of Orchestration using gRPC API and REST API in Server Crea...
 

Recently uploaded

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 

Performance improvement by

  • 1. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 DOI : 10.5121/ijcnc.2015.7408 115 PERFORMANCE IMPROVEMENT BY TEMPORARY EXTENSION OF AN HPC CLUSTER Pil Seong Park Department of Computer Science, University of Suwon, Hwasung, Korea ABSTRACT Small HPC clusters are widely used in many small labs since they are easy to build and cost-effective. When more power is needed, instead of adding costly new nodes to old clusters, we may try to make use of the idle times of some servers in the same building, that work independently for their own purposes, especially during the night. However such extension across a firewall raises not only some security problem but also a load balancing problem caused by heterogeneity of the resulting system. In this paper, we devise a method to solve such problems using only old techniques applicable to our old cluster systems as is, without requiring any upgrade for hardware or software. We also discuss about how to deal with heterogeneity and load balancing in application, using a two-queue overflow queuing network problem as a sample problem. KEYWORDS HPC clusters, Security, NFS, SSH tunnelling, Load balancing 1. INTRODUCTION A variety of architectures and configurations have appeared according to the desire of getting more computing power and higher reliability by orchestrating a number of low cost commercial off-the-shelf processors. One such example is a computer cluster, which consists of a set of loosely or tightly coupled computers that are usually connected to each other through a fast LAN and work together so that they can be viewed as a single system in many respects. Many kinds of commercial clusters are already on the market. However the technologies to build similar systems using off-the-shelf components are widely known (e.g., see [1]), and it is easy to build a low-cost small cluster [2]. Many small labs commonly build their own cluster, and gradually grow it by adding more dedicated nodes later. However, instead of adding dedicated nodes, if there are some nodes that are being used for other purposes on the same LAN, we may try to utilize their idle times as long as they are not always busy enough, especially during the night. The Berkeley NOW (Network of Workstations) project is one of such early attempts to utilize the power of clustered machines on a building-wide scale [3]. However such extension gives rise to difficulties in security, administering the cluster, and load balancing for optimal performance. In this paper, we consider some technical issues arising in extending a small Linux cluster to include some non-dedicated nodes in the same building across a firewall. Of course there are already various new good methods that can be used to avoid such technical issues. However we do not adopt such state-of-the-art technologies, since the sole purpose of this paper is to achieve our goal with our old HPC cluster as is, without any hardware or software upgrade. Some results of the experiments to evaluate the performance of the extended system are given at the end.
  • 2. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 116 2. BACKGROUND 2.1. HPC Clusters Computer clusters may be configured for different purposes. Load-balancing (LB) clusters are configurations in which nodes share computational workload like a web server cluster. High- performance computing (HPC) clusters are used for computation-intensive purposes, rather than handling IO-oriented operations. High-availability (HA) clusters improve the availability of the cluster, by having redundant nodes, which are then used to provide service when system components fail. The activities of all compute nodes are orchestrated by a cluster middleware that allows treating the whole cluster system via a single system image concept. Well-known HPC middlewares based on message passing are the Message Passing Interface (MPI) [4] and the Parallel Virtual Machine (PVM) [5], the former being the de facto standard. Many commercial or non-commercial implementation libraries have been developed in accordance with the MPI standard. LAM/MPI, FT-MPI, and LA-MPI are some of widely used non-commercial libraries, and their technologies and resources have been combined into the on- going Open MPI project [6]. An HPC cluster normally consists of the dedicated nodes that reside on a secure isolated private network behind a firewall. Users normally access the master/login node (master, for short) only, which sits in front of compute nodes (slave nodes, or slaves for short). All the nodes in an HPC cluster share not only executable codes but also data via insecure NFS (Network File System). It is perfectly all right as long as the whole cluster nodes are on a secure LAN behind a firewall. However, to extend the cluster to include other nodes outside of the firewall, we will be confronted with many problems with security and data sharing among nodes. Moreover, such growth results in a heterogeneous cluster with nodes of different power, possibly running different Linux versions. 2.2. Linux/Unix File Sharing Protocols NFS, created by Sun Microsystems in 1984, is a protocol to allow file sharing between Linux/UNIX systems residing on a LAN. Linux NFS clients support three versions of the NFS protocol: NFSv2 (1989), NFSv3 (1995), and NFSv4 (2000). However the NFS protocol as is has many problems in extending the cluster, since its packets are not encrypted and due to other shortcomings to be discussed later. Other alternatives to NFS include AFS (Andrew File System), DFS (Distributed File System), RFS (Remote File System), Netware, etc. [7]. There are also various clustered file systems shared by multiple servers [8]. However we do not adopt such new technologies since they are not supported by our old cluster’s hardware and software. 2.3. SSH Tunnelling Encryption of NFS traffic is necessary for secure extension across a firewall. One of the common techniques that are commonly used is known as cryptographically protected tunnelling. An IP- level or TCP-level stream of packets is used to tunnel application-layer segments [9]. A TCP tunnel is a technology that aggregates and transfers packets between two hosts as a single TCP
  • 3. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 117 connection. By using a TCP tunnel, several protocols can be transparently transmitted through a firewall. Under certain conditions, it is known that the use of a TCP tunnel severely degrades the end-to-end TCP performance, which is called TCP meltdown problem [10]. The SSH protocol allows any client and server programs to communicate securely over an insecure network. Furthermore, it allows tunnelling (port forwarding) of any TCP connection on top of SSH, so as to cryptographically protect any application that uses clear-text protocols. 3. TEMPORARY EXTENSION OF AN HPC CLUSTER ACROSS A FIREWALL We would like to extend our old cluster (consisting of the master and slave nodes in Figure 1) to include some other nodes, say Ext1 and Ext2, outside of a firewall, as shown in Figure 1. The master node generally has attached storage that is also accessible by diskless slave nodes using insecure NFS. Since NFS relies on the inherently insecure unencrypted UDP protocol (up to NFSv3), transactions between host and client are not encrypted, and IP spoofing is possible. Moreover, firewall configuration is difficult because of the way NFS daemons work. Figure 1. Cluster extension to include external nodes Ext1 and Ext2 across a firewall. 3.1. NFS over TCP and Fixing NFS Ports SSH tunnelling, which is based on SSH port forwarding, is commonly used to encrypt some unencrypted packets or to bypass firewalls, e.g., see [9,11]. SSH tunnels support only TCP protocols of fixed ports, but NFS uses UDP protocols by default, and the ports of some daemons essential for the operation of NFS are variable. Fortunately NFS over TCP protocols are also supported from the Linux kernel 2.4 and later on the NFS client side, and from the kernel 2.4.19 on the server side [12]. Since all the nodes of our old cluster satisfy this condition, we can make NFS work over TCP just by specifying the option "-o tcp" in the mounting command. The following shows an example of mounting the NFS server’s /nfs_dir directory on the client’s mount_pt. # mount -t nfs -o tcp server:/nfs_dir mount_pt For tunnelling, we need fix NFS ports. The daemons essential for NFS operation are 2 kinds. Portmapper (port 111) and rpc.nfsd (port 2049) use fixed ports, while rpc.statd, rpc.lockd, rpc.mountd, and rpc.rquotad use ports that are randomly assigned by the operating system. However the ports of the latter can be fixed by specifying port numbers in appropriate configuration files [13], as shown below.
  • 4. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 118 The ports of the first three (i.e., rpc.statd, rpc.lockd, and rpc.mountd) can be fixed by adding the following lines in the NFS configuration file /etc/sysconfig/nfs, for example, STATD_PORT=4001 LOCKD_TCPPORT=4002 LOCKD_UDPPORT=4002 MOUNTD_PORT=4003 The rest ports can be fixed by defining a new port number 4004 in the configuration file /etc/services, for example rquotad 4004/tcp rquotad 4004/udp 3.2. Setting up an SSH Tunnel For tunnelling of NFS, it is necessary for the server to mount its own NFS directory to be exported to clients. Hence on the server side, the configuration file /etc/exports has to be modified, for example, for exporting its /nfs_dir to itself, /nfs_dir localhost (sync,rw,insecure,root_squash) where “insecure” means it allows connection from ports higher than 1023, and “root_squash” means it squashes the root permissions for the client and denies root access to access/create files on the NFS server for security. Now an SSH tunnel has to be set up from the client’s side. On the client side, to forward the ports 11000 and 12000, for example, to the fixed ports 2049 (rpc.nfsd) and 4003 (rpc.mountd), respectively, we can use the command # ssh nfssvr -L 11000:localhost:2049 -L 12000:localhost:4003 -f sleep 600m where "nfssvr" is the IP or the name of the NFS server registered in the configuration file /etc/hosts, and "-f sleep 600m" means that port forwarding is to last for 600 minutes in the background. Figure 2. Local port forwarding for SSH tunnelling. When connected to the NFS server, an SSH tunnel will be open if the correct password is entered. Then the NFS server’s export directory can be mounted on the client’s side. The following command is an example to mount the /nfs_dir directory of the NFS server on the /client_dir directory of the client. # mount -t nfs -o tcp,hard,intr,port=11000, mountport=12000 localhost:/client_dir
  • 5. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 119 3.3. Suggested Structure Even though the NFS through an SSH tunnel is encrypted, it has a serious drawback if we cannot completely trust the local users on the NFS server [14]. For example, if some local user on the NFS server can login on the server and create an SSH tunnel, any ordinary user on the server can mount the file systems with the same rights as root on the client. One possible solution might be prohibiting local users' direct login to the NFS server to prevent creating an SSH tunnel. One simple way is changing all local users' login shells from /bin/bash to /sbin/nologin in the file /etc/passwd. However it causes another problem. In general, the master node normally works as the NFS server too in small HPC clusters. However local users should be allowed to login the master node to use the cluster, which is dangerous when using NFS through an SSH tunnel, as was pointed out previously. Figure 3 is an alternative of the structure of an extended HPC cluster that takes everything into account. The structure has a separate NFS server which does not allow local users' login. An SSH tunnel can be created by the root user on Ext1 or Ext2, and local users just login on the master node to use the cluster. Figure 3. The suggested structure The firewall setting of the master node does not need any modification. However, since the NFS service is provided through SSH tunnelling, it is required for the NFS server to open the port 22 only to the computation nodes outside of its firewall. The NFS server should be configured so that it releases as little information about itself as possible. For example, services like portmapper should be protected from outside. And it is suggested to specify explicitly the hosts that are allowed to access all the services on the NFS server. This can be done by setting ALL:ALL in the configuration file /etc/hosts.deny, and listing only the hosts (or their IPs) together with the services which are allowed to access, in the file /etc/hosts.allow. The use of “rsh” command, which is not encrypted, is common on old HPC clusters with old middlewares. Fortunately LAM/MPI v.7.1.2 allows SSH login with an authentication key but without a password, e.g., see [15]. 4. DYNAMIC LOAD BALANCING The original HPC cluster may be homogeneous, i.e., the power of all nodes may be the same. However, the extended cluster may be heterogeneous or may behave like heterogeneous one even if all the nodes have the same power, since the communication speeds between nodes are not the same depending on various factors: the types of network, existence of firewall, and encryption. Moreover the workload of the non-dedicated nodes outside of the firewall may change continually, and the communication speed may change continually because of the traffic on the outside LAN. Hence we need to use a dynamic run-time load balancing strategy [16], while initially assigning equal amount of work to the original slave nodes of the cluster.
  • 6. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 120 Data partitioning and load balancing have been important components in parallel computation. Since earlier works (e.g., see [17]), many authors have studied load balancing using different strategies on dedicated/non-dedicated heterogeneous systems [18,19,20,21], but it was not possible to find related works on the security problems arising in cluster expansion, which is more technical rather than academic. 4.1. The Sample Problem Dynamic load balancing strategies depend on the problem we deal with, and we need introduce our problem first. We would like to solve the old famous two-queue overflow queuing problem given by the Kolmogorov balance equation ( ) ( ) ( ) ( ) 1 2 2 1 1 , 1 , 1 2 , 1 1 1 2 2 , 1 ,0 1, 1 , 1 1 1, [ 1 1 m in ( , ) m in ( , )] = 1 1 m in ( 1, ) , , 1, 2 , i n j n j n i j i i j i n i j i s j s p p i s p i j λ δ δ λ δ µ µ λ δ µ δ − − − − − + − + − + + − + − + = where δi,j is the Kronecker delta, pi,j is the steady-state probability distribution giving the probability that there are ij customers in the j-th queue. It is assumed that, in the i-th queue, there are si parallel servers and ni-si-1 waiting spaces. Customers enter the i-th queue with mean arrival rate λi, and depart at the mean rate µi. The model allows overflow from the first queue to the second queue if the first queue is full. The total number of possible states is n1×n2. It is known that no analytic solution exists, and the balance equation has to be solved explicitly. We can write the discrete equation as a singular linear system 0A =x , where x is the vector consisting of all states pi,j’s. Even for systems with relatively small numbers of queues, waiting spaces, and servers per queue, the size of the resulting matrix is huge. The numbers of waiting spaces in our problem are 200 and 100 in each queue, respectively, and the resulting matrix size is 20,000ⅹ20,000. 4.2. Job Assignment by Domain Decomposition It is easy to see that the graph of the resulting matrix of a k-queue problem is the same as the graph of the matrix corresponding to the k-dimensional Laplacian operator. Thus the graph of the matrix A is a two-dimensional rectangular grid. As an example, Figure 4.a) shows the grid of a two-queue overflow model with 9ⅹ5 = 45 states, where each dot corresponds to some state pi,j, i=1,2,…9 and j=1,2….5. If we use 3 computation nodes (or processes, one at each node), we may assign 15 grid points to each node. However to compute the values at boundary points (in vertical dotted boxes), each node needs the values at the boundary points to be computed by its adjacent nodes. As a result, node 0 and node 2 should have enough memory to store 20 grid points (or 4 vertical grid lines) and node 2 needs to store 25 grid points (or 5 vertical grid lines), even though each node updates only 15 grid points (or 3 vertical grid lines). The extra grid points are for storing the values to be received from adjacent nodes. Figure 4.b) shows partitioning of the corresponding matrix A of size 45ⅹ45, where each non-zero entry is marked by a dot. Each submatrix Ai is of size 5ⅹ45, and for better load balancing, workload will be assigned in units of one vertical line (that corresponds to the computation with one submatrix Ai) on the 2D grid. Since we deal with a huge matrix problem of size 20,000ⅹ20,000, one unit workload is computation with a 100ⅹ20,000 submatrix Ai.
  • 7. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 121 0 5 10 15 20 25 30 35 40 45 403020100 Process 2 Process 1 A7 A8 A9 A6 A5 A4 A3 A2 A1 Process 0 0 nz=197 a) b) Figure 4. Examples showing a) job assignment to 3 nodes, and b) matrix partitioning for a 9ⅹ5 = 45 state two-queue overflow queuing model. To solve the singular system 0A =x , we use a splitting method A=D-C where D is the diagonal matrix that consists of all diagonal entries of A, and C=D-A. Then we can rewrite D C=x x which leads to a convergent eigenvalue problem 1 D C− =x x . For convergence test, we use the size of the residual defined by 1 2 2 2|| r || || ( ) || where || || 1.I D C− = − =x x 4.3. Communication for Asynchronous Iteration In normal synchronous parallel computation, each node updates the values of the grid points assigned to it in lockstep fashion, so that it can give the same result as that by serial computation. During computation, each node should communicate with its adjacent nodes to exchange boundary values, and all nodes should synchronize computation. The workload of the added nodes Ext1 and Ext2 may change dynamically at all times, and it is unpredictable and uncontrollable. If any of these is busy doing some other jobs, it cannot communicate with other nodes immediately, degrading overall performance by making all other nodes wait. Asynchronous algorithms appeared naturally in parallel computation and have been heavily exploited applications. Allowing nodes to operate in an asynchronous manner simplifies the implementation of parallel algorithms and eliminates the overhead associated with synchronization. Since 1990s, extensive theories on asynchronous iteration have been developed, e.g., see [22]. Fortunately, the matrix A in our problem satisfies the necessary condition for a matrix to satisfy for convergence of asynchronous iteration. Hence we adopt asynchronous iteration freely, but we need some more work for implementation details. We assume the master node is always alive and is not busy so that it can immediately handle communication with other nodes. Data exchange between any two nodes should be done via the master node, and the master node should always keep the most recent values at all grid points. But all communication should be initiated by compute nodes (including the added non-dedicate nodes), and the master node should just respond to compute nodes’ request. We do not present our full asynchronous parallel algorithm, but show only the key algorithm that allows asynchronous computation. Figure 5 is the master’s algorithm using MPI functions to handle other nodes’ request. The master node continually checks if any message has arrived from
  • 8. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 122 any other nodes by calling the MPI_Iprobe( ) function. If there is any message arrived, it then identifies the sender, tag, and sometimes the number of data received if necessary. Based on the tag received, the master takes some appropriate action. For example, if the received tag is 1000, it means “Receive the boundary data I send, and send the boundary data I need.”, etc. Then any compute node can continue its work without delay. do { MPI_Iprobe(MPI_ANY_SOURCE,MPI_ANY_TAG,communicator,&flag,&status); if (flag) { // TRUE if some message has arrived. sender=status.MPI_SOURCE; // Identify the sender tag=status.MPI_TAG; // Identify the type of the received message. If ( some data has been received ) MPI_Get_count(&status,MPI_datatype,&n); // Identify no. of data received. // Depending on the value of “tag”, take an appropriate action. } } until all other nodes finish computation Figure 5. A master’s algorithm to handle other nodes’ request. We also adopt a dynamic load balancing strategy by measuring the time interval between two consecutive data requests by the same node. It is necessary only for non-dedicated nodes, since the others are dedicated and have the same computational power. The workload of the non- dedicated node should be reduced if N t n T α< where N is the average workload (in units of the number of vertical grid lines), T is the average time interval between two consecutive data requests by the same dedicated node, t is the time interval between two consecutive requests of non-dedicated nodes, and α is some tolerance parameter less than 1 (we used 0.8). Similarly, the workload of the non-dedicate node is increased if N t n T β> for some β (we used 1.2). In adjusting workloads, we used a simpler strategy. When one of these conditions is satisfied, we just decrease/increase two boundary grid lines of non-dedicated nodes and assigned them to/from adjacent nodes. For this, to each non-dedicated node, we assigned the grid region which is between the two regions assigned to dedicated nodes. It takes very little time to re-assign grid lines to some other node, since the matrix is very sparse (just 5 nonzero entries in each row) and any submatrix Ai can be generated by each node instantly whenever necessary. 5. PERFORMANCE TESTS Our old HPC cluster we want to extend has 2 nodes alive, each with dual 1GHz Pentium 3 Xeon processors running Fedora Core 4, with LAM/MPI v.7.1.2 installed. The nodes are interconnected via a Gigabit LAN, and NFS is used for file sharing. Both of the non-dedicated nodes Ext1 and Ext2 that are on the outside fast Ethernet LAN have a 2.4 GHz Intel® Core™ i5 CPU with 8 GB memory.
  • 9. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 123 The performance of the NFS protocol through SSH tunnelling will inevitably drop due to encryption overhead. Communication speed tests were performed between the NFS server node and the node Ext1, using UDP or TCP, and with or without SSH tunnelling across the firewall. The times it took to read or write a file of size 1GB from the exterior node Ext1 were measured at varying NFS block sizes. Since NFSSVC_MAXBLKSIZE (maximum block size) of the NFS in Fedora Core 4 is 32*1024 (e.g., see /usr/src/kernels/2.6.11-1.1369_FC4-i686/include/linux/ nfsd/const.h), tests were performed at block sizes of 4k, 8k, 16k, and 32k, respectively, 3 times for each and the results are averaged. In addition, to delete any remaining data in cache, NFS file system was manually unmounted and mounted again between tests. The following are example commands that first measure the time taken to create the file /home/testfile of size 1GB on the NFS server and read it using block size 16KB. # time dd if=/dev/zero of=/home/testfile bs=16k count=65536 # time dd if=/home/testfile of=/dev/null bs=16k The results of the NFS performance test, with or without SSH tunnelling over the fast Ethernet LAN are given in Table 1. For the common NFS without tunnelling, the figures in parentheses are the times it took when TCP is used, and others are when UDP is used. As is expected, the larger the NFS block size, the faster in all cases. For the common NFS without SSH tunnelling, write operation using UDP is slightly faster than TCP, but it is not the case for read operation. Moreover the NFS with SSH tunnelling takes 3.9%-10.1% more time for write and 1.4-4.3% more for read, than the common NFS using UDP. Table 1. The speed of NFS, with or without SSH tunnelling (sec). Block size Common NFS SSH tunnelled Read Write Read Write 4K 96.12 (95.31) 119.37 (120.95) 100.29 131.45 8K 95.07 (94.43) 115.74 (120.31) 98.12 122.13 16K 93.85 (92.21) 114.70 (118.32) 95.31 121.72 32K 92.51 (91.45) 112.35 (115.87) 93.89 116.83 As long as the NFS block size is taken as large as possible, the tunnelling overhead may not be large, since the external nodes outside of the firewall need not read or write so often through NFS, which is common in high performance parallel computation. Figure 6 shows the decrease pattern of residual until it reaches the size less than 10-5 . The line marked with A is by synchronous computation using only the dedicated nodes of our old cluster which has 4 cores in total. The line marked with B is the result when we add the node Ext1, and that marked with C is when we add both Ext1 and Ext2 (we created just 1 MPI process on each node Ext1 or Ext2). The size of the residual and elapsed time were checked every time the first slave reports the intermediate result. The running times to converge to the result with residual less than 10-5 were 955.47 sec for curve A, 847.24 sec for curve B, and 777.08 sec for curve C, respectively. Hence the speedup by B and C are 1.129 and 1.231, respectively.
  • 10. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 124 Figure 6. Comparison results. The result is somewhat disappointing, since the speed of the CPU core on Ext1 or Ext2 is almost twice as fast as that of the old cluster. Probably it is mainly due to much slower network speed (fast Ethernet) of the outside network on which both Ext1 and Ext2 reside. Other reasons might be due to some extra workload imposed on them, and also due to the network traffic which influences the communication speed. 5. CONCLUSIONS Small HPC clusters are widely used in many small labs, because they are easy to build, cost- effective, and easy to grow. Instead of adding new nodes, we can extend the clusters to include some other Linux servers in the same building, so that we can make use of their idle times. However, unlike a tightly-coupled HPC cluster behind a firewall, the resulting system suffers a security problem with NFS which is vital for HPC clusters. In addition, the resulting system becomes heterogeneous. Of course there are many new good solutions using recent technologies. However we do adopt such strategy, because they require upgrade of hardwares and/or softwares including the operating system. Instead we devise a solution using SSH tunnelling, which can be applied to the old system as is. Probably this approach may be helpful for many small old cluster systems. We also considered a dynamic load balancing method which is based on overlapped domain decomposition and asynchronous iteration, using a two-queue overflow queuing network problem of matrix size 20,000ⅹ20,000. The speedup obtained by using extra processes on a much faster non-dedicated nodes is somewhat disappointing, mainly because of the slow network speed of the fast Ethernet between the cluster and the exterior nodes. However we can expect much higher speedup if the outer network is upgraded to a Gigabit LAN, for example. ACKNOWLEDGEMENTS This work was supported by the GRRC program of Gyeonggi province [GRRC SUWON2014-B1, Intelligent Context Aware & Realtime Security Response System]. REFERENCES [1] G. W. Burris, "Setting Up an HPC Cluster", ADMIN Magazine, 2015, http://www.admin- magazine.com/HPC/Articles/real_world_hpc_setting_up_an_hpc_cluster [2] M. Baker and R. Buyya, "Cluster Computing: the commodity supercomputer", Software-Practice and Experience, vol.29(6), 1999, pp.551-576. [3] Berkeley NOW Project, http://now.cs.berkeley.edu/
  • 11. International Journal of Computer Networks & Communications (IJCNC) Vol.7, No.4, July 2015 125 [4] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, and J. J. Dongarra, MPI: The Complete Reference. MIT Press, Cambridge, MA, USA, 1996. [5] G. A. Geist, A. Beguelin, J. J. Dongarra, W. Jiang, R. Manchek, and V. S. Sunderam, PVM: Parallel Virtual Machine: A Users' Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge, MA, USA 1994. [6] Open MPI, http://www.open-mpi.org [7] Linux NFS-HOWTO, http://nfs.sourceforge.net/nfs-howto/ [8] Clustered file system, http://en.wikipedia.org/wiki/Clustered_file_system [9] M. Dusi, F. Gringoli, and L. Salgarelli, "A Preliminary Look at the Privacy of SSH Tunnels", in Proc. 17th International Conference on Computer Communications and Networks(ICCCN '08), 2008. [10] O. Honda, H. Ohsaki, M. Imase, M. Ishizuka, and J. Murayama, "Understanding TCP over TCP: Effects of TCP tunneling on end-to-end throughput and latency," in Proc. 2005 OpticsEast/ITCom, Oct. 2005. [11] Port Forwarding Using SSH Tunnel, http://www.fclose.com/b/linux/818/port-forwarding-using-ssh- tunnel/ [12] Linux NFS Overview, FAQ and HOWTO, http://nfs.sourceforge.net/ [13] http://www.linuxquestions.org/questions/linux-security-4/firewall-blocking-nfs-even-though-ports- are-open-294069/ [14] Linux NFS Overview, FAQ and HOWTO, http://nfs.sourceforge.net/ [15] ssh-keygen: password-less SSH login, http://rcsg-gsir.imsb-dsgi.nrc-cnrc.gc.ca/documents/ internet/ node31.html [16] L. V. Kale, M. Bhandarkar, and R. Brunner, "Run-time Support for Adaptive Load Balancing", in Lecture Notes in Computer Science, Proc. 4th Workshop on Runtime Systems for Parallel Programming (RTSPP) Cancun - Mexico, vol.1800, 2000, pp.1152-1159. [17] M. Zaki, W. Li, and S. Parthasarathy, "Customized Dynamic Load Balancing in a Heterogeneous Network of Workstations", in 1996 Proc. 5th IEEE Int. Symposium on High Performance Distributed Computing. [18] M. Eggen, N. Franklin, and R. Eggen, "Load Balancing on a Non-dedicated Heterogeneous Network of Workstations." in International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2002), June 2002. [19] J. Faik, L. G. Gervasio, J. E. Flaherty, J. Chang, J. D. Teresco, E.G. Boman, and K. D. Devine, "A model for resource-aware load balancing on heterogeneous clusters", Tech. Rep. CS-03-03, Williams College Dept. of Computer Science, http://www.cs.williams.edu/drum/, 2003. [20] I. Galindo, F. Almeida, and J. M. Badia-Contelles, "Dynamic Load Balancing on Dedicated Heterogeneous Systems", Recent Advances in Parallel Virtual Machine and Message Passing Interface, Lecture Notes in Computer Science, vol.5205, 2008, pp 64-74 [21] K. Lu, R. Subrata, and A. Y. Zomaya, "On the performance-driven load distribution for heterogeneous computational grids", J. of Computer and System Sciences vol.73, 2007, pp.1191- 1206. [22] M. P. Raju and S. Khaitan, "Domain Decomposition Based High Performance Computing", International J. of Computer Science Issues, vol.5, 2009,pp.27-32. AUTHOR Prof. Pil Seong Park received his B.S. degree in Physical Oceanography from Seoul National University, Korea in 1977, and M.S. degree in Applied Mathematics from Old Dominion University, U.S.A. in 1984. He received his Ph.D. degree in Interdisciplinary Applied Mathematics (with emphasis on computer science) from University of Maryland at College Park, U.S.A in 1991. He worked as the director of Computer & Data Center at Korea Ocean Research & Development Institute from 1991 to 1995. Currently, he is a professor of the Department of Computer Science, University of Suwon in Korea. His research interest includes high performance computing, Linux clusters, digital image processing, and knowledge-based information systems. He is a member of several academic societies in Korea.