The document provides instructions for setting up a cluster using Ubuntu and MPICH for parallel computing. It discusses prerequisites like installing MPICH, SSH and GCC on multiple nodes. It describes how to define hostnames, set up authorized keys for passwordless communication, and create a machine file specifying processes on each node. The document also shows how to write a sample MPI program, compile it using MPICH and execute across nodes using the machine file.
2. 2
Cluster Setup Manual
Using Ubuntu and MPICH
Institute of Information Technology,
University of Dhaka
Date of Submission
13 June 2015
Submitted to
Emon Kumar Dey
Course Instructor of SE-501
Lecturer
Institute of Information Technology
University of Dhaka
Submitted by
Md. Rakib Hossain
(BSSE 0516)
Submitted to
Amit Seal Ami
Lab Instructor of SE-501
Lecturer
Institute of Information Technology
University of Dhaka
3. 3
TABLE OF CONTENTS
BACKGROUND..............................................................................................5
OBJECTIVES .................................................................................................5
BOARD OBJECTIVE: ..................................................................................................................... 5
SPECIFIC OBJECTIVES: ................................................................................................................. 5
ORIGIN OF THE DOCUMENT............................................................................5
WHAT IS PARALLEL COMPUTING? ..................................................................6
WHERE USED PARALLEL COMPUTING?............................................................6
SCIENCE AND ENGINEERING: ....................................................................................................... 6
INDUSTRIAL AND COMMERCIAL:.................................................................................................. 7
WHAT IS CLUSTER COMPUTING? ....................................................................7
WHAT ARE MPI AND MPICH? ..........................................................................7
MPI: ............................................................................................................................................ 7
MPICH:....................................................................................................................................... 8
MAKE A CLUSTER FOR PARALLEL COMPUTING................................................8
WHAT ARE THE PREREQUISITES? ................................................................................................. 8
WHAT ARE THE REQUIRED PACKAGES NEEDED TO BE INSTALLED?...................9
INSTALLING MPICH.................................................................................................................... 9
INSTALLING SSH SERVER.......................................................................................................... 10
INSTALLING GCC ...................................................................................................................... 10
HOW TO SETUP THE CLUSTER ENVIRONMENT? ..............................................11
DEFINING HOSTNAMES:.............................................................................................................. 11
SETTING UP AUTHORIZED KEYS FOR PASSWORD LESS COMMUNICATION BETWEEN NODES:....... 13
SETTING UP A MACHINE-FILE: .................................................................................................... 15
HOW TO WRITE A PROGRAM USING MPICH?............................................................................ 16
HOW TO COMPILE AND EXECUTE THE PROGRAM USING MPICH?............................................. 16
1) Using USB Flash Drive: ............................................................................................... 17
2) Using scp command:..................................................................................................... 17
3) Using sharing master folder:........................................................................................ 17
5. 5
Background
Generally, a software program has been developed for serial computation. In order to solve a
computing problem, an algorithm is formulated and applied as a successive flow of instructions.
All these instructions are usually executed on a CPU in one computer. Merely single instruction
may execute at a time after that instruction is finished, the next instruction is executed.
In our real world there are lots of computing problem that needs huge calculation and
concurrency such as weather forecast, planetary movements, galaxy formulation etc. These types
of problems are so large and/or complex that it is impractical or impossible to solve them on a
single computer using serial computation especially given limited computer memory.
Objectives
Board Objective:
The main objective of this document is to show how to setup a cluster using two computer
having Linux based OS (Ubuntu 14.04) using MPICH.
Specific Objectives:
To learn what is parallel computing and how it works.
Why parallel computing is needed and important.
What is cluster computing and how it works?
What are MPI and MPICH and how they work?
How to make a cluster
How to write the first parallel computing code and execute it on cluster
Origin of the Document
As per our course tutor Mr. Emon Kumar Dey instructed us to submit a report as part of the
course evaluation, I prepare the paper. By writing the report I have learnt how to setup a cluster
for parallel computing. I have also learnt how to write a code that may execute parallel on a
cluster. So we are very thankful to our course tutor for giving us the opportunity.
6. 6
What is Parallel Computing?
Parallel computing is used for multiple processing components simultaneously to resolve a
problem. This is carried out by splitting the problem into independent section in order that every
single processing component can easily execute its section of the algorithm simultaneously with
the others. The processing components may be diverse including resources like a single
computer with multiple processors, several networked computers, specialized hardware, or any
combination of the above
Figure 1: Parallel Computing
Where used Parallel Computing?
Science and Engineering:
Historically, parallel computing has been considered to be "the high end of computing",
and has been used to model difficult problems in many areas of science and engineering:
o Atmosphere, Earth, Environment
o Physics - applied, nuclear, particle,
condensed matter, high pressure, fusion,
photonics
o Bioscience, Biotechnology, Genetics
o Chemistry, Molecular Sciences
o Geology, Seismology
o Mechanical Engineering - from
prosthetics to spacecraft
o Electrical Engineering, Circuit
Design, Microelectronics
o Computer Science,
Mathematics
o Defense, Weapons
7. 7
Industrial and Commercial:
Today, commercial applications provide an equal or greater driving force in the
development of faster computers. These applications require the processing of large
amounts of data in sophisticated ways. For example:
o Databases, data mining
o Oil exploration
o Web search engines, web based
business services
o Medical imaging and diagnosis
o Pharmaceutical design
o Financial and economic modeling
o Management of national and multi-
national corporations
o Advanced graphics and virtual reality,
particularly in the entertainment industry
o Networked video and multi-media
technologies
o Collaborative work environments
What is Cluster Computing?
Cluster computing is a model of computing where a collection of computers are interconnected
with each other in order that they can behave like a single entity. The components of a cluster are
normally linked to one another through fast local area networks, among each node running its
own instance of an operating system. Generally in most situations, all the nodes use the same
hardware and the same operating system, even though in a few configurations different operating
systems may be used for each computer, as well as different hardware.
Clusters are generally implemented to enhance performance and availability over compared to a
single computer, while usually being much more cost-effective than single computers of
comparable speed or availability.
Cluster Computing is used for parallel processing, load balancing etc. Clustering is a popular
strategy for implementing parallel processing programs as it enables the programs to run each
independent component simultaneously in each computer. Clusters are able to executing multiple
complex instructions by distributing workload throughout all connected computers. Clustering
enhances the system's availability to users, its aggregate performance, and overall tolerance to
faults and component failures.
What are MPI and MPICH?
MPI: Message Passing Interface (MPI) is a standardized and transportable message-passing
system developed by a group of researchers from academia and industry to work on a wide range
8. 8
of parallel computers. The goal of the Message Passing Interface is to establish a portable,
efficient, and flexible standard for message passing that will be widely used for writing message
passing programs. Using this API connected computer in a cluster can communicate to each
other through message passing .It enables them to send and receive message and then
synchronize themselves.
The standard defines the syntax and semantics of a core of library routines useful to a wide range
of users writing portable message-passing programs in different computer programming
languages such as Fortran, C, C++ and Java.
MPICH: MPICH is a high performance and widely portable implementation of the Message
Passing Interface (MPI) standard.
Make a Cluster for Parallel Computing
What are the Prerequisites?
Here we develop a cluster using MPI and then write a program, compile and execute this
program using MPICH
The prerequisites are
1. We need at-least two computers having Linux distribution installed on it (here we use
Ubuntu 14.04 LTS) .We have to make it sure that our system has GCC installed. As
Ubuntu has installed GCC built in on it, for the time being we need not to install it
2. A network connection between this two computers and they must have IP addresses
assigned on them. Now let us assume that we have two computers fulfilling our
prerequisites. Let the host name and the IP addresses of these computers be. Here we call
all this computer node
misubeimp 10.255.4.125
minhas-pc 10.255.4.98
9. 9
What are the Required Packages needed to be installed?
Installing MPICH
As we are using mpich2 as our message passing system so we should install its packages in all
the nodes. To install it we have to run the following command
***Here one important point must be noted that we have to install the same version of MPICH in
all the nodes. We use MPICH2.To check the version properly run the following command.
To ensure where installing MPICH2 run the following command
10. 10
We may test that the program did indeed install successfully by entering the following
commands in all nodes.
Installing SSH Server
Mpich communicates among the nodes using remote login and also distribute the processes
among the nodes through remote log in. So we need to confirm that our nodes have the ability of
remote log in. Remote log in can be performed using different way for example telnets, openssh
.In our case we use openssh as it gives better security of data than telnet. To install openssh we
have to run the following command in all nodes.
Installing GCC
As it is said before Ubuntu has built-in GCC installed but to ensure that which version of GCC is
installed we may execute the following command
If exceptionally GCC is missed then we may easily installed it using the following command in
all nodes.
11. 11
How to Setup the Cluster Environment?
Defining hostnames:
We have to define the two node host name in every node hosts file. To define that we have to
edit the host file by executing the following command
First the hosts file looks like the given picture. We have to edit the hosts file with our two nodes
ip address and host name in this way mentioned in the below picture
After editing out hosts file is looked as the following picture.
Add a New User for running MPI programs:
Now we have to add a new user in every node for running MPI program. In every node the new
user must have the same user name. It is better to give the same password for availability. So
first we make a directory in our root directory then we add our new user in the directory. Our
12. 12
new user name would be mpiuser and the directory name is cluster. To do it we have to run the
following command.
As we directly add our user to the cluster directory without create our user that’s why the system
will ask to add the new user automatically
.
For the time being here we just enter our password and leave all other stuff default.
Now our new mpiuser successfully add to this directory and we now changes the owner ship of
this directory to mpiuser .To do it we have to enter the following command.
13. 13
Setting up Authorized Keys for password less communication
between nodes:
After successfully adding new user now we log in to the new user.
Now we will generate a new ssh key. On executing the following command
Here it'll ask for a paraphrase. Leave it blank as we want to create a password-less ssh
(Assuming that we have a trusted LAN with no security issues).
14. 14
After executing the command a folder called .ssh will be created in home directory. It’s a hidden
folder. This folder will contain a file id_dsa.pub that contains your public key. This public key is
used for sending cryptic message. The distinguishing technique used in public key cryptography
is the use of asymmetric key algorithms, where the key used to encrypt a message is not the same
as the key used to decrypt it. Each user has a pair of cryptographic keys- a public encryption
key and a private decryption key. The publicly available encrypting-key is widely distributed,
while the private decrypting-key is known only to the recipient. Messages are encrypted with the
recipient's public key and can only be decrypted with the corresponding private key. The keys are
related mathematically, but the private key cannot feasibly be derived from the public key.
Now copy this key to another file called authorized_keys in the same directory. Execute the
Commands in the terminal
The authorized_keys file contains the key for one node and it will look like the following
picture. It will show misubeimp pc’s mpiuser authorize keys
We have also got another authorized_keys in minhas-pc‘s mpiuser authorize keys. Now we
have to make a common authorized_keys file for both user pc so that both nodes contains the
same keys in their authorized_keys file. We may do it using simply copy past command with
nano editor.
After make the common file the file would be look like the below picture.
15. 15
Setting up the keys we setup the environment successfully. Up to that log out from the mpiuser
and restart the pc.
Setting up a machine-file:
Now we create a file called "machinefile" in mpiuser home directory with node names followed
by a colon and a number of processes to we want to execute on each node.
16. 16
How to Write a Program using MPICH?
Now we write our very first program using mpich convention. There is a demo example is given
below.
How to Compile and Execute the Program Using MPICH?
To compile the above program using mpich we have to execute the following command
.
After compiling we can execute the compiled file using mpich to our local node without using
the machine file. To do it we have to execute the following command.
17. 17
Now as our goal is to execute the compiled program in both nods of our cluster so we have to
make sure that in both nodes the compiled file and the machine file are present in their mpiuser
home directory. After that we may execute the program with machine in any node.
Here we can transfer the executable file in all nodes using different ways.
1. Using Usb Flash drive
2. Using scp command
3. Using sharing master folder.
1) Using USB Flash Drive: When we transfer the executable file among the nodes using
usb flash drive we have to place the file in the same location of mpiuser account. We
also make sure that this file is in executable mode .To do that we may run the following
command to make the file executable.
2) Using scp command: We may transfer the executable file using scp command from
one node to another node into the same location. To do that we may execute the
following command.
3) Using sharing master folder: Here we first make a folder in all nodes, and then we'll
store our data and programs in this folder. And then we share the contents of this folder
located on the master node to all the other nodes. As we didn’t make any master folder in
our cluster so if someone is interested to know how to make a master folder, they are
requested to check out the following link’s 2 and 3 no points. MpichCluster.
18. 18
Now to execute the program in multiple modes we have to execute the following commands
with machine file
Now it will show the following output
19. 19
Conclusion
In the manual, firstly we have discussed on parallel computing, how parallel computing works,
in which situation we need to compute parallel and why parallel computing is necessary for
scientific research and industrial works. The main focus of this manual based on how to setup a
cluster using Ubuntu (14.04) operating system and mpich along with writing the first parallel
program and executes it in multiple pc parallel. To do so step by step procedures are mentioned
in this manual .We hope this manual will be very helpful for understanding cluster computing
and setting up first cluster computer.
Appendix
List of command used in this manual:
1) misubeimp@misubeimp:~$ sudo apt-get update
2) misubeimp@misubeimp:~$ sudo apt-get install mpich2
3) misubeimp@misubeimp:~$ mpichversion
4) misubeimp@misubeimp:~$ which mpiexec
5) misubeimp@misubeimp:~$ which mpirun
6) misubeimp@misubeimp:~$ sudo apt-get install openssh-server
7) misubeimp@misubeimp:~$ gcc -- version
8) misubeimp@misubeimp:~$ sudo apt-get install build-essential
9) misubeimp@misubeimp:~$ sudo gedit /etc/hosts
10) misubeimp@misubeimp:~$ sudo mkdir /cluster
11) misubeimp@misubeimp:~$ sudo adduser mpiuser --home /clutser
12) misubeimp@misubeimp:~$ sudo chown mpiuser / cluster
13) misubeimp@misubeimp:~$ su – mpiuser
14) mpiuser@misubeimp:~$ cd .ssh
15) mpiuser@misubeimp:~$ ssh-keygen -t dsa
16) mpiuser@misubeimp:~$ cat id_dsa.pub >> authorized_keys
17) mpiuser@misubeimp:~$ cat authorized_keys
18) mpiuser@misubeimp:~$ nano authorized_keys
19) misubeimp@misubeimp:~$ mpicc I_am_alive –o I_am_alive.c
20) misubeimp@misubeimp:~$ mpiexe –n 8 ./I_am_alive
21) misubeimp@misubeimp:~$ sudo scp I_am_alive @minhas-pc
22) misubeimp@misubeimp:~$ sudo chmod +x I_am_alive
23) misubeimp@misubeimp:~$ mpiexe –n 8 –f machinefile ./I_am_alive