The ANDC Cluster Story. <ul><ul><li>Animesh Kumar </li></ul></ul><ul><ul><li>& </li></ul></ul><ul><ul><li>Ankit Bhattachar...
Chapter 1. <ul><li>In the beginning there was…… </li></ul>
Then it exploded….. <ul><ul><li>The Idea: </li></ul></ul><ul><ul><li>The cluster project started through a discussion betw...
Chapter 2 <ul><li>  </li></ul>
<ul><li>Initially the project was very challenging, the challenges being of two sorts:  </li></ul><ul><ul><ul><li>Technica...
Chapter 3 <ul><li>Not everything that can be counted counts, and not everything that counts can be counted. </li></ul>
Junkyard Reclamation…. <ul><ul><li>The project officially started when the team was &quot;presented&quot;  </li></ul></ul>...
Experiences don't come cheap….. <ul><li>The first investment : Since a fairly &quot;impressive&quot; cluster needed to be ...
<ul><ul><li>Finally 6 comps that were in the best </li></ul></ul><ul><ul><li>state were chosen as follows: </li></ul></ul>...
Chapter 4 <ul><li>Wisdom Through Failure…. </li></ul>
Our first mistake….. <ul><li>ClusterKoppix  is chosen  </li></ul><ul><li>Based on thorough research by Dr. Chauhan on the ...
Why cluster knoppix? <ul><ul><li>Lack of requisite knowledge to remaster or implement changes at kernel level.  </li></ul>...
<ul><ul><li>No cdrom drive/harddisk/floppy needed for the clients / openMosix autodiscovery: </li></ul></ul><ul><ul><ul><l...
What Could Have Been……
Problems up there… <ul><li>Both clusterknoppix and openMosix development had  </li></ul><ul><li>stopped so not much suppor...
<ul><li>OpenMosix terminal server - uses PXE, DHCP and tftp to boot linux clients via the network: </li></ul><ul><ul><ul><...
Problems down under…… On the master node we executed the following commands: 1) ifconfig eth0 192.168.1.10 2) route add -n...
Chapter 5   <ul><li>Any port in a storm… </li></ul>
Other solutions tried…. <ul><li>The 'educational' BCCD from the university of IOWA : </li></ul><ul><ul><ul><li>The BCCD wa...
<ul><ul><li>CHAOS: </li></ul></ul><ul><ul><ul><li>Small (6Mbyte) Linux distribution designed for creating ad hoc computer ...
Chapter 6. <ul><li>First taste of success…. </li></ul>
Paralledigm Shift!!! <ul><li>After a lot of frustrating trials that the clusterKnoppix idea was dropped. </li></ul><ul><li...
<ul><ul><li>Didn't work immediately : </li></ul></ul><ul><ul><ul><li>PK needs LAN-booting support and our network cards di...
What the future holds…… <ul><li>More permanent solution instead of temporary solution. eg ROCKS, HADOOP, DISCO..... </li><...
 
Upcoming SlideShare
Loading in …5
×

The Andc Cluster

1,498 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,498
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Andc Cluster

  1. 1. The ANDC Cluster Story. <ul><ul><li>Animesh Kumar </li></ul></ul><ul><ul><li>& </li></ul></ul><ul><ul><li>Ankit Bhattacharjee </li></ul></ul>
  2. 2. Chapter 1. <ul><li>In the beginning there was…… </li></ul>
  3. 3. Then it exploded….. <ul><ul><li>The Idea: </li></ul></ul><ul><ul><li>The cluster project started through a discussion between the Principal of ANDC, Dr Savithri Singh , and the Director of OpenLX, Mr Sudhir Gandotra during a Linux workshop in 2007 </li></ul></ul><ul><ul><li>Dr Sanjay Chauhan’s recruitment : </li></ul></ul><ul><ul><li>Dr. Savithri Singh inducted Dr Sanjay Chauhan from Physics department in the cluster project </li></ul></ul><ul><ul><li>Clueless students' involvement : </li></ul></ul><ul><ul><li>Arjun, Animesh, Ankit and Sudhang. </li></ul></ul>
  4. 4. Chapter 2 <ul><li> </li></ul>
  5. 5. <ul><li>Initially the project was very challenging, the challenges being of two sorts: </li></ul><ul><ul><ul><li>Technical : </li></ul></ul></ul><ul><ul><ul><li>Especially the reclamation of the to-be-junked hardware, and. </li></ul></ul></ul><ul><ul><ul><li>Human : </li></ul></ul></ul><ul><ul><ul><li>Mostly relating to the lack of experience and know- how of the players. This was especially hurtful, since it cost significant man-hours spent on suboptimal and downright incorrect 'solutions' that could have been avoided had the team been slightly more knowledgeable. </li></ul></ul></ul>
  6. 6. Chapter 3 <ul><li>Not everything that can be counted counts, and not everything that counts can be counted. </li></ul>
  7. 7. Junkyard Reclamation…. <ul><ul><li>The project officially started when the team was &quot;presented&quot; </li></ul></ul><ul><ul><li>with 18- 20 decrepit machines of which barely 5 worked . </li></ul></ul><ul><ul><li>The junk consisted of A gallery of PI's, PII's, PIII's at the end of </li></ul></ul><ul><ul><li>their life, most of them not working, requiring us to </li></ul></ul><ul><ul><li>implement some: </li></ul></ul><ul><ul><li>Upgradation : </li></ul></ul><ul><ul><li>Some of those that did required significant upgrades to be worth deployment in the cluster. </li></ul></ul><ul><ul><li>Scavenging : </li></ul></ul><ul><ul><li>Over a certain length of time, few could be repaired while the rest were discarded after &quot; scavenging &quot; useful parts from them for use in the future and in salvageable machines. </li></ul></ul><ul><ul><li>Arjun’s knowledge on hardware acts as great foundation and learning experience. </li></ul></ul>
  8. 8. Experiences don't come cheap….. <ul><li>The first investment : Since a fairly &quot;impressive&quot; cluster needed to be at least visibly fast to the lay observer, the machines had to be upgraded in RAM. 25 X 256 SDRAM modules were purchased and multiples of these were put in all working machines . </li></ul>
  9. 9. <ul><ul><li>Finally 6 comps that were in the best </li></ul></ul><ul><ul><li>state were chosen as follows: </li></ul></ul><ul><ul><li>Specs here: </li></ul></ul><ul><ul><li>4 X PII with 512 MB RAM. </li></ul></ul><ul><ul><li>2 X PIII with 512 MB RAM. </li></ul></ul><ul><ul><li>These were connected via a 100Mbps switch. </li></ul></ul>
  10. 10. Chapter 4 <ul><li>Wisdom Through Failure…. </li></ul>
  11. 11. Our first mistake….. <ul><li>ClusterKoppix is chosen </li></ul><ul><li>Based on thorough research by Dr. Chauhan on the topic, we choose: </li></ul><ul><li>ClusterKnoppix  is a specialized Linux distribution based on the Knoppix distribution, but which uses the openMosix </li></ul><ul><li>kernel. </li></ul><ul><li>openMosix , developed by Israeli technologist, author, investor and entrepreneur Moshe Bar was a fork of the once-open, then-proprietary MOSIX cluster system. </li></ul>
  12. 12. Why cluster knoppix? <ul><ul><li>Lack of requisite knowledge to remaster or implement changes at kernel level. </li></ul></ul><ul><ul><li>ClusterKnoppix aims to provide the same core features and software as Knoppix , but adds the o penMosix clustering capabilities also. </li></ul></ul><ul><ul><li>Specifically designed to be a good master node . </li></ul></ul><ul><ul><li>openMosix has the ability to build a cluster out of inexpensive hardware giving you a traditional supercomputer. As long as you use processors out of the same architecture, any configuration of your node is possible. </li></ul></ul>
  13. 13. <ul><ul><li>No cdrom drive/harddisk/floppy needed for the clients / openMosix autodiscovery: </li></ul></ul><ul><ul><ul><li>New nodes automatically join the cluster (no configuration needed).  </li></ul></ul></ul><ul><ul><li>Cluster Management tools : </li></ul></ul><ul><ul><ul><li>openMosix userland / openMosixview </li></ul></ul></ul><ul><ul><li>Every node can run full blown X (PC-room/demo setup) or, Console only :more memory available for user applications. </li></ul></ul>
  14. 14. What Could Have Been……
  15. 15. Problems up there… <ul><li>Both clusterknoppix and openMosix development had </li></ul><ul><li>stopped so not much support was available.  </li></ul><ul><li>  </li></ul>
  16. 16. <ul><li>OpenMosix terminal server - uses PXE, DHCP and tftp to boot linux clients via the network: </li></ul><ul><ul><ul><li>So it was’nt compatible with the older cards in our fixed machines which were’nt PXE enabled. </li></ul></ul></ul><ul><li>Would’nt work on WFC machines' lan cards: </li></ul><ul><ul><ul><li>No support for post 2.4.x kernels,hence it could’nt be deployed on any of the other labs in the college, as the machines on those had network cards that were incompatible with the GNU/Linux kernel versions with which openMosix worked. </li></ul></ul></ul><ul><li>  </li></ul>
  17. 17. Problems down under…… On the master node we executed the following commands: 1) ifconfig eth0 192.168.1.10 2) route add -net 0.0.0.0 gw 192.168.1.1 3) tyd -f init 4) tyd And on the drone node we executed: 1) ifconfig eth0 192.168.1.20 2) route add -net 0.0.0.0 gw 192.168.1.1 3) tyd -f init 4) tyd -m 192.168.1.10 The error we got was : SIOCSIFFLAGS : no such device
  18. 18. Chapter 5 <ul><li>Any port in a storm… </li></ul>
  19. 19. Other solutions tried…. <ul><li>The 'educational' BCCD from the university of IOWA : </li></ul><ul><ul><ul><li>The BCCD was created to facilitate  instruction  of parallel computing aspects and paradigms. </li></ul></ul></ul><ul><ul><ul><li>The BCCD is a bootable CD image that boots up into a pre-configured distributed computing environment. </li></ul></ul></ul><ul><ul><ul><li>Focus in on  educational  aspects of High-Performance Computing (HPC) instead of the HPC core . </li></ul></ul></ul><ul><ul><li>Problem … </li></ul></ul><ul><ul><ul><li>It asked for a password even from a live cd due to the hardware incompatibility!!!!!!!! </li></ul></ul></ul>
  20. 20. <ul><ul><li>CHAOS: </li></ul></ul><ul><ul><ul><li>Small (6Mbyte) Linux distribution designed for creating ad hoc computer clusters. </li></ul></ul></ul><ul><ul><ul><li>This tiny disc will boot any i586 class PC (that supports CD booting), into a working openMosix node, without disturbing (or even touching) the contents of any local hard disk. </li></ul></ul></ul><ul><ul><li>Quantian OS:   </li></ul></ul><ul><ul><ul><li>A re-mastering of clusterknoppix for computational sciences. </li></ul></ul></ul><ul><ul><ul><li>The environment is self-configuring and directly bootable. </li></ul></ul></ul>
  21. 21. Chapter 6. <ul><li>First taste of success…. </li></ul>
  22. 22. Paralledigm Shift!!! <ul><li>After a lot of frustrating trials that the clusterKnoppix idea was dropped. </li></ul><ul><li>Parallel Knoppix(Upgraded to Pelican HPC) is chosen : </li></ul><ul><ul><ul><li>ParallelKnoppix is a live CD image that let's you set up a high performance computing cluster in a few minutes. </li></ul></ul></ul><ul><ul><ul><li>A Parallel cluster allows you to do parallel computing using MPI. </li></ul></ul></ul><ul><li>Advantages: </li></ul><ul><ul><ul><li>The frontend node (either a real computer or a virtual machine) boots from the CD image. The compute nodes boot by PXE, using the frontend node as the server. </li></ul></ul></ul><ul><ul><ul><li>The LAM-MPI and OpenMPI implementations of MPI are installed. </li></ul></ul></ul><ul><ul><ul><li>Contains extensive example programs . </li></ul></ul></ul><ul><ul><ul><li>Very easy to add packages </li></ul></ul></ul>
  23. 23. <ul><ul><li>Didn't work immediately : </li></ul></ul><ul><ul><ul><li>PK needs LAN-booting support and our network cards didn't support it. We added “no acpi” and accidentally it worked.. ;) </li></ul></ul></ul><ul><ul><li>Etherboot is used : </li></ul></ul><ul><ul><ul><li>gPXE/Etherboot is an open source(GPL) network bootloader. It provides a direct replacement for proprietary PXE ROMs, with many extra features such as DNS, HTTP, iSCSI, etc . </li></ul></ul></ul><ul><ul><li>This solution, thus, gave us our first cluster. </li></ul></ul>
  24. 24. What the future holds…… <ul><li>More permanent solution instead of temporary solution. eg ROCKS, HADOOP, DISCO..... </li></ul><ul><li>Implementing key parallel algorithms. </li></ul><ul><li>Developing a guide for future cluster administrators.. (Who should be students.... :) ) </li></ul><ul><li>Familiarizing other departments with the applications of cluster for their research. </li></ul>

×